Solved: Re: in.emaild.mta high cpu usage

NorthernNetGuy · ‎2020-01-15

I'm seeing extremely high CPU usage form the in.emaild.mta that past 2 days. No significant changes have been made.

Currently it's consuming %120 of cpu (5400, dual core). I've tried rebooting and failing over. I'm not seeing much in the queue when running >tecli show emulator queue, but there are 4 items that are stack in there (we are using cloud), cloud queue is rolling through fast as well.

I'm a little lost as to why the cpu usage has shot up. looking at the logs we're not seeing any significant increase in mail traffic.

fw ctl multik stat
ID | Active  | CPU    | Connections | Peak
----------------------------------------------
 0 | Yes     | 1      |        4738 |     9502
 1 | Yes     | 0      |        4738 |     9609

HeikoAnkenbrand · ‎2020-01-15

MTA

1) An e-mail is sent to the MTA (on Security Gateway) to TCP port 25 (this is the only supported port).

2) Postfix on Security Gateway receives all e-mails (clear and encrypted) and responds to the sender.

3) Postfix on Security Gateway decrypts the e-mail (if needed) and saves on the incoming queue (marked as PF¹ on the diagram below).

4) The in.emaild.mta process is configured to be Postfix content filter.
Each e-mail is sent by Postfix to in.emaild.mta process to TCP port 10025.
The e-mail is parsed by the MIME parser and the attachments (if any) are sent to Threat Emulation Daemon ted for emulations.

in.emaild.mta use:

Anti-Virus over MTA: Anti-Virus is supported on MTA in R80.10 and R80.20 with the latest engine update.

Anti-Spam over MTA: MTA can function as an Anti-Spam

Threat Emulation over MTA use TED

Threat Extraction over MTA use scrub

Debug MTA performance refer to MTA Debugging and Performance Troubleshooting Toolkit

Debug MTA in.emaild.mta refer to sk60387

Start debug:
fw debug in.emaild.mta on TDERROR_ALL_ALL=5
Replicate the issue
Stop debug:
fw debug in.emaild.mta off TDERROR_ALL_ALL=0
Analyze:
$FWDIR/log/emaild.mta.elg*

➜ CCSM Elite, CCME, CCTE ➜ www.checkpoint.tips

View solution in original post

TP_Master · ‎2020-01-20

Hi

I'd like to update that we've identified a problem with our AV-related code that causes this high CPU.

We will issue a fix for this tomorrow in the form of a new MTA engine update.

Customers who do not use AV with MTA are not affected by this issue.

Feel free to post/DM for any questions and we'll answer.

View solution in original post

NorthernNetGuy · ‎2020-01-15

I'm going to follow up on my own post.

After digging, I've found that the anti-virus behavior has changed. for some reason all of the links in our internal mail signatures are being scanned now. I'm trying to see what caused this behavior change, and prevent these from being scanned constantly. I would expect the links would have been hashed and saved so that it doesn't need to scan each of them every time.

HeikoAnkenbrand · ‎2020-01-15

MTA

1) An e-mail is sent to the MTA (on Security Gateway) to TCP port 25 (this is the only supported port).

2) Postfix on Security Gateway receives all e-mails (clear and encrypted) and responds to the sender.

3) Postfix on Security Gateway decrypts the e-mail (if needed) and saves on the incoming queue (marked as PF¹ on the diagram below).

4) The in.emaild.mta process is configured to be Postfix content filter.
Each e-mail is sent by Postfix to in.emaild.mta process to TCP port 10025.
The e-mail is parsed by the MIME parser and the attachments (if any) are sent to Threat Emulation Daemon ted for emulations.

in.emaild.mta use:

Anti-Virus over MTA: Anti-Virus is supported on MTA in R80.10 and R80.20 with the latest engine update.

Anti-Spam over MTA: MTA can function as an Anti-Spam

Threat Emulation over MTA use TED

Threat Extraction over MTA use scrub

Debug MTA performance refer to MTA Debugging and Performance Troubleshooting Toolkit

Debug MTA in.emaild.mta refer to sk60387

Start debug:
fw debug in.emaild.mta on TDERROR_ALL_ALL=5
Replicate the issue
Stop debug:
fw debug in.emaild.mta off TDERROR_ALL_ALL=0
Analyze:
$FWDIR/log/emaild.mta.elg*

➜ CCSM Elite, CCME, CCTE ➜ www.checkpoint.tips

HeikoAnkenbrand · ‎2020-01-16

Hi @NorthernNetGuy,

The problem has been occurring more and more frequently with some customers in recent days.

PS:
Hi@PhoneBoy

Is this a known problem? Can the TAC vs. R&D give out any information here or should we open a ticket?

➜ CCSM Elite, CCME, CCTE ➜ www.checkpoint.tips

NorthernNetGuy · ‎2020-01-16

It looks like this happened after /scripts/del_all_tmp_files.py was ran, removing all files from /var/log/opt/CPsuite-R80/fw1//tmp/dlp

Not sure if it's related but that's all I've been able to find so far.

EDIT

it looks like /var/log/opt/CPsuite-R80/fw1/log/ is missing almost all it's files as well.

We are on r80.30 and they exist in /CPsuite-R80.30/, but the file count used to be much larger r /CPsuite-R80/, looks like we lost about 15gb of files in the /CPsuite-R80/ directory

Gareth_somers · ‎2020-01-16

We have the exact same issue, yesterday at about 13:30 the emaild.mta CPU and memory usage shot up on both members of an R80.30 cluster. We don't see anything unusual logged at the time and we still have old files in /var/log/opt/CPsuite-R80/fw1//tmp/dlp and /var/log/opt/CPsuite-R80/fw1/log/.

PID PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
7336 15 0 2259m 1.6g 26m S 184 21.6 1282:30 in.emaild.mta

CPU usage below:

There is nothing out of the ordinary in the logs, we do see the Save Sender ID lists happens about this time every hour, not sure if that's related. IPS etc. update much earlier in the day.

NorthernNetGuy · ‎2020-01-17

something I found in my emaild.smtp.elg, the entire file is full with the following

 fwd_add_to_tracked: fw_track_conn failed: Operation now in progress
 fw_track_conn: <cc6510ae,ff89,6cb16f1b,19,6> already tracked
 fwd_add_to_tracked: fw_track_conn failed: Operation now in progress
 fw_track_conn: <cc6510ae,c8ff,3407587b,19,6> already tracked
 fwd_add_to_tracked: fw_track_conn failed: Operation now in progress
 fw_track_conn: <cc6510ae,cdba,b9682c38,19,6> already tracked
 fwd_add_to_tracked: fw_track_conn failed: Operation now in progress
 fw_track_conn: <cc6510ae,964e,682f3d24,19,6> already tracked
 fwd_add_to_tracked: fw_track_conn failed: Operation now in progress
 fw_track_conn: <cc6510ae,d275,6cb16f1a,19,6> already tracked
 fwd_add_to_tracked: fw_track_conn failed: Operation now in progress
 fw_track_conn: <cc6510ae,b760,3407587b,19,6> already tracked
 fwd_add_to_tracked: fw_track_conn failed: Operation now in progress
 fw_track_conn: <cc6510ae,a7e9,b9682c38,19,6> already tracked
 fwd_add_to_tracked: fw_track_conn failed: Operation now in progress
 fw_track_conn: <cc6510ae,d730,6cb16f1a,19,6> already tracked
 fwd_add_to_tracked: fw_track_conn failed: Operation now in progress

Also found that my scrubd.elg file is looking off, only a few lines a day show up. not sure if this is standard

[15 Jan  8:00:22] Warning:cp_timed_blocker_handler: A handler [0x80ee990] blocked for 4 seconds.[15 Jan  8:00:22] Warning:cp_timed_blocker_handler: Handler info: Library [scrubd], Function offset [0xa6990].
[15 Jan  9:00:30] Warning:cp_timed_blocker_handler: A handler [0x80ee990] blocked for 7 seconds.
[15 Jan  9:00:30] Warning:cp_timed_blocker_handler: Handler info: Library [scrubd], Function offset [0xa6990].
[15 Jan  9:10:59] Warning:cp_timed_blocker_handler: A handler [0xf49b9200] blocked for 3 seconds.
[15 Jan  9:10:59] Warning:cp_timed_blocker_handler: Handler info: Library [/opt/CPshrd-R80.30/lib/libEntitlementStatusCollector.so], Function offset [0xd200].
[15 Jan  9:41:39] Warning:cp_timed_blocker_handler: A handler [0x80ee990] blocked for 21 seconds.
[15 Jan  9:41:39] Warning:cp_timed_blocker_handler: Handler info: Library [scrubd], Function offset [0xa6990].
[15 Jan  9:42:59] Warning:cp_timed_blocker_handler: A handler [0xf76c5ed0] blocked for 5 seconds.
[15 Jan  9:42:59] Warning:cp_timed_blocker_handler: Handler info: Library [/opt/CPsuite-R80.30/fw1/lib/libDaemonBasics.so], Function offset [0xfbed0].
[15 Jan  9:42:59] Warning:cp_timed_blocker_handler: Handler info: Nearest symbol name [_ZN3NAC2IS22BasicDaemonApplication13T_ReconfAsyncEPv], offset [0xfbed0].
[16 Jan  3:57:11] Warning:cp_timed_blocker_handler: A handler [0x80ee990] blocked for 6 seconds.
[16 Jan  3:57:11] Warning:cp_timed_blocker_handler: Handler info: Library [scrubd], Function offset [0xa6990].
[16 Jan 14:56:10] Warning:cp_timed_blocker_handler: A handler [0x80ee990] blocked for 10 seconds.
[16 Jan 14:56:10] Warning:cp_timed_blocker_handler: Handler info: Library [scrubd], Function offset [0xa6990].
[17 Jan  2:30:57] Warning:cp_timed_blocker_handler: A handler [0x80f4f00] blocked for 8 seconds.
[17 Jan  2:30:57] Warning:cp_timed_blocker_handler: Handler info: Library [scrubd], Function offset [0xacf00].
[17 Jan  8:08:49] Warning:cp_timed_blocker_handler: A handler [0x80ffc10] blocked for 5 seconds.
[17 Jan  8:08:49] Warning:cp_timed_blocker_handler: Handler info: Library [scrubd], Function offset [0xb7c10].
[17 Jan  8:08:49] Warning:cp_timed_blocker_handler: Handler info: Nearest symbol name [_ZN11ScrubDaemon20s_AMWInstallPolicyCBEP7fwd_envPcS2_S2_iPvS3_], offset [0xb7c10].
[17 Jan  8:08:49] Warning:cp_timed_blocker_handler: A handler [0xf7d433c0] blocked for 5 seconds.
[17 Jan  8:08:49] Warning:cp_timed_blocker_handler: Handler info: Library [/opt/CPshrd-R80.30/lib/libmessaging.so], Function offset [0x43c0].
[17 Jan  8:08:49] Warning:cp_timed_blocker_handler: A handler [0xf785a2e0] blocked for 5 seconds.
[17 Jan  8:08:49] Warning:cp_timed_blocker_handler: Handler info: Library [/opt/CPshrd-R80.30/lib/libComUtils.so], Function offset [0x1a2e0].

PhoneBoy · ‎2020-01-17

Possible a recent update to the Threat Emulation engine is to blame.
Best to open a TAC case so we can investigate.

NorthernNetGuy · ‎2020-01-17

stopping the AV blade and then killing in.emaild.mta has been my workaround for now.

I've got a TAC case open, they've deferred to R&D for this. Hoping to get the AV blade enabled again soon.

Andi_Schill · ‎2020-01-17

We have the sane issue!

Wolfgang · ‎2020-01-20

We too had reporting from some customers with the same problem. Since some days "in.emaild.mta" process is at 90-100% CPU. We could see spikes every 1-2minutes, continues for 1min.

Disabling AV-blade shows normal utilizations.

TP_Master · ‎2020-01-20

Hi

I'd like to update that we've identified a problem with our AV-related code that causes this high CPU.

We will issue a fix for this tomorrow in the form of a new MTA engine update.

Customers who do not use AV with MTA are not affected by this issue.

Feel free to post/DM for any questions and we'll answer.

Gareth_somers · ‎2020-01-21

Just as an FYI, we're seeing CPU usage back to normal on our cluster since 09:15GMT today, we have a ticket open with TAC about this which we'll keep on hold for 24 hours to confirm everyhting is ok. Thanks.

TP_Master · ‎2020-01-21

Correct:

The issue was related to a SSL certificate change we have made in our cloud services last week. It was combined with an error in our handling of certificate-related errors. We have fixed the issue on the SSL certificate side, therefore all MTA CPU usage should be back to normal without a need for an engine update - starting ~3 hours ago.

We will still issue soon a new MTA engine update (as planned) which includes a fix to this issue as well as other stability and other improvements.

christopher · ‎2020-05-06

We had the same issue since the MTA Hotfix 68 installation.

My solution to get rid of the 100% cpu usage from emaild.mta was to disable the option "Activate Continuous Download" unter Anti-Spam & Mail -> Advanced -> SMTP

Maybe this is also activated in your setup... take a look.

Are you a member of CheckMates?

in.emaild.mta high cpu usage