Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted

in.emaild.mta high cpu usage

Jump to solution

I'm seeing extremely high CPU usage form the in.emaild.mta that past 2 days. No significant changes have been made.

 

Currently it's consuming %120 of cpu (5400, dual core). I've tried rebooting and failing over.  I'm not seeing much in the queue when running >tecli show emulator queue, but there are 4 items that are stack in there (we are using cloud), cloud queue is rolling through fast as well.

 

I'm a little lost as to why the cpu usage has shot up. looking at the logs we're not seeing any significant increase in mail traffic.

 

2020-01-15_14h40_09.png

 

 

fw ctl multik stat
ID | Active  | CPU    | Connections | Peak
----------------------------------------------
 0 | Yes     | 1      |        4738 |     9502
 1 | Yes     | 0      |        4738 |     9609

 

 

  

 

2 Solutions

Accepted Solutions
Highlighted

MTA 

1) An e-mail is sent to the MTA (on Security Gateway) to TCP port 25 (this is the only supported port).

2) Postfix on Security Gateway receives all e-mails (clear and encrypted) and responds to the sender.

3) Postfix on Security Gateway decrypts the e-mail (if needed) and saves on the incoming queue (marked as PF1 on the diagram below).

4) The in.emaild.mta process is configured to be Postfix content filter.
Each e-mail is sent by Postfix to in.emaild.mta process to TCP port 10025.
The e-mail is parsed by the MIME parser and the attachments (if any) are sent to Threat Emulation Daemon ted for emulations.

Flow Diagram 1.png

in.emaild.mta use:

Anti-Virus over MTA: Anti-Virus is supported on MTA in R80.10 and R80.20 with the latest engine update.

Anti-Spam over MTA: MTA can function as an Anti-Spam

Threat Emulation over MTA use TED

Threat Extraction over MTA use scrub

 

Debug MTA performance refer to MTA Debugging and Performance Troubleshooting Toolkit 

Debug MTA in.emaild.mta refer to sk60387

  1. Start debug:
    fw debug in.emaild.mta on TDERROR_ALL_ALL=5
  2. Replicate the issue
  3. Stop debug:
    fw debug in.emaild.mta off TDERROR_ALL_ALL=0
  4. Analyze:
    $FWDIR/log/emaild.mta.elg*

 

 

View solution in original post

Tags (1)
Highlighted
Employee+
Employee+

Hi 

I'd like to update that we've identified a problem with our AV-related code that causes this high CPU.

We will issue a fix for this tomorrow in the form of a new MTA engine update.

Customers who do not use AV with MTA are not affected by this issue. 

Feel free to post/DM for any questions and we'll answer. 

 

View solution in original post

0 Kudos
14 Replies
Highlighted

I'm going to follow up on my own post.

 

After digging, I've found that the anti-virus behavior has changed. for some reason all of the links in our internal mail signatures are being scanned now. I'm trying to see what caused this behavior change, and prevent these from being scanned constantly. I would expect the links would have been hashed and saved so that it doesn't need to scan each of them every time.

0 Kudos
Highlighted

MTA 

1) An e-mail is sent to the MTA (on Security Gateway) to TCP port 25 (this is the only supported port).

2) Postfix on Security Gateway receives all e-mails (clear and encrypted) and responds to the sender.

3) Postfix on Security Gateway decrypts the e-mail (if needed) and saves on the incoming queue (marked as PF1 on the diagram below).

4) The in.emaild.mta process is configured to be Postfix content filter.
Each e-mail is sent by Postfix to in.emaild.mta process to TCP port 10025.
The e-mail is parsed by the MIME parser and the attachments (if any) are sent to Threat Emulation Daemon ted for emulations.

Flow Diagram 1.png

in.emaild.mta use:

Anti-Virus over MTA: Anti-Virus is supported on MTA in R80.10 and R80.20 with the latest engine update.

Anti-Spam over MTA: MTA can function as an Anti-Spam

Threat Emulation over MTA use TED

Threat Extraction over MTA use scrub

 

Debug MTA performance refer to MTA Debugging and Performance Troubleshooting Toolkit 

Debug MTA in.emaild.mta refer to sk60387

  1. Start debug:
    fw debug in.emaild.mta on TDERROR_ALL_ALL=5
  2. Replicate the issue
  3. Stop debug:
    fw debug in.emaild.mta off TDERROR_ALL_ALL=0
  4. Analyze:
    $FWDIR/log/emaild.mta.elg*

 

 

View solution in original post

Tags (1)
Highlighted

Hi @David_Spencer,

The problem has been occurring more and more frequently with some customers in recent days.

PS:
Hi@PhoneBoy 

Is this a known problem? Can the TAC vs. R&D give out any information here or should we open a ticket?

0 Kudos
Highlighted

It looks like this happened after /scripts/del_all_tmp_files.py was ran, removing all files from /var/log/opt/CPsuite-R80/fw1//tmp/dlp

Not sure if it's related but that's all I've been able to find so far.

 

EDIT

 

it looks like /var/log/opt/CPsuite-R80/fw1/log/ is missing almost all it's files as well.

 

We are on r80.30 and they exist in /CPsuite-R80.30/, but the file count used to be much larger r /CPsuite-R80/, looks like we lost about 15gb of files in the /CPsuite-R80/ directory

0 Kudos
Highlighted

We have the exact same issue, yesterday at about 13:30 the emaild.mta CPU and memory usage shot up on both members of an R80.30 cluster. We don't see anything unusual logged at the time and we still have old files in /var/log/opt/CPsuite-R80/fw1//tmp/dlp and /var/log/opt/CPsuite-R80/fw1/log/

  PID PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 7336 15   0 2259m 1.6g  26m S  184 21.6   1282:30 in.emaild.mta

CPU usage below:

image.png

 

There is nothing out of the ordinary in the logs, we do see the Save Sender ID lists happens about this time every hour, not sure if that's related. IPS etc. update much earlier in the day.

 

 

 

Highlighted

something I found in my emaild.smtp.elg, the entire file is full with the following

 

 

 fwd_add_to_tracked: fw_track_conn failed: Operation now in progress
 fw_track_conn: <cc6510ae,ff89,6cb16f1b,19,6> already tracked
 fwd_add_to_tracked: fw_track_conn failed: Operation now in progress
 fw_track_conn: <cc6510ae,c8ff,3407587b,19,6> already tracked
 fwd_add_to_tracked: fw_track_conn failed: Operation now in progress
 fw_track_conn: <cc6510ae,cdba,b9682c38,19,6> already tracked
 fwd_add_to_tracked: fw_track_conn failed: Operation now in progress
 fw_track_conn: <cc6510ae,964e,682f3d24,19,6> already tracked
 fwd_add_to_tracked: fw_track_conn failed: Operation now in progress
 fw_track_conn: <cc6510ae,d275,6cb16f1a,19,6> already tracked
 fwd_add_to_tracked: fw_track_conn failed: Operation now in progress
 fw_track_conn: <cc6510ae,b760,3407587b,19,6> already tracked
 fwd_add_to_tracked: fw_track_conn failed: Operation now in progress
 fw_track_conn: <cc6510ae,a7e9,b9682c38,19,6> already tracked
 fwd_add_to_tracked: fw_track_conn failed: Operation now in progress
 fw_track_conn: <cc6510ae,d730,6cb16f1a,19,6> already tracked
 fwd_add_to_tracked: fw_track_conn failed: Operation now in progress

 

 

Also found that my scrubd.elg file is looking off, only a few lines a day show up. not sure if this is standard

[15 Jan  8:00:22] Warning:cp_timed_blocker_handler: A handler [0x80ee990] blocked for 4 seconds.[15 Jan  8:00:22] Warning:cp_timed_blocker_handler: Handler info: Library [scrubd], Function offset [0xa6990].
[15 Jan  9:00:30] Warning:cp_timed_blocker_handler: A handler [0x80ee990] blocked for 7 seconds.
[15 Jan  9:00:30] Warning:cp_timed_blocker_handler: Handler info: Library [scrubd], Function offset [0xa6990].
[15 Jan  9:10:59] Warning:cp_timed_blocker_handler: A handler [0xf49b9200] blocked for 3 seconds.
[15 Jan  9:10:59] Warning:cp_timed_blocker_handler: Handler info: Library [/opt/CPshrd-R80.30/lib/libEntitlementStatusCollector.so], Function offset [0xd200].
[15 Jan  9:41:39] Warning:cp_timed_blocker_handler: A handler [0x80ee990] blocked for 21 seconds.
[15 Jan  9:41:39] Warning:cp_timed_blocker_handler: Handler info: Library [scrubd], Function offset [0xa6990].
[15 Jan  9:42:59] Warning:cp_timed_blocker_handler: A handler [0xf76c5ed0] blocked for 5 seconds.
[15 Jan  9:42:59] Warning:cp_timed_blocker_handler: Handler info: Library [/opt/CPsuite-R80.30/fw1/lib/libDaemonBasics.so], Function offset [0xfbed0].
[15 Jan  9:42:59] Warning:cp_timed_blocker_handler: Handler info: Nearest symbol name [_ZN3NAC2IS22BasicDaemonApplication13T_ReconfAsyncEPv], offset [0xfbed0].
[16 Jan  3:57:11] Warning:cp_timed_blocker_handler: A handler [0x80ee990] blocked for 6 seconds.
[16 Jan  3:57:11] Warning:cp_timed_blocker_handler: Handler info: Library [scrubd], Function offset [0xa6990].
[16 Jan 14:56:10] Warning:cp_timed_blocker_handler: A handler [0x80ee990] blocked for 10 seconds.
[16 Jan 14:56:10] Warning:cp_timed_blocker_handler: Handler info: Library [scrubd], Function offset [0xa6990].
[17 Jan  2:30:57] Warning:cp_timed_blocker_handler: A handler [0x80f4f00] blocked for 8 seconds.
[17 Jan  2:30:57] Warning:cp_timed_blocker_handler: Handler info: Library [scrubd], Function offset [0xacf00].
[17 Jan  8:08:49] Warning:cp_timed_blocker_handler: A handler [0x80ffc10] blocked for 5 seconds.
[17 Jan  8:08:49] Warning:cp_timed_blocker_handler: Handler info: Library [scrubd], Function offset [0xb7c10].
[17 Jan  8:08:49] Warning:cp_timed_blocker_handler: Handler info: Nearest symbol name [_ZN11ScrubDaemon20s_AMWInstallPolicyCBEP7fwd_envPcS2_S2_iPvS3_], offset [0xb7c10].
[17 Jan  8:08:49] Warning:cp_timed_blocker_handler: A handler [0xf7d433c0] blocked for 5 seconds.
[17 Jan  8:08:49] Warning:cp_timed_blocker_handler: Handler info: Library [/opt/CPshrd-R80.30/lib/libmessaging.so], Function offset [0x43c0].
[17 Jan  8:08:49] Warning:cp_timed_blocker_handler: A handler [0xf785a2e0] blocked for 5 seconds.
[17 Jan  8:08:49] Warning:cp_timed_blocker_handler: Handler info: Library [/opt/CPshrd-R80.30/lib/libComUtils.so], Function offset [0x1a2e0].

 

0 Kudos
Highlighted
Admin
Admin
Possible a recent update to the Threat Emulation engine is to blame.
Best to open a TAC case so we can investigate.
Highlighted

stopping the AV blade and then killing in.emaild.mta has been my workaround for now.

I've got a TAC case open, they've deferred to R&D for this. Hoping to get the AV blade enabled again soon.

Highlighted

We have the sane issue!

0 Kudos
Highlighted
Gold

We too had reporting from some customers with the same problem. Since some days "in.emaild.mta" process is at 90-100% CPU. We could see spikes every 1-2minutes, continues for 1min.

Disabling AV-blade shows normal utilizations.

0 Kudos
Highlighted
Employee+
Employee+

Hi 

I'd like to update that we've identified a problem with our AV-related code that causes this high CPU.

We will issue a fix for this tomorrow in the form of a new MTA engine update.

Customers who do not use AV with MTA are not affected by this issue. 

Feel free to post/DM for any questions and we'll answer. 

 

View solution in original post

0 Kudos
Highlighted
Just as an FYI, we're seeing CPU usage back to normal on our cluster since 09:15GMT today, we have a ticket open with TAC about this which we'll keep on hold for 24 hours to confirm everyhting is ok. Thanks.
0 Kudos
Employee+
Employee+

Correct:

The issue was related to a SSL certificate change we have made in our cloud services last week. It was combined with an error in our handling of certificate-related errors. We have fixed the issue on the SSL certificate side, therefore all MTA CPU usage should be back to normal without a need for an engine update - starting ~3 hours ago. 

We will still issue soon a new MTA engine update (as planned) which includes a fix to this issue as well as other stability and other improvements.

 

0 Kudos
Highlighted

We had the same issue since the MTA Hotfix 68 installation.

My solution to get rid of the 100% cpu usage from emaild.mta was to disable the option "Activate Continuous Download" unter Anti-Spam & Mail -> Advanced -> SMTP

Maybe this is also activated in your setup... take a look.

0 Kudos