Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
abideenshaikh
Explorer

IN /VAR/LOG/messages keep getting Stopping CUL mode

I am keep seeing below error messages in /var/log. However we realized at times our CPU utilization peaking over 80%. Since last 24 hours we are experiencing degraded Network performance and latency for the devices sitting behind the FW.  Please advise.

 

Apr 8 10:33:13 2021 cor-fw02 kernel: [fw4_1];CLUS-120202-2: Stopping CUL mode after 17 sec (short CUL timeout), because no member reported CPU usage above the configured threshold (80%) during the last 10 sec.
Apr 8 10:33:19 2021 cor-fw02 kernel: [fw4_1];CLUS-120200-2: Starting CUL mode because CPU-02 usage (84%) on the local member increased above the configured threshold (80%).
Apr 8 10:33:30 2021 cor-fw02 kernel: [fw4_1];CLUS-120202-2: Stopping CUL mode after 11 sec (short CUL timeout), because no member reported CPU usage above the configured threshold (80%) during the last 10 sec.
Apr 8 10:33:35 2021 cor-fw02 kernel: [fw4_1];CLUS-120200-2: Starting CUL mode because CPU-02 usage (82%) on the local member increased above the configured threshold (80%).
Apr 8 10:33:49 2021 cor-fw02 kernel: [fw4_1];CLUS-120202-2: Stopping CUL mode after 14 sec (short CUL timeout), because no member reported CPU usage above the configured threshold (80%) during the last 10 sec.
Apr 8 10:33:50 2021 cor-fw02 kernel: [fw4_1];CLUS-120200-2: Starting CUL mode because CPU-02 usage (81%) on the local member increased above the configured threshold (80%).
Apr 8 10:34:00 2021 cor-fw02 kernel: [fw4_1];CLUS-120202-2: Stopping CUL mode after 10 sec (short CUL timeout), because no member reported CPU usage above the configured threshold (80%) during the last 10 sec.
Apr 8 10:34:04 2021 cor-fw02 kernel: [fw4_1];CLUS-120200-2: Starting CUL mode because CPU-02 usage (85%) on the local member increased above the configured threshold (80%).
Apr 8 10:34:15 2021 cor-fw02 kernel: [fw4_1];CLUS-120202-2: Stopping CUL mode after 10 sec (short CUL timeout), because no member reported CPU usage above the configured threshold (80%) during the last 10 sec.
Apr 8 10:34:18 2021 cor-fw02 kernel: [fw4_1];CLUS-120200-2: Starting CUL mode because CPU-02 usage (89%) on the local member increased above the configured threshold (80%).
Apr 8 10:34:28 2021 cor-fw02 kernel: [fw4_1];CLUS-120202-2: Stopping CUL mode after 10 sec (short CUL timeout), because no member reported CPU usage above the configured threshold (80%) during the last 10 sec.
Apr 8 10:34:44 2021 cor-fw02 kernel: [fw4_0];[10.1.3.230:28091 -> 10.45.44.112:25] [ERROR]: fileapp_parser_smtp_process_content_type: invalid state sequence
Apr 8 10:34:44 2021 cor-fw02 kernel: [fw4_0];[10.1.3.230:28091 -> 10.45.44.112:25] [ERROR]: fileapp_parser_execute_ctx: context 68 func failed
Apr 8 10:34:44 2021 cor-fw02 kernel: [fw4_0];[10.1.3.230:28091 -> 10.45.44.112:25] [ERROR]: fileapp_parser_print_execution_failure_info: encountered internal error on context 68
Apr 8 10:34:44 2021 cor-fw02 kernel: [fw4_0];[10.1.3.230:28091 -> 10.45.44.112:25] [ERROR]: fileapp_parser_print_execution_failure_info: parser_ins: flags = 1, state = 7
Apr 8 10:34:44 2021 cor-fw02 kernel: [fw4_0];[10.1.3.230:28091 -> 10.45.44.112:25] [ERROR]: fileapp_parser_print_execution_failure_info: parser_opq: type = 3, filename = NULL, uuid = 190975, flags = 93, handler type = -1
Apr 8 10:34:44 2021 cor-fw02 kernel: [fw4_0];[10.1.3.230:28091 -> 10.45.44.112:25] [ERROR]: fileapp_parser_print_execution_failure_info: failed for conn_key: <dir 0, 10.1.3.230:28091 -> 10.45.44.112:25 IPP 6>

 

We are running on below OS

[Expert@cor-fw02:0]# clish -c "show version all"
Product version Check Point Gaia R80.40
OS build 294
OS kernel version 3.10.0-957.21.3cpx86_64
OS edition 64-bit

 

 

0 Kudos
10 Replies
_Val_
Admin
Admin

CUL means Cluster Under Load. It is not a reason for slowness. Instead, this is an additional indication that your FW appliances are experiencing performance issues. You need to analyse what's causing CPU02 to spoke over 80% and try to rectify the root cause.

0 Kudos
abideenshaikh
Explorer

Hello,

Thanks for the response. Its not just CPU 2 which is peaking i have picked some history llogs from CPVIEW -t please see the screen shots its all of them which are peaking. please point me to the direction where should i start checking ?

abideenshaikh_0-1617879298887.png

 

abideenshaikh_1-1617879298893.png

 

 

abideenshaikh_2-1617879298898.png

 

0 Kudos
_Val_
Admin
Admin

Fine, but this does not change the recommendation. You need to investigate the root cause before anything else. 

0 Kudos
Timothy_Hall
Champion
Champion

Agree with Val, CUL is just a symptom of the underlying problem, not the cause.  If you have at least Jumbo HFA 69 loaded, Spike Detective might be helpful:

sk166454: CPU Spike Detective

Next step is to determine what execution mode (user/kernel) the heavy CPU utilization is occurring in.  Try running expert mode command sar which will show you that day's CPU statistics.  Beyond that, please provide the output of the Super Seven for further analysis:

S7PAC - Super Seven Performance Assessment Command.

New 2021 IPS/AV/ABOT Self-Guided Video Series
now available at http://www.maxpowerfirewalls.com
0 Kudos
abideenshaikh
Explorer

Hi Timothy and Val thank for the quick response please see super 7 output below

[Expert@cor-fw02:0]# fwaccel stat
+---------------------------------------------------------------------------------+
|Id|Name |Status |Interfaces |Features |
+---------------------------------------------------------------------------------+
|0 |SND |enabled |eth1,eth5,eth2,eth4,Sync |Acceleration,Cryptography |
| | | | |Crypto: Tunnel,UDPEncap,MD5, |
| | | | |SHA1,NULL,3DES,DES,AES-128, |
| | | | |AES-256,ESP,LinkSelection, |
| | | | |DynamicVPN,NatTraversal, |
| | | | |AES-XCBC,SHA256,SHA384 |
+---------------------------------------------------------------------------------+

Accept Templates : enabled
Drop Templates : enabled
NAT Templates : enabled


[Expert@cor-fw02:0]# fwaccel stats -s
Accelerated conns/Total conns : 310/16733 (1%)
Accelerated pkts/Total pkts : 2767685456/2828887261 (97%)
F2Fed pkts/Total pkts : 61201805/2828887261 (2%)
F2V pkts/Total pkts : 12077955/2828887261 (0%)
CPASXL pkts/Total pkts : 94279305/2828887261 (3%)
PSLXL pkts/Total pkts : 2413309861/2828887261 (85%)
CPAS pipeline pkts/Total pkts : 0/2828887261 (0%)
PSL pipeline pkts/Total pkts : 0/2828887261 (0%)
CPAS inline pkts/Total pkts : 0/2828887261 (0%)
PSL inline pkts/Total pkts : 0/2828887261 (0%)
QOS inbound pkts/Total pkts : 1190521646/2828887261 (42%)
QOS outbound pkts/Total pkts : 1352420783/2828887261 (47%)
Corrected pkts/Total pkts : 0/2828887261 (0%)


[Expert@cor-fw02:0]# grep -c processor /proc/cpuinfo
4


[Expert@cor-fw02:0]# fw ctl affinity -l -r
CPU 0: eth1 eth5 eth2 eth4 Sync
CPU 1: fw_2
in.emaild.mta cp_file_convertd fwd scrub_cp_file_convertd scanengine_s mpdaemon rad vpnd pdpd scanengine_b in.acapd rtmd scrubd pepd in.geod mta_monitor wsdnsd lpd usrchkd watermark_cp_file_convertd fgd50 cprid cpd
CPU 2: fw_1
in.emaild.mta cp_file_convertd fwd scrub_cp_file_convertd scanengine_s mpdaemon rad vpnd pdpd scanengine_b in.acapd rtmd scrubd pepd in.geod mta_monitor wsdnsd lpd usrchkd watermark_cp_file_convertd fgd50 cprid cpd
CPU 3: fw_0
in.emaild.mta cp_file_convertd fwd scrub_cp_file_convertd scanengine_s mpdaemon rad vpnd pdpd scanengine_b in.acapd rtmd scrubd pepd in.geod mta_monitor wsdnsd lpd usrchkd watermark_cp_file_convertd fgd50 cprid cpd
All:


[Expert@cor-fw02:0]# netstat -ni
Kernel Interface table
Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
Sync 1500 0 3056044 0 0 0 16189862 0 0 0 BMRU
eth1 1500 0 277465925 0 0 0 277025714 0 0 0 BMRU
eth2 1430 0 166346 0 0 0 173374 0 0 0 BMRU
eth4 1500 0 1406127355 0 702 702 1233970185 0 0 0 BMRU
eth5 1500 0 1214968288 0 0 0 1398640000 0 0 0 BMRU
lo 65536 0 5776315 0 0 0 5776315 0 0 0 LMNRU
vpnt110 1500 0 0 0 0 0 0 0 0 0 MOPRU
vpnt130 1500 0 0 0 0 0 0 0 0 0 MOPRU


[Expert@cor-fw02:0]# fw ctl multik stat
ID | Active | CPU | Connections | Peak
----------------------------------------------
0 | Yes | 3 | 6016 | 22288
1 | Yes | 2 | 5957 | 22284
2 | Yes | 1 | 6019 | 22274

 

[Expert@cor-fw02:0]# cpstat os -f multi_cpu -o 1

 

Processors load
---------------------------------------------------------------------------------
|CPU#|User Time(%)|System Time(%)|Idle Time(%)|Usage(%)|Run queue|Interrupts/sec|
---------------------------------------------------------------------------------
| 1| 0| 40| 60| 40| ?| 302|
| 2| 9| 37| 54| 46| ?| 290|
| 3| 7| 38| 55| 45| ?| 290|
| 4| 7| 39| 54| 46| ?| 291|
---------------------------------------------------------------------------------

 

 

Processors load
---------------------------------------------------------------------------------
|CPU#|User Time(%)|System Time(%)|Idle Time(%)|Usage(%)|Run queue|Interrupts/sec|
---------------------------------------------------------------------------------
| 1| 0| 40| 60| 40| ?| 302|
| 2| 9| 37| 54| 46| ?| 290|
| 3| 7| 38| 55| 45| ?| 290|
| 4| 7| 39| 54| 46| ?| 291|
---------------------------------------------------------------------------------

 

 

Processors load
---------------------------------------------------------------------------------
|CPU#|User Time(%)|System Time(%)|Idle Time(%)|Usage(%)|Run queue|Interrupts/sec|
---------------------------------------------------------------------------------
| 1| 1| 41| 59| 41| ?| 222479|
| 2| 5| 37| 58| 42| ?| 222487|
| 3| 7| 32| 61| 39| ?| 222487|
| 4| 6| 38| 56| 44| ?| 111244|
---------------------------------------------------------------------------------

0 Kudos
abideenshaikh
Explorer

Spike detective logs are attached. 

0 Kudos
the_rock
Authority
Authority

Question...is this something new that happened or did you ever have this issue before? Just one member showing this or both? Have you tried failing over or possibly if you run ps -auxw, it would show you cpu/mem utilization, so you can see whats actually "eating" up resources. 

0 Kudos
abideenshaikh
Explorer

Hey Rock,

We never had this performance issue we will replace the secondary GW on Tuesday will be in a position to confirm if it is an issue with other GW too. Pleasew find attached ps -auxw output.

 

 

0 Kudos
Timothy_Hall
Champion
Champion

Hmm everything looks pretty well balanced, the firewall is just quite busy.  It looks like you have a large number of blades enabled, please provide output of enabled_blades command.

One slightly unusual thing is that you have the QoS blade enabled which will cause a fair amount of CPU overhead.  I assume you are relying on the QoS blade for prioritization of voice/video?  If not you should turn it off.  Also be careful about applying QoS between internal high-speed LANs, normally you'll just want QoS for traffic traversing your Internet-facing network interface.

Your PSLXL traffic % is pretty high, make sure that rules enforcing APCL/URLF are doing so only for traffic to and from the Internet, and not between high-speed internal LAN networks.  Next step would be to look over your Threat Prevention policy and identify situations where it is being asked to scan traffic between high-speed internal LAN networks, and try to minimize that if possible with a "null" TP profile (which has all five blades unchecked) applied to high-speed internal traffic at the top of the TP policy.  TP exceptions will not help in this case.

 

New 2021 IPS/AV/ABOT Self-Guided Video Series
now available at http://www.maxpowerfirewalls.com
0 Kudos
ggiordano
Participant

do you have an high number of VPN connections especially site-to-site? what is the model of your gateway?

in this pandemic time, where a lot of users work from home, I see an increment of CPU usage trend for my customers. In particular where the customer has gateway 5200 or 5400. These model doesn't support AES-NI technology and it means more CPU interrupts are required for VPN traffic.

an interesting read is the "Best Practices - VPN Performance" article (sk105119)

0 Kudos