fw_worker_0 using 100% CPU

Sanjay_S · ‎2018-11-05

Hi All,

We are facing the issue below since 1 week, this looks very difficult for us to troubleshoot.

fw_worker_0 process is using 100% CPU, this is causing slowness in the connectivity. When checked the SKs, suggests the Application filtering blade is causing the issue. But we did not enable the blade. We have only enabled URL filtering and not Application filtering. Is there any solution to resolve this issue asap.

Please help.

G_W_Albrecht · ‎2018-11-05

i would start troubleshooting using sk112134: How to troubleshoot the issue with CoreXL "fw_worker_0" consuming CPU at 100%.

CCSP - CCSE / CCTE / CTPS / CCME / CCSM Elite / SMB Specialist

Sanjay_S · ‎2018-11-05

Thanks Gunther,

I am checking this. Will get back to you if i find something on this.

Timothy_Hall · ‎2018-11-05

Gateway version and Jumbo HFA level?

--
Second Edition of my "Max Power" Firewall Book
Now Available at http://www.maxpowerfirewalls.com

Gaia 4.18 (R82) Immersion Tips, Tricks, & Best Practices Video Course
Now Available at https://shadowpeak.com/gaia4-18-immersion-course

Sanjay_S · ‎2018-11-05

Hi Timothy, Below is the version we are on.

This is Check Point's software version R80.10 - Build 439
kernel: R80.10 - Build 448

No HFA is installed.

G_W_Albrecht · ‎2018-11-05

No Jumbo installed ? I would not suggest to keep that state...

CCSP - CCSE / CCTE / CTPS / CCME / CCSM Elite / SMB Specialist

Mike_A · ‎2018-11-05

As the SK from Gunther states, check your fw ctl multik stat.

1.) What version are you running on your gateways? (R77.20/R77.30/R80.10)

2.) Do you have Route based VPN enabled which could cause CoreXL to be disabled (thus pinning all your traffic for VPN to worker_0)?

a.) I had the same issue and the dispatch global parameter did help drop the CPU on worker 0.

This setting can be made on the fly and then persistent in fwkern.conf. Please note that this only worked for me in R77.20. When I went to R77.30 the dispatch global must now have a check, as I was told by Diamond, their internal notes state that the dispatch global statement (default 0, I set to 1). Can not be equal to or greater than the number of active CPU's when using the command fw multik stat. If you only have 1 active then you cannot use this command in R77.30, the command will take but the parameter will not move off of any setting other than 0.

If CoreXL is enabled you can check your affinity settings (fw ctl affinity -l -v -r) and check where your SND and workers are distributed.

Sanjay_S · ‎2018-11-05

Hi Mike, Below is the version we are running on.

This is Check Point's software version R80.10 - Build 439
kernel: R80.10 - Build 448

And CoreXL is not enabled in this.

Mike_A · ‎2018-11-05

Sanjay,

Is there a reason you don't have CoreXL enabled? Without CoreXL you are forcing all all processes to be pinned to a single CPU, which, I can only assume, is causing your 100% usage.

Here is an SK (sk105261) that references the dynamic dispatcher in R80.10 and how to enable/disable/check and monitor the dispatching across various cores but CoreXL needs to be enabled.

- Mike

Timothy_Hall · ‎2018-11-06

Lots of speculation here, but let's cut through it. Please provide outputs from following commands:

fwaccel stat
fwaccel stats -s
grep -c ^processor /proc/cpuinfo
/sbin/cpuinfo
fw ctl affinity -l -r
sim affinity -l
netstat -ni
fw ctl multik stat
cpstat os -f multi_cpu -o 1
free -m
enabled_blades

You mentioned initially that you have URLF enabled but not APCL; you almost certainly need to optimize your URLF policy to keep LAN-speed traffic from getting inappropriately inspected in PXL. See my post here:

https://community.checkpoint.com/message/28972-re-layers-and-the-cleanup-rule?commentID=28972#commen...

--
Second Edition of my "Max Power" Firewall Book
Now Available at http://www.maxpowerfirewalls.com

Gaia 4.18 (R82) Immersion Tips, Tricks, & Best Practices Video Course
Now Available at https://shadowpeak.com/gaia4-18-immersion-course

Kris_Pellens · ‎2018-11-05

Have (or had) you enabled any other blade (besides firewall and url filtering)?

I have a system on R80.20; I turned on IPS; then I turned it off.

However, amw remained loaded; one or more fwk_worker processes go up to 100%.

Can you check whether or not amw is loaded; if it is, just unload it (fw amw unload) and redeploy the policy.

Sanjay_S · ‎2018-11-26

Hi All,

Thank you for all your replies. Now the CPU looks stable after the below procedure i followed.

NAT & Drop Templates are enabled on Secure XL.
JHF Take 154 is installed.
Enabled CoreXL.
Enabled Dynamic Dispatcher.
Disabled URL Filtering.
Disabled IPS.
Optimized rule base and re-arrange few rules.
Enabled the IPS Blade.
Disabled the ‘Accept outgoing packets originating from Gateway’
Disabled IPS and enabled the URL Filtering.
Enabled IPS blade again.

So i suspect the issue was with the 'Accept outgoing packets originating from Gateway'

Thomas_Eichelbu · ‎2019-06-27

Hello,

maybe you can check out this commands:

via "top" command u localize the worker process yoz focus on ...
and then issue this command with the number of the worker ...
1."echo 1 > /proc/cpkstats/fw_worker_XXX_stats"
run it for a few seconds, keep it mind this could cause some performance issues ...

2."cat /proc/cpkstats/fw_worker_XXX_stats"
it will print you a table with the top F2F SRC and DST pairs … you can search for the most heavy sessions and analyze the traffic …
so keep in mind it shows only F2F traffic, which is non-accelerated by SecureXL...

3."echo 0 > /proc/cpkstats/fw_worker_XXX_stats"
use this command to finally stop the trace …

this helped us today to identify a very heavy connection with cause a massive heavy load on a cluster and dropped VPN´s and made work impossible ...
Check Point TAC showed us this really helpful set of commands!

best regards
Thomas.

Are you a member of CheckMates?

fw_worker_0 using 100% CPU