Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Sanjay_S
Advisor

fw_worker_0 using 100% CPU

Hi All,

We are facing the issue below since 1 week, this looks very difficult for us to troubleshoot.

fw_worker_0 process is using 100% CPU, this is causing slowness in the connectivity. When checked the SKs, suggests the Application filtering blade is causing the issue. But we did not enable the blade. We have only enabled URL filtering and not Application filtering. Is there any solution to resolve this issue asap.

Please help.

12 Replies
G_W_Albrecht
Champion
Champion

0 Kudos
Reply
Sanjay_S
Advisor

Thanks Gunther,

I am checking this. Will get back to you if i find something on this.

0 Kudos
Reply
Timothy_Hall
Champion
Champion

Gateway version and Jumbo HFA level?

--
Second Edition of my "Max Power" Firewall Book
Now Available at http://www.maxpowerfirewalls.com

"Max Capture: Know Your Packets" Video Series
now available at http://www.maxpowerfirewalls.com
0 Kudos
Reply
Sanjay_S
Advisor

Hi Timothy, Below is the version we are on.

This is Check Point's software version R80.10 - Build 439
kernel: R80.10 - Build 448

No HFA is installed.

0 Kudos
Reply
G_W_Albrecht
Champion
Champion

No Jumbo installed ? I would not suggest to keep that state...

0 Kudos
Reply
Mike_A
Advisor

As the SK from Gunther states, check your fw ctl multik stat.

1.) What version are you running on your gateways? (R77.20/R77.30/R80.10)

2.) Do you have Route based VPN enabled which could cause CoreXL to be disabled (thus pinning all your traffic for VPN to worker_0)?

         a.) I had the same issue and the dispatch global parameter did help drop the CPU on worker 0.

This setting can be made on the fly and then persistent in fwkern.conf. Please note that this only worked for me in R77.20. When I went to R77.30 the dispatch global must now have a check, as I was told by Diamond, their internal notes state that the dispatch global statement (default 0, I set to 1). Can not be equal to or greater than the number of active CPU's when using the command fw multik stat. If you only have 1 active then you cannot use this command in R77.30, the command will take but the parameter will not move off of any setting other than 0. 

If CoreXL is enabled you can check your affinity settings (fw ctl affinity -l -v -r) and check where your SND and workers are distributed. 

0 Kudos
Reply
Sanjay_S
Advisor

Hi Mike, Below is the version we are running on.

This is Check Point's software version R80.10 - Build 439
kernel: R80.10 - Build 448

And CoreXL is not enabled in  this.

0 Kudos
Reply
Mike_A
Advisor

Sanjay,

Is there a reason you don't have CoreXL enabled? Without CoreXL you are forcing all all processes to be pinned to a single CPU, which, I can only assume, is causing your 100% usage.

Here is an SK (sk105261) that references the dynamic dispatcher in R80.10 and how to enable/disable/check and monitor the dispatching across various cores but CoreXL needs to be enabled. 

- Mike 

Timothy_Hall
Champion
Champion

Lots of speculation here, but let's cut through it.  Please provide outputs from following commands:

fwaccel stat
fwaccel stats -s
grep -c ^processor /proc/cpuinfo
/sbin/cpuinfo
fw ctl affinity -l -r
sim affinity -l
netstat -ni
fw ctl multik stat
cpstat os -f multi_cpu -o 1
free -m
enabled_blades

You mentioned initially that you have URLF enabled but not APCL; you almost certainly need to optimize your URLF policy to keep LAN-speed traffic from getting inappropriately inspected in PXL.  See my post here:

https://community.checkpoint.com/message/28972-re-layers-and-the-cleanup-rule?commentID=28972#commen... 

--
Second Edition of my "Max Power" Firewall Book
Now Available at http://www.maxpowerfirewalls.com

"Max Capture: Know Your Packets" Video Series
now available at http://www.maxpowerfirewalls.com
Kris_Pellens
Contributor

Have (or had) you enabled any other blade (besides firewall and url filtering)?

I have a system on R80.20; I turned on IPS; then I turned it off.

However, amw remained loaded; one or more fwk_worker processes go up to 100%.

Can you check whether or not amw is loaded; if it is, just unload it (fw amw unload) and redeploy the policy.

0 Kudos
Reply
Sanjay_S
Advisor

Hi All,

Thank you for all your replies. Now the CPU looks stable after the below procedure i followed.

  • NAT & Drop Templates are enabled on Secure XL.
  • JHF Take 154 is installed.
  • Enabled CoreXL.
  • Enabled Dynamic Dispatcher.
  • Disabled URL Filtering.
  • Disabled IPS.
  • Optimized rule base and re-arrange few rules.
  • Enabled the IPS Blade.
  • Disabled the ‘Accept outgoing packets originating from Gateway’
  • Disabled IPS and enabled the URL Filtering.
  • Enabled IPS blade again.

So i suspect the issue was with the 'Accept outgoing packets originating from Gateway'

0 Kudos
Reply
Thomas_Eichelbu
Collaborator


Hello,

 

maybe you can check out this commands:

 

via "top" command u localize the worker process yoz focus on ...
and then issue this command with the number of the worker ...
1."echo 1 > /proc/cpkstats/fw_worker_XXX_stats"
run it for a few seconds, keep it mind this could cause some performance issues ...


2."cat /proc/cpkstats/fw_worker_XXX_stats"
it will print you a table with the top F2F SRC and DST pairs … you can search for the most heavy sessions and analyze the traffic …
so keep in mind it shows only F2F traffic, which is non-accelerated by SecureXL...


3."echo 0 > /proc/cpkstats/fw_worker_XXX_stats"
use this command to finally stop the trace …

 

this helped us today to identify a very heavy connection with cause a massive heavy load on a cluster and dropped VPN´s and made work impossible ...
Check Point TAC showed us this really helpful set of commands!

best regards
Thomas.