cancel
Showing results for 
Search instead for 
Did you mean: 
Create a Post
Tom_Cripps
Silver

Upgrading to R80.30 has caused one fw_worker to be stuck at 100%

Hi,

Since our upgrade to 80.30, our standby member in our cluster has had a fw_worker stuck at 100% cpu, it isn't a particular fw_worker it can change, when one drops another one takes it place essentially. 

We're also now seeing that when we attempt policy installations we lose "GAiA" in essence as is presented with the raw Bash shell as you would see if booted in maintenance mode.

Anything obvious stick out to anyone?

Tom

0 Kudos
10 Replies

Re: Upgrading to R80.30 has caused one fw_worker to be stuck at 100%

With this onliner you can view the process load of each core. This can help you locate the process.

CORE=3; ps -e -o pid,psr,%cpu,%mem,cmd | grep -E  "^[[:space:]][[:digit:]]+[[:space:]]+${CORE}"

More read here: ONELINER - process utilization per core 

Tags (1)
0 Kudos
Tom_Cripps
Silver

Re: Upgrading to R80.30 has caused one fw_worker to be stuck at 100%

Hi Heiko,

Thank you for this. It turned out to be CCP wasn't not being allowed. Temporarily we have added a rule to allow CCP, QA feel this should be implied though.

Tom

0 Kudos
Highlighted
Employee+
Employee+

Re: Upgrading to R80.30 has caused one fw_worker to be stuck at 100%

In R80.30 CCP Encryption was introduced. I would recommend checking the R80.30 ClusterXL Admin Guide and reading up on the new feature to make sure that it is not causing you any issues.

To test you can run `cphaconf ccp_encrypt off` from expert mode on both cluster members, if you stop seeing issues then this was the problem

0 Kudos

Re: Upgrading to R80.30 has caused one fw_worker to be stuck at 100%

This is covered in the third edition of my book, it is probably one of these two things:

1) Cluster Members inappropriately attempting to inspect CCP traffic from other clusters: sk132672: High CPU on ClusterXL due to inspection of CCP packets

2) As Heiko said it could be the overhead of CCP encryption, quoted from my book:

 

Spoiler

CCP Encryption

Starting in version R80.30 with Gaia kernel version 3.10, all CCP traffic is automatically
encrypted to protect it against tampering. You can confirm the configuration state of
CCP encryption with the expert mode command cphaprob ccp_encrypt. As long
as your firewall has AES New Instructions (AES-NI – covered in Chapter 9) as part of its
processor architecture, the additional load incurred by this CCP encryption is expected to
be negligible, and my lab testing has seemed to confirm this. I’d recommend leaving
CCP encryption enabled due to the security benefits it provides.


However let’s suppose you just upgraded your cluster to R80.30 or later with Gaia
kernel 3.10, and you are noticing increased CPU utilization that you can’t seem to pin
down. If you suspect it is the new CCP encryption feature causing the unexplained CPU
load (keep in mind this is more likely on firewall hardware that does not support AES-NI
– see Chapter 9), try these steps to confirm:


1. Baseline the firewall’s CPU usage
2. From expert mode, on all cluster members execute command cphaconf
ccp_encrypt off
3. Examine the firewall’s CPU usage, if it drops substantially consider leaving CCP
encryption disabled, but be mindful of the security ramifications
4. From expert mode, on all cluster members execute command cphaconf
ccp_encrypt on

 

 

 

Book "Max Power 2020: Check Point Firewall Performance Optimization" Third Edition
Now Available at www.maxpowerfirewalls.com
Tom_Cripps
Silver

Re: Upgrading to R80.30 has caused one fw_worker to be stuck at 100%

Hi Tim,

Just purchased the book so will take a look over the new material. I would suggest though that it may be due to the SK linked. We're not inspecting CCP as we must still be running the old kernel.

The output of cphaprob ccp_encrypt is off due to us still utilising the 2.6 Kernel.

We're looking in the possibility of the issue being relating to that SK. Unless you have anything else to check?

 

 

0 Kudos

Re: Upgrading to R80.30 has caused one fw_worker to be stuck at 100%

I guess the question is what the heck is that fw_worker doing that is making it so busy, is it processing traffic or performing some internal function like state sync, CCP processing etc.

I suppose it could be related to an elephant flow (which I'll be speaking about at CPX) that got assigned to that worker, try identifying the Firewall Worker instance number (this is usually different than the core number, use fw ctl affinity -l -r) then in cpview visit these two screens for that particular instance:

  • Advanced...CoreXL...Instances...FW-Instance#...Top FW-Lock consumers
  • CPU...Top-Connections...Instance#...Top Connections

If these screens don't show anything unusual for the busy instance, that tells me it is hung up on some kind of internal function and not something directly related to processing traffic.

 

Book "Max Power 2020: Check Point Firewall Performance Optimization" Third Edition
Now Available at www.maxpowerfirewalls.com
0 Kudos
Tom_Cripps
Silver

Re: Upgrading to R80.30 has caused one fw_worker to be stuck at 100%

We're seeing around ~500,000 handled inbound packets by that worker in 20 seconds, bare in mind this is standby. The top connections is from 0.0.0.0:8116 to an interface we use for DMZ management and not Sync.

 

Tom

0 Kudos
abihsot__
Copper

Re: Upgrading to R80.30 has caused one fw_worker to be stuck at 100%

Hello,

I am seeing similar behavior on standby node as well. Just in my case one cpu is 100% used by fw_full. I am running r80.30 JHF111. Kernel 2.6.

And fwd.elg is full of messages:

[17 Jan 17:41:16] fwd: restarting scrub_cp_file_convertd
[17 Jan 17:41:16] fwd: restarting scrub_cp_file_convertd
[17 Jan 17:41:16] fwd: restarting scrub_cp_file_convertd
[17 Jan 17:41:19] fwd: restarting scrub_cp_file_convertd
[17 Jan 17:41:20] fwd: restarting scrub_cp_file_convertd
[17 Jan 17:41:20] fwd: restarting scrub_cp_file_convertd

0 Kudos
Tom_Cripps
Silver

Re: Upgrading to R80.30 has caused one fw_worker to be stuck at 100%

Hi Tim,

Just an update for you. We was told that in R80.20 a feature was introduced which allows CCP to automatically set it's method. If i remember right we was using Multicast before, and it had changed to Unicast. Setting this now to Broadcast, has fixed this problem.

Thanks for the pointers.

Tom

0 Kudos

Re: Upgrading to R80.30 has caused one fw_worker to be stuck at 100%

Got it, thanks!

 

Book "Max Power 2020: Check Point Firewall Performance Optimization" Third Edition
Now Available at www.maxpowerfirewalls.com
0 Kudos