- Local User Groups
I am Dr. Dorit Dor
Ask Me Anything
Check Point for Beginners
Welcome to the
Working From Home
Review Check Point,
Win Apple AirPods!
More Ask Me Anything with Gil Shwed
as we switched from R80.10 to R80.30 kernel 3.1 with UMFW I am looking for some in-depth information if packet handling changed because of e.g. SecureXL no handled by usermode. I found all the great information gathered by @HeikoAnkenbrand but until now could not find infos on the topic below.
After the upgrade we could see much less cpu usage on SNDs then before but much higher load on the fw_worker instances.
Actually before the change we had average 80% CPU on SNDs and 30% on FW_workers and now we have 10% on SNDs but 70% on FW-workers. We upgraded hardware to faster CPUs (3,2 GHz in opposite to 2 GHz) but core number stayed the same with 16 cores. We used multi-queue before but with R80.30 we could now use multiqueue for all interfaces.
Nevertheless my feeling is that some processing that was done by SNDs before has now moved to the fw_workers.
Note that we currently have a high number of VPN clients connected due to Corona.
Thanks for any insights.
This is an exciting question. I can confirm similar behavior on some firewalls. What surprises me is that the basic process is already producing about 10%-20% CPU load (without firewall traffic).
In UMFW the fw instances are threads of the fwk0_dev_0 so by default the top shows all the threads cpu utilization under the main thread. Top has the option to present the utilization per thread as well.
A small calculation sample for the utilization of process fwk0_dev_0:
fwk0_dev_0 = ∑ fwk0_x + ∑ fwk0_dev_x + fwk0_kissd + fwk0_hp
Thread from process fwk0_dev_0:
- fwk0_X -> fw instance thread that takes care for the packet processing
- fwk0_dev_X -> the thread that takes care for communication between fw instances and other CP daemons
- fwk0_kissd -> legacy Kernel Infrastructure (obsolete)
- fwk0_hp -> (high priority) cluster thread
yes I also found the Shift+H option for top to display the single fw_worker processes.
But if there is a change in handling traffic distribution differently there is at least in my case the need to change SND/CoreXL distribution configuration as well.
But before I wanted to understand why I can see this (heavy) shift in load from SNDs to fw_worker´s CoreXL instances.
This is in R80.40:
8445 admin 0 -20 2295088 1.058g 126768 S 0.3 14.1 5:10.29 1 fwk0_dev_0
8479 admin 0 -20 2295088 1.058g 126768 S 0.0 14.1 0:00.00 3 fwk0_kissd
8593 admin 0 -20 2295088 1.058g 126768 S 0.3 14.1 4:22.95 3 fwk0_0
8594 admin 0 -20 2295088 1.058g 126768 S 0.0 14.1 4:49.87 1 fwk0_1
8595 admin 0 -20 2295088 1.058g 126768 S 0.7 14.1 4:11.93 2 fwk0_2
8617 admin 0 -20 2295088 1.058g 126768 S 0.0 14.1 0:01.29 3 fwk0_service
8618 admin 0 -20 2295088 1.058g 126768 S 0.0 14.1 1:06.37 1 fwk0_dev_1
8620 admin 0 -20 2295088 1.058g 126768 S 0.0 14.1 1:08.77 2 fwk0_dev_2
8621 admin 0 -20 2295088 1.058g 126768 S 0.0 14.1 0:29.05 2 fwk0_HeavyIoctl
Probably the biggest change between R80.10 and R80.30 is templating. It used to be handled in SecureXL. Now all templates are moved to FWK.
If FWK is matching a packet to a template (and accelerated connection), it re-injects it back to SXL. As shown in sk153832:
Not really. However, in R80.40, there is something called "dynamic split" (sk164155).
With this feature, system is automatically balancing amount of SNDs and FWKs to keep reasonable CPU utilization.
Definitely, which is why I documented these changes in my book. There is a shift in responsibilities to Firewall Workers in R80.20+ which increases their CPU load. Running in USFW mode instead of kernel mode for the Firewall Workers also causes additional overhead reaching the Firewall Workers, which incurs additional CPU load. But now with the 40 core limit lifted by USFW you can have lots and lots of Firewall Worker cores to handle these new responsibilities.
This shift may well require reducing the number of SND cores after an upgrade to R80.20+, but this is not a hard and fast rule and highly depends on how much traffic is fully accelerated by SecureXL (Packets/sec in fwaccel stats -s output). With the R80.20 changes, in some cases much more traffic can be fully accelerated by SecureXL than before, thus increasing the load on the SND/IRQ cores...
The overall performance impact of this (especially in R80.40) should be neutral to positive.
After upgrading to R80.40 from R80.30 on my 4400 appliance, I can tell you that my CPU process is WAY higher. I can't say if that's because of UMFW or not, but I can say UMFW was turned on as part of the upgrade on a 4400 appliance with a whopping 2 cores on it. For all I know, it could be that R80.40's HTTPS inspection is handling more HTTPS traffic than ever and the increase is due to that. I don't have the full expertise or time to dig in and find out exactly why the CPU is so much higher than R80.30, I just know I'm getting more CPU alerts than ever since upgrading. Thankfully this year we are do for hardware upgrades, so I'm just limping along until that can occur.
Quick update on this. I was working with CP support on an unrelated issue and they noticed that usermode FW was turned on for our 4000 series gateway and they turned it off. My CPU usage was cut in half when they did this. Clearly this should not be turned on for a box this small. Just a reminder, I did not turn on usermode fw, it was turned on automatically by the R80.40 install.
That's not supposed to happen, see my response here:
However whether USFW is enabled by default has been in a bit of flux over time, can you recall when you fresh-loaded R80.40 on your 4000?
Hehe, I know it's not supposed to happen, that was the entire reason I replied back to let you know that it is being enabled by default on an R80.40 install on a certified 4000 series appliance that does not meet the minimum requirements to have it enabled.
I wrote the following about a month ago:
Just a quick FYI. I upgraded a 4400 series cluster to R80.40 and UMFW was enabled after installing via a BLINK upgrade. My firewall is now averaging about twice as high of CPU load as compared to when it was running R80.30. I have been following this and other UMFW threads closely because it sure sounds like we should be disabling UMFW on any firewall with less than 40 cores, but I'm hesitant to do that considering that Checkpoint obviously started enabling UMFW by default in R80.40 regardless of the number of cores you have (or they have a bug in their code and it's incorrectly enabling it.)
SecureXL underwent dramatic changes in R80.20, and these changes apply regardless of whether the Firewall Workers are in kernel mode or USFW. More overall responsibilities were shifted to the Firewall Workers, and this is a bit more discernible when in USFW mode as you can see the CPU being used by the individual fwk* processes/threads mentioned by Heiko, instead of all the CPU time just being lumped into sy/si in kernel mode.
All packets still come through SecureXL/sim/SND first after being emptied from interface ring buffers by SoftIRQ in R80.20+, but unless the packet matches an existing connection in SecureXL's state table, the packet is sent to a Firewall Worker instance which decides whether the connection matches an Accept template, which path the connection should be processed in, etc. This shift in responsibilities is so important to tuning that I created these tables in the third edition of my book documenting the shift in tasks between SND/IRQ cores and Firewall Workers that occurred in R80.20, as well as how the processing paths changed. Hopefully these tables will help...
thanks for the detailed information.
Especially table 6 was exactly what I was looking for.
This confirms my feeling that we´ll need to re-revaluate our SND/fw_worker distribution setting.