- CheckMates
- :
- Products
- :
- General Topics
- :
- Re: Question on traffic handling between Kernel Mo...
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Are you a member of CheckMates?
×- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Question on traffic handling between Kernel Mode FW and Usermode FW
Hi,
as we switched from R80.10 to R80.30 kernel 3.1 with UMFW I am looking for some in-depth information if packet handling changed because of e.g. SecureXL no handled by usermode. I found all the great information gathered by @HeikoAnkenbrand but until now could not find infos on the topic below.
After the upgrade we could see much less cpu usage on SNDs then before but much higher load on the fw_worker instances.
Actually before the change we had average 80% CPU on SNDs and 30% on FW_workers and now we have 10% on SNDs but 70% on FW-workers. We upgraded hardware to faster CPUs (3,2 GHz in opposite to 2 GHz) but core number stayed the same with 16 cores. We used multi-queue before but with R80.30 we could now use multiqueue for all interfaces.
Nevertheless my feeling is that some processing that was done by SNDs before has now moved to the fw_workers.
Note that we currently have a high number of VPN clients connected due to Corona.
Thanks for any insights.
regards Thomas
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This is an exciting question. I can confirm similar behavior on some firewalls. What surprises me is that the basic process is already producing about 10%-20% CPU load (without firewall traffic).
In UMFW the fw instances are threads of the fwk0_dev_0 so by default the top shows all the threads cpu utilization under the main thread. Top has the option to present the utilization per thread as well.
A small calculation sample for the utilization of process fwk0_dev_0:
max_CoreXL_number max_CoreXL_number
fwk0_dev_0 = ∑ fwk0_x + ∑ fwk0_dev_x + fwk0_kissd + fwk0_hp
x=0 x=0
Thread from process fwk0_dev_0:
- fwk0_X -> fw instance thread that takes care for the packet processing
- fwk0_dev_X -> the thread that takes care for communication between fw instances and other CP daemons
- fwk0_kissd -> legacy Kernel Infrastructure (obsolete)
- fwk0_hp -> (high priority) cluster thread
More read here:
R80.x - Performance Tuning Tip – User Mode Firewall vs. Kernel Mode Firewall
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Heiko,
yes I also found the Shift+H option for top to display the single fw_worker processes.
But if there is a change in handling traffic distribution differently there is at least in my case the need to change SND/CoreXL distribution configuration as well.
But before I wanted to understand why I can see this (heavy) shift in load from SNDs to fw_worker´s CoreXL instances.
Regards Thomas
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I´ll have to add that there are these new "snd" processes which consume also cpu:
So at least there is another process which seems to be related to SND processing ...
Regards Thomas
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This is in R80.40:
...
8445 admin 0 -20 2295088 1.058g 126768 S 0.3 14.1 5:10.29 1 fwk0_dev_0
8479 admin 0 -20 2295088 1.058g 126768 S 0.0 14.1 0:00.00 3 fwk0_kissd
8593 admin 0 -20 2295088 1.058g 126768 S 0.3 14.1 4:22.95 3 fwk0_0
8594 admin 0 -20 2295088 1.058g 126768 S 0.0 14.1 4:49.87 1 fwk0_1
8595 admin 0 -20 2295088 1.058g 126768 S 0.7 14.1 4:11.93 2 fwk0_2
8617 admin 0 -20 2295088 1.058g 126768 S 0.0 14.1 0:01.29 3 fwk0_service
8618 admin 0 -20 2295088 1.058g 126768 S 0.0 14.1 1:06.37 1 fwk0_dev_1
8620 admin 0 -20 2295088 1.058g 126768 S 0.0 14.1 1:08.77 2 fwk0_dev_2
8621 admin 0 -20 2295088 1.058g 126768 S 0.0 14.1 0:29.05 2 fwk0_HeavyIoctl
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Probably the biggest change between R80.10 and R80.30 is templating. It used to be handled in SecureXL. Now all templates are moved to FWK.
If FWK is matching a packet to a template (and accelerated connection), it re-injects it back to SXL. As shown in sk153832:
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Not really. However, in R80.40, there is something called "dynamic split" (sk164155).
With this feature, system is automatically balancing amount of SNDs and FWKs to keep reasonable CPU utilization.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Definitely, which is why I documented these changes in my book. There is a shift in responsibilities to Firewall Workers in R80.20+ which increases their CPU load. Running in USFW mode instead of kernel mode for the Firewall Workers also causes additional overhead reaching the Firewall Workers, which incurs additional CPU load. But now with the 40 core limit lifted by USFW you can have lots and lots of Firewall Worker cores to handle these new responsibilities.
This shift may well require reducing the number of SND cores after an upgrade to R80.20+, but this is not a hard and fast rule and highly depends on how much traffic is fully accelerated by SecureXL (Packets/sec in fwaccel stats -s output). With the R80.20 changes, in some cases much more traffic can be fully accelerated by SecureXL than before, thus increasing the load on the SND/IRQ cores...
CET (Europe) Timezone Course Scheduled for July 1-2
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Now, I am even more worried about my 4-core 3600 that has UMFW enabled by default 😄
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Why do you have it in the first place? There is no point to enable UMFW with less than 40+ cores
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
That's how it came from CheckPoint. Do you mind ask R&D to confirm it is not enabled by mistake ? It does not make sense to me either...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It should not be enabled, unless you are running R80.40. Do you?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It came from the factory with R80.30 and UMFW enabled by default. I am sure about that. Now it is indeed running R80.40. Shall I leave it like that ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
yes you can
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
All right, good to know. Thank you!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry, to hijack this thread with something else. I will open another one concerning UMFW on lower CPU appliances in a hope to get it clarified.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It’s also default in any appliance with more than 40 cores.
The overall performance impact of this (especially in R80.40) should be neutral to positive.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@PhoneBoy wrote:
The overall performance impact of this (especially in R80.40) should be neutral to positive.
After upgrading to R80.40 from R80.30 on my 4400 appliance, I can tell you that my CPU process is WAY higher. I can't say if that's because of UMFW or not, but I can say UMFW was turned on as part of the upgrade on a 4400 appliance with a whopping 2 cores on it. For all I know, it could be that R80.40's HTTPS inspection is handling more HTTPS traffic than ever and the increase is due to that. I don't have the full expertise or time to dig in and find out exactly why the CPU is so much higher than R80.30, I just know I'm getting more CPU alerts than ever since upgrading. Thankfully this year we are do for hardware upgrades, so I'm just limping along until that can occur.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Could be something else.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quick update on this. I was working with CP support on an unrelated issue and they noticed that usermode FW was turned on for our 4000 series gateway and they turned it off. My CPU usage was cut in half when they did this. Clearly this should not be turned on for a box this small. Just a reminder, I did not turn on usermode fw, it was turned on automatically by the R80.40 install.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
That's not supposed to happen, see my response here:
However whether USFW is enabled by default has been in a bit of flux over time, can you recall when you fresh-loaded R80.40 on your 4000?
CET (Europe) Timezone Course Scheduled for July 1-2
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hehe, I know it's not supposed to happen, that was the entire reason I replied back to let you know that it is being enabled by default on an R80.40 install on a certified 4000 series appliance that does not meet the minimum requirements to have it enabled.
I wrote the following about a month ago:
Just a quick FYI. I upgraded a 4400 series cluster to R80.40 and UMFW was enabled after installing via a BLINK upgrade. My firewall is now averaging about twice as high of CPU load as compared to when it was running R80.30. I have been following this and other UMFW threads closely because it sure sounds like we should be disabling UMFW on any firewall with less than 40 cores, but I'm hesitant to do that considering that Checkpoint obviously started enabling UMFW by default in R80.40 regardless of the number of cores you have (or they have a bug in their code and it's incorrectly enabling it.)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There will be a post from @shais that shall provide us with more info and answers as to why, when and where is USFW enabled, what is the performance impact, etc. So, stay tuned.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
SecureXL underwent dramatic changes in R80.20, and these changes apply regardless of whether the Firewall Workers are in kernel mode or USFW. More overall responsibilities were shifted to the Firewall Workers, and this is a bit more discernible when in USFW mode as you can see the CPU being used by the individual fwk* processes/threads mentioned by Heiko, instead of all the CPU time just being lumped into sy/si in kernel mode.
All packets still come through SecureXL/sim/SND first after being emptied from interface ring buffers by SoftIRQ in R80.20+, but unless the packet matches an existing connection in SecureXL's state table, the packet is sent to a Firewall Worker instance which decides whether the connection matches an Accept template, which path the connection should be processed in, etc. This shift in responsibilities is so important to tuning that I created these tables in the third edition of my book documenting the shift in tasks between SND/IRQ cores and Firewall Workers that occurred in R80.20, as well as how the processing paths changed. Hopefully these tables will help...
Comparison of Processing Paths: R80.10 vs. R80.20+
SND/IRQ Core Tasks: R80.10 vs. R80.20+
Firewall Worker Instance Tasks: R80.10 vs. R80.20+
CET (Europe) Timezone Course Scheduled for July 1-2
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Timothy,
thanks for the detailed information.
Especially table 6 was exactly what I was looking for.
This confirms my feeling that we´ll need to re-revaluate our SND/fw_worker distribution setting.
Thanks again
Thomas
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
However, if you manually adjust the settings, you lose the ability to leverage that.
