- CheckMates
- :
- Products
- :
- Quantum
- :
- Security Gateways
- :
- Dynamic dispatcher issue with R80.30 - Part 2
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Are you a member of CheckMates?
×- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dynamic dispatcher issue with R80.30 - Part 2
Hi again,
issue now reappeared (https://community.checkpoint.com/t5/Next-Generation-Firewall/Dynamic-dispatcher-issue-with-R80-30/m-...) on the other cluster member.
I still wonder how this uneven connection distribution can happen with dynamic dispatcher:
[Expert@FW:0]# fw ctl multik stat
ID | Active | CPU | Connections | Peak
----------------------------------------------
0 | Yes | 15 | 64943 | 74447
1 | Yes | 7 | 53890 | 63290
2 | Yes | 14 | 33710 | 52682
3 | Yes | 6 | 15808 | 41551
4 | Yes | 13 | 5164 | 32555
5 | Yes | 5 | 1542 | 26356
6 | Yes | 12 | 792 | 26865
7 | Yes | 4 | 875 | 26358
8 | Yes | 11 | 800 | 26930
9 | Yes | 3 | 940 | 26791
10 | Yes | 10 | 743 | 27393
So @HeikoAnkenbrand already explained:
The rank for each CoreXL FW instance is calculated according to its CPU utilization (only for first packet).
The higher the CPU utilization, the higher the CoreXL FW instance's rank is, hence this CoreXL FW instance is less likely to be selected by the CoreXL SND.
The CoreXL Dynamic Dispatcher allows for better load distribution and helps mitigate connectivity issues during traffic "peaks", as connections opened at a high rate that would have been assigned to the same CoreXL FW instance by a static decision, will now be distributed to several CoreXL FW instances.
There are the following points which influence an asymmetrical distribution:
- Elephant flows with high CPU utilization per CPU core
- Other FW processes that increase the CPU usage of a core.
In your example these processes:
mpdaemon fwd pdpd lpd pepd dtpsd in.acapd dtlsd in.asessiond rtmd vpnd cprid cpd
I understand that an elephant flow causes high cpu utilization but it should not cause high connection number on one CPU and lesser on other CPUs. If during an elephant flow another/new connection is established dynamic dispatcher should calculate and distribute to a lesser loaded CPU. Also other FW processes might increase load but not unevenly distribute connection (at least to my understanding).
Can anyone help ?
Thanks and regards
Thomas
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
So it looks like the issue was solved by adding two fwkern parameters.
In $FWDIR/boot/modules/fwkern.conf:
fwmultik_enable_round_robin=1
fwmultik_enable_increment_first=1
This was done by CP support.
Regards Thomas
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The Dynamic Dispatcher does not directly care about the number of connections currently assigned to a firewall worker instance when it makes its dispatching decision for a new connection, all it is looking at is the current CPU loads on the firewall worker instance cores. If all connections were exactly identical as far as CPU utilization (which is not always directly proportional to bandwidth utilization, it depends on which processing path the connection is using on the firewall worker instance) then yes the total number of connections per worker instance would be perfectly distributed. However each connection is different as to how much bandwidth it is attempting to utilize and consumption of CPU resources at any given moment, and of course elephant flows can cause mayhem.
If you run top and hit 1, what is the CPU load of the worker instance cores? Barring any current elephant flows (use command fw ctl multik print_heavy_conn to see if there are any), I'd consider a CPU utilization variance of up to 25% across the assigned firewall worker instance cores completely normal.
March 27th with sessions for both the EMEA and Americas time zones
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Timothy,
this is the TOP output
top - 11:12:41 up 27 days, 15:40, 4 users, load average: 6.56, 6.52, 6.36
Threads: 458 total, 8 running, 450 sleeping, 0 stopped, 0 zombie
%Cpu0 : 0.0 us, 2.0 sy, 0.0 ni, 33.3 id, 0.0 wa, 0.0 hi, 64.7 si, 0.0 st
%Cpu1 : 0.0 us, 4.4 sy, 0.0 ni, 67.6 id, 0.0 wa, 0.0 hi, 28.0 si, 0.0 st
%Cpu2 : 0.0 us, 3.4 sy, 0.0 ni, 66.9 id, 0.0 wa, 0.0 hi, 29.8 si, 0.0 st
%Cpu3 : 1.3 us, 2.7 sy, 0.0 ni, 96.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu4 : 4.0 us, 2.7 sy, 0.0 ni, 93.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu5 : 8.9 us, 4.1 sy, 0.0 ni, 87.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu6 : 61.9 us, 23.0 sy, 0.0 ni, 15.1 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu7 : 70.6 us, 26.4 sy, 0.0 ni, 2.7 id, 0.0 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu8 : 0.0 us, 4.0 sy, 0.0 ni, 62.5 id, 0.0 wa, 0.0 hi, 33.5 si, 0.0 st
%Cpu9 : 0.0 us, 6.2 sy, 0.0 ni, 54.7 id, 0.0 wa, 0.0 hi, 39.1 si, 0.0 st
%Cpu10 : 6.7 us, 5.4 sy, 0.0 ni, 87.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu11 : 4.4 us, 4.7 sy, 0.0 ni, 90.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu12 : 2.7 us, 5.3 sy, 0.0 ni, 92.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu13 : 33.0 us, 15.4 sy, 0.0 ni, 51.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu14 : 69.5 us, 22.7 sy, 0.0 ni, 7.5 id, 0.0 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu15 : 71.7 us, 26.6 sy, 0.0 ni, 1.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 26411572+total, 24251463+free, 15387788 used, 6213300 buff/cache
KiB Swap: 67103500 total, 67103500 free, 0 used. 24679282+avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
22523 admin 0 -20 11.158g 9.118g 484696 R 97.0 3.6 4952:37 fwk0_0
22524 admin 0 -20 11.158g 9.118g 484696 R 92.7 3.6 3820:58 fwk0_1
22525 admin 0 -20 11.158g 9.118g 484696 S 89.7 3.6 3246:35 fwk0_2
22526 admin 0 -20 11.158g 9.118g 484696 R 80.7 3.6 2865:29 fwk0_3
22527 admin 0 -20 11.158g 9.118g 484696 S 46.2 3.6 2694:37 fwk0_4
CPUs with fw workers are 3-7 and 10-15 where you can see in the above top that CPU usage various greatly (far more than 25%).
[Expert@FW01:0]# fw ctl affinity -l -r
CPU 0:
CPU 1:
CPU 2:
CPU 3: fw_9
mpdaemon fwd dtlsd vpnd in.asessiond lpd in.acapd rtmd pdpd dtpsd pepd cprid cpd
CPU 4: fw_7
mpdaemon fwd dtlsd vpnd in.asessiond lpd in.acapd rtmd pdpd dtpsd pepd cprid cpd
CPU 5: fw_5
mpdaemon fwd dtlsd vpnd in.asessiond lpd in.acapd rtmd pdpd dtpsd pepd cprid cpd
CPU 6: fw_3
mpdaemon fwd dtlsd vpnd in.asessiond lpd in.acapd rtmd pdpd dtpsd pepd cprid cpd
CPU 7: fw_1
mpdaemon fwd dtlsd vpnd in.asessiond lpd in.acapd rtmd pdpd dtpsd pepd cprid cpd
CPU 8:
CPU 9:
CPU 10: fw_10
mpdaemon fwd dtlsd vpnd in.asessiond lpd in.acapd rtmd pdpd dtpsd pepd cprid cpd
CPU 11: fw_8
mpdaemon fwd dtlsd vpnd in.asessiond lpd in.acapd rtmd pdpd dtpsd pepd cprid cpd
CPU 12: fw_6
mpdaemon fwd dtlsd vpnd in.asessiond lpd in.acapd rtmd pdpd dtpsd pepd cprid cpd
CPU 13: fw_4
mpdaemon fwd dtlsd vpnd in.asessiond lpd in.acapd rtmd pdpd dtpsd pepd cprid cpd
CPU 14: fw_2
mpdaemon fwd dtlsd vpnd in.asessiond lpd in.acapd rtmd pdpd dtpsd pepd cprid cpd
CPU 15: fw_0
mpdaemon fwd dtlsd vpnd in.asessiond lpd in.acapd rtmd pdpd dtpsd pepd cprid cpd
I think "fw ctl multik print_heavy_conn" will not work with R80.30 in UMFW, right ?
If I run the command the output is zero ...
Regards Thomas
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
So it looks like the issue was solved by adding two fwkern parameters.
In $FWDIR/boot/modules/fwkern.conf:
fwmultik_enable_round_robin=1
fwmultik_enable_increment_first=1
This was done by CP support.
Regards Thomas
