Automatic sim affinity deprecated in R80.40

Daniel_Szydelko · ‎2021-03-26

Hello,

Recently we did upgrade cluster of open servers (HP) from Gaia R77.30 to R80.40 (with cpuse clean install).

Before upgrade open servers used current settings:
- on-board network interfaces
- license for 4 cpu cores
- CoreXL settings: 2/2 (2x SND / 2x FW Instance)
- blade: FW only

Before upgrade affinity was used:

[Expert@sg1:0]# fw ctl affinity -l -r -v -a
CPU 0: eth0 (irq 91) eth3 (irq 107)
CPU 1: eth1 (irq 123) eth2 (irq 131)
CPU 2: fw_1
CPU 3: fw_0
CPU 4:
CPU 5:
CPU 6:
CPU 7:
CPU 8:
CPU 9:
CPU 10:
CPU 11:
CPU 12:
CPU 13:
CPU 14:
CPU 15:
All: rtmd mpdaemon fwd cpd cprid

After upgrade to R80.40 we noticed that only one SND cpu core is utilized:

[Expert@sg1:0]# fw ctl affinity -l -r -v -a
CPU 0: eth0 (irq 96) eth1 (irq 111) eth2 (irq 101) eth3 (irq 106)
CPU 1:
CPU 2: fw_1
in.asessiond rtmd mpdaemon fwd lpd cprid cpd
CPU 3: fw_0
in.asessiond rtmd mpdaemon fwd lpd cprid cpd
CPU 4:
CPU 5:
CPU 6:
CPU 7:
CPU 8:
CPU 9:
CPU 10:
CPU 11:
CPU 12:
CPU 13:
CPU 14:
CPU 15:
All:
The current license permits the use of CPUs 0, 1, 2, 3 only.

Cpview showed second SND cpu core state as OTHER.

We noticed also repeting cpu cpikes related to used SND cpu core in /var/log/messages:

Feb 19 09:40:48 2021 sg1 kernel: [fw4_1];CLUS-120200-1: Starting CUL mode because CPU-00 usage (81%) on the local member increased above the configured threshold (80%).
Feb 19 09:40:58 2021 sg1 kernel: [fw4_1];CLUS-120202-1: Stopping CUL mode after 10 sec (short CUL timeout), because no member reported CPU usage above the configured threshold (80%) during the last
10 sec.
Feb 19 09:45:00 2021 sg1 kernel: [fw4_1];CLUS-120200-1: Starting CUL mode because CPU-00 usage (86%) on the local member increased above the configured threshold (80%).
Feb 19 09:45:10 2021 sg1 kernel: [fw4_1];CLUS-120202-1: Stopping CUL mode after 10 sec (short CUL timeout), because no member reported CPU usage above the configured threshold (80%) during the last
10 sec.
Feb 19 09:46:56 2021 sg1 kernel: [fw4_1];CLUS-120200-1: Starting CUL mode because CPU-00 usage (82%) on the local member increased above the configured threshold (80%).
Feb 19 09:47:06 2021 sg1 kernel: [fw4_1];CLUS-120202-1: Stopping CUL mode after 10 sec (short CUL timeout), because no member reported CPU usage above the configured threshold (80%) during the last
10 sec.

We tried to force automatic affinitity for interfaces by:
- reboot both SG
- perform: sim affinity -a command
- push more traffic from different src/dst
but with no luck to push both SND start to work.

As these open servers using on-board network cards then Multi Queue is not supported.

As we didn't want to set affinities manually we opened TAC tikcet to investigate why automatic sim affinity isn't working.

Finally we received information from TAC:

I have checked internally and found that the feature for 'sim' to have an automatic affinity, where the interface affinity would be checked and adjusted accordingly every minute as per the load is
not available on R80.40.
sk98737 is also under process to be updated accordingly.
From the activity, the sim affinity balancing on the interface will not be achieved as it's deprecated, We will have to set the affinity of interfaces manually, please try using:
fw ctl affinity -s -i <Interface Name> <CPU ID>

Can anyone confirm above that automatic sim affinity is deprecated starting from R80.40?

Best Regards
Daniel Szydelko.

_Val_ · ‎2021-03-26

Please confirm you are using 3.10 kernel after upgrade.

Daniel_Szydelko · ‎2021-03-26

[Expert@sg1:0]# clish -c "show version all"
Product version Check Point Gaia R80.40
OS build 294
OS kernel version 3.10.0-957.21.3cpx86_64
OS edition 64-bit

[Expert@sg1:0]# U=`cpprod_util FwIsUsermode`; [ $U == 0 ] && { U="Kernel Mode"; } || { U="User Mode"; }; echo; echo "Firewall: $U";echo;

Firewall: Kernel Mode

_Val_ · ‎2021-03-26

Correct, R80.40 is 3.10 kernel based. The platform is using new affinity mechanism provided by the 3.10 OS kernel, which makes automatic sim affinity obsolete.

Since you are using open server and just 4 cores, with only 2 cores as FWKs, I would suggest manual affinity settings, as support advises you to do.

Daniel_Szydelko · ‎2021-03-26

Ok. Is there any documentation describing such change?
I found only sk170012 saying only: "sim affinity" has been deprecated in R80.40.

ATRG: CoreXL is not updated currently and stated:
===
Default affinity settings for interfaces:

- If SecureXL is enabled - the default affinities of all interfaces are 'Automatic' - the affinity for each interface is automatically reset every 60 seconds, and balanced between available CPU cores based on the current load.
===

By new affinity mechanism do you mean Dynamic Split?

_Val_ · ‎2021-03-26

"sim affinity" is a command, and it is changed to "fw affinity" because of the kernel change.
As your TAC engineer already communicated, ATRG is being changed, please give a chance SecureKnowledge team to do that. There is a process which takes some days. Frankly, this is an oversight on our end, apologies for that.
"Dynamic Split" is now called "Dynamic Workflows" (DW). DW leverages new affinity capability of RH 3.10 kernel, but it is not the point. You cannot use DW for two reasons: open server, not enough cores. You need at least 8 licensed cores for that. Open server support is still on the road map.

By "new capabilities" I mean what DW is actually leveraging: OS level support for queuing all CPUs to all HW and SW interrupts, with ability to enable and disable some of those queues on the fly.

Daniel_Szydelko · ‎2021-03-26

1. Ok. I was aware changing sim affinity to fw affinity but not aware that maunual affinity is needed starting with R80.40 3.10 as automatic affinity stopped working / is deprecated.
2. Understood
3. Correct, based on sk164155 - Dynamic Balancing is supported only on Check Point Appliances

Anyway, thanks Val.

Daniel_Szydelko · ‎2021-03-26

Performance Tuning R80.40 Administration Guide is also not updated and saying:

Note - When the SecureXL is enabled (this is the default), only the SecureXL SIM Affinity configuration defines the interfaces affinities (see sim affinity). Security Gateway ignores the interface affinity settings in the $FWDIR/conf/fwaffinity.conf file.

Timothy_Hall · ‎2021-03-26

Automatic interface affinity actually went away in version R80.20 when SecureXL was substantially reworked, as mentioned in the third edition of my book. Allocating enough SND/IRQ cores in your CoreXL split and manually enabling Multi-Queue on the interfaces that need it (kernel 2.6.18) is what replaced this functionality. On firewalls utilizing the 3.10 kernel, Multi-Queue is enabled on all interfaces that support it by default (except for the management interface in some early Jumbo HFAs). Multi-Queue attempts to keep the SND/IRQ cores balanced as automatic interface affinity once did and frankly does a much better job of it, but MQ's balancing mechanism is not perfect under all traffic conditions.

If your NICs do not support Multi-Queue (please post output of expert mode command mq_mng -vv --show to confirm) then TAC is correct and you are stuck doing manual interface affinity to spread out the interface processing duties across your SND/IRQ cores. Having your cores limited to 4 by license will certainly not help this situation, and you'll need to do some traffic analysis with cpview to try to keep your busiest interfaces off of the same SND/IRQ core.

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com

Daniel_Szydelko · ‎2021-03-26

Timothy_Hall, it's clear and suitable in case where device is supporting MQ.

These open servers using tg3 drivers for on-board network interface card so it's not supported with MQ:

[Expert@sg1:0]# ethtool -i eth0
driver: tg3
version: 3.137
firmware-version: 5719-v1.46 NCSI v1.4.16.0
expansion-rom-version:
bus-info: 0000:02:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no

and any MQ commands produce empty results.

In this case we will follow manual re-configuration of interface affinity.

I hope CheckPoint is going to update ATRG: CoreXL and Performance Tunning Guide.

Timothy_Hall · ‎2021-03-26

SecureXL changed very little between versions R70 and R80.10 which was a relatively long period of time, so there is still quite a bit of documentation/SKs/courseware that reflects how SecureXL operated pre-R80.20. I flag it whenever I run into one of these, but clearly some still need to be updated.

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com

PhoneBoy · ‎2021-03-28

Automatic sim affinity was removed as of R80.20 and that is definitely documented here:
https://supportcenter.checkpoint.com/supportcenter/portal?eventSubmit_doGoviewsolutiondetails=&solut...
At least on 8+ core Check Point appliances, it's possible to leverage Dynamic Split (from R80.40), which changes the SND/FWK mix on the fly.
(Open Servers don't support this as of yet)

Daniel_Szydelko · ‎2021-03-28

Thx. It's really easy to miss such information in case main documents (sk98737 - ATRG: CoreXL or Performance Tuning Administration Guides for R80.30, R80.40 and even R81) are still describing pre-R80.20 behavior in this particular case.

Daniel_Szydelko · ‎2021-04-13

Just to close this thread:

To perform manual interface affnity we should use:
fw ctl affinity -s -i <Interface Name> <CPU ID>
- it will survive reboot
- it makes automatically changes in $FWDIR/conf/fwaffinity.conf like:

[Expert@sg1:0]# cat $FWDIR/conf/fwaffinity.conf
# Process / Interface Affinity Settings
# -------------------------------------
#
# Each line shoud contain:
# 1. A type - 1 character. "i" for interface, "n" for process name, "k" for kernel instance.
# 2. An ID - interface name, process name, or kernel instance number.
# a. For interfaces, you can also write "default", and the setting would apply to any interface not
# mentioned in the file.
# 3. The desired affinity. Either:
# a. One or more CPU numbers.
# b. "all" - all CPUs are eligible.
# c. "ignore" - do nothing for this entry.
# d. "auto" - use any free CPU. A free CPU is one that doesn't appear in any line in this file,
# and doesn't run a worker thread.
#
i default auto
i eth0 0
i eth1 0
i eth2 1
i eth3 1

So forget about official Performance Tuning R80.40 Administration Guide saying:

Running the 'fw ctl affinity -s' command in Gateway Mode

Description

The fw ctl affinity -s command configures the CoreXL affinity settings on a Security Gateway for:

- Interfaces
- User-space processes
- CoreXL Firewall instances

Notes:

- Changes you make with this command do not survive the Security Gateway reboot.
If you want the settings to survive reboot, do one of these:
. Manually edit the $FWDIR/conf/fwaffinity.conf configuration file.
. Run the sim affinity -s command (configures the affinity for interfaces only).
- The fw ctl affinity -s command cannot configure affinity for interfaces, if you already configured affinity for interfaces with the SecureXL sim affinity command (in Automatic or Static mode).

Are you a member of CheckMates?

Automatic sim affinity deprecated in R80.40