Solved: Re: Core XL SND

GigaYang · ‎2024-12-28

Dear All,

I have a few questions about CoreXL SND:
1. Assume that my firewall has two SNDs. When traffic reaches the firewall, how is it decided which SND should be processed? Is it also based on the load size of the two SNDs?
2. Does the working position of SND correspond to the 'little i'?
3. Is the load of SND Core generally smaller than that of Firewall Instance? If full load occurs, what are the possible reasons?
4. After R80, SND allocates work based on the load size of the Firewall Instance, so has the Global Dispatcher table been completely abandoned?

Thank you.

AkosBakos · ‎2024-12-29

Hi @GigaYang

First: what is the version of you GW? R81.20 take ?
Is Dynamic Balancing enabled?
https://community.checkpoint.com/t5/General-Topics/R80-40-Dynamic-split-of-CoreXL/m-p/66872#M13714

The task of the SND with my (poor) words: it handles the traffic between the SND and the FW workers. (for easier understanding)

Here is a thread about packet flow:

https://community.checkpoint.com/t5/General-Topics/R81-x-Security-Gateway-Architecture-Logical-Packe...

SND SND

Q2:

the "small" "i" means the outside of the incoming interface. The "big" "I" means the inside ... and so on

Q3:

There are a lot of possible scenarios. Because there are a lot of blades enabled, there can be a lot of traffic that can't accelerated.
Here are the sk: https://support.checkpoint.com/results/sk/sk32578

The there are rulebase issues, where the templating stopped.

You can check with fwaccel stat command:

2024-12-29 13_59_31-sk32578 - SecureXL Mechanism.png

Check this things first, before you move further.

Q4:

What do you mean here? Yes, if the dynamic balancing is enabled, the GW will do everything for the best performance.

Akos

----------------
\m/_(>_<)_\m/

View solution in original post

Timothy_Hall · ‎2024-12-29

I'll take a crack here, I think you are conflating Multi-Queue with the Dynamic Dispatcher:

1. Assume that my firewall has two SNDs. When traffic reaches the firewall, how is it decided which SND should be processed? Is it also based on the load size of the two SNDs?

Assuming that all interfaces with Multi-Queue (MQ) enabled are set for Auto or Dynamic mode (mq_mng --show to check), the NIC driver (which is what actually implements MQ, it is not technically Check Point code) runs a hash calculation of the L3 src/dst IP addresses and L4 src/dst ports, and assigns that flow's packets to an SND core for handling. The assignment at this level is not load-based to my knowledge, as opposed to something like the Dynamic Dispatcher which assigns connections/flows to a Firewall Worker Instance based on load. I believe the flow's reply packets will also be hashed to that same SND core. You can observe how well the traffic balancing is working in cpview via the Advanced...SecureXL..Network Per CPU screen.

If the traffic balance is way out of whack between the SNDs this is usually because someone has messed with the MQ configuration (see the first sentence of the last paragraph) which you should NEVER DO in R80.40 and later, and this will be especially disastrous if Dynamic Split is enabled. All SNDs are considered to be equal as far as processing power. If the traffic is well-balanced between the SNDs but CPU utilization is way out of whack between the SNDs, generally this is a Check Point code issue with SecureXL. Most of the time TAC will be required to assist here, but I will discuss some undocumented techniques for doing this yourself in my CPX 2025 Vegas speech on the CheckMates track.

2. Does the working position of SND correspond to the 'little i'?

Traditionally (prior to R80.20) "i" would only indicate the entrance to the slowpath, and packets in the medium/fast path would not traverse it at all, other than in R80.20+ where the first packet of every new connection/session which always goes slowpath, and then the rest of the connection is hopefully offloaded to the Medium path or fastpath.

However new capture tools like fw monitor -F available in R80.20+ now show all packets coming into the SND as "i" so I'm not sure what to think now. In the modern releases (R80.20+), I suppose "i" could be interpreted as when a packet is handed off from the Gaia Linux OS code (NIC driver & ring buffer) to the Check Point code (SecureXL dispatcher or worker instance code). If someone from R&D could further clarify this that would be helpful.

3. Is the load of SND Core generally smaller than that of Firewall Instance? If full load occurs, what are the possible reasons?

This is highly dependent on the distribution of traffic between the fastpath, medium path, and slowpath (fwaccel stats -s). On a firewall with no deep inspection blades enabled (APCL, TP etc) a high percentage of traffic will be completely processed on the SND cores only (other than the first packet of a new connection/session which always goes slowpath). However the bulk of traffic on modern firewalls is examined by deep inspection in the Medium Path and sometimes slowpath on a Firewall Worker Instance. So the inspection operations are much more intensive on a worker instance when compared to a SND, which is why there generally tends to be more worker instances than SND instances on most firewalls unless percentage of fastpath traffic is extremely high.

When you say "full load" I assume you mean either just the SNDs are saturated or only the Worker Instances are saturated. Dynamic Split can help with this if there is enough spare CPU capacity overall. The most common cause of high load on worker instances is excessive slowpath/F2F traffic. The most common cause of high load on SNDs is a very high amount of fastpath traffic, or possibly an MQ or Check Point SecureXL code issue.

4. After R80, SND allocates work based on the load size of the Firewall Instance, so has the Global Dispatcher table been completely abandoned?

As far as the SND who runs the Dynamic Dispatcher is concerned, all Worker Instances are equal in overall capability unless the server architecture has Intel's P-cores and E-cores present which is a whole other can of worms. But for the most part when the first packet of a new connection/session arrives at a SND, it assigns the connection and all its subsequent packets to the least-loaded Worker Instance. This assignment is tracked in the SNDs by what I believe you are calling the "Global Dispatcher Table" which can be viewed with fw ctl multik gconn; this is necessary as all the packets of a single connection must always be handled by the same Worker Instance, even with Hyperflow.

Attend my Gateway Performance Optimization R81.20 course
CET (Europe) Timezone Course Scheduled for July 1-2

View solution in original post

AkosBakos · ‎2025-01-05

If not the fw_worker is loaded, some load must be arise somewhere. Thats the SND 🙂

----------------
\m/_(>_<)_\m/

View solution in original post

the_rock · ‎2025-01-05

You said it, thats so true.

View solution in original post

Chris_Atkinson · ‎2024-12-28

How many total cores does the system have 8 or more and what blades are enabled?

To show the current mapping use: fw ctl affinity -l -r

Performance Tuning Guide

https://sc1.checkpoint.com/documents/R81.20/WebAdminGuides/EN/CP_R81.20_PerformanceTuning_AdminGuide...

ATRG: CoreXL

https://support.checkpoint.com/results/sk/sk98737

Dynamic Balancing

https://support.checkpoint.com/results/sk/sk164155

CCSM R77/R80/ELITE

GigaYang · ‎2024-12-28

Hi Chris,

My firewall has 8 CPU cores.

Chris_Atkinson · ‎2024-12-29

In this case you will also want to review the multi-queue configuration, use: mq_mng --show

https://sc1.checkpoint.com/documents/R81.20/WebAdminGuides/EN/CP_R81.20_PerformanceTuning_AdminGuide...

Is the gateway configured for large MTU, which version / JHF is used?

CCSM R77/R80/ELITE

the_rock · ‎2024-12-30

Hey Chris,

Extremelly helpful links! Just curious if you know and I welcome, as always, any other opinions @GigaYang @Timothy_Hall @AkosBakos

Do you guys have any idea what OTHER represents in below screenshot? Its my eve-ng lab and I gave it 10 CPU cores when it was created few months ago, though never checked this until I saw this post. Its R81.20 jumbo 92 gateway (this one is not a cluster.

Tx!

Andy

Timothy_Hall · ‎2024-12-30

Are you using a trial/eval license? Any chance your security gateway container is only allowing 8 cores?

Also is it being presented to the VM as 10 full cores or 5x2 with SMT?

Attend my Gateway Performance Optimization R81.20 course
CET (Europe) Timezone Course Scheduled for July 1-2

the_rock · ‎2024-12-30

Hey Tim,

Thats right, its eval license, correct. And as far as your 2nd question, its 10 cores all together.

Btw, @Timothy_Hall , sorry if this may sound like a dumb question, but would below indicate each interface's multi-q is configured for 8 cpu cores?

[Expert@CP-GW:0]# mq_mng -o
Total 10 cores. Available for MQ 2 cores
i/f driver driver mode state mode (queues) cores
actual/avail
------------------------------------------------------------------------------------------------
eth0 vmxnet3 Kernel Up Auto (8/8) 0,1,0,1,0,1,0,1
eth1 vmxnet3 Kernel Up Auto (8/8) 0,1,0,1,0,1,0,1
eth2 vmxnet3 Kernel Up Auto (8/8) 0,1,0,1,0,1,0,1

[Expert@CP-GW:0]#

Andy

Timothy_Hall · ‎2024-12-30

Looks like the Check Point code and MQ are only recognizing/using 8 cores (max for vmxnet3 is 8 cores/queues anyway). 10 cores may not actually be supported, can you increase the number of cores to 12 or 16? Pretty sure you will see all cores being utilized with those core counts.

Attend my Gateway Performance Optimization R81.20 course
CET (Europe) Timezone Course Scheduled for July 1-2

the_rock · ‎2024-12-30

Will do it shortly and let you know, great suggestion.

Andy

the_rock · ‎2024-12-30

Just tested it and I believe it would make sense its license limitation.

Andy

[Expert@CP-GW:0]# mq_mng -o
Total 16 cores. Available for MQ 2 cores
i/f driver driver mode state mode (queues) cores
actual/avail
------------------------------------------------------------------------------------------------
eth0 vmxnet3 Kernel Up Auto (8/8) 0,1,0,1,0,1,0,1
eth1 vmxnet3 Kernel Up Auto (8/8) 0,1,0,1,0,1,0,1
eth2 vmxnet3 Kernel Up Auto (8/8) 0,1,0,1,0,1,0,1

[Expert@CP-GW:0]#

I assigned 16 cores.

AkosBakos · ‎2024-12-29

Hi @GigaYang

First: what is the version of you GW? R81.20 take ?
Is Dynamic Balancing enabled?
https://community.checkpoint.com/t5/General-Topics/R80-40-Dynamic-split-of-CoreXL/m-p/66872#M13714

The task of the SND with my (poor) words: it handles the traffic between the SND and the FW workers. (for easier understanding)

Here is a thread about packet flow:

https://community.checkpoint.com/t5/General-Topics/R81-x-Security-Gateway-Architecture-Logical-Packe...

SND SND

Q2:

the "small" "i" means the outside of the incoming interface. The "big" "I" means the inside ... and so on

Q3:

There are a lot of possible scenarios. Because there are a lot of blades enabled, there can be a lot of traffic that can't accelerated.
Here are the sk: https://support.checkpoint.com/results/sk/sk32578

The there are rulebase issues, where the templating stopped.

You can check with fwaccel stat command:

2024-12-29 13_59_31-sk32578 - SecureXL Mechanism.png

Check this things first, before you move further.

Q4:

What do you mean here? Yes, if the dynamic balancing is enabled, the GW will do everything for the best performance.

Akos

----------------
\m/_(>_<)_\m/

GigaYang · ‎2024-12-29

Hi AkosBakos,

My firewall is using R82 now, and upgrade from R81.20.

Timothy_Hall · ‎2024-12-29

If you are using R82 (or the latest JHFAs for R81.20) AND you are on a Quantum Lightspeed or Quantum Force 9000/19000/29000 appliance, something called UPPAK will be enabled instead of the traditional KPPAK (fwaccel stat to check). Basically most of SecureXL executes in user space (usim) instead as a driver in kernel space (sim) if UPPAK is active.

If UPPAK is enabled the SND cores will always register at 100% CPU utilization regardless of traffic load, at least as reported by the Gaia/Linux tools top/vmstat/etc. This is EXPECTED BEHAVIOR due to the migration from interrupt-based to poll-mode processing in UPPAK mode which leverages something called DPDK. Check Point-based status tools such as the CPU screen of cpview will show you the "real" CPU load on the SND cores based on how much traffic they are actually handling.

This new SND CPU behavior in UPPAK mode has and will continue to cause confusion going forward, check out the DPDK wikipedia page for more info. I initially assumed this poll mode behavior was a quite undesirable "busy wait" approach (which traditionally fell into the realm of sloppy programming and was to be desperately avoided at all costs), but it allows modern systems to scale to much greater capacities while keeping the infamous killer of network performance known as jitter to a minimum.

Attend my Gateway Performance Optimization R81.20 course
CET (Europe) Timezone Course Scheduled for July 1-2

GigaYang · ‎2024-12-29

Hi Timothy_Hall,

Thanks for a lot. We will buy two 9300 gateway at 2025.

_Val_ · ‎2025-01-06

Just a couple of nagging "admin" notes.

1. Fixed the title, CoreXL is without a space in the middle.

2. The diagram you are referring too is not 100% accurate, according to R&D. Not a big deal, though, the principles it is showing are still standing.

AkosBakos · ‎2025-01-06

"The devil is in the detail"

Point 2: Is there an more accurate diagram somewhere?

Á

----------------
\m/_(>_<)_\m/

the_rock · ‎2024-12-29

You got excellent responses so far. I always refer to ARTG link @Chris_Atkinson gave. Will say though, in R81.20, I never had a need to modify those settings manually.

Andy

Timothy_Hall · ‎2024-12-29