Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
GigaYang
Collaborator
Jump to solution

Core XL SND

Dear All,

I have a few questions about Core XL SND:
1. Assume that my firewall has two SNDs. When traffic reaches the firewall, how is it decided which SND should be processed? Is it also based on the load size of the two SNDs?
2. Does the working position of SND correspond to the 'little i'?
3. Is the load of SND Core generally smaller than that of Firewall Instance? If full load occurs, what are the possible reasons?
4. After R80, SND allocates work based on the load size of the Firewall Instance, so has the Global Dispatcher table been completely abandoned?

Thank you.

0 Kudos
2 Solutions

Accepted Solutions
AkosBakos
Leader Leader
Leader

Hi @GigaYang 

The task of the SND with my (poor) words: it handles the traffic between the SND and the FW workers. (for easier understanding)

Here is a thread about packet flow:

https://community.checkpoint.com/t5/General-Topics/R81-x-Security-Gateway-Architecture-Logical-Packe...

SNDSND

Q2:

the "small"  "i" means the outside of the incoming interface. The "big" "I" means the inside ... and so on

Q3:

There are a lot of possible scenarios. Because there are a lot of blades enabled, there can be a lot of traffic that can't  accelerated. 
Here are the sk: https://support.checkpoint.com/results/sk/sk32578

The there are rulebase issues, where the templating stopped.

You can check with fwaccel stat command:

2024-12-29 13_59_31-sk32578 - SecureXL Mechanism.png

Check this things first, before you move further.

Q4:

What do you mean here? Yes, if the dynamic balancing is enabled, the GW will do everything for the best performance.

Akos

----------------
\m/_(>_<)_\m/

View solution in original post

(1)
Timothy_Hall
Legend Legend
Legend

I'll take a crack here, I think you are conflating Multi-Queue with the Dynamic Dispatcher:

1. Assume that my firewall has two SNDs. When traffic reaches the firewall, how is it decided which SND should be processed? Is it also based on the load size of the two SNDs?

Assuming that all interfaces with Multi-Queue (MQ) enabled are set for Auto or Dynamic mode (mq_mng --show to check), the NIC driver (which is what actually implements MQ, it is not technically Check Point code) runs a hash calculation of the L3 src/dst IP addresses and L4 src/dst ports, and assigns that flow's packets to an SND core for handling.  The assignment at this level is not load-based to my knowledge, as opposed to something like the Dynamic Dispatcher which assigns connections/flows to a Firewall Worker Instance based on load.  I believe the flow's reply packets will also be hashed to that same SND core.  You can observe how well the traffic balancing is working in cpview via the Advanced...SecureXL..Network Per CPU screen. 

If the traffic balance is way out of whack between the SNDs this is usually because someone has messed with the MQ configuration (see the first sentence of the last paragraph) which you should NEVER DO in R80.40 and later, and this will be especially disastrous if Dynamic Split is enabled.  All SNDs are considered to be equal as far as processing power.  If the traffic is well-balanced between the SNDs but CPU utilization is way out of whack between the SNDs, generally this is a Check Point code issue with SecureXL.  Most of the time TAC will be required to assist here, but I will discuss some undocumented techniques for doing this yourself in my CPX 2025 Vegas speech on the CheckMates track.

2. Does the working position of SND correspond to the 'little i'?

Traditionally (prior to R80.20) "i" would only indicate the entrance to the slowpath, and packets in the medium/fast path would not traverse it at all, other than in R80.20+ where the first packet of every new connection/session which always goes slowpath, and then the rest of the connection is hopefully offloaded to the Medium path or fastpath. 

However new capture tools like fw monitor -F available in R80.20+ now show all packets coming into the SND as "i" so I'm not sure what to think now.  In the modern releases (R80.20+), I suppose "i" could be interpreted as when a packet is handed off from the Gaia Linux OS code (NIC driver & ring buffer) to the Check Point code (SecureXL dispatcher or worker instance code).  If someone from R&D could further clarify this that would be helpful.

3. Is the load of SND Core generally smaller than that of Firewall Instance? If full load occurs, what are the possible reasons?

This is highly dependent on the distribution of traffic between the fastpath, medium path, and slowpath (fwaccel stats -s).  On a firewall with no deep inspection blades enabled (APCL, TP etc) a high percentage of traffic will be completely processed on the SND cores only (other than the first packet of a new connection/session which always goes slowpath).  However the bulk of traffic on modern firewalls is examined by deep inspection in the Medium Path and sometimes slowpath on a Firewall Worker Instance.  So the inspection operations are much more intensive on a worker instance when compared to a SND, which is why there generally tends to be more worker instances than SND instances on most firewalls unless percentage of fastpath traffic is extremely high.

When you say "full load" I assume you mean either just the SNDs are saturated or only the Worker Instances are saturated.  Dynamic Split can help with this if there is enough spare CPU capacity overall.  The most common cause of high load on worker instances is excessive slowpath/F2F traffic.  The most common cause of high load on SNDs is a very high amount of fastpath traffic, or possibly an MQ or Check Point SecureXL code issue.

4. After R80, SND allocates work based on the load size of the Firewall Instance, so has the Global Dispatcher table been completely abandoned?

As far as the SND who runs the Dynamic Dispatcher is concerned, all Worker Instances are equal in overall capability unless the server architecture has Intel's P-cores and E-cores present which is a whole other can of worms.  But for the most part when the first packet of a new connection/session arrives at a SND, it assigns the connection and all its subsequent packets to the least-loaded Worker Instance.  This assignment is tracked in the SNDs by what I believe you are calling the "Global Dispatcher Table" which can be viewed with fw ctl multik gconn; this is necessary as all the packets of a single connection must always be handled by the same Worker Instance, even with Hyperflow.

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com

View solution in original post

26 Replies
Chris_Atkinson
Employee Employee
Employee

How many total cores does the system have 8 or more and what blades are enabled?

To show the current mapping use: fw ctl affinity -l -r

 

Performance Tuning Guide

https://sc1.checkpoint.com/documents/R81.20/WebAdminGuides/EN/CP_R81.20_PerformanceTuning_AdminGuide...

https://sc1.checkpoint.com/documents/R81.20/WebAdminGuides/EN/CP_R81.20_PerformanceTuning_AdminGuide...

ATRG: CoreXL

https://support.checkpoint.com/results/sk/sk98737

Dynamic Balancing

https://support.checkpoint.com/results/sk/sk164155

CCSM R77/R80/ELITE
(1)
GigaYang
Collaborator

Hi Chris,

My firewall has 8 CPU cores. 

0 Kudos
Chris_Atkinson
Employee Employee
Employee

In this case you will also want to review the multi-queue configuration, use: mq_mng --show

https://sc1.checkpoint.com/documents/R81.20/WebAdminGuides/EN/CP_R81.20_PerformanceTuning_AdminGuide...

Is the gateway configured for large MTU, which version / JHF is used?

CCSM R77/R80/ELITE
0 Kudos
the_rock
Legend
Legend

Hey Chris,

Extremelly helpful links! Just curious if you know and I welcome, as always, any other opinions @GigaYang @Timothy_Hall @AkosBakos 

Do you guys have any idea what OTHER represents in below screenshot? Its my eve-ng lab and I gave it 10 CPU cores when it was created few months ago, though never checked this until I saw this post. Its R81.20 jumbo 92 gateway (this one is not a cluster.

Tx!

Andy

 

Screenshot_1.png

0 Kudos
Timothy_Hall
Legend Legend
Legend

Are you using a trial/eval license?  Any chance your security gateway container is only allowing 8 cores? 

Also is it being presented to the VM as 10 full cores or 5x2 with SMT?

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
0 Kudos
the_rock
Legend
Legend

Hey Tim,

Thats right, its eval license, correct. And as far as your 2nd question, its 10 cores all together.

Btw, @Timothy_Hall , sorry if this may sound like a dumb question, but would below indicate each interface's multi-q is configured for 8 cpu cores?

[Expert@CP-GW:0]# mq_mng -o
Total 10 cores. Available for MQ 2 cores
i/f driver driver mode state mode (queues) cores
actual/avail
------------------------------------------------------------------------------------------------
eth0 vmxnet3 Kernel Up Auto (8/8) 0,1,0,1,0,1,0,1
eth1 vmxnet3 Kernel Up Auto (8/8) 0,1,0,1,0,1,0,1
eth2 vmxnet3 Kernel Up Auto (8/8) 0,1,0,1,0,1,0,1

[Expert@CP-GW:0]#

Andy

0 Kudos
Timothy_Hall
Legend Legend
Legend

Looks like the Check Point code and MQ are only recognizing/using 8 cores (max for vmxnet3 is 8 cores/queues anyway).  10 cores may not actually be supported, can you increase the number of cores to 12 or 16?  Pretty sure you will see all cores being utilized with those core counts.

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
the_rock
Legend
Legend

Will do it shortly and let you know, great suggestion.

Andy

0 Kudos
the_rock
Legend
Legend

Just tested it and I believe it would make sense its license limitation.

Andy

 

[Expert@CP-GW:0]# mq_mng -o
Total 16 cores. Available for MQ 2 cores
i/f driver driver mode state mode (queues) cores
actual/avail
------------------------------------------------------------------------------------------------
eth0 vmxnet3 Kernel Up Auto (8/8) 0,1,0,1,0,1,0,1
eth1 vmxnet3 Kernel Up Auto (8/8) 0,1,0,1,0,1,0,1
eth2 vmxnet3 Kernel Up Auto (8/8) 0,1,0,1,0,1,0,1

[Expert@CP-GW:0]#

 

I assigned 16 cores.

 

Screenshot_2.png

0 Kudos
AkosBakos
Leader Leader
Leader

Hi @GigaYang 

The task of the SND with my (poor) words: it handles the traffic between the SND and the FW workers. (for easier understanding)

Here is a thread about packet flow:

https://community.checkpoint.com/t5/General-Topics/R81-x-Security-Gateway-Architecture-Logical-Packe...

SNDSND

Q2:

the "small"  "i" means the outside of the incoming interface. The "big" "I" means the inside ... and so on

Q3:

There are a lot of possible scenarios. Because there are a lot of blades enabled, there can be a lot of traffic that can't  accelerated. 
Here are the sk: https://support.checkpoint.com/results/sk/sk32578

The there are rulebase issues, where the templating stopped.

You can check with fwaccel stat command:

2024-12-29 13_59_31-sk32578 - SecureXL Mechanism.png

Check this things first, before you move further.

Q4:

What do you mean here? Yes, if the dynamic balancing is enabled, the GW will do everything for the best performance.

Akos

----------------
\m/_(>_<)_\m/
(1)
GigaYang
Collaborator

Hi AkosBakos,

My firewall is using R82 now, and upgrade from R81.20.

0 Kudos
Timothy_Hall
Legend Legend
Legend

If you are using R82 (or the latest JHFAs for R81.20) AND you are on a Quantum Lightspeed or Quantum Force 9000/19000/29000 appliance, something called UPPAK will be enabled instead of the traditional KPPAK (fwaccel stat to check).  Basically most of SecureXL executes in user space (usim) instead as a driver in kernel space (sim) if UPPAK is active.

If UPPAK is enabled the SND cores will always register at 100% CPU utilization regardless of traffic load, at least as reported by the Gaia/Linux tools top/vmstat/etc.  This is EXPECTED BEHAVIOR due to the migration from interrupt-based to poll-mode processing in UPPAK mode which leverages something called DPDK.  Check Point-based status tools such as the CPU screen of cpview will show you the "real" CPU load on the SND cores based on how much traffic they are actually handling.

This new SND CPU behavior in UPPAK mode has and will continue to cause confusion going forward, check out the DPDK wikipedia page for more info.  I initially assumed this poll mode behavior was a quite undesirable "busy wait" approach (which traditionally fell into the realm of sloppy programming and was to be desperately avoided at all costs), but it allows modern systems to scale to much greater capacities while keeping the infamous killer of network performance known as jitter to a minimum.

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
0 Kudos
GigaYang
Collaborator

Hi Timothy_Hall,

Thanks for a lot. We will buy two 9300 gateway at 2025.

0 Kudos
the_rock
Legend
Legend

You got excellent responses so far. I always refer to ARTG link @Chris_Atkinson gave. Will say though, in R81.20, I never had a need to modify those settings manually.

Andy

Timothy_Hall
Legend Legend
Legend

I'll take a crack here, I think you are conflating Multi-Queue with the Dynamic Dispatcher:

1. Assume that my firewall has two SNDs. When traffic reaches the firewall, how is it decided which SND should be processed? Is it also based on the load size of the two SNDs?

Assuming that all interfaces with Multi-Queue (MQ) enabled are set for Auto or Dynamic mode (mq_mng --show to check), the NIC driver (which is what actually implements MQ, it is not technically Check Point code) runs a hash calculation of the L3 src/dst IP addresses and L4 src/dst ports, and assigns that flow's packets to an SND core for handling.  The assignment at this level is not load-based to my knowledge, as opposed to something like the Dynamic Dispatcher which assigns connections/flows to a Firewall Worker Instance based on load.  I believe the flow's reply packets will also be hashed to that same SND core.  You can observe how well the traffic balancing is working in cpview via the Advanced...SecureXL..Network Per CPU screen. 

If the traffic balance is way out of whack between the SNDs this is usually because someone has messed with the MQ configuration (see the first sentence of the last paragraph) which you should NEVER DO in R80.40 and later, and this will be especially disastrous if Dynamic Split is enabled.  All SNDs are considered to be equal as far as processing power.  If the traffic is well-balanced between the SNDs but CPU utilization is way out of whack between the SNDs, generally this is a Check Point code issue with SecureXL.  Most of the time TAC will be required to assist here, but I will discuss some undocumented techniques for doing this yourself in my CPX 2025 Vegas speech on the CheckMates track.

2. Does the working position of SND correspond to the 'little i'?

Traditionally (prior to R80.20) "i" would only indicate the entrance to the slowpath, and packets in the medium/fast path would not traverse it at all, other than in R80.20+ where the first packet of every new connection/session which always goes slowpath, and then the rest of the connection is hopefully offloaded to the Medium path or fastpath. 

However new capture tools like fw monitor -F available in R80.20+ now show all packets coming into the SND as "i" so I'm not sure what to think now.  In the modern releases (R80.20+), I suppose "i" could be interpreted as when a packet is handed off from the Gaia Linux OS code (NIC driver & ring buffer) to the Check Point code (SecureXL dispatcher or worker instance code).  If someone from R&D could further clarify this that would be helpful.

3. Is the load of SND Core generally smaller than that of Firewall Instance? If full load occurs, what are the possible reasons?

This is highly dependent on the distribution of traffic between the fastpath, medium path, and slowpath (fwaccel stats -s).  On a firewall with no deep inspection blades enabled (APCL, TP etc) a high percentage of traffic will be completely processed on the SND cores only (other than the first packet of a new connection/session which always goes slowpath).  However the bulk of traffic on modern firewalls is examined by deep inspection in the Medium Path and sometimes slowpath on a Firewall Worker Instance.  So the inspection operations are much more intensive on a worker instance when compared to a SND, which is why there generally tends to be more worker instances than SND instances on most firewalls unless percentage of fastpath traffic is extremely high.

When you say "full load" I assume you mean either just the SNDs are saturated or only the Worker Instances are saturated.  Dynamic Split can help with this if there is enough spare CPU capacity overall.  The most common cause of high load on worker instances is excessive slowpath/F2F traffic.  The most common cause of high load on SNDs is a very high amount of fastpath traffic, or possibly an MQ or Check Point SecureXL code issue.

4. After R80, SND allocates work based on the load size of the Firewall Instance, so has the Global Dispatcher table been completely abandoned?

As far as the SND who runs the Dynamic Dispatcher is concerned, all Worker Instances are equal in overall capability unless the server architecture has Intel's P-cores and E-cores present which is a whole other can of worms.  But for the most part when the first packet of a new connection/session arrives at a SND, it assigns the connection and all its subsequent packets to the least-loaded Worker Instance.  This assignment is tracked in the SNDs by what I believe you are calling the "Global Dispatcher Table" which can be viewed with fw ctl multik gconn; this is necessary as all the packets of a single connection must always be handled by the same Worker Instance, even with Hyperflow.

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
AkosBakos
Leader Leader
Leader

Amazing answer, THX @Timothy_Hall 

----------------
\m/_(>_<)_\m/
0 Kudos
the_rock
Legend
Legend

Indeed! I always think Tims book is superb, which it is, but then sometimes I see even better explanations here 🙂

0 Kudos
GigaYang
Collaborator

Yes. The book is grate, I have buy the book.

0 Kudos
GigaYang
Collaborator

When traffic via Accelerated Path. Why increase SND core loading?

0 Kudos
the_rock
Legend
Legend

Hey Giga,

Just something I noticed, though this sort of made sense to me already 🙂

Andy

CP-GW> set dynamic-balancing state enable
Dynamic Balancing is not supported on open server appliances

CP-GW>

0 Kudos
Timothy_Hall
Legend Legend
Legend

Yep, but you can force Dynamic Split active on a system that does not support it for lab purposes.  Here is the secret expert mode command we use in my Gateway Performance Optimization course which is implemented in VMWare, a reboot will be required:

dynamic_split -o enable_automation

 

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
0 Kudos
the_rock
Legend
Legend

Lets see if it works, will report soon.

Andy

0 Kudos
the_rock
Legend
Legend

Guess eve-ng is special lol

Andy

[Expert@CP-GW:0]# dynamic_split -o enable_automation
Dynamic Balancing is not supported on open server appliances
[Expert@CP-GW:0]#

0 Kudos
Timothy_Hall
Legend Legend
Legend

Wow really?  What code release and HFA?

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
0 Kudos
the_rock
Legend
Legend

[Expert@CP-GW:0]# cpinfo -y fw1

This is Check Point CPinfo Build 914000250 for GAIA
[FW1]
HOTFIX_TEX_ENGINE_R8120_AUTOUPDATE
HOTFIX_INEXT_NANO_EGG_AUTOUPDATE
HOTFIX_R81_20_JUMBO_HF_MAIN Take: 92
HOTFIX_R80_40_MAAS_TUNNEL_AUTOUPDATE
HOTFIX_PUBLIC_CLOUD_CA_BUNDLE_AUTOUPDATE
HOTFIX_GOT_TPCONF_AUTOUPDATE

FW1 build number:
This is Check Point's software version R81.20 - Build 043
kernel: R81.20 - Build 050

[Expert@CP-GW:0]#

0 Kudos
the_rock
Legend
Legend

Btw, for the context, no matter what "flavor" I try, its the same thing.

Andy

 

[Expert@CP-GW:0]# dynamic_split -h
-h - Shows built-in help
-o enable - Enables the CoreXL Dynamic Balancing (requires a reboot)
-o disable - Disables the CoreXL Dynamic Balancing (requires a reboot)
-o stop - Stops the CoreXL Dynamic Balancing ("freezes" it; survives a reboot)
-o start - Starts the CoreXL Dynamic Balancing ("resumes" it; survives a reboot)
-p - Shows the current state of the CoreXL Dynamic Balancing
-r - Resets the CoreXL configuration to the default and keeps the CoreXL Dynamic Balancing enabled
-v <config> <value> - Set Dynamic Balancing configuration value (use "default" to change <config> back to default)
[Expert@CP-GW:0]# dynamic_split -o start
Dynamic Balancing is not running

[Expert@CP-GW:0]# dynamic_split -o enable
Dynamic Balancing is not supported on open server appliances
[Expert@CP-GW:0]#

0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events