Solved: VSX cluster and CoreXL

Wipeout_ · ‎2025-04-14

I have a 2 node VSX cluster in which coreXL is not enabled in cpconfig (VS0), but it is for all 4 virtual systems hosted via smartconsole. Confirmed via "fw ctl multik stat" and "top" fwk instance number for each VS configured in the smartconsole. Thats correct.

# fw ctl affinity -l -r
CPU 0: Mgmt
CPU 1:
CPU 2:
CPU 3:
CPU 4:
CPU 5:
CPU 6:
CPU 7:
CPU 8:
... ...
CPU 43:
CPU 44:
CPU 45:
CPU 46:
CPU 47:
All:
Interface eth3-01: has multi queue enabled
Interface eth3-02: has multi queue enabled
Interface Sync: has multi queue enabled
Interface eth1-01: has multi queue enabled
Interface eth2-01: has multi queue enabled
Interface eth2-02: has multi queue enabled
Interface eth2-03: has multi queue enabled
Interface eth2-04: has multi queue enabled

CPU #0 is showing an average high cpu and some load peaks...

https://sc1.checkpoint.com/documents/R81.20/WebAdminGuides/EN/CP_R81.20_VSX_AdminGuide/Content/Topic... shows the following:

Important - Enabling CoreXL on VS 0 is not recommended because of increased memory overhead and potential performance degradation. Most CSX deployments and use cases do not require more than a single Firewall instance for VSO as its main purpose is managing the VSX Gateway.

¿Should i enable cpconfig in VS0 for better performance tuning?
¿Any other suggestion?

Thanks!

Chris_Atkinson · ‎2025-04-14

A similar recent discussion was had here in case it is useful for you:

https://community.checkpoint.com/t5/General-Topics/CoreXL-is-turned-off-by-default-on-a-brand-new-98...

CCSM R77/R80/ELITE

View solution in original post

_Val_ · ‎2025-04-14

As the Admin guide clearly says, you should NOT. Use the default settings.

Wipeout_ · ‎2025-04-14

Ok, thx Val.
What about having 48 cores but a single one with high cpu that seems to be assinged to Mgmt?

_Val_ · ‎2025-04-14

No issue. MGMT interface is not used for production traffic, right? Logs, policy installs, and control messages do not require lots of firepower, so a single CPX queued to that interface is just fine.

Wipeout_ · ‎2025-04-14

Thanks again for your replies Val.

As i've pasted before, fw ctl affinity does not show much info.

But running "fw ctl multik stat" for each VS shows that all of them are using the range of cores of 2-23+.

And mq_mng:

# mq_mng -o
Total 48 cores. Multiqueue 4 cores
i/f type state mode cores
------------------------------------------------------------------------------------------------
Mgmt igb Up Off 0
Sync igb Up Auto (4/4) 0,24,1,25
eth1-01 igb Up Auto (4/4) 0,24,1,25
eth2-01 i40e Up Auto (4/4) 0,24,1,25
eth2-02 i40e Up Auto (4/4) 0,24,1,25
eth2-03 i40e Up Auto (4/4) 0,24,1,25
eth2-04 i40e Up Auto (4/4) 0,24,1,25
eth3-01 i40e Up Auto (4/4) 0,24,1,25
eth3-02 i40e Up Auto (4/4) 0,24,1,25

mq is not sharing cores with the vs workers, but it seems core 0 is shared between Mgmt, Sync and MQ.
And there are shared cores between interfaces, Sync...
I suppose that can be tuned. What would be the way, maybe "mq_mng_reconf_all_vs"? I dont find documentation about it

Thanks again

Chris_Atkinson · ‎2025-04-14

A similar recent discussion was had here in case it is useful for you:

https://community.checkpoint.com/t5/General-Topics/CoreXL-is-turned-off-by-default-on-a-brand-new-98...

CCSM R77/R80/ELITE

Lesley · ‎2025-04-14

As posted, it is not needed to enable coreXL in VS0. If you see high load / CPU spikes it could indicate an issue. Adding more CPU's would maybe help for now but if there is a memory leak you can wait until it goes wrong.

run hcp -r all on the loaded system, anything there? Zombies, coredumps etc.

what version are you running -> cpinfo -y all

how does top look like? Is the system swapping?

-------
Please press "Accept as Solution" if my post solved it 🙂

Wipeout_ · ‎2025-04-14

The "problem" is that core 0 is always 80% average reaching sometimes more than 90%.
That seems to be too much for only Management related processes.

But i have "detected" the problem. These are all cores:

As you can see, there are only 4 of them that are higher than 50.

One of them is 80% average. That is cpu #0. The other ones are #1, #24 and #25.

As checked via "mq_mng -o" or "cpview", those are the CPUs assigned as SND, that is traffic processing.
At this point, i understand the solution could be enabling the CoreXL Dynamic Balancing.
https://support.checkpoint.com/results/sk/sk164155

Given that our gateways still are in R80.40, we can upgrade to R81.20 (we have to) so the CoreXL DLB is enabled automatically.

"This is the behavior when you upgrade VSX Gateways / VSX Cluster Members R80.40 - R81.10, on which CoreXL Dynamic Balancing was not disabled explicitly, to R81.20 (or higher) and then install a Jumbo Hotfix Accumulator:

CoreXL Dynamic Balancing will be enabled by default.
Any previously configured manual affinity settings for interfaces / daemons will be overridden."

What do you think?

Lesley · ‎2025-04-16

Check this one out:

https://support.checkpoint.com/results/sk/sk176908

-------
Please press "Accept as Solution" if my post solved it 🙂

Wipeout_ · ‎2025-04-16

Thanks Lesley.
I'd already checked the workers and the load is correctly balanced between multiple cores/VS instances.
The high load comes from the 4 processord assigned for SND.

In fact, this high CPU cannot be checked and associated to processes because are due to software interrupts.

emmap · ‎2025-04-16

You should upgrade anyway as R80.40 is out of support, but yes dynamic balancing will likely help you out here.

Lesley · ‎2025-04-16

Upgrade could help to solve performance issues. Either bug or new functionality. Not worth to spend any more time on R80.40

Second, dynamic balancing is good for this. But the current take 98 and 99 R81.20 has open bugs for this feature. So pick version after that (not released yet) or older.

-------
Please press "Accept as Solution" if my post solved it 🙂

Alex- · ‎2025-04-16

Second, dynamic balancing is good for this. But the current take 98 and 99 R81.20 has open bugs for this feature. So pick version after that (not released yet) or older.

Is this documented somewhere? Just upgraded a VSX Security Group to T99. 😀

Lesley · ‎2025-04-16

PRJ-58188,
PRHF-35819

Security Gateway

After an upgrade, Dynamic Balancing does not start. The "dynamic_balancing -p" command returns "Dynamic Balancing is currently Initializing". Refer to sk182615.

PRJ-58560,
PRHF-37532

Security Gateway

Enabling the CoreXL Dynamic Split feature causes high CPU load on Maestro Security Group Members because of multiple "mq_mng -u" processes. Refer to sk183251.

PRJ-47275,
PMTR-92832

Security Gateway

When Dynamic Split is enabled, SND synchronization fails between members on Active site and Standby site, although it should occur automatically, when one of the members receives an additional SND.

Second one above is still there in take 99. Other I have not seen return in take 99. Custom fix possible

-------
Please press "Accept as Solution" if my post solved it 🙂

Alex- · ‎2025-04-16

OK, thanks. We'll monitor the situation as both SK pointed to T99 as solution.

Are you a member of CheckMates?

VSX cluster and CoreXL