Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Wipeout_
Participant
Jump to solution

VSX cluster and CoreXL


I have a 2 node VSX cluster in which coreXL is not enabled in cpconfig (VS0), but it is for all 4 virtual systems hosted via smartconsole. Confirmed via "fw ctl multik stat" and "top" fwk instance number for each VS configured in the smartconsole. Thats correct.

# fw ctl affinity -l -r
CPU 0: Mgmt
CPU 1:
CPU 2:
CPU 3:
CPU 4:
CPU 5:
CPU 6:
CPU 7:
CPU 8:
... ...
CPU 43:
CPU 44:
CPU 45:
CPU 46:
CPU 47:
All:
Interface eth3-01: has multi queue enabled
Interface eth3-02: has multi queue enabled
Interface Sync: has multi queue enabled
Interface eth1-01: has multi queue enabled
Interface eth2-01: has multi queue enabled
Interface eth2-02: has multi queue enabled
Interface eth2-03: has multi queue enabled
Interface eth2-04: has multi queue enabled​


CPU #0 is showing an average high cpu and some load peaks...

https://sc1.checkpoint.com/documents/R81.20/WebAdminGuides/EN/CP_R81.20_VSX_AdminGuide/Content/Topic... shows the following:

Important - Enabling CoreXL on VS 0 is not recommended because of increased memory overhead and potential performance degradation. Most CSX deployments and use cases do not require more than a single Firewall instance for VSO as its main purpose is managing the VSX Gateway.


¿Should i enable cpconfig in VS0 for better performance tuning?
¿Any other suggestion?

Thanks!

0 Kudos
1 Solution

Accepted Solutions
Chris_Atkinson
Employee Employee
Employee

A similar recent discussion was had here in case it is useful for you:

https://community.checkpoint.com/t5/General-Topics/CoreXL-is-turned-off-by-default-on-a-brand-new-98...

CCSM R77/R80/ELITE

View solution in original post

14 Replies
_Val_
Admin
Admin

As the Admin guide clearly says, you should NOT. Use the default settings.

Wipeout_
Participant

Ok,  thx Val.
What about having 48 cores but a single one with high cpu that seems to be assinged to Mgmt?

0 Kudos
_Val_
Admin
Admin

No issue. MGMT interface is not used for production traffic, right? Logs, policy installs, and control messages do not require lots of firepower, so a single CPX queued to that interface is just fine.

Wipeout_
Participant

Thanks again for your replies Val.

As i've pasted before, fw ctl affinity does not show much info.

But running "fw ctl multik stat" for each VS shows that all of them are using the range of cores of 2-23+. 

And mq_mng:

# mq_mng -o
Total 48 cores. Multiqueue 4 cores
i/f type state mode cores
------------------------------------------------------------------------------------------------
Mgmt igb Up Off 0
Sync igb Up Auto (4/4) 0,24,1,25
eth1-01 igb Up Auto (4/4) 0,24,1,25
eth2-01 i40e Up Auto (4/4) 0,24,1,25
eth2-02 i40e Up Auto (4/4) 0,24,1,25
eth2-03 i40e Up Auto (4/4) 0,24,1,25
eth2-04 i40e Up Auto (4/4) 0,24,1,25
eth3-01 i40e Up Auto (4/4) 0,24,1,25
eth3-02 i40e Up Auto (4/4) 0,24,1,25


mq is not sharing cores with the vs workers, but it seems core 0 is shared between Mgmt, Sync and MQ.
And there are shared cores between interfaces, Sync...
I suppose that can be tuned. What would be the way, maybe "mq_mng_reconf_all_vs"? I dont find documentation about it

Thanks again

0 Kudos
Chris_Atkinson
Employee Employee
Employee

A similar recent discussion was had here in case it is useful for you:

https://community.checkpoint.com/t5/General-Topics/CoreXL-is-turned-off-by-default-on-a-brand-new-98...

CCSM R77/R80/ELITE
Lesley
Mentor Mentor
Mentor

As posted, it is not needed to enable coreXL in VS0. If you see high load / CPU spikes it could indicate an issue. Adding more CPU's would maybe help for now but if there is a memory leak you can wait until it goes wrong.

run hcp -r all on the loaded system, anything there? Zombies, coredumps etc.

what version are you running -> cpinfo -y all

how does top look like? Is the system swapping? 

-------
If you like this post please give a thumbs up(kudo)! 🙂
Wipeout_
Participant

The "problem" is that core 0 is always 80% average reaching sometimes more than 90%.
That seems to be too much for only Management related processes.

But i have "detected" the problem. These are all cores:

cores_all.png

As you can see, there are only 4 of them that are higher than 50.

cores_network.png
One of them is 80% average. That is cpu #0. The other ones are #1, #24 and #25.

As checked via "mq_mng -o" or "cpview", those are the CPUs assigned as SND, that is traffic processing.
At this point, i understand the solution could be enabling the CoreXL Dynamic Balancing.
https://support.checkpoint.com/results/sk/sk164155


Given that our gateways still are in R80.40, we can upgrade to R81.20 (we have to) so the CoreXL DLB is enabled automatically.

"This is the behavior when you upgrade VSX Gateways / VSX Cluster Members R80.40 - R81.10, on which CoreXL Dynamic Balancing was not disabled explicitly, to R81.20 (or higher) and then install a Jumbo Hotfix Accumulator:

  1. CoreXL Dynamic Balancing will be enabled by default.
  2. Any previously configured manual affinity settings for interfaces / daemons will be overridden."


    What do you think?
0 Kudos
Lesley
Mentor Mentor
Mentor

Check this one out:

https://support.checkpoint.com/results/sk/sk176908

-------
If you like this post please give a thumbs up(kudo)! 🙂
0 Kudos
Wipeout_
Participant

Thanks Lesley. 
I'd already checked the workers and the load is correctly balanced between multiple cores/VS instances.
The high load comes from the 4 processord assigned for SND.

In fact, this high CPU cannot be checked and associated to processes because are due to software interrupts.

cores_interrupts.png

0 Kudos
emmap
Employee
Employee

You should upgrade anyway as R80.40 is out of support, but yes dynamic balancing will likely help you out here.

0 Kudos
Lesley
Mentor Mentor
Mentor

Upgrade could help to solve performance issues. Either bug or new functionality. Not worth to spend any more time on R80.40

Second, dynamic balancing is good for this. But the current take 98 and 99 R81.20 has open bugs for this feature. So pick version after that (not released yet) or older. 

-------
If you like this post please give a thumbs up(kudo)! 🙂
0 Kudos
Alex-
Leader Leader
Leader

Second, dynamic balancing is good for this. But the current take 98 and 99 R81.20 has open bugs for this feature. So pick version after that (not released yet) or older. 


Is this documented somewhere? Just upgraded a VSX Security Group to T99. 😀

0 Kudos
Lesley
Mentor Mentor
Mentor

PRJ-58188,
PRHF-35819

Security Gateway

After an upgrade, Dynamic Balancing does not start. The "dynamic_balancing -p" command returns "Dynamic Balancing is currently Initializing". Refer to sk182615.

PRJ-58560,
PRHF-37532

Security Gateway

Enabling the CoreXL Dynamic Split feature causes high CPU load on Maestro Security Group Members because of multiple "mq_mng -u" processes. Refer to sk183251.

PRJ-47275,
PMTR-92832

Security Gateway

When Dynamic Split is enabled, SND synchronization fails between members on Active site and Standby site, although it should occur automatically, when one of the members receives an additional SND.

 

Second one above is still there in take 99. Other I have not seen return in take 99. Custom fix possible

-------
If you like this post please give a thumbs up(kudo)! 🙂
Alex-
Leader Leader
Leader

OK, thanks. We'll monitor the situation as both SK pointed to T99 as solution.

0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events