- Products
- Learn
- Local User Groups
- Partners
- More
CheckMates Fifth Birthday
Celebrate with Us!
days
hours
minutes
seconds
Join the CHECKMATES Everywhere Competition
Submit your picture to win!
Check Point Proactive support
Free trial available for 90 Days!
As YOU DESERVE THE BEST SECURITY
Upgrade to our latest GA Jumbo
The 2022 MITRE Engenuity ATT&CK®
Evaluations Results Are In!
Now Available: SmartAwareness Security Training
Training Built to Educate and Engage
MITRE ATT&CK
Inside Check Point products!
CheckFlix!
All Videos In One Space
We recently moved from OpenServers to VSX Clusters and now performance is really bad. Traffic to the internet is terrible.
People working over VPN can't hardly work. We just don't know were to look anymore. we are on R80.10 take 121.
Clusters are 15600 and 23500 models.
Has anybody any idea were to look?
The VS on the VSX cluster is default using 1 CPU core.
For more performance , you need to increase the number of core.
To configure CoreXL on a Virtual System:
The Virtual System General Properties window opens.
we assigned 10 firewall instances to that VS
and we assigned 10 CPU's to this VS dedicated.
More information is required here.
What blades are enabled? Might be it's something connected with additional blades enabled for all traffic.
What you can see in top command output, what processes are using most of CPU?
Try to start with Super Seven commands from Tim Hall's presentation, they are integrated in Common Check Point Commands (ccc) under Gateway Performance Optimization section.
Next blades are enabled ips, anti-virus/anti-bot, URL filterening, application and VPN
Can you send output of
fw ctl affinity -l
fw ctl multik stat
cpmq get
top (extended to see all individual cores)
top - 15:46:46 up 1 day, 23:20, 2 users, load average: 4.14, 4.61, 5.00
Tasks: 519 total, 1 running, 518 sleeping, 0 stopped, 0 zombie
Cpu(s): 7.6%us, 1.5%sy, 0.0%ni, 89.6%id, 0.0%wa, 0.1%hi, 1.2%si, 0.0%st
Mem: 131774100k total, 19940328k used, 111833772k free, 389664k buffers
Swap: 33551672k total, 0k used, 33551672k free, 11931140k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
6069 admin 0 -20 2653m 2.1g 178m S 379 1.6 5724:45 fwk2_dev
13600 admin 15 0 716m 227m 40m S 7 0.2 7:40.63 fw_full
13598 admin 15 0 292m 75m 39m S 1 0.1 0:50.09 cpd
2952 admin 0 -20 710m 179m 31m S 1 0.1 22:39.70 fwk0_dev
3845 admin 15 0 365m 342m 10m S 1 0.3 4:17.23 rad
6071 admin 0 -20 1267m 734m 99m S 1 0.6 35:17.06 fwk1_dev
3163 admin 15 0 286m 74m 44m S 0 0.1 0:24.01 cpd
3503 admin 15 0 615m 100m 41m S 0 0.1 0:40.87 fw_full
12837 admin 15 0 292m 75m 39m S 0 0.1 0:30.06 cpd
12839 admin 15 0 563m 72m 40m S 0 0.1 0:04.51 fw_full
1 admin 15 0 1976 724 624 S 0 0.0 0:04.91 init
2 admin RT -5 0 0 0 S 0 0.0 0:00.04 migration/0
3 admin 15 0 0 0 0 S 0 0.0 0:00.00 ksoftirqd/0
4 admin RT -5 0 0 0 S 0 0.0 0:00.00 watchdog/0
5 admin RT -5 0 0 0 S 0 0.0 0:00.00 migration/1
6 admin 15 0 0 0 0 S 0 0.0 0:00.00 ksoftirqd/1
7 admin RT -5 0 0 0 S 0 0.0 0:00.00 watchdog/1
8 admin RT -5 0 0 0 S 0 0.0 0:15.59 migration/2
9 admin 15 0 0 0 0 S 0 0.0 0:00.00 ksoftirqd/2
10 admin RT -5 0 0 0 S 0 0.0 0:00.00 watchdog/2
11 admin RT -5 0 0 0 S 0 0.0 3:14.68 migration/3
12 admin 15 0 0 0 0 S 0 0.0 0:00.00 ksoftirqd/3
13 admin RT -5 0 0 0 S 0 0.0 0:00.00 watchdog/3
14 admin RT -5 0 0 0 S 0 0.0 0:05.87 migration/4
15 admin 15 0 0 0 0 S 0 0.0 0:00.00 ksoftirqd/4
16 admin RT -5 0 0 0 S 0 0.0 0:00.00 watchdog/4
17 admin RT -5 0 0 0 S 0 0.0 0:08.82 migration/5
18 admin 15 0 0 0 0 S 0 0.0 0:00.00 ksoftirqd/5
19 admin RT -5 0 0 0 S 0 0.0 0:00.00 watchdog/5
20 admin RT -5 0 0 0 S 0 0.0 0:02.10 migration/6
21 admin 15 0 0 0 0 S 0 0.0 0:00.00 ksoftirqd/6
22 admin RT -5 0 0 0 S 0 0.0 0:00.00 watchdog/6
23 admin RT -5 0 0 0 S 0 0.0 0:04.03 migration/7
24 admin 15 0 0 0 0 S 0 0.0 0:00.00 ksoftirqd/7
25 admin RT -5 0 0 0 S 0 0.0 0:00.00 watchdog/7
26 admin RT -5 0 0 0 S 0 0.0 0:06.17 migration/8
27 admin 15 0 0 0 0 S 0 0.0 0:00.00 ksoftirqd/8
28 admin RT -5 0 0 0 S 0 0.0 0:00.00 watchdog/8
29 admin RT -5 0 0 0 S 0 0.0 0:52.68 migration/9
[Expert@vsx-lvn-pub2:0]# fw ctl multik stat
fw: CoreXL is disabled
[Expert@vsx-lvn-pub2:0]# cpmq get
Active ixgbe interfaces:
eth1-01 [Off]
eth1-02 [Off]
eth1-03 [Off]
eth1-04 [Off]
Active igb interfaces:
Mgmt [Off]
Sync [Off]
eth2-01 [Off]
First question you need to ask yourself, how many core per VS are enabled? If more than one, what are affinity settings?
VS 2 is the VS with issues
Mgmt: CPU 0
Sync: CPU 0
eth1-01: CPU 1 2
eth1-02: CPU 3 4
eth1-03: CPU 5 6
eth1-04: CPU 7 8
eth2-01: CPU 9 10
VS_0: CPU 39
VS_0 fwk: CPU 39
VS_1: CPU 11 12 13 14 15 16 17 18 19 20
VS_1 fwk: CPU 11 12 13 14 15 16 17 18 19 20
VS_2: CPU 21 22 23 24 25 26 27 28 29 30
VS_2 fwk: CPU 21 22 23 24 25 26 27 28 29 30
VS_3 fwk: CPU 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
Hi Bart! Your core allocation is set wrong. You are using hyperthreaded CPUs so you have to be mindful about numbering!
You have allocated the same physical cores to SecureXL on interfaces (0-10) and VS2 (matching sibling cores 20-30)
Pls have a look at the article i wrote not that long ago
Security Gateway Performance Optimization - VSX
Make your own spreadsheet and re-allocate cores correctly.
Note that having 2 CPUs per interface is not going to help you unless you use multiqueue - may as well stick with singe CPU per interface.
In your case I would do like this to start with and then tweak depending on CPU usage
Mgmt: CPU 0
Sync: CPU 0
eth1-01: CPU 1 (21)
eth1-02: CPU 2 (22)
eth1-03: CPU 3 (23)
eth1-04: CPU 4 (24)
eth2-01: CPU 5 (25)
VS_0: CPU 6 26
VS_0 fwk: CPU 6 26
VS_1: CPU 10 11 12 13 14 30 31 32 33 34
VS_1 fwk: CPU 10 11 12 13 14 30 31 32 33 34
VS_2: CPU 15 16 17 18 19 35 36 37 38 39
VS_2 fwk: CPU 15 16 17 18 19 35 36 37 38 39
VS_3 fwk: CPU 7 8 9 27 28 29
Or design your own but make sure you take into account hyper threaded numbering. i.e Physical core 0 also holds hyperthreaded core 20 so don't mix SecureXL and CoreXL on those!
Let us know if you need more info or commands to set affinities
And remember to press 1 when you do top command so you see all individual cores not just summary
Thanks Kaspars,
I've read your article it was very helpfull, i've modified the affinity following your guidelines.
I will keep you posted next week when production starts again on monday.
Btw, I had to guess-work some things, so ideally send us fw ctl multik stat command output to confirm that suggested config will work ok.
Also if you have possibility, set up some sort of SNMP graphs for all CPU cores to further fine tune your CoreXL and SXL
You also would have to adjust allocation depending on total number of cores. My example was for 40 HT cores
What is you experience with multique? Is it advisable to enable it on some interfaces?
If the RX-DRP rate on a busy interface is >0.1% (viewed with netstat -ni) even enough SND/IRQ cores have been allocated such that the busy interface has its own dedicated SND/IRQ core as shown by sim affinity -l, then Multi-Queue should be enabled. Multi-Queue does cause some slight additional overhead on the SND/IRQ core to "stick" the packets associated with a single connection to the same queue every time to avoid out of order delivery, so enabling Multi-Queue is not always a no-brainer. More SND/IRQ cores should be allocated first if possible. Specifically if all SND/IRQ cores are very busy (>75% utilization) and you can't allocate any more due to a limited number of cores, enabling Multi-Queue will actually make things worse.
--
Second Edition of my "Max Power" Firewall Book
Now Available at http://www.maxpowerfirewalls.com
output of fw ctl multik stat after the change
[Expert@vsx-lvn-pub2:2]# fw ctl multik stat
ID | Active | CPU | Connections | Peak
----------------------------------------------
0 | Yes | 15-19+ | 1232 | 6691
1 | Yes | 15-19+ | 935 | 6653
2 | Yes | 15-19+ | 1120 | 9340
3 | Yes | 15-19+ | 932 | 6128
4 | Yes | 15-19+ | 1281 | 11506
5 | Yes | 15-19+ | 968 | 7107
6 | Yes | 15-19+ | 1073 | 8319
7 | Yes | 15-19+ | 1006 | 7035
8 | Yes | 15-19+ | 1072 | 6552
9 | Yes | 15-19+ | 1123 | 5669
[Expert@vsx-lvn-pub2:2]# vsenv 1
Context is set to Virtual Device vsx-lvn-pub2_fw-lvn-snx (ID 1).
[Expert@vsx-lvn-pub2:1]# fw ctl multik stat
ID | Active | CPU | Connections | Peak
----------------------------------------------
0 | Yes | 10-14+ | 391 | 115231
1 | Yes | 10-14+ | 183 | 76175
2 | Yes | 10-14+ | 312 | 116117
3 | Yes | 10-14+ | 860 | 114412
4 | Yes | 10-14+ | 445 | 90342
[Expert@vsx-lvn-pub2:1]# vsenv 3
Context is set to Virtual Device vsx-lvn-pub2_vlan802 (ID 3).
[Expert@vsx-lvn-pub2:3]# fw ctl multik stat
fw: CoreXL is disabled
[Expert@vsx-lvn-pub2:3]# vsenv 0
Context is set to Virtual Device vsx-lvn-pub2 (ID 0).
[Expert@vsx-lvn-pub2:0]# fw ctl multik stat
fw: CoreXL is disabled
I am on R77.30 (going to R80.10 soon). I thought the problem is fixed on R80.10 but seems like you are having the same issue that I see on R77.30. I am not sure if we can ever get the same performance on VSX, compared with regular gateways. You need to fine tune your VSX environment to improve the performance.
I'll guess so, but even a policy push will make the cluster unstable or at least that VS.
Pushing policy will make the VS unresponsive for a couple of minutes.
Just want to share my experience. We have three 4-node VSX clusters, all the hardware is 23800. One cluster was upgraded to R80.10 (from R77.30) couple of months ago, and one a few days ago.
The only issue I have is that the performance is not the same as the normal (non-VSX) gateway. Also I wish there is no downtime on changing CoreXL value.
Other than that we have no other issues, it is stable and reliable. No issues ever noticed on pushing the policy.
If you upgrade to R80.20 then there will be no downtime in changing cores.
As for performance, you can tweak it to be fairly close now with 64 bit kernel on VS but of course it will never be as fast as non-vsx.
What do the interfaces look like - are you clocking up RX or TX errors?
watch netstat -ni
& ethtool for the i-faces - look at queueing
Also dig into and look at the input queue config:
sk61143
you may need to increase it (from old memory here) as it feeds all the VSs
Good luck,
Jerry Lee
Normally I would enable MQ on busy 10Gbps interface as single core cannot cope with that much traffic.
We tried to enable it on 4x1Gbps bond but for some reason it didn't work that well. Bond kept losing members and was going up and down as a result every minute so we rolled it back. Didn't have time to investigate it any further I'm afraid but I believe there was something wrong in our config. Too many things "to do" at the moment.
The strange situation you encountered with Multi-Queue is why I am generally not a fan of turning on features that aren't enabled by default, unless you know for sure that you need them. The KISS principle is your friend... 🙂
--
Second Edition of my "Max Power" Firewall Book
Now Available at http://www.maxpowerfirewalls.com
We worked with o-tac and a diamond engineer to tshoot and tweak this issue.
.
All traffic inbound was being affected to all of the VSs on rhe 21400 4 node cluster.
Jerry
Get Outlook for Android<https://aka.ms/ghei36>
Hello Bart,
We've a similar set up as yours; but we're using the 23800 appliances. We've ditched R80.10 and replaced it with R80.20. Have you considered to upgrade to R80.20, because between the lines you can read that the performance on R80.10 could be better. From the R80.20 release notes:
Has the user experience been improved after you made the proposed changes?
Many thanks.
Regards,
Kris
Actually 64bit mode is available already in R80.10
VSX Enhancements:
We do see improvement after setting the CPU affinity right, i mean i definitely now see that all CPU's are used.
i also enabled multiqueing on my busiest interfaces and also here i see an improvement.
But still it's not what it should be, especially if we push a policy all connections are just dropped for 10 minutes. This is not good. we don't see this any other VSX cluster we have.
Which CPU cores are busy when it happens? Is it VS2 ten cores or interfaces?
As suggested by Kris, switching to 64bit kernel will boost a lot of memory if it's FWK that's maxing out
check with vs_bits -stat
[Expert@vsx-lvn-pub2:0]# vs_bits -stat
All VSs are at 32 bits
About CheckMates
Learn Check Point
Advanced Learning
YOU DESERVE THE BEST SECURITY