- CheckMates
- :
- Products
- :
- General Topics
- :
- Re: Bad Performance
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Are you a member of CheckMates?
×- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Bad Performance
We recently moved from OpenServers to VSX Clusters and now performance is really bad. Traffic to the internet is terrible.
People working over VPN can't hardly work. We just don't know were to look anymore. we are on R80.10 take 121.
Clusters are 15600 and 23500 models.
Has anybody any idea were to look?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The VS on the VSX cluster is default using 1 CPU core.
For more performance , you need to increase the number of core.
To configure CoreXL on a Virtual System:
- Open SmartConsole.
- From the Gateways & Servers view or Object Explorer, double-click the Virtual System.
The Virtual System General Properties window opens.
- From the navigation tree, select CoreXL.
- Select the number of firewall instances for the Virtual System.
- Click OK.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
we assigned 10 firewall instances to that VS
and we assigned 10 CPU's to this VS dedicated.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
More information is required here.
What blades are enabled? Might be it's something connected with additional blades enabled for all traffic.
What you can see in top command output, what processes are using most of CPU?
Try to start with Super Seven commands from Tim Hall's presentation, they are integrated in Common Check Point Commands (ccc) under Gateway Performance Optimization section.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Next blades are enabled ips, anti-virus/anti-bot, URL filterening, application and VPN
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can you send output of
fw ctl affinity -l
fw ctl multik stat
cpmq get
top (extended to see all individual cores)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
top - 15:46:46 up 1 day, 23:20, 2 users, load average: 4.14, 4.61, 5.00
Tasks: 519 total, 1 running, 518 sleeping, 0 stopped, 0 zombie
Cpu(s): 7.6%us, 1.5%sy, 0.0%ni, 89.6%id, 0.0%wa, 0.1%hi, 1.2%si, 0.0%st
Mem: 131774100k total, 19940328k used, 111833772k free, 389664k buffers
Swap: 33551672k total, 0k used, 33551672k free, 11931140k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
6069 admin 0 -20 2653m 2.1g 178m S 379 1.6 5724:45 fwk2_dev
13600 admin 15 0 716m 227m 40m S 7 0.2 7:40.63 fw_full
13598 admin 15 0 292m 75m 39m S 1 0.1 0:50.09 cpd
2952 admin 0 -20 710m 179m 31m S 1 0.1 22:39.70 fwk0_dev
3845 admin 15 0 365m 342m 10m S 1 0.3 4:17.23 rad
6071 admin 0 -20 1267m 734m 99m S 1 0.6 35:17.06 fwk1_dev
3163 admin 15 0 286m 74m 44m S 0 0.1 0:24.01 cpd
3503 admin 15 0 615m 100m 41m S 0 0.1 0:40.87 fw_full
12837 admin 15 0 292m 75m 39m S 0 0.1 0:30.06 cpd
12839 admin 15 0 563m 72m 40m S 0 0.1 0:04.51 fw_full
1 admin 15 0 1976 724 624 S 0 0.0 0:04.91 init
2 admin RT -5 0 0 0 S 0 0.0 0:00.04 migration/0
3 admin 15 0 0 0 0 S 0 0.0 0:00.00 ksoftirqd/0
4 admin RT -5 0 0 0 S 0 0.0 0:00.00 watchdog/0
5 admin RT -5 0 0 0 S 0 0.0 0:00.00 migration/1
6 admin 15 0 0 0 0 S 0 0.0 0:00.00 ksoftirqd/1
7 admin RT -5 0 0 0 S 0 0.0 0:00.00 watchdog/1
8 admin RT -5 0 0 0 S 0 0.0 0:15.59 migration/2
9 admin 15 0 0 0 0 S 0 0.0 0:00.00 ksoftirqd/2
10 admin RT -5 0 0 0 S 0 0.0 0:00.00 watchdog/2
11 admin RT -5 0 0 0 S 0 0.0 3:14.68 migration/3
12 admin 15 0 0 0 0 S 0 0.0 0:00.00 ksoftirqd/3
13 admin RT -5 0 0 0 S 0 0.0 0:00.00 watchdog/3
14 admin RT -5 0 0 0 S 0 0.0 0:05.87 migration/4
15 admin 15 0 0 0 0 S 0 0.0 0:00.00 ksoftirqd/4
16 admin RT -5 0 0 0 S 0 0.0 0:00.00 watchdog/4
17 admin RT -5 0 0 0 S 0 0.0 0:08.82 migration/5
18 admin 15 0 0 0 0 S 0 0.0 0:00.00 ksoftirqd/5
19 admin RT -5 0 0 0 S 0 0.0 0:00.00 watchdog/5
20 admin RT -5 0 0 0 S 0 0.0 0:02.10 migration/6
21 admin 15 0 0 0 0 S 0 0.0 0:00.00 ksoftirqd/6
22 admin RT -5 0 0 0 S 0 0.0 0:00.00 watchdog/6
23 admin RT -5 0 0 0 S 0 0.0 0:04.03 migration/7
24 admin 15 0 0 0 0 S 0 0.0 0:00.00 ksoftirqd/7
25 admin RT -5 0 0 0 S 0 0.0 0:00.00 watchdog/7
26 admin RT -5 0 0 0 S 0 0.0 0:06.17 migration/8
27 admin 15 0 0 0 0 S 0 0.0 0:00.00 ksoftirqd/8
28 admin RT -5 0 0 0 S 0 0.0 0:00.00 watchdog/8
29 admin RT -5 0 0 0 S 0 0.0 0:52.68 migration/9
[Expert@vsx-lvn-pub2:0]# fw ctl multik stat
fw: CoreXL is disabled
[Expert@vsx-lvn-pub2:0]# cpmq get
Active ixgbe interfaces:
eth1-01 [Off]
eth1-02 [Off]
eth1-03 [Off]
eth1-04 [Off]
Active igb interfaces:
Mgmt [Off]
Sync [Off]
eth2-01 [Off]
Mgmt: CPU 0
Sync: CPU 0
eth1-01: CPU 1 2
eth1-02: CPU 3 4
eth1-03: CPU 5 6
eth1-04: CPU 7 8
eth2-01: CPU 9 10
VS_0: CPU 39
VS_0 fwk: CPU 39
VS_1: CPU 11 12 13 14 15 16 17 18 19 20
VS_1 fwk: CPU 11 12 13 14 15 16 17 18 19 20
VS_2: CPU 21 22 23 24 25 26 27 28 29 30
VS_2 fwk: CPU 21 22 23 24 25 26 27 28 29 30
VS_3 fwk: CPU 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
First question you need to ask yourself, how many core per VS are enabled? If more than one, what are affinity settings?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
VS 2 is the VS with issues
Mgmt: CPU 0
Sync: CPU 0
eth1-01: CPU 1 2
eth1-02: CPU 3 4
eth1-03: CPU 5 6
eth1-04: CPU 7 8
eth2-01: CPU 9 10
VS_0: CPU 39
VS_0 fwk: CPU 39
VS_1: CPU 11 12 13 14 15 16 17 18 19 20
VS_1 fwk: CPU 11 12 13 14 15 16 17 18 19 20
VS_2: CPU 21 22 23 24 25 26 27 28 29 30
VS_2 fwk: CPU 21 22 23 24 25 26 27 28 29 30
VS_3 fwk: CPU 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Bart! Your core allocation is set wrong. You are using hyperthreaded CPUs so you have to be mindful about numbering!
You have allocated the same physical cores to SecureXL on interfaces (0-10) and VS2 (matching sibling cores 20-30)
Pls have a look at the article i wrote not that long ago
Security Gateway Performance Optimization - VSX
Make your own spreadsheet and re-allocate cores correctly.
Note that having 2 CPUs per interface is not going to help you unless you use multiqueue - may as well stick with singe CPU per interface.
In your case I would do like this to start with and then tweak depending on CPU usage
Mgmt: CPU 0
Sync: CPU 0
eth1-01: CPU 1 (21)
eth1-02: CPU 2 (22)
eth1-03: CPU 3 (23)
eth1-04: CPU 4 (24)
eth2-01: CPU 5 (25)
VS_0: CPU 6 26
VS_0 fwk: CPU 6 26
VS_1: CPU 10 11 12 13 14 30 31 32 33 34
VS_1 fwk: CPU 10 11 12 13 14 30 31 32 33 34
VS_2: CPU 15 16 17 18 19 35 36 37 38 39
VS_2 fwk: CPU 15 16 17 18 19 35 36 37 38 39
VS_3 fwk: CPU 7 8 9 27 28 29
Or design your own but make sure you take into account hyper threaded numbering. i.e Physical core 0 also holds hyperthreaded core 20 so don't mix SecureXL and CoreXL on those!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Let us know if you need more info or commands to set affinities
And remember to press 1 when you do top command so you see all individual cores not just summary
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Kaspars,
I've read your article it was very helpfull, i've modified the affinity following your guidelines.
I will keep you posted next week when production starts again on monday.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Btw, I had to guess-work some things, so ideally send us fw ctl multik stat command output to confirm that suggested config will work ok.
Also if you have possibility, set up some sort of SNMP graphs for all CPU cores to further fine tune your CoreXL and SXL
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You also would have to adjust allocation depending on total number of cores. My example was for 40 HT cores
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What is you experience with multique? Is it advisable to enable it on some interfaces?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If the RX-DRP rate on a busy interface is >0.1% (viewed with netstat -ni) even enough SND/IRQ cores have been allocated such that the busy interface has its own dedicated SND/IRQ core as shown by sim affinity -l, then Multi-Queue should be enabled. Multi-Queue does cause some slight additional overhead on the SND/IRQ core to "stick" the packets associated with a single connection to the same queue every time to avoid out of order delivery, so enabling Multi-Queue is not always a no-brainer. More SND/IRQ cores should be allocated first if possible. Specifically if all SND/IRQ cores are very busy (>75% utilization) and you can't allocate any more due to a limited number of cores, enabling Multi-Queue will actually make things worse.
--
Second Edition of my "Max Power" Firewall Book
Now Available at http://www.maxpowerfirewalls.com
Exclusively at CPX 2025 Las Vegas Tuesday Feb 25th @ 1:00pm
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
output of fw ctl multik stat after the change
[Expert@vsx-lvn-pub2:2]# fw ctl multik stat
ID | Active | CPU | Connections | Peak
----------------------------------------------
0 | Yes | 15-19+ | 1232 | 6691
1 | Yes | 15-19+ | 935 | 6653
2 | Yes | 15-19+ | 1120 | 9340
3 | Yes | 15-19+ | 932 | 6128
4 | Yes | 15-19+ | 1281 | 11506
5 | Yes | 15-19+ | 968 | 7107
6 | Yes | 15-19+ | 1073 | 8319
7 | Yes | 15-19+ | 1006 | 7035
8 | Yes | 15-19+ | 1072 | 6552
9 | Yes | 15-19+ | 1123 | 5669
[Expert@vsx-lvn-pub2:2]# vsenv 1
Context is set to Virtual Device vsx-lvn-pub2_fw-lvn-snx (ID 1).
[Expert@vsx-lvn-pub2:1]# fw ctl multik stat
ID | Active | CPU | Connections | Peak
----------------------------------------------
0 | Yes | 10-14+ | 391 | 115231
1 | Yes | 10-14+ | 183 | 76175
2 | Yes | 10-14+ | 312 | 116117
3 | Yes | 10-14+ | 860 | 114412
4 | Yes | 10-14+ | 445 | 90342
[Expert@vsx-lvn-pub2:1]# vsenv 3
Context is set to Virtual Device vsx-lvn-pub2_vlan802 (ID 3).
[Expert@vsx-lvn-pub2:3]# fw ctl multik stat
fw: CoreXL is disabled
[Expert@vsx-lvn-pub2:3]# vsenv 0
Context is set to Virtual Device vsx-lvn-pub2 (ID 0).
[Expert@vsx-lvn-pub2:0]# fw ctl multik stat
fw: CoreXL is disabled
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am on R77.30 (going to R80.10 soon). I thought the problem is fixed on R80.10 but seems like you are having the same issue that I see on R77.30. I am not sure if we can ever get the same performance on VSX, compared with regular gateways. You need to fine tune your VSX environment to improve the performance.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'll guess so, but even a policy push will make the cluster unstable or at least that VS.
Pushing policy will make the VS unresponsive for a couple of minutes.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Just want to share my experience. We have three 4-node VSX clusters, all the hardware is 23800. One cluster was upgraded to R80.10 (from R77.30) couple of months ago, and one a few days ago.
The only issue I have is that the performance is not the same as the normal (non-VSX) gateway. Also I wish there is no downtime on changing CoreXL value.
Other than that we have no other issues, it is stable and reliable. No issues ever noticed on pushing the policy.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you upgrade to R80.20 then there will be no downtime in changing cores.
As for performance, you can tweak it to be fairly close now with 64 bit kernel on VS but of course it will never be as fast as non-vsx.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What do the interfaces look like - are you clocking up RX or TX errors?
watch netstat -ni
& ethtool for the i-faces - look at queueing
Also dig into and look at the input queue config:
sk61143
you may need to increase it (from old memory here) as it feeds all the VSs
Good luck,
Jerry Lee
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Normally I would enable MQ on busy 10Gbps interface as single core cannot cope with that much traffic.
We tried to enable it on 4x1Gbps bond but for some reason it didn't work that well. Bond kept losing members and was going up and down as a result every minute so we rolled it back. Didn't have time to investigate it any further I'm afraid but I believe there was something wrong in our config. Too many things "to do" at the moment.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The strange situation you encountered with Multi-Queue is why I am generally not a fan of turning on features that aren't enabled by default, unless you know for sure that you need them. The KISS principle is your friend... 🙂
--
Second Edition of my "Max Power" Firewall Book
Now Available at http://www.maxpowerfirewalls.com
Exclusively at CPX 2025 Las Vegas Tuesday Feb 25th @ 1:00pm
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We worked with o-tac and a diamond engineer to tshoot and tweak this issue.
.
All traffic inbound was being affected to all of the VSs on rhe 21400 4 node cluster.
Jerry
Get Outlook for Android<https://aka.ms/ghei36>
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Bart,
We've a similar set up as yours; but we're using the 23800 appliances. We've ditched R80.10 and replaced it with R80.20. Have you considered to upgrade to R80.20, because between the lines you can read that the performance on R80.10 could be better. From the R80.20 release notes:
- Significant boost to Virtual Systems performance, utilizing up to 32 CoreXL FW instances for each Virtual System.
- Dynamic Dispatcher - Packets are processed by different FW worker (FWK) instances based on the current instance load.
- Changes in the number of FW worker instances (FWK) in a VSLS setup do not require downtime.
- SecureXL Penalty Box supports the contexts of each Virtual System, see sk74520.
Has the user experience been improved after you made the proposed changes?
Many thanks.
Regards,
Kris
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Actually 64bit mode is available already in R80.10
VSX Enhancements:
- 64-bit support for VSX Gateways, increasing concurrent connections capacity.
- Content Awareness for VSX Gateways.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We do see improvement after setting the CPU affinity right, i mean i definitely now see that all CPU's are used.
i also enabled multiqueing on my busiest interfaces and also here i see an improvement.
But still it's not what it should be, especially if we push a policy all connections are just dropped for 10 minutes. This is not good. we don't see this any other VSX cluster we have.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Which CPU cores are busy when it happens? Is it VS2 ten cores or interfaces?
As suggested by Kris, switching to 64bit kernel will boost a lot of memory if it's FWK that's maxing out
check with vs_bits -stat
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[Expert@vsx-lvn-pub2:0]# vs_bits -stat
All VSs are at 32 bits
