Re: Bad Performance

Bart_Leysen · ‎2018-10-19

We recently moved from OpenServers to VSX Clusters and now performance is really bad. Traffic to the internet is terrible.

People working over VPN can't hardly work. We just don't know were to look anymore. we are on R80.10 take 121.

Clusters are 15600 and 23500 models.

Has anybody any idea were to look?

Hsu_Teddy · ‎2018-10-19

The VS on the VSX cluster is default using 1 CPU core.

For more performance , you need to increase the number of core.

To configure CoreXL on a Virtual System:

Open SmartConsole.
From the Gateways & Servers view or Object Explorer, double-click the Virtual System.
The Virtual System General Properties window opens.
From the navigation tree, select CoreXL.
Select the number of firewall instances for the Virtual System.
Click OK.

Check Point VSX R80.10 Administration Guide

Bart_Leysen · ‎2018-10-19

we assigned 10 firewall instances to that VS

and we assigned 10 CPU's to this VS dedicated.

AlekseiShelepov · ‎2018-10-19

More information is required here.

What blades are enabled? Might be it's something connected with additional blades enabled for all traffic.

What you can see in top command output, what processes are using most of CPU?

Try to start with Super Seven commands from Tim Hall's presentation, they are integrated in Common Check Point Commands (ccc) under Gateway Performance Optimization section.

Bart_Leysen · ‎2018-10-19

Next blades are enabled ips, anti-virus/anti-bot, URL filterening, application and VPN

I will watch the super Seven commans, thanks

Kaspars_Zibarts · ‎2018-10-19

Can you send output of

fw ctl affinity -l

fw ctl multik stat

cpmq get

top (extended to see all individual cores)

Bart_Leysen · ‎2018-10-19

top - 15:46:46 up 1 day, 23:20, 2 users, load average: 4.14, 4.61, 5.00
Tasks: 519 total,   1 running, 518 sleeping,   0 stopped,   0 zombie
Cpu(s): 7.6%us, 1.5%sy, 0.0%ni, 89.6%id, 0.0%wa, 0.1%hi, 1.2%si, 0.0%st
Mem: 131774100k total, 19940328k used, 111833772k free,   389664k buffers
Swap: 33551672k total,        0k used, 33551672k free, 11931140k cached

PID USER      PR NI VIRT RES SHR S %CPU %MEM    TIME+ COMMAND
6069 admin      0 -20 2653m 2.1g 178m S 379 1.6   5724:45 fwk2_dev
13600 admin     15   0 716m 227m 40m S    7 0.2   7:40.63 fw_full
13598 admin     15   0 292m 75m 39m S    1 0.1   0:50.09 cpd
2952 admin      0 -20 710m 179m 31m S    1 0.1 22:39.70 fwk0_dev
3845 admin     15   0 365m 342m 10m S    1 0.3   4:17.23 rad
6071 admin      0 -20 1267m 734m 99m S    1 0.6 35:17.06 fwk1_dev
3163 admin     15   0 286m 74m 44m S    0 0.1   0:24.01 cpd
3503 admin     15   0 615m 100m 41m S    0 0.1   0:40.87 fw_full
12837 admin     15   0 292m 75m 39m S    0 0.1   0:30.06 cpd
12839 admin     15   0 563m 72m 40m S    0 0.1   0:04.51 fw_full
    1 admin     15   0 1976 724 624 S    0 0.0   0:04.91 init
    2 admin     RT -5     0    0    0 S    0 0.0   0:00.04 migration/0
    3 admin     15   0     0    0    0 S    0 0.0   0:00.00 ksoftirqd/0
    4 admin     RT -5     0    0    0 S    0 0.0   0:00.00 watchdog/0
    5 admin     RT -5     0    0    0 S    0 0.0   0:00.00 migration/1
    6 admin     15   0     0    0    0 S    0 0.0   0:00.00 ksoftirqd/1
    7 admin     RT -5     0    0    0 S    0 0.0   0:00.00 watchdog/1
    8 admin     RT -5     0    0    0 S    0 0.0   0:15.59 migration/2
    9 admin     15   0     0    0    0 S    0 0.0   0:00.00 ksoftirqd/2
   10 admin     RT -5     0    0    0 S    0 0.0   0:00.00 watchdog/2
   11 admin     RT -5     0    0    0 S    0 0.0   3:14.68 migration/3
   12 admin     15   0     0    0    0 S    0 0.0   0:00.00 ksoftirqd/3
   13 admin     RT -5     0    0    0 S    0 0.0   0:00.00 watchdog/3
   14 admin     RT -5     0    0    0 S    0 0.0   0:05.87 migration/4
   15 admin     15   0     0    0    0 S    0 0.0   0:00.00 ksoftirqd/4
   16 admin     RT -5     0    0    0 S    0 0.0   0:00.00 watchdog/4
   17 admin     RT -5     0    0    0 S    0 0.0   0:08.82 migration/5
   18 admin     15   0     0    0    0 S    0 0.0   0:00.00 ksoftirqd/5
   19 admin     RT -5     0    0    0 S    0 0.0   0:00.00 watchdog/5
   20 admin     RT -5     0    0    0 S    0 0.0   0:02.10 migration/6
   21 admin     15   0     0    0    0 S    0 0.0   0:00.00 ksoftirqd/6
   22 admin     RT -5     0    0    0 S    0 0.0   0:00.00 watchdog/6
   23 admin     RT -5     0    0    0 S    0 0.0   0:04.03 migration/7
   24 admin     15   0     0    0    0 S    0 0.0   0:00.00 ksoftirqd/7
   25 admin     RT -5     0    0    0 S    0 0.0   0:00.00 watchdog/7
   26 admin     RT -5     0    0    0 S    0 0.0   0:06.17 migration/8
   27 admin     15   0     0    0    0 S    0 0.0   0:00.00 ksoftirqd/8
   28 admin     RT -5     0    0    0 S    0 0.0   0:00.00 watchdog/8
   29 admin     RT -5     0    0    0 S    0 0.0   0:52.68 migration/9

[Expert@vsx-lvn-pub2:0]# fw ctl multik stat
fw: CoreXL is disabled

[Expert@vsx-lvn-pub2:0]# cpmq get

Active ixgbe interfaces:
eth1-01 [Off]
eth1-02 [Off]
eth1-03 [Off]
eth1-04 [Off]

Active igb interfaces:
Mgmt [Off]
Sync [Off]
eth2-01 [Off]

[Expert@vsx-lvn-pub2:0]# fw ctl affinity -l
Mgmt: CPU 0
Sync: CPU 0
eth1-01: CPU 1 2
eth1-02: CPU 3 4
eth1-03: CPU 5 6
eth1-04: CPU 7 8
eth2-01: CPU 9 10
VS_0: CPU 39
VS_0 fwk: CPU 39
VS_1: CPU 11 12 13 14 15 16 17 18 19 20
VS_1 fwk: CPU 11 12 13 14 15 16 17 18 19 20
VS_2: CPU 21 22 23 24 25 26 27 28 29 30
VS_2 fwk: CPU 21 22 23 24 25 26 27 28 29 30
VS_3 fwk: CPU 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39

_Val_ · ‎2018-10-19

First question you need to ask yourself, how many core per VS are enabled? If more than one, what are affinity settings?

Bart_Leysen · ‎2018-10-19

VS 2 is the VS with issues

Mgmt: CPU 0
Sync: CPU 0
eth1-01: CPU 1 2
eth1-02: CPU 3 4
eth1-03: CPU 5 6
eth1-04: CPU 7 8
eth2-01: CPU 9 10
VS_0: CPU 39
VS_0 fwk: CPU 39
VS_1: CPU 11 12 13 14 15 16 17 18 19 20
VS_1 fwk: CPU 11 12 13 14 15 16 17 18 19 20
VS_2: CPU 21 22 23 24 25 26 27 28 29 30
VS_2 fwk: CPU 21 22 23 24 25 26 27 28 29 30
VS_3 fwk: CPU 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39

Kaspars_Zibarts · ‎2018-10-20

Hi Bart! Your core allocation is set wrong. You are using hyperthreaded CPUs so you have to be mindful about numbering!

You have allocated the same physical cores to SecureXL on interfaces (0-10) and VS2 (matching sibling cores 20-30)

Pls have a look at the article i wrote not that long ago

Security Gateway Performance Optimization - VSX

Make your own spreadsheet and re-allocate cores correctly.

Note that having 2 CPUs per interface is not going to help you unless you use multiqueue - may as well stick with singe CPU per interface.

In your case I would do like this to start with and then tweak depending on CPU usage

Mgmt: CPU 0
Sync: CPU 0
eth1-01: CPU 1 (21)
eth1-02: CPU 2 (22)
eth1-03: CPU 3 (23)
eth1-04: CPU 4 (24)
eth2-01: CPU 5 (25)
VS_0: CPU 6 26
VS_0 fwk: CPU 6 26
VS_1: CPU 10 11 12 13 14 30 31 32 33 34
VS_1 fwk: CPU 10 11 12 13 14 30 31 32 33 34
VS_2: CPU 15 16 17 18 19 35 36 37 38 39
VS_2 fwk: CPU 15 16 17 18 19 35 36 37 38 39
VS_3 fwk: CPU 7 8 9 27 28 29

Or design your own but make sure you take into account hyper threaded numbering. i.e Physical core 0 also holds hyperthreaded core 20 so don't mix SecureXL and CoreXL on those!

Kaspars_Zibarts · ‎2018-10-20

Let us know if you need more info or commands to set affinities

And remember to press 1 when you do top command so you see all individual cores not just summary

Bart_Leysen · ‎2018-10-20

Thanks Kaspars,

I've read your article it was very helpfull, i've modified the affinity following your guidelines.

I will keep you posted next week when production starts again on monday.

Kaspars_Zibarts · ‎2018-10-20

Btw, I had to guess-work some things, so ideally send us fw ctl multik stat command output to confirm that suggested config will work ok.

Also if you have possibility, set up some sort of SNMP graphs for all CPU cores to further fine tune your CoreXL and SXL

Kaspars_Zibarts · ‎2018-10-20

You also would have to adjust allocation depending on total number of cores. My example was for 40 HT cores

Bart_Leysen · ‎2018-10-20

What is you experience with multique? Is it advisable to enable it on some interfaces?

Timothy_Hall · ‎2018-10-20

If the RX-DRP rate on a busy interface is >0.1% (viewed with netstat -ni) even enough SND/IRQ cores have been allocated such that the busy interface has its own dedicated SND/IRQ core as shown by sim affinity -l, then Multi-Queue should be enabled. Multi-Queue does cause some slight additional overhead on the SND/IRQ core to "stick" the packets associated with a single connection to the same queue every time to avoid out of order delivery, so enabling Multi-Queue is not always a no-brainer. More SND/IRQ cores should be allocated first if possible. Specifically if all SND/IRQ cores are very busy (>75% utilization) and you can't allocate any more due to a limited number of cores, enabling Multi-Queue will actually make things worse.

--
Second Edition of my "Max Power" Firewall Book
Now Available at http://www.maxpowerfirewalls.com

--
New Max Power 2026 Book - Coming Soon

Bart_Leysen · ‎2018-10-20

output of fw ctl multik stat after the change

[Expert@vsx-lvn-pub2:2]# fw ctl multik stat
ID | Active | CPU | Connections | Peak
----------------------------------------------
0 | Yes | 15-19+ | 1232 | 6691
1 | Yes | 15-19+ | 935 | 6653
2 | Yes | 15-19+ | 1120 | 9340
3 | Yes | 15-19+ | 932 | 6128
4 | Yes | 15-19+ | 1281 | 11506
5 | Yes | 15-19+ | 968 | 7107
6 | Yes | 15-19+ | 1073 | 8319
7 | Yes | 15-19+ | 1006 | 7035
8 | Yes | 15-19+ | 1072 | 6552
9 | Yes | 15-19+ | 1123 | 5669
[Expert@vsx-lvn-pub2:2]# vsenv 1
Context is set to Virtual Device vsx-lvn-pub2_fw-lvn-snx (ID 1).
[Expert@vsx-lvn-pub2:1]# fw ctl multik stat
ID | Active | CPU | Connections | Peak
----------------------------------------------
0 | Yes | 10-14+ | 391 | 115231
1 | Yes | 10-14+ | 183 | 76175
2 | Yes | 10-14+ | 312 | 116117
3 | Yes | 10-14+ | 860 | 114412
4 | Yes | 10-14+ | 445 | 90342
[Expert@vsx-lvn-pub2:1]# vsenv 3
Context is set to Virtual Device vsx-lvn-pub2_vlan802 (ID 3).
[Expert@vsx-lvn-pub2:3]# fw ctl multik stat
fw: CoreXL is disabled

[Expert@vsx-lvn-pub2:3]# vsenv 0
Context is set to Virtual Device vsx-lvn-pub2 (ID 0).
[Expert@vsx-lvn-pub2:0]# fw ctl multik stat
fw: CoreXL is disabled

Muazzam_Saeed · ‎2018-10-19

I am on R77.30 (going to R80.10 soon). I thought the problem is fixed on R80.10 but seems like you are having the same issue that I see on R77.30. I am not sure if we can ever get the same performance on VSX, compared with regular gateways. You need to fine tune your VSX environment to improve the performance.

Bart_Leysen · ‎2018-10-19

I'll guess so, but even a policy push will make the cluster unstable or at least that VS.

Pushing policy will make the VS unresponsive for a couple of minutes.

Muazzam_Saeed · ‎2018-10-23

Just want to share my experience. We have three 4-node VSX clusters, all the hardware is 23800. One cluster was upgraded to R80.10 (from R77.30) couple of months ago, and one a few days ago.

The only issue I have is that the performance is not the same as the normal (non-VSX) gateway. Also I wish there is no downtime on changing CoreXL value.

Other than that we have no other issues, it is stable and reliable. No issues ever noticed on pushing the policy.

Kaspars_Zibarts · ‎2018-10-23

If you upgrade to R80.20 then there will be no downtime in changing cores.

As for performance, you can tweak it to be fairly close now with 64 bit kernel on VS but of course it will never be as fast as non-vsx.

Jerry_Lee · ‎2018-10-19

What do the interfaces look like - are you clocking up RX or TX errors?

watch netstat -ni

& ethtool for the i-faces - look at queueing

Also dig into and look at the input queue config:

sk61143

you may need to increase it (from old memory here) as it feeds all the VSs

Good luck,

Jerry Lee

Kaspars_Zibarts · ‎2018-10-21

Normally I would enable MQ on busy 10Gbps interface as single core cannot cope with that much traffic.

We tried to enable it on 4x1Gbps bond but for some reason it didn't work that well. Bond kept losing members and was going up and down as a result every minute so we rolled it back. Didn't have time to investigate it any further I'm afraid but I believe there was something wrong in our config. Too many things "to do" at the moment.

Timothy_Hall · ‎2018-10-21

The strange situation you encountered with Multi-Queue is why I am generally not a fan of turning on features that aren't enabled by default, unless you know for sure that you need them. The KISS principle is your friend... 🙂

--
Second Edition of my "Max Power" Firewall Book
Now Available at http://www.maxpowerfirewalls.com

--
New Max Power 2026 Book - Coming Soon

Jerry_Lee · ‎2018-10-21

We worked with o-tac and a diamond engineer to tshoot and tweak this issue.

.

All traffic inbound was being affected to all of the VSs on rhe 21400 4 node cluster.

Jerry

Get Outlook for Android<https://aka.ms/ghei36>

Kris_Pellens · ‎2018-10-22

Hello Bart,

We've a similar set up as yours; but we're using the 23800 appliances. We've ditched R80.10 and replaced it with R80.20. Have you considered to upgrade to R80.20, because between the lines you can read that the performance on R80.10 could be better. From the R80.20 release notes:

Significant boost to Virtual Systems performance, utilizing up to 32 CoreXL FW instances for each Virtual System.
Dynamic Dispatcher - Packets are processed by different FW worker (FWK) instances based on the current instance load.
Changes in the number of FW worker instances (FWK) in a VSLS setup do not require downtime.
SecureXL Penalty Box supports the contexts of each Virtual System, see sk74520.

Has the user experience been improved after you made the proposed changes?

Many thanks.

Regards,

Kris

Kaspars_Zibarts · ‎2018-10-22

Actually 64bit mode is available already in R80.10

VSX Enhancements:

64-bit support for VSX Gateways, increasing concurrent connections capacity.
Content Awareness for VSX Gateways.

Bart_Leysen · ‎2018-10-22

We do see improvement after setting the CPU affinity right, i mean i definitely now see that all CPU's are used.

i also enabled multiqueing on my busiest interfaces and also here i see an improvement.

But still it's not what it should be, especially if we push a policy all connections are just dropped for 10 minutes. This is not good. we don't see this any other VSX cluster we have.

Kaspars_Zibarts · ‎2018-10-22

Which CPU cores are busy when it happens? Is it VS2 ten cores or interfaces?

As suggested by Kris, switching to 64bit kernel will boost a lot of memory if it's FWK that's maxing out

check with vs_bits -stat

Bart_Leysen · ‎2018-10-22

[Expert@vsx-lvn-pub2:0]# vs_bits -stat
All VSs are at 32 bits

Are you a member of CheckMates?

Bad Performance