cancel
Showing results for 
Search instead for 
Did you mean: 
Post a Question
Bart_Leysen
Nickel

Bad Performance

We recently moved from OpenServers to VSX Clusters and now performance is really bad. Traffic to the internet is terrible.

People working over VPN can't hardly work. We just don't know were to look anymore. we are on R80.10 take 121.

Clusters are 15600 and 23500 models.

Has anybody any idea were to look?

60 Replies

Re: Bad Performance

The VS on the VSX cluster is default using 1 CPU core.

For more performance , you need to increase the number of core.

To configure CoreXL on a Virtual System:

  1. Open SmartConsole.
  2. From the Gateways & Servers view or Object Explorer, double-click the Virtual System.

    The Virtual System General Properties window opens.

  3. From the navigation tree, select CoreXL.
  4. Select the number of firewall instances for the Virtual System.
  5. Click OK.

Check Point VSX R80.10 Administration Guide 

0 Kudos
Bart_Leysen
Nickel

Re: Bad Performance

we assigned 10 firewall instances to that VS

and we assigned 10 CPU's to this VS dedicated.

0 Kudos

Re: Bad Performance

More information is required here.

What blades are enabled? Might be it's something connected with additional blades enabled for all traffic.

What you can see in top command output, what processes are using most of CPU?

Try to start with Super Seven commands from Tim Hall's presentation, they are integrated in Common Check Point Commands (ccc) under Gateway Performance Optimization section.

Bart_Leysen
Nickel

Re: Bad Performance

Next blades are enabled ips, anti-virus/anti-bot, URL filterening, application and VPN

I will watch the super Seven commans, thanks
0 Kudos

Re: Bad Performance

Can you send output of 

fw ctl affinity -l

fw ctl multik stat

cpmq get

top (extended to see all individual cores)

Bart_Leysen
Nickel

Re: Bad Performance

top - 15:46:46 up 1 day, 23:20,  2 users,  load average: 4.14, 4.61, 5.00
Tasks: 519 total,   1 running, 518 sleeping,   0 stopped,   0 zombie
Cpu(s):  7.6%us,  1.5%sy,  0.0%ni, 89.6%id,  0.0%wa,  0.1%hi,  1.2%si,  0.0%st
Mem:  131774100k total, 19940328k used, 111833772k free,   389664k buffers
Swap: 33551672k total,        0k used, 33551672k free, 11931140k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 6069 admin      0 -20 2653m 2.1g 178m S  379  1.6   5724:45 fwk2_dev
13600 admin     15   0  716m 227m  40m S    7  0.2   7:40.63 fw_full
13598 admin     15   0  292m  75m  39m S    1  0.1   0:50.09 cpd
 2952 admin      0 -20  710m 179m  31m S    1  0.1  22:39.70 fwk0_dev
 3845 admin     15   0  365m 342m  10m S    1  0.3   4:17.23 rad
 6071 admin      0 -20 1267m 734m  99m S    1  0.6  35:17.06 fwk1_dev
 3163 admin     15   0  286m  74m  44m S    0  0.1   0:24.01 cpd
 3503 admin     15   0  615m 100m  41m S    0  0.1   0:40.87 fw_full
12837 admin     15   0  292m  75m  39m S    0  0.1   0:30.06 cpd
12839 admin     15   0  563m  72m  40m S    0  0.1   0:04.51 fw_full
    1 admin     15   0  1976  724  624 S    0  0.0   0:04.91 init
    2 admin     RT  -5     0    0    0 S    0  0.0   0:00.04 migration/0
    3 admin     15   0     0    0    0 S    0  0.0   0:00.00 ksoftirqd/0
    4 admin     RT  -5     0    0    0 S    0  0.0   0:00.00 watchdog/0
    5 admin     RT  -5     0    0    0 S    0  0.0   0:00.00 migration/1
    6 admin     15   0     0    0    0 S    0  0.0   0:00.00 ksoftirqd/1
    7 admin     RT  -5     0    0    0 S    0  0.0   0:00.00 watchdog/1
    8 admin     RT  -5     0    0    0 S    0  0.0   0:15.59 migration/2
    9 admin     15   0     0    0    0 S    0  0.0   0:00.00 ksoftirqd/2
   10 admin     RT  -5     0    0    0 S    0  0.0   0:00.00 watchdog/2
   11 admin     RT  -5     0    0    0 S    0  0.0   3:14.68 migration/3
   12 admin     15   0     0    0    0 S    0  0.0   0:00.00 ksoftirqd/3
   13 admin     RT  -5     0    0    0 S    0  0.0   0:00.00 watchdog/3
   14 admin     RT  -5     0    0    0 S    0  0.0   0:05.87 migration/4
   15 admin     15   0     0    0    0 S    0  0.0   0:00.00 ksoftirqd/4
   16 admin     RT  -5     0    0    0 S    0  0.0   0:00.00 watchdog/4
   17 admin     RT  -5     0    0    0 S    0  0.0   0:08.82 migration/5
   18 admin     15   0     0    0    0 S    0  0.0   0:00.00 ksoftirqd/5
   19 admin     RT  -5     0    0    0 S    0  0.0   0:00.00 watchdog/5
   20 admin     RT  -5     0    0    0 S    0  0.0   0:02.10 migration/6
   21 admin     15   0     0    0    0 S    0  0.0   0:00.00 ksoftirqd/6
   22 admin     RT  -5     0    0    0 S    0  0.0   0:00.00 watchdog/6
   23 admin     RT  -5     0    0    0 S    0  0.0   0:04.03 migration/7
   24 admin     15   0     0    0    0 S    0  0.0   0:00.00 ksoftirqd/7
   25 admin     RT  -5     0    0    0 S    0  0.0   0:00.00 watchdog/7
   26 admin     RT  -5     0    0    0 S    0  0.0   0:06.17 migration/8
   27 admin     15   0     0    0    0 S    0  0.0   0:00.00 ksoftirqd/8
   28 admin     RT  -5     0    0    0 S    0  0.0   0:00.00 watchdog/8
   29 admin     RT  -5     0    0    0 S    0  0.0   0:52.68 migration/9


[Expert@vsx-lvn-pub2:0]# fw ctl multik stat
fw: CoreXL is disabled

[Expert@vsx-lvn-pub2:0]# cpmq get

Active ixgbe interfaces:
eth1-01 [Off]
eth1-02 [Off]
eth1-03 [Off]
eth1-04 [Off]

Active igb interfaces:
Mgmt [Off]
Sync [Off]
eth2-01 [Off]

[Expert@vsx-lvn-pub2:0]# fw ctl affinity -l
Mgmt: CPU 0
Sync: CPU 0
eth1-01: CPU 1 2
eth1-02: CPU 3 4
eth1-03: CPU 5 6
eth1-04: CPU 7 8
eth2-01: CPU 9 10
VS_0: CPU 39
VS_0 fwk: CPU 39
VS_1: CPU 11 12 13 14 15 16 17 18 19 20
VS_1 fwk: CPU 11 12 13 14 15 16 17 18 19 20
VS_2: CPU 21 22 23 24 25 26 27 28 29 30
VS_2 fwk: CPU 21 22 23 24 25 26 27 28 29 30
VS_3 fwk: CPU 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
0 Kudos

Re: Bad Performance

First question you need to ask yourself, how many core per VS are enabled? If more than one, what are affinity settings? 

0 Kudos
Bart_Leysen
Nickel

Re: Bad Performance

VS 2 is the VS with issues

Mgmt: CPU 0
Sync: CPU 0
eth1-01: CPU 1 2
eth1-02: CPU 3 4
eth1-03: CPU 5 6
eth1-04: CPU 7 8
eth2-01: CPU 9 10
VS_0: CPU 39
VS_0 fwk: CPU 39
VS_1: CPU 11 12 13 14 15 16 17 18 19 20
VS_1 fwk: CPU 11 12 13 14 15 16 17 18 19 20
VS_2: CPU 21 22 23 24 25 26 27 28 29 30
VS_2 fwk: CPU 21 22 23 24 25 26 27 28 29 30
VS_3 fwk: CPU 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39

0 Kudos
Highlighted

Re: Bad Performance

Hi Bart! Your core allocation is set wrong. You are using hyperthreaded CPUs so you have to be mindful about numbering! 

You have allocated the same physical cores to SecureXL on interfaces (0-10) and VS2 (matching sibling cores 20-30)

Pls have a look at the article i wrote not that long ago

Security Gateway Performance Optimization - VSX 

Make your own spreadsheet and re-allocate cores correctly.

Note that having 2 CPUs per interface is not going to help you unless you use multiqueue - may as well stick with singe CPU per interface.

In your case I would do like this to start with and then tweak depending on CPU usage

Mgmt: CPU 0
Sync: CPU 0
eth1-01: CPU 1 (21)
eth1-02: CPU 2 (22)
eth1-03: CPU 3 (23)
eth1-04: CPU 4 (24)
eth2-01: CPU 5 (25)
VS_0: CPU 6 26
VS_0 fwk: CPU 6 26
VS_1: CPU 10 11 12 13 14 30 31 32 33 34
VS_1 fwk: CPU 10 11 12 13 14 30 31 32 33 34
VS_2: CPU 15 16 17 18 19 35 36 37 38 39
VS_2 fwk: CPU 15 16 17 18 19 35 36 37 38 39
VS_3 fwk: CPU 7 8 9 27 28 29

Or design your own but make sure you take into account hyper threaded numbering. i.e Physical core 0 also holds hyperthreaded core 20 so don't mix SecureXL and CoreXL on those!

Re: Bad Performance

Let us know if you need more info or commands to set affinities

And remember to press 1 when you do top command so you see all individual cores not just summary Smiley Happy

0 Kudos
Bart_Leysen
Nickel

Re: Bad Performance

Thanks Kaspars,

I've read your article it was very helpfull, i've modified the affinity following your guidelines.

I will keep you posted next week when production starts again on monday.

Re: Bad Performance

Btw, I had to guess-work some things, so ideally send us fw ctl multik stat command output to confirm that suggested config will work ok.

Also if you have possibility, set up some sort of SNMP graphs for all CPU cores to further fine tune your CoreXL and SXL

0 Kudos

Re: Bad Performance

You also would have to adjust allocation depending on total number of cores. My example was for 40 HT cores

0 Kudos
Bart_Leysen
Nickel

Re: Bad Performance

What is you experience with multique? Is it advisable to enable it on some interfaces?

0 Kudos

Re: Bad Performance

If the RX-DRP rate on a busy interface is >0.1% (viewed with netstat -ni) even enough SND/IRQ cores have been allocated such that the busy interface has its own dedicated SND/IRQ core as shown by sim affinity -l, then Multi-Queue should be enabled.  Multi-Queue does cause some slight additional overhead on the SND/IRQ core to "stick" the packets associated with a single connection to the same queue every time to avoid out of order delivery, so enabling Multi-Queue is not always a no-brainer.  More SND/IRQ cores should be allocated first if possible.  Specifically if all SND/IRQ cores are very busy (>75% utilization) and you can't allocate any more due to a limited number of cores, enabling Multi-Queue will actually make things worse.

--
Second Edition of my "Max Power" Firewall Book
Now Available at http://www.maxpowerfirewalls.com

"IPS Immersion Training" Self-paced Video Class
Now Available at http://www.maxpowerfirewalls.com
0 Kudos
Bart_Leysen
Nickel

Re: Bad Performance

output of fw ctl multik stat after the change

[Expert@vsx-lvn-pub2:2]# fw ctl multik stat
ID | Active | CPU | Connections | Peak
----------------------------------------------
0 | Yes | 15-19+ | 1232 | 6691
1 | Yes | 15-19+ | 935 | 6653
2 | Yes | 15-19+ | 1120 | 9340
3 | Yes | 15-19+ | 932 | 6128
4 | Yes | 15-19+ | 1281 | 11506
5 | Yes | 15-19+ | 968 | 7107
6 | Yes | 15-19+ | 1073 | 8319
7 | Yes | 15-19+ | 1006 | 7035
8 | Yes | 15-19+ | 1072 | 6552
9 | Yes | 15-19+ | 1123 | 5669
[Expert@vsx-lvn-pub2:2]# vsenv 1
Context is set to Virtual Device vsx-lvn-pub2_fw-lvn-snx (ID 1).
[Expert@vsx-lvn-pub2:1]# fw ctl multik stat
ID | Active | CPU | Connections | Peak
----------------------------------------------
0 | Yes | 10-14+ | 391 | 115231
1 | Yes | 10-14+ | 183 | 76175
2 | Yes | 10-14+ | 312 | 116117
3 | Yes | 10-14+ | 860 | 114412
4 | Yes | 10-14+ | 445 | 90342
[Expert@vsx-lvn-pub2:1]# vsenv 3
Context is set to Virtual Device vsx-lvn-pub2_vlan802 (ID 3).
[Expert@vsx-lvn-pub2:3]# fw ctl multik stat
fw: CoreXL is disabled

[Expert@vsx-lvn-pub2:3]# vsenv 0
Context is set to Virtual Device vsx-lvn-pub2 (ID 0).
[Expert@vsx-lvn-pub2:0]# fw ctl multik stat
fw: CoreXL is disabled

0 Kudos

Re: Bad Performance

I am on R77.30 (going to R80.10 soon). I thought the problem is fixed on R80.10 but seems like you are having the same issue that I see on R77.30. I am not sure if we can ever get the same performance on VSX, compared with regular gateways. You need to fine tune your VSX environment to improve the performance.

0 Kudos
Bart_Leysen
Nickel

Re: Bad Performance

I'll guess so, but even a policy push will make the cluster unstable or at least that VS.

Pushing policy will make the VS unresponsive for a couple of minutes.

0 Kudos

Re: Bad Performance

Just want to share my experience. We have three 4-node VSX clusters, all the hardware is 23800. One cluster was upgraded to R80.10 (from R77.30) couple of months ago, and one a few days ago.

The only issue I have is that the performance is not the same as the normal (non-VSX) gateway. Also I wish there is no downtime on changing CoreXL value.

Other than that we have no other issues, it is stable and reliable. No issues ever noticed on pushing the policy.

0 Kudos

Re: Bad Performance

If you upgrade to R80.20 then there will be no downtime in changing cores.

As for performance, you can tweak it to be fairly close now with 64 bit kernel on VS but of course it will never be as fast as non-vsx. 

Re: Bad Performance

What do the interfaces look like - are you clocking up RX or TX errors? 

watch netstat -ni

& ethtool for the i-faces - look at queueing

Also dig into  and look at the input queue config:

sk61143

you may need to increase it (from old memory here) as it feeds all the VSs

Good luck,

Jerry Lee

0 Kudos

Re: Bad Performance

Normally I would enable MQ on busy 10Gbps interface as single core cannot cope with that much traffic. 

We tried to enable it on 4x1Gbps bond but for some reason it didn't work that well. Bond kept losing members and was going up and down as a result every minute so we rolled it back. Didn't have time to investigate it any further I'm afraid but I believe there was something wrong in our config. Too many things "to do" at the moment.

0 Kudos

Re: Bad Performance

The strange situation you encountered with Multi-Queue is why I am generally not a fan of turning on features that aren't enabled by default, unless you know for sure that you need them.  The KISS principle is your friend...  :-)

--
Second Edition of my "Max Power" Firewall Book
Now Available at http://www.maxpowerfirewalls.com

"IPS Immersion Training" Self-paced Video Class
Now Available at http://www.maxpowerfirewalls.com

Re: Bad Performance

We worked with o-tac and a diamond engineer to tshoot and tweak this issue.

.

All traffic inbound was being affected to all of the VSs on rhe 21400 4 node cluster.

Jerry

Get Outlook for Android<https://aka.ms/ghei36>

0 Kudos

Re: Bad Performance

Hello Bart,

We've a similar set up as yours; but we're using the 23800 appliances. We've ditched R80.10 and replaced it with R80.20. Have you considered to upgrade to R80.20, because between the lines you can read that the performance on R80.10 could be better. From the R80.20 release notes:

  1. Significant boost to Virtual Systems performance, utilizing up to 32 CoreXL FW instances for each Virtual System.
  2. Dynamic Dispatcher - Packets are processed by different FW worker (FWK) instances based on the current instance load.
  3. Changes in the number of FW worker instances (FWK) in a VSLS setup do not require downtime.
  4. SecureXL Penalty Box supports the contexts of each Virtual System, see sk74520.

Has the user experience been improved after you made the proposed changes?

Many thanks.

Regards,

Kris

Re: Bad Performance

Actually 64bit mode is available already in R80.10 Smiley Happy

VSX Enhancements:

  • 64-bit support for VSX Gateways, increasing concurrent connections capacity.
  • Content Awareness for VSX Gateways.
0 Kudos
Bart_Leysen
Nickel

Re: Bad Performance

We do see improvement after setting the CPU affinity right, i mean i definitely now see that all CPU's are used.

i also enabled multiqueing on my busiest interfaces and also here i see an improvement.

But still it's not what it should be, especially if we push a policy all connections are just dropped for 10 minutes. This is not good. we don't see this any other VSX cluster we have.

0 Kudos

Re: Bad Performance

Which CPU cores are busy when it happens? Is it VS2 ten cores or interfaces?

As suggested by Kris, switching to 64bit kernel will boost a lot of memory if it's FWK that's maxing out

check with vs_bits -stat

0 Kudos
Bart_Leysen
Nickel

Re: Bad Performance

[Expert@vsx-lvn-pub2:0]# vs_bits -stat
All VSs are at 32 bits

0 Kudos