I'm trying to squeeze out as much performance using a Check Point CPAP-SG3800 as possible. It's currently running R81, but we might move it to R81.10 if there are some performance benefits to be had.
The issue at hand here is that the customer wants to have two LACP bonds with three members in each bond. This is not due to redundancy, it's strictly about performance and load sharing.
This firewall is running firewall blade only, and they are going to have a lot of 4K video streams passing the firewall. They have based their expectations on the 3800 datasheet which claims:
Claiming 3.6 Gbps throughput and 2.75 Gbps IP-sec VPN throughput (AES-128).
The design is not ideal from the get-go. Going for 3x 1Gbps LACP for achieving 3Gbps is not the best way of handling things, I would much rather have 10Gbps interfaces. But it is what it is I suppose. The bonding is running this configuration:
add bonding group 1
add bonding group 2
add bonding group 1 interface eth1
add bonding group 1 interface eth2
add bonding group 1 interface eth3
add bonding group 2 interface Mgmt
add bonding group 2 interface eth4
add bonding group 2 interface eth5
set bonding group 1 mode 8023AD
set bonding group 1 lacp-rate fast
set bonding group 1 mii-interval 100
set bonding group 1 down-delay 200
set bonding group 1 up-delay 200
set bonding group 1 xmit-hash-policy layer3+4
set bonding group 2 mode 8023AD
set bonding group 2 lacp-rate fast
set bonding group 2 mii-interval 100
set bonding group 2 down-delay 200
set bonding group 2 up-delay 200
set bonding group 2 xmit-hash-policy layer3+4
The customer is also involving Jumbo Frames, so every interface that is a part of bond2 is running MTU 9216. Bond1 which only contains the Internet link is running MTU 1500. Currently, there is no traffic on the site that is running Jumbo Frames on the client/server so in practice all current traffic is MTU 1500.
The 3x 1Gbps LACP bond that makes up the Internet link has a 10Gbps connection to the ISP and the ISP ensures 10Gbps speed within the customer WAN between their main locations. So IP-sec VPN going from one of their main sites to another main site is supposed to have 10Gbps throughout the whole chain with the support of using MTU 9216.
Internal VLAN to VLAN traffic is almost reaching 3 Gbps throughput and is working almost better than expected all things considered. But traffic over IP-sec VPN is hitting a brick wall at around 500-650 Mbps. This is where our real issues begin.
I know the datasheet is some kind of "best-case scenario", but the customer expects it to be higher than this. What makes it even worse is that latency on the entire firewall quadruples once these 4K video streams are passing over IP-sec VPN so even when we have almost idle SND's and fw workers with low load, all traffic is getting affected.
The VPN community is running:
DH Group 19
CoreXL Split / Dynamic Load Balancing is enabled and the appliance is running in USFW mode. When looking at load during these IP-sec sessions we can see the number of SND's increasing from 2 to 4, so the appliance is normally running 6+2, but once this heavy IP-sec traffic hits it's adjusting itself to 4 + 4. When looking at multi-queue all interfaces seems to stay at 2-queues all the time, even after the number of SND's increase during the load.
The problem is that this traffic seems to be locked to a single thread/CPU? Even when going from 2x SND into 4x SND there is always 1x SND stuck at 100% while the others barely see any load. The VPN daemon is supposed to be able to scale over multiple threads but it seems like we are hitting some kind of limit here?
When looking at TOP we can see that it's ksoftirqd/0 that is utilising 100% CPU.
I'm not entirely sure if we can expect this to improve by tweaking things? My thoughts were to disable CoreXL Split and simply put it at 2+6 and manually override multi-queue to have 6 queues on all interfaces with the hopes of this traffic being able to scale. But considering it doesn't seem to be able to scale with the current 4 + 4 I suspect that this won't really change anything?
The customer wants to have Jumbo Frames enabled on the IP-sec VPN. I suppose this might improve performance, as the ISP has Jumbo Frames enabled so in theory we should be able to have MTU 9216 for VPN traffic but this would require us to re-design as they also have IP-sec VPN with third parties. I suppose we would have to utilise VTI and set MTU 9216 on the VTI interface, while also having MTU 9216 on the internet link. Then the other VPN communities could still be using domain-based VPN and normal MTU? Seems like a tiresome process, and if we are limited by hardware I'm not sure it will do anything?
We could also lower the settings on the VPN community, but this should be most affected by phase 2 settings and it's already running AES-128-GCM and that's supposedly the best you can choose in terms of performance when being accelerated using AES-NI.
I might try to flip it from USFW into KMFW. Does anyone have any pointers here? It's the numbers from the datasheet that is putting pressure on us here and I have a hard time understanding how we would be able to achieve anywhere close to 2.75 Mbps via IP-sec VPN on this hardware if the traffic is being stuck to a single-threaded process. These are Intel Atom cores after all.
The gateway on the other end of the tunnel is running an 8-core open server licence, it's configured with 4 + 4 and the interfaces have 4-queues via multi-queue. And its Intel Xeon CPU and AES-NI acceleration does not seem to be any kind of bottleneck. When checking CPview and TOP when the heavy VPN traffic hits there is nothing pointing at this gateway having any limiting factor, at least not at our current 500-650 Mbps throughput. The NIC is on the HCL-list and it's the recommended Intel driver known to work great with Check Point GAiA. This one is also running R81.
Does anyone have any pointers here?
Certifications: CCSA, CCSE, CCSM, CCSM ELITE, CCTA, CCTE, CCVS, CCME