Re: speed up for single connection

Wolfgang

We are looking for a solution with more throughput via an encrypted tunnel.
Our requirement is to get more then 2Gb/s between two hosts for a single connection.

Every host is located behind a gateway, host A behind gateway A and Host B behind gateway B.
The gateways communicate via a private routed network. Currently we are using an IPSEC-tunnel
between both gateways. But for only one single connection the throughput is very limited.
The hosts and the private network are able to run at 10G/s speed.

host A <== gateway A <==> private routed network (encapsulation needed) <==> gateway B ==> host B

Traffic from host A to host B must be encapsulated in the private routed network. This can be IPSEC but
also any other encapsulation is possible, like GRE VXLan etc.

I think the major problem is the single connection between host A and host B. New handling of elephant flows
can solve this, because more the one CPU can be used for these elephant flows. But how about
if any of an ecapsulation protocol is used for the connection between the gateways ?

Are GRE or VXLan single CPU processes ? I believe GRE yes, but VXLan ?
Any essential performance improvements with R82 for IPSEC ?
How can we speed up a single connection with any encapsulation and which ?

Timothy_Hall

The best a single connection can do here is be handled on a single CPU for inspection and perhaps another one for encryption/decryption (see sk118097: MultiCore Support for IPsec VPN); Hyperflow only helps with certain types of Threat Prevention inspection and does not truly spread all inspection duties around multiple cores for an elephant flow.

The key for maximum performance of this traffic in a VPN tunnel here will be:

1) Get the connection into the fastpath, so that all inspection and encryption/decryption will be handled completely in the fast path on a single SND. When one of these connections is actively up and running, run fw tab -t connections -z. If you don't see the connection here at all that means the connection is not in the slowpath, so the connection is eligible to be forced into the fastpath via fast_accel: sk156672: SecureXL Fast Accelerator (fw fast_accel) for R80.20 and above

2) If your firewalls both support the AES-NI processor extension (they almost certainly do, use fw ctl get int AESNI_is_supported to check), use the AES-GCM-128 version of AES for IKE Phase 2/IPSec. GCM combines the encryption and hashing into a single operation that can be boosted 4-10X over the main CPU via AES-NI. If the processor architecture does not support AES-NI (unlikely), AES-128 will be slightly more efficient. GCM is not supported for use by IKE/Phase 1 until R82, but the vast majority of traffic sent in a VPN tunnel is in the IPSec/P2 tunnel, so using GCM in IKE/Phase1 won't make much of a difference performance-wise.

3) Leaving PFS disabled will avoid an expensive DH calculation every time the P2 tunnel expires, but that is only once every 60 minutes by default which won't make much of a difference.

4) It is assumed that the MTU is 1500 between all these systems so you won't have to deal with fragmentation or TCP MSS Clamping. If not get that fixed ASAP. Also it is assumed the network is clean and that netstat -ni is showing no RX-ERR/OVR/DRP.

5) One final thing to try is disabling SMT/Hyperthreading so that each SND instance is assigned a "full" physical core and not being shared with another SND instance. This might get you another 20-30% boost under high load.

If all this was done, when one of these heavy connections is running in the tunnel you should see one of your SNDs climb near 100% CPU utilization; that performance level is pretty much all you are going to get out of it unless you get a bigger box with faster individual CPUs. If the connection seems to be topping out and the SND is not near 100%, there may be something else bottlenecking it that could be rectified to pick up some more speed. Multi-Queue might be able to automatically spread out the processing across multiple SNDs but I'd say that is pretty unlikely, but make sure MQ is enabled on the relevant interfaces anyway.

I doubt GRE would be faster than 100% fastpath handling, even without the encryption/decryption overhead.

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com

PhoneBoy

GRE tunnels are done in slowpath, so you're probably correct about that. 🙂

Timothy_Hall

Actually the R82 release notes state:

SecureXL acceleration of traffic over VxLAN and GRE tunnels.

I'm not sure if this means that GRE/VxLAN traffic that is simply transiting the gateway can be accelerated (which seems more likely), or whether VxLAN/GRE tunnels that are implemented by Gaia and terminate on the gateway itself can be accelerated too (less likely). Generally connections/tunnels that terminate in the Gaia OS itself cannot be accelerated but I could be wrong? R&D clarification please?

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com

PhoneBoy

I suspect it's "through" the gateway, not terminating "on" the gateway, at least based on the SecureXL Mechanism SK in the context of interface types: https://support.checkpoint.com/results/sk/sk32578

Timothy_Hall

You're probably right, but even if it is just for transiting GRE/VxLAN traffic that is a big improvement since all transiting traffic that was not TCP or UDP has always had to go slowpath.

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com

Wolfgang

Thanks a lot @Timothy_Hall and @PhoneBoy all that are really good points. We need this for a new project. So which throughput can we achieve under best conditions ? I think we don‘t need an appliance with a lot of CPUs we need the one with the fastest .

Any suggestions or real experience would be helpful. We don‘t want to optimize the running gateways, we want to use new gateway with the best approach to get the fastest speed.

Timothy_Hall

Hey @PhoneBoy can you confirm that VxLAN/GRE tunnels implemented by Gaia terminating on the gateway can only go F2F/slowpath even in R82? I have a customer proposing a solution leveraging Gaia's GRE and want to be sure about this. Thanks!

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com

PhoneBoy

"SecureXL does not accelerate traffic that passes through these interfaces." (applies to both GRE and VxLAN)
https://support.checkpoint.com/results/sk/sk32578

Are you a member of CheckMates?

speed up for single connection