- Products
- Learn
- Local User Groups
- Partners
- More
Quantum Spark Management Unleashed!
Check Point Named Leader
2025 Gartner® Magic Quadrant™ for Hybrid Mesh Firewall
HTTPS Inspection
Help us to understand your needs better
CheckMates Go:
SharePoint CVEs and More!
Dear community,
I am currently investigating an issue on a CPSG 3800 cluster running only S2S vpns. Throughput is limited to roundabout 300 Mbps because CPU 0 is contantly at 100% load. Besides normal SND tasks, there is the process show below which is causing the load:
Does any of you had similar issues and a solution?
Cheers,
Michael
Okay problem is finally "solved": Check Point provided a POC hardware (6400) with better CPUs but lower VPN throughput according to datasheet. But single thread performance matters in this case. Bandwidth is now at interface maximum (1 Gbps) with CPU 0 load at about 30%.
Conclusion: Either "someone" fixes the datasheets or performance figures must be divided by ten...
A similar issue was resolved in JHFs. Check you have the latest installed. For example from R81.10:
https://sc1.checkpoint.com/documents/Jumbo_HFA/R81.10/R81.10/R81.10-List-of-all-Resolved-Issues.htm
PRJ-42145, PMTR-88118 | SecureXL | SNDs may reach 100% CPU utilization and are not released in some Site to Site VPN scenarios. |
Thank you for your reply. But take 95 is already installed.
I suggest contacting TAC and referencing the fix in the JHF (PRJ-42145, PMTR-88118) so that they can communicate it with relevant owners in R&D.
In the mean time we kind of "solved" it by setting P2 integrity to SHA1 after reviewing sk73980.
But honestly: This gateway is specified to achieve 2.75 IPSEC performance.
It should reach that with up to date crypto and NOT by using deprecated ciphers.
TAC case is open. 😉
Just my personal honest opinion...I would NOT do settings from that sk, because it essentially "solves" speed, but severely impacts security as far as VPN goes.
That's a bit generic... E.g. No one wants 3DES. Whereas the AES-NI friendly protocols are better on both fronts.
Hey Chris,
I had more than 1 instance where TAC suggested that sk on the phone to customers and every single time they got an argument back about security, which is 100% valid. TAC response was always that while its true, it should help the speed. In my view, sort of hard sell for security company....
We all have different objectives & situations/ constraints that we're dealing with. Security is a sound argument but often doesn't fly if an appliance is undersized for the task at hand.
Ultimately it's a balance like most things.
I get it, but if you were a customer and security vendor told you that, Im sure you would not be too happy about it.
Absolutely right! I wouldn't do that either but, you might know how "urgent" things can get. 8)
Do you know how many SNDs are there were on the machine, and what is the CPU usage on those SNDs when the issue happened?
There's dynamic balancing configured and working fine. All other cores are at ~90% idle.
Hashes for HMACs are vastly different from the same hashes used for file integrity. They don't depend nearly as much on the security of the hash, since they're computed per packet, and finding collisions retrospectively isn't useful.
MD5 provides more than enough integrity assurance for HMAC use for the next century. The only reason to use anything longer is checkbox-compliance where somebody cares more about not seeing the string 'md5" in a configuration than they care about the actual security of the environment.
Fully agreed. But there are "checkboxes".
I've successfully argued with compliance assessors for a few standards (including PCI-DSS) that the proscription against MD5 and SHA1 doesn't apply to HMACs.
Next, I need to convince them that the requirement for passwords is "expiration every 90 days or better" and that no expiration is better, like NIST says. It just hasn't come up in an assessment yet.
So VPN and no blades other than FW?
Have you tuned / altered your CoreXL config at all?
Yes, FW/VPN only and we altered different aspects of CoreXL including disabling dynamic balancing / multiqueueing with no impact on the main problem. We also checked if KSFW mode makes a difference but as expected, it has not.
From the perf top output above I assume, that this is no issue that can be solved on CoreXL level. Looks like a software bug to me.
Understood. Specific issues aside VPN performance may require changes for best results is the point and can also be dependent on the testing methodology.
Hope the SR is resolved for you quickly.
Thanks a lot!
I've had one case where a customer with around 80 incoming tunnels changed the community from "tunnel per gateway pair" to "tunnel per host pair" and that immediately caused the appliance to become unresponsive. Something to double-check?
Yes we checked that, too. It's already "tunnel per gateway pair". Only a hadful of VPNs and about 30 firewall rules...
That setting is usually checked for permanent tunnel...is that the case? Or is it just regular tunnel?
Andy
This is a permanent tunnel connecting to Zscaler. We configured it according to sk174848 and it has been working fine for months. Customer did not change anything but all of a sudden packet loss started last monday.
My guess is that traffic never got above ~350 Mbps but no one noticed. Today, after changing P2 integrity from sha256 to sha1 we noticed rates above 500 Mbps with cpu 0 load at about 50% which is okay.
Currently we're waiting for R&D to come back with suggestions on how to switch back to sha256. 😉
K, got it, fair enough : - )
Here is what escalation guy from Dallas gave me few months ago when customer had similar issue...client never implemented it, since they had more pressing projects on the go, but not sure if this is something that would help though
Andy
fw_clamp_tcp_mss_control - change in Guidbedit to true
mss_value - change to 1200
fw_clamp_vpn_mss -> # fw ctl set int fw_clamp_vpn_mss 1
sim_clamp_vpn_mss -> change to 1 $PPKDIR/conf/simkern.conf
If your VPN peer supports it, in P2 try using a Galois Counter Mode version of AES such as AES-GCM-128 or AES-GCM-256. GCM takes the encryption and hashing function and stitches them into a single operation, which can be fully offloaded to the AES-NI silicon. This should give you a nice boost; we switch to a GCM-based algorithm from 3DES in a lab exercise of my Gateway Performance Optimization Course and the speed increase is impressive. The 3800 does have 8 cores but they are low-power ATOM and not very speedy, the performance of the 3800 was dissected in this earlier thread:
Maximum reachable bandwidth 3800
This would be right if we had encryption in P2. But this is a VPN tunnel with Zscaler ZIA and we're only using the hash function. See sk174848.
Maybe that's part of the problem? Don't know but our L3 engineer forwarded it to R&D. I'm pretty curious.
BTW: I don't expect the box to provide more than 700-800 Mbps throughput.
The explanation can be found here: sk177966: Secure Network Distributor (SND) shows high CPU usage when 3DES encryption is enabled
sk174848 tells you to Configure the encryption properties according to the following recommendation:
5.2.2.2. IKE SA Phase 2 – NULL or AES + MD5
So why did you use sha256 in P2 ? Usually, AES-128 / SHA-1 is good enough for P2...
Tunnel has been created according to sk174848.
See configuration screenshots there. But it also performs bad with sha1 and I bet it's worse with md5. But that's not the point: The issue is, that the hashing nearly fully saturates one cpu core and if an appliance is specified for 2.75 Gbps performance I would expect it to deliver anything near that number.
Leaderboard
Epsum factorial non deposit quid pro quo hic escorol.
User | Count |
---|---|
18 | |
11 | |
7 | |
7 | |
6 | |
6 | |
6 | |
4 | |
4 | |
3 |
Tue 16 Sep 2025 @ 02:00 PM (EDT)
Securing Applications with Check Point and AWS: A Unified WAF-as-a-Service Approach - AmericasWed 17 Sep 2025 @ 04:00 PM (AEST)
Securing Applications with Check Point and AWS: A Unified WAF-as-a-Service Approach - APACWed 17 Sep 2025 @ 03:00 PM (CEST)
Securing Applications with Check Point and AWS: A Unified WAF-as-a-Service Approach - EMEAThu 18 Sep 2025 @ 03:00 PM (CEST)
Bridge the Unmanaged Device Gap with Enterprise Browser - EMEAThu 18 Sep 2025 @ 02:00 PM (EDT)
Bridge the Unmanaged Device Gap with Enterprise Browser - AmericasTue 16 Sep 2025 @ 02:00 PM (EDT)
Securing Applications with Check Point and AWS: A Unified WAF-as-a-Service Approach - AmericasWed 17 Sep 2025 @ 04:00 PM (AEST)
Securing Applications with Check Point and AWS: A Unified WAF-as-a-Service Approach - APACWed 17 Sep 2025 @ 03:00 PM (CEST)
Securing Applications with Check Point and AWS: A Unified WAF-as-a-Service Approach - EMEAThu 18 Sep 2025 @ 03:00 PM (CEST)
Bridge the Unmanaged Device Gap with Enterprise Browser - EMEAThu 18 Sep 2025 @ 02:00 PM (EDT)
Bridge the Unmanaged Device Gap with Enterprise Browser - AmericasAbout CheckMates
Learn Check Point
Advanced Learning
YOU DESERVE THE BEST SECURITY