Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Lithin_Mathew
Contributor

Endpoint VPN users facing extreme slowness

We are currently using Checkpoint Appliance 23500 in our Data Centre which is running in Cluster (Active/Standby).

We have approx: 2500 to 3000 active remote VPN users connecting to the firewall at a time during Peak business hours.

The Internet on the Checkpoint Firewall is 2Gbps, and it peaks upto 800Mbps during business hours.

There is 20 CPU's, and we have Multi-Threading enabled so total 40 Virtual CPU's, the CPU peaks to max 55% during the peak business hours.

Hub mode is configured to route all traffic through the gateway (due to security reasons we cannot change it).

Enabled blades:

[Expert@QTS-CP-NW-FW02:0]# enabled_blades
fw vpn cvpn urlf av appi ips identityServer anti_bot content_awareness mon vpn

Most of the Remote VPN users have an Internet speed of about 200Mbps, some even have 500Mbps.

But after connecting to Checkpoint Endpoint VPN the speed goes below 15 Mbps (Download) and Upload (50 Mbps), which is affecting 2000+ users.

Below are some of the verification done from our side:

1. We have auto_detect set for endpoint_vpn_ipsec_transport in Guidbedit Firewall properties.

2. SecureXL is enabled:

[Expert@QTS-CP-NW-FW02:0]# fwaccel stats -s
Accelerated conns/Total conns : 10/39553 (0%)
Accelerated pkts/Total pkts : 163746283249/335101509859 (48%)
F2Fed pkts/Total pkts : 9663120065/335101509859 (2%)
F2V pkts/Total pkts : 2927705054/335101509859 (0%)
CPASXL pkts/Total pkts : 0/335101509859 (0%)
PSLXL pkts/Total pkts : 161692106545/335101509859 (48%)
QOS inbound pkts/Total pkts : 0/335101509859 (0%)
QOS outbound pkts/Total pkts : 0/335101509859 (0%)
Corrected pkts/Total pkts : 0/335101509859 (0%)

 

3. We tried to change the Remote VPN Phase 1 and Phase 2 encryption algorithm to lower encryption AES-128 SHA-1, but still no improvements. 

 

Also we have Multiple Interface option in VPN Clients --> Office Mode checked.

"Support connectivity enhancement for gateways with multiple external interfaces"

 

Need assistance to identify what is causing the network slowness issue in checkpoint VPN. 

 

 

 

 

 

 

 

 

 

 

 

 

0 Kudos
17 Replies
G_W_Albrecht
Legend
Legend

I read that you have 3000 RA clients in Hub mode - so when you divide GWs 2Gbps by 3000x2 (as most traffic goes thru the GW 2 times), what is left for each client ? Routing all connections makes a heavy load...

0 Kudos
Lithin_Mathew
Contributor

Hi @G_W_Albrecht ,

Thank you for your quick response.

We have netflow enabled on the Gateway and as per the bandwidth utilisation report it never exceeds 800Mbps, if bandwidth was an issue we should have seen it peaking up to 2Gbps right.

0 Kudos
G_W_Albrecht
Legend
Legend

What is the traffic on each of the 2 x 2000+ connections? 15 Mbps (Download) and Upload (50 Mbps) times 2000+ times two ?

0 Kudos
Lithin_Mathew
Contributor

Hi @G_W_Albrecht

Actually I forgot to mention the below points as well:

1. We ran the speedtest, during non-business hours, that is at 2am EST when the active remote VPN users where around 50, the results were the same. 

2. During the peak hours we also ran speedtest from the Servers within the DC (which are behind the CP firewall) we get speed upto 700 Mbps for download on these servers, the point to note is all the traffic is going only via the single 2Gbps circuit.

 

So not sure if bandwidth could be the reason which is causing the VPN slowness.

 

0 Kudos
G_W_Albrecht
Legend
Legend

I would consult with TAC about this !

0 Kudos
Lithin_Mathew
Contributor

Thanks @G_W_Albrecht  for checking on this, actually we already have a ticket opened with Checkpoint Support for the same, currently its escalated to Tier-3 but still we are unable to find the root cause of the issue, so I thought to get help from the Checkpoint Community.

0 Kudos
_Val_
Admin
Admin

I second @G_W_Albrecht, hub mode might be the main reason for slow connectivity. 

0 Kudos
PhoneBoy
Admin
Admin

It may be partially a client side issue and should be addressed via the TAC.

That said with appropriate controls on the endpoint you don’t need to “route all traffic” back to your gateways.
To me, that seems like a much more scalable approach.

0 Kudos
Timothy_Hall
Champion
Champion

You don't mention your code version, make sure you are running at least R80.40 Jumbo HFA Take 53+ where major scalability improvements were added for Visitor Mode traffic.  Do you know if your users are utilizing Visitor Mode?

Need to know your CoreXL split, and what does individual core utilization look like during busy periods on SND/IRQ vs Firewall Worker cores?  Also need to see netstat -ni to ensure network interfaces are running cleanly without frame loss.  Please provide output of Super Seven commands, ideally taken when Remote Access VPN traffic is high.

"Max Capture: Know Your Packets" Video Series
now available at http://www.maxpowerfirewalls.com
0 Kudos
G_W_Albrecht
Legend
Legend

I can second that - Visitor mode can be an issue, see sk159372: Visitor Mode in Remote Access clients

Even with R80.40 Jumbo HFA Take 53, when the limitation on the maximum number of simultaneous Visitor Mode connections of 1024 was lifted, Visitor mode can only work by adding additional encapsulations to the traffic...

0 Kudos
Lithin_Mathew
Contributor

Hi @Timothy_Hall ,

We are running R80.30 Take 196.

NAT-T is enabled in VPN Clients > Remote Access, also as of checking now we have 1900+ users connected to RA VPN and only 3 users part of Visitor mode.

We had high CPU on the SND Cores before we enabled Multi-Queue (before June 2020), after we enabling Multi-Queue and adding more cores to the SND (currently 6 cores for Multi-Queue and 34 Cores for FW Workers) we have not seen SND's crossing above 60% CPU during peak hours.

I have attached all the outputs here.

 

 

 

 

 

0 Kudos
Timothy_Hall
Champion
Champion

Your firewall appears to be well-tuned and not struggling.  Your issue kind of sounds like this SK, but your SNDs don't seem to be overloaded:

sk165853: High CPU usage on one CPU core when the number of Remote Access users is high

NAT-T should be getting handled in the kernel, but what does the CPU utilization of vpnd look like when things are slow?  I'm wondering if some condition is forcing large amounts of RA VPN traffic to get handled by vpnd.

Beyond that, it could be some kind of low MTU issue in the network path causing issues with IPSec and the inability to fragment.  Try forcing a slow client to use either Visitor Mode or NAT-T as specified here and see what happens: sk107433: How to change transport method with Endpoint Clients 

"Max Capture: Know Your Packets" Video Series
now available at http://www.maxpowerfirewalls.com
0 Kudos
Lithin_Mathew
Contributor

Hi @Timothy_Hall , @G_W_Albrecht , @_Val_ , @PhoneBoy ,

Thanks all for your help on this, we were able to get this fixed at last.

After working with about 6 Checkpoint Engineers from TAC and 8 hours of troubleshooting, we were able to identify, the culprit for this issue was the Multiple Interfaces option in VPN Clients which was checked.

Even though we only had a Single WAN Interface, the option was kept checked for a very long time (more than 2 years), but the impact was felt when covid started and large number of users migrated to Remote VPN.

The internet speed test was less than 2 Mbps when it was checked and it went upto 40 Mbps after this option was unchecked.

   

 

 

_Val_
Admin
Admin

Great to know it is resolved! Thanks for sharing

0 Kudos
Duane_Toler
Collaborator

Excellent news!

Curious, if you can say:  Was this similar to the kernel parameter "fw ctl set int tunnel_test_do_in_kernel 1" ? (as in sk164933 and sk128652).  Your stated solution to disable the probing for multiple interfaces seems to be similar to the effects of that kernel value.  Perhaps Val could elucidate further.

 

Either way, congrats and I can imagine your collective relief!

 

0 Kudos
514numbers
Participant

Hi, you seem to be pointing current stats however what is your past baseline? Have you tried to remove any traffic from the route 0 hub mode with sk167000 ( works really well ). Perhapss alleviating some O365 / Teams or whatever you want can help out with that SK. Are you using any sort of QoS ( either on chkp or elsewhere )?.

0 Kudos
Lithin_Mathew
Contributor

Hi @514numbers ,

We have certain limitations in our environment for deploying the sk167000 as we have a set of users for whom outlook won't work if they are not connected to VPN (this was done for security reasons), also wanted to know if Checkpoint has a feature for creating multiple VPN profiles for Remote VPN (a feature which we used for Cisco ASA firewall) in this case we can have different settings for different group of users connecting to the same Gateway.

Regarding QoS we do have it enabled, I have shared the enabled_blades output in the initial Post.

 

 

0 Kudos