Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
stuart2020
Participant

Mobile VPN Client Disconnects

Hello,

I have been having an issue where our Mobile Access VPN clients will disconnect and reconnect intermittently many times a day. The Gateways are clustered active/standby CheckPoint 15400 appliances running R80.40 take 87 and the VPN clients are on version E84.50. The clients connect using machine certificate based authentication automatically. When the VPN disconnects it will automatically reconnect straight away and continue to work for a period of time before disconnecting again. In the client helpdesk log I can see the following messages:

[6 Aug 7:42:19] Starting connect...
[6 Aug 7:42:19] Creating primary conn flow to FAKE-CLUSTER-NAME (2)
[6 Aug 7:42:19] Transport is auto detect
[6 Aug 7:42:19] No need to upgrade client, client version is 986102502
[6 Aug 7:42:19] Starting new connection (2)
[6 Aug 7:42:20] No need to download topology
[6 Aug 7:42:20] No need to upgrade client, client version is 986102502
[6 Aug 7:42:20] no need executing firewall step
[6 Aug 7:42:23] Office mode IP was set successfully
[6 Aug 7:42:23] No reply from the gw ip=xx.xx.xx.xx for tunnel test packet. Office Mode IP=xx.xx.xx.xx, source port=18001.
[6 Aug 7:42:26] OM started successfully with IP = xx.xx.xx.xx.
[6 Aug 7:42:26] No reply from the gw ip=xx.xx.xx.xx for tunnel test packet. Office Mode IP=xx.xx.xx.xx, source port=18002.
[6 Aug 7:42:26] Client state is connecting
[6 Aug 7:42:26] Connection was successfully established (2)
[6 Aug 7:43:17] No reply from the gw ip=xx.xx.xx.xx for tunnel test packet. Office Mode IP=xx.xx.xx.xx, source port=18004.
[6 Aug 7:43:19] No reply from the gw ip=xx.xx.xx.xx for tunnel test packet. Office Mode IP=xx.xx.xx.xx, source port=18005.
[6 Aug 7:43:21] No reply from the gw ip=xx.xx.xx.xx for tunnel test packet. Office Mode IP=xx.xx.xx.xx, source port=18006.
[6 Aug 7:43:23] No reply from the gw ip=xx.xx.xx.xx for tunnel test packet. Office Mode IP=xx.xx.xx.xx, source port=18007.
[6 Aug 7:43:25] No reply from the gw ip=xx.xx.xx.xx for tunnel test packet. Office Mode IP=xx.xx.xx.xx, source port=18008.
[6 Aug 7:43:27] No reply from the gw ip=xx.xx.xx.xx for tunnel test packet. Office Mode IP=xx.xx.xx.xx, source port=18009.
[6 Aug 7:43:29] No reply from the gw ip=xx.xx.xx.xx for tunnel test packet. Office Mode IP=xx.xx.xx.xx, source port=18010.
[6 Aug 7:43:31] No reply from the gw ip=xx.xx.xx.xx for tunnel test packet. Office Mode IP=xx.xx.xx.xx, source port=18011.
[6 Aug 7:43:33] No reply from the gw ip=xx.xx.xx.xx for tunnel test packet. Office Mode IP=xx.xx.xx.xx, source port=18012.
[6 Aug 7:43:35] IKE tunnel disconnected, error code=-1000. Reason: Site is not responding.
[6 Aug 7:43:35] Client state is connected
[6 Aug 7:43:35] Tunnel (2) disconnected. State is connected. Trying to reconnect.
[6 Aug 7:43:49] IKE connection failed, error code=-1000. Reason: Site is not responding.

 

Control connections are disabled and a Global policy is being used to allow this type of traffic. I can see that traffic is being allowed and not being blocked. I have been through SK106853 to add a NAT rule as the VPND is listening on the inside interface. The client PC is running Windows 10 with no other local firewalls blocking this traffic.

Does anyone have any suggestions as to what could be causing this issue? 

Thank you.

 

0 Kudos
10 Replies
PhoneBoy
Admin
Admin

Given the message is "site is not responding" it's either something related to the connection or the site itself.
Have you tried debugging on the gateway side?
Maybe start here: https://supportcenter.checkpoint.com/supportcenter/portal?eventSubmit_doGoviewsolutiondetails=&solut... 

0 Kudos
stuart2020
Participant

Thank you, these are some of the logs I can see in the VPND file. 

[vpnd 8495 4081866688]@GATEWAY-1[3 Aug 13:53:06][tunnel] IkeTcptDataHandler: Recieved error status -1 for connection 1790 (tunnel: 637). Closing connection
[vpnd 8495 4081866688]@GATEWAY-1[3 Aug 13:53:06][tunnel] SharedConnection::Clone: New reference to TCPT connection 1790 (peer: 94.194.145.251). Current reference count: 3
[vpnd 8495 4081866688]@GATEWAY-1[3 Aug 13:53:06][tunnel] SharedConnection::RemoveConnectionHandler: 0 handlers still registered
[vpnd 8495 4081866688]@GATEWAY-1[3 Aug 13:53:06][tunnel] TransportConnectionTable::Remove(conn): Removing connection with 94.194.145.251 from table (50 active connections).
[vpnd 8495 4081866688]@GATEWAY-1[3 Aug 13:53:06][tunnel] SharedConnection::Release: Released reference to TCPT connection 1790 (peer: 94.194.145.251). Current reference count: 2
[vpnd 8495 4081866688]@GATEWAY-1[3 Aug 13:53:06][tunnel] SharedConnection::Release: Released reference to TCPT connection 1790 (peer: 94.194.145.251). Current reference count: 1
[vpnd 8495 4081866688]@GATEWAY-1[3 Aug 13:53:06][tunnel] SharedConnection::Release: Released reference to TCPT connection 1790 (peer: 94.194.145.251). Current reference count: 0
[vpnd 8495 4081866688]@GATEWAY-1[3 Aug 13:53:06][tunnel] TCPTConnection::~TCPTConnection: Destroying connection with 94.194.145.251, tunnel: 637, ID: 1790
[vpnd 8495 4081866688]@GATEWAY-1[3 Aug 13:53:06][tunnel] SharedConnection::~SharedConnection: Destroying connection 1790.
[vpnd 8495 4081866688]@GATEWAY-1[3 Aug 13:53:06][tunnel] Killed TransportConnection (1145037 Total: 50)

[vpnd 8495 4081866688]@GATEWAY-1[3 Aug 13:53:09][tunnel] -- updatePayloadMap: received payload PA_HASH.
[vpnd 8495 4081866688]@GATEWAY-1[3 Aug 13:53:09][tunnel] ProcessInfo: identifyPayloads succeeded.
[vpnd 8495 4081866688]@GATEWAY-1[3 Aug 13:53:09][tunnel] ProcessInfo: got neither notify nor delete - aborting
[vpnd 8495 4081866688]@GATEWAY-1[3 Aug 13:53:09][tunnel] TalkToEngine: Engine RC is << FWIKE_ERROR >>
[vpnd 8495 4081866688]@GATEWAY-1[3 Aug 13:53:09][tunnel] TalkToEngine: received Error reply from Engine
[vpnd 8495 4081866688]@GATEWAY-1[3 Aug 13:53:09][tunnel] KillNegotiation: Killing negotiation 472660 (0xe065a58) ...

 

[vpnd 8495 4081866688]@GATEWAY-1[3 Aug 13:53:09][tunnel] NegotiationTable::DeleteNegotiation: peer: 94.194.145.251 found in negByPeer hash
[vpnd 8495 4081866688]@GATEWAY-1[3 Aug 13:53:09][tunnel] NegotiationTable::DeleteNegotiation: last neg for peer: 94.194.145.251, remove from hash
[vpnd 8495 4081866688]@GATEWAY-1[3 Aug 13:53:09][tunnel] WaitingNegotiations::WakeNegotiations: called for: e065a58, there are 0 waiting negs
[vpnd 8495 4081866688]@GATEWAY-1[3 Aug 13:53:09][tunnel] KillNegotiation: Negotiation 472660 (0xe065a58) dead.

 

The log below is strange as I have noticed connections go into Visitor mode. This is a cluster but we only have 1 external interface configured on each node. 
[vpnd 8495 4081866688]@GATEWAY-1[3 Aug 13:53:46][tunnel] ComputeMyNatDHashes: Found 12 interfaces. GW is a cluster
[vpnd 8495 4081866688]@GATEWAY-1[3 Aug 13:53:46][tunnel] ComputeMyNatDHashes: RA NAT-T interfaces limiting is disabled for this negotiation

0 Kudos
PhoneBoy
Admin
Admin

If this is happening for multiple users (going to visitor mode), that could cause a performance issue, particularly in gateway versions prior to R81.
Recommend engaging with the TAC on this if you haven't already.

0 Kudos
stuart2020
Participant

Yes I have raised this with TAC. I will chase them today as the ticket was opened last Tuesday and I haven't had a response yet. 

Thank you. 

0 Kudos
Markus_Genser
Contributor

Have you checked your routing?

The office mode IP range has to be routed through the external interface!

If the range is routed through the internal interfaces, the firewall hast troubles matching it for vpn.

 

BR

Markus

0 Kudos
stuart2020
Participant

Hi Markus,

Yes I have checked the routing. We have a static route pointing the Office mode network to the uplink Internet router. 

Just to confirm, we also have another static route /16 subnet which points to the internal network. This also includes the office mode network. Will CheckPoint use the more specific (office mode /24 subnet) as the priority?

Thank you. 

0 Kudos
Markus_Genser
Contributor

Yes, the firewall will use the more specific route.

Have you checked if the firewall or the SND is during the disconnects under a high load?

We had to implement sk165853 for a customer in the past.

And as you say you use machine certs, have you checked the installed certificates on the client?

I've encountered a couple of times that clients have multiple machine certs installed and the wrong one is used for authentication.

The RA client will automatically use longest valid certificate for the authentication.

 

My last tip would be to check the changelog of the latest jumbo hfas to see if any of the solutions matches your issue.

 

BR

 

 

0 Kudos
stuart2020
Participant

Disconnects occur when all cores are low utilisation. 

I have been through and removed any other machine certificates in case that was causing an issue but that hasn't resolved the issue. 

A few issues have been reported and resolved in the changelog for newer Jumbo hotfixes. Applying the latest take version might be my next step. 

Thank you.

0 Kudos
George_Casper
Contributor

What's your uptime on the 15400's and memory utilization?   Had similar issue on 15400's R80.40 take 87 and a memory leak over time.  Had to reboot the gateway to clear it up.

0 Kudos
stuart2020
Participant

The gateways have been rebooted in the last few weeks. RAM: 24GB (Free: 13GB). 

Interesting on the memory leak. Could be another reason to apply the latest jumbo hotfix. 

Thank you. 

0 Kudos