Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Mike_Jensen
Advisor
Jump to solution

IPSec site to site - some subnets lose connectivity to remote side

Have any of you every encountered an issue with a site to site IPSec VPN where you have multiple subnets on one side and at what seems to be random times one or a few of those subnets lose all connectivity to the far end?

 

At my Headquarters location I have the IPSec VPN running on a Check Point appliance running 80.30 with jumbo hotfix accumulator take 196.  The Branch office has a third party device (Palo Alto).

Twice this week a handful of subnets at Headquarters lost connectivity to the private network at the Branch office while other subnets continued to operate fine.  In both instances connectivity restored itself in about an hour without any manual intervention. 

During these partial outages the logs in SmartConsole's logging show traffic from the subnet(s) in question being encrypted and sent on their way to the Branch office.  On the Branch end that traffic never appears.

I would open a TAC case but I don't know if they will be able to tshoot this unless the problem is currently happening.

My VPN settings are as follows:

No NAT-T

IKEv2 only

Phase 1:

AES-128

SHA256

DH Group 19

Phase 2:

AES-GCM-128

SHA 256

PFS Group 19

No compression

Phase 1 and 2 renegotiation times were left at the defaults of 1440 minutes and 3600 seconds.

All of the subnets needed are in the Check Point encryption domain and on the Branch end the subnets have Proxy-ID's in the Palo Alto.

Per the "Max Power 2020" book I know I am not using the recommended settings for VPNs with third party devices (IKEv2 and PFS), but I wanted to try for the additional security in this case.

If IKEv2 or PFS is the issue here would it affect all subnets or none?

I am aware of SK165003 where when NAT-T is used traffic needs to be initiated from the far end third party device for traffic to actually start traversing the tunnel properly, but I am not using NAT-T in this case.

Any ideas?  

 

 

 

0 Kudos
2 Solutions

Accepted Solutions
Timothy_Hall
Champion
Champion

Hard to say if IKEv2 the issue; my overall recommendation is to always try IKEv2 first and see how it works, but do not hesitate to go back to IKEv1 in an interoperable scenario if issues arise.

Make sure that the data lifesize option on the Palo Alto is disabled or set to an unreachably high value.  Also ensure that there is no VPN tunnel idle timer set on the Palo side.  

Edit: Since you only seem to be only losing some subnets, it could suggest a problem with subsequent Phase 2 negotiations which perform PFS.  So first step might be to try disabling PFS but still use IKEv2.

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com

View solution in original post

0 Kudos
HeikoAnkenbrand
Champion Champion
Champion

Hi @Mike_Jensen,

use this oneliner to debug the P1 and P2 issue quick and easy in realtime.

ONELINER - Ease VPN Debug - with IKE live view 

 

➜ CCSM Elite, CCME, CCTE

View solution in original post

9 Replies
PhoneBoy
Admin
Admin
This smacks of a mismatch in key renegotiation timing settings between the different sites.
Given that IPsec SAs are per subnet usually, this might explain why it starts working again after an hour.
0 Kudos
Mike_Jensen
Advisor

Hi PhoneBoy,

I double checked the lifetime for phase 1 and 2 and they are identical on both devices. 

0 Kudos
PhoneBoy
Admin
Admin
You're going to have to do detailed VPN debug on both sides to see what is going on.
0 Kudos
Mike_Jensen
Advisor
Ok, I have the commands to do that on the Check Point side. The only thing to do now is wait until the issue occurs again as I can't seem to reproduce it.
0 Kudos
Timothy_Hall
Champion
Champion

Hard to say if IKEv2 the issue; my overall recommendation is to always try IKEv2 first and see how it works, but do not hesitate to go back to IKEv1 in an interoperable scenario if issues arise.

Make sure that the data lifesize option on the Palo Alto is disabled or set to an unreachably high value.  Also ensure that there is no VPN tunnel idle timer set on the Palo side.  

Edit: Since you only seem to be only losing some subnets, it could suggest a problem with subsequent Phase 2 negotiations which perform PFS.  So first step might be to try disabling PFS but still use IKEv2.

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
0 Kudos
Mike_Jensen
Advisor
I ended up having to switch to IKEv1. I tried disabling PFS first and that didn't resolve the issue.
For phase two encryption I also had to switch from AES-128-GCM to AES-128.
Does AES GCM only work with IKEv2?
0 Kudos
Timothy_Hall
Champion
Champion

The GCM versions of AES should work with IKEv1 but they are relatively new, which of course often causes problems with interoperability until everything settles out between the different firewall vendors over the years...

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
0 Kudos
Mike_Jensen
Advisor

After a couple of TAC cases for this it was determined that a value in GuiDBedit was not set to the R80 default.  My policy had at some point been migrated from 77.x and the value for "ike_keep_child_sa_interop_devices" was set to false.  This value was changed to "true" (the R80 default) and this VPN has been working a lot better since.

0 Kudos
HeikoAnkenbrand
Champion Champion
Champion

Hi @Mike_Jensen,

use this oneliner to debug the P1 and P2 issue quick and easy in realtime.

ONELINER - Ease VPN Debug - with IKE live view 

 

➜ CCSM Elite, CCME, CCTE

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events