Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted

VPN issue with IKEv2 and Cisco ASA

Hi,

Last week we upgraded our security gateway from R77.30 to R80.20. After this upgrade, we lost connectivity with one of our VPNs. This VPN is with a third party gateway, a Cisco ASA and we are using IKEv2.

The issue is weird and I've isolated the following things:

1)If the negotiation is triggered on the ASA side, everything works as expected (so, as a workaround, they are bouncing the tunnel on their side, generating traffic to us (if we are the first to generate traffic it won't work) and that's allowing us to connect)

2)If we initiate the connection, we are unable to reach the other side of the VPN but, they are able to reach our network. So traffic generated on their side of the VPN always reaches us without issues.

3)Child SAs are only being negotiated on re-keys, I'm assuming the first time they are created is under the AUTH packet, as per the RFC.

 

I have a case opened with TAC, but so far no meaningful replies. I can also share the vpnd.elg files, as well as the ikev2.xmll files if you are interested in taking a look at that.

 

Thanks

18 Replies
Highlighted

Two guesses:

1) You had a custom subnet_for_range_and_peer directive defined in the $FWDIR/conf/user.def.R77CMP file on your SMS, and when the gateway was upgraded to R80.10+ this file no longer applied.  Any special directives in the old file need to be copied to the $FWDIR/conf/user.def.FW1 file on the SMS and policy reinstalled to apply to the new gateway version.  sk98239: Location of 'user.def' files on Security Management Server

2) You had a custom kernel definition affecting the VPN in the $FWDIR/boot/modules/fwkern.conf, $FWDIR/boot/modules/vpnkern.conf or $PPKDIR/boot/conf/simkern.conf file(s) on the upgraded gateway itself that did not survive the upgrade process.

If it is neither of those things, try disabling SecureXL VPN acceleration for that peer and see if it impacts the issue: sk151114: "fwaccel off" does not affect disabling acceleration of VPN tunnels in R80.20 and above

Also watch out for sk116776: Instability issues in VPN Tunnel with Cisco using IKEv2

R80.40 addendum for book "Max Power 2020" now available
for free download at http://www.maxpowerfirewalls.com
0 Kudos
Highlighted

Hi Timothy,

 

So, you definitively have something there... When this tunnel was created, an entry was indeed added to the user.def file. However, this was done in a different file location than the one mentioned on sk98239. We have an MDS, but according to the SK the file shouldn't be defined here. The subnet_for_range_and_peer was defined under /var/opt/CPmds-R80.20/conf/user.def.R77CMP. I have since tried to remove this entry an add it to the correct location, under /opt/CPmds-R80.20/customers/<CMA-NAME>/CPsuite-R80.20/fw1/conf/user.def.FW1, installed policy but no success. I've also tried to add this entry under  /var/opt/CPmds-R80.20/conf/user.def.FW1, also without success. I did run the fw tab -t subnet_for_range_and_peer which shows the correct entry for this VPN on the gateway after installing policy, however, I was still experiencing the same issues.

I've tried disabled fwaccel as well as vpn accel without success. As far as custom kernel definitions go, I checked and couldn't find any...

 

I believe that the issue causing this is related with the user.def files. I'll redirect the support to that, but if you could provide some insight as to why this is still happening, despite the fact that I've moved the definitions, I'd appreciate that.

 

Thanks!

 

0 Kudos
Highlighted

I'm assuming for VPN Tunnel Sharing in the community settings you have it set to "one tunnel per subnet".  As a test try setting it to "one tunnel per pair of hosts" and reinstalling policy.  If the problem goes away you have confirmed that it is indeed a subnet/selector issue and not something else.  In general it is not a good idea to leave it set to "pair of hosts" as a large number of IPSec tunnels can be generated.

 

R80.40 addendum for book "Max Power 2020" now available
for free download at http://www.maxpowerfirewalls.com
0 Kudos
Highlighted

I agree. And if I recall correctly, we tried that when we were first setting up the tunnel and it worked.

 

I'll raise a change to test that but, as you've said, this is not an ideal solution. If this works, what can we do to use "one tunnel per subnet"?

 

Thanks

0 Kudos
Highlighted

You'll probably need to work with TAC and figure out why your subnet-per-peer directive is not working properly as that should definitely work with IKEv2.  Because the directive is showing up on the gateway's tables, it sounds like you have it defined in the correct user.def* instance on the MDS/SMS/Domain. 

You can use "pair of hosts" permanently, but only if you have just a few hosts on each end that need to use the tunnel *and* the Firewall/Network Policy Layer is sufficiently locked down to prevent large amount of tunnels from starting.  With "pair of hosts" a separate IPSec/Phase 2 tunnel is started for every combination of host IP addresses (/32's) that are allowed to communicate.  So if two Class C networks are using the tunnel and the rulebase allows the entirety of the networks to communicate with each other, in theory over 65,000 separate tunnels could try to start which will quickly bang against the soft limit of 10,000 concurrent tunnels and cause intermittent VPN connectivity.  If PFS is enabled a separate computationally-expensive Diffie Hellman calculation will occur for each and every IPSec/Phase 2 tunnel which will cause a massive amount of firewall CPU overhead and further problems.

 

R80.40 addendum for book "Max Power 2020" now available
for free download at http://www.maxpowerfirewalls.com
0 Kudos
Highlighted

I'm already working with TAC on that. I'll post updates here once I have them. Do you have any ideas on how to troubleshoot this? I guess I could run a kernel debug and checkout vpnd.elg after correcting the user.def file, and maybe see if I'm missing something.

 

Thanks for you help so far!

0 Kudos
Highlighted

All IKE negotiations take place in process space via vpnd on the firewall, so you'll need to debug vpnd (vpnd.elg) and probably turn on IKE debugging which is output to ikev2.xmll.  I don't think you'll need to perform kernel-level debugging for this issue, at least not initially.

 

R80.40 addendum for book "Max Power 2020" now available
for free download at http://www.maxpowerfirewalls.com
0 Kudos
Highlighted

Hi,

 

So, we've isolated the issue. Apparently the ASA was erroneously detecting the need to use NAT-T during the IKE_INIT phase, when we started the communication. My guess is that, when the ASA initiated the communication, it did so by negotiating NAT-T with us (the checkpoint is configured to support NAT-T) and that would establish the tunnel successfully and allow communication.

 

The ASA was on version 9.8, for future reference.

Highlighted

Yep just saw this with a customer that upgraded from R80.10 to R80.30 and transitioned from a single 4600 to a ClusterXL cluster of 5400s with R80.30 JHFA 50.  Everything worked after the upgrade, except a domain-based site-to-site VPN to a Cisco ASA using IKEv2.  Using ikeview we could see that when the Check Point was initiating Phase 1 would complete, but when the Check Point sent the Auth packet with the Traffic Selectors and such...no response from the Cisco.   So the Check Point just kept sending the Auth over and over again. vpn accel off had no effect on the issue.

After some lengthy debugging on the Cisco side we found out that the Cisco was determining that NAT-T needed to be used, which is simply wrong as we double-checked and triple-checked there was no NAT between the two peers.  The Auth packet was being silently dropped by the Cisco since it was expecting it to come in on UDP 4500 instead of UDP 500.  Once we set force_nat_t to true via GuiDBedit for the Check Point cluster object the tunnel came up and worked normally. 

This discovery led to a spirited discussion between myself and the Cisco administrator, as he insisted that nothing had changed on his end (which is true), but he took offense when I said the Cisco was "erroneously" starting NAT-T (which is also true).  Clearly Check Point is doing something different in IKEv2 between R80.10 and R80.30 that is tripping up the Cisco ASA in regards to NAT-T; I couldn't see anything that would cause a peer gateway to determine NAT-T was required.  The Peer ID IP address and source IP address on the IKE packets matched exactly.

 

R80.40 addendum for book "Max Power 2020" now available
for free download at http://www.maxpowerfirewalls.com
0 Kudos
Highlighted


@Timothy_Hall wrote:

Clearly Check Point is doing something different in the IKEv2 Auth packet between R80.10 and R80.30 that is tripping up the Cisco ASA in regards to NAT-T; I looked at every bit in the Auth packet and couldn't see anything that would cause a peer gateway to determine NAT-T was required.  The Peer ID IP address and source IP address on the IKE packets matched exactly.

 


My thoughts exactly and I cranked my head around it because I had some PCAPs and the IKEv2 and NAT-T RFCs side by side and couldn't figure out what CheckPoint was doing for the ASA to detect it as a NAT-T device.

Do you have the ASA OS version? I read somewhere there was a bug on a 10.something release regarding NAT-T detection, and I believe my peer was on that release or in a subsequent one

0 Kudos
Highlighted

Never caught the Cisco device version, was a strange problem to be sure. 

Was the firewall you experienced it on part of a HA cluster?  That was one other thing that changed in our case other than the code version from R80.10 to R80.30.

R80.40 addendum for book "Max Power 2020" now available
for free download at http://www.maxpowerfirewalls.com
0 Kudos
Highlighted

Yes, it was also an HA cluster. But in our case it was a cluster upgrade, and because we kept it as a cluster, I don't know if it could be something related with clusterXL. I've been meaning to test this on a lab environment but haven't gotten around to doing it, unfortunately.

Highlighted
Ivory

Hello, Tiago and all. Thank you for this post to the forum, since it was very helpful today.

I had the exact same issue today. Nothing helped in Check Point Knowledge Base, thankfully, google led me to this thread 🙂

What I can add is that for troubleshooting purposes, we changed the encryption method to "IKEv1 only" on both Cisco side and Check Point side, and tunnel and traffic worked fine.

If we switch back to IKEv2, tunnel is up, traffic reaches Cisco Side, but does not return to Check Point.

We need to disable force NAT-T on Cisco side (I did not try to force enable on Check Point side), so that everything works fine again.

So, in R77.30 it was working with encryption method IKEv2 yesterday, and after last nights upgrade to R80.20, it stopped working.

No errors on Check Point side, but I did not do a full debug. I only disabled Acceleration for the Cisco peer, for debugging purposes.

Can any one raise this for investigation purposes?

I did not open a case, because we needed to fix it or migrate the tunnel ASAP.

Thank you all 🙂

0 Kudos
Highlighted

Yeah I have not had the best of luck with IKEv2 in interoperable VPN situations.  IKEv1 has been around a long time and works well.

 

R80.40 addendum for book "Max Power 2020" now available
for free download at http://www.maxpowerfirewalls.com
0 Kudos
Highlighted

Sorry it took me this long to check this message, must have slipped my email queue...

Did you manage to solve the issue?

0 Kudos
Iron

Hello,

we figured out, that this happens with a Cisco CSR as well.

And in addition we see, that the Check Point does not respond to UDP ESP packets from the CSR. (In case of this nat-t behavior)

 

Have a new idea for a bug bounty program:

For each bug CP should extend the CCSM Cert. for one month 🙂

 

Highlighted



Have a new idea for a bug bounty program:

For each bug CP should extend the CCSM Cert. for one month 🙂

 


👍👨‍🎓

and now to something completely different
0 Kudos
Highlighted

My memory on this subject is not fresh (happened almost a year ago), but I believe that it doesn't reply because when one side is using NAT-T and the other one isn't, the checksums (not sure if it's the checksums or some other value) don't match on the opposite side. The firewall drops it correctly because it believes this to be a replay attack.

I do remember looking through the RFC and the PCAPs and debugs and couldn't find a single issue with the RFC from the Check Point side.

 

No CCSM for me, but I would appreciate a voucher for the TAC courses 😄 

0 Kudos