IKEv2 VPN issues after upgrade to R80.40

mk1 · ‎2021-07-19

Hello all,

After HA cluster upgrade from R80.20 to R80.40 with the latest jumbo take 118, we started facing issues with 2 VPN tunnels which use IKEv2. One of them is with Palo Alto device, and the other one is with Azure. We opened case to TAC and they gave us custom patch which had to improve the things or fix the issues, but unfortunately that not happened. This is not the first time we have VPN issues after upgrade to R80.40. The previous time it was related with gateways which have dual-ISP setup. We have "keep_IKE_SAs" enabled in the Global Properties -> Advanced settings. We were also advised to enable delete_ikev2sa_before_init_ex option but that didn't help either. We are still receiving logs like:

Informational exchange: Sending notification to peer: Invalid IKE SPI IKE SPIs: 2d49d13048e8c3d7:136debd1278baccd

We asked the 3rd parties to reset the tunnels on their end, so they can generate new keys, but it didn't help either.

Did anyone have similar problems?

Thank you!

Timothy_Hall · ‎2021-07-19

I assume you have seen these:

sk116615: During IKEv2 Initial Phase re-negotiation initiated by Check Point Security Gateway to 3rd...

sk172926: Traffic stops passing at certain times over the Site to Site VPN between the Check Point C...

Unfortunately the interoperability for IKEv2 is still evolving, and it took over 10 years to get IKEv1 to a reasonable level of interoperability. Rekeying is an area of particular concern, try setting the IPSec tunnel lifetime to a greater value than the IKE tunnel lifetime and see if that helps. You may need to fall back to IKEv1.

Attend my online "Be your Own TAC: Part Deux" CheckMates event
March 27th with sessions for both the EMEA and Americas time zones

mk1 · ‎2021-07-19

Hello Tim,

Yes, we've already tried sk116615 but it didn't help and it's more useful if the traffic is mostly initiated from the 3rd party to us. About the other one (sk172926) we already use the latest jumbo, so according CP the issue must be solved but apparently it is not. We can ask the 3rd party with Palo Alto to change to IKEv1, but it's not an option for the Azure, because as far as I know they have to change the whole plan with another which supports IKEv1, which means they have to pay different price. What is even more strange is that during we were running R80.20, the tunnel with Azure was configured as a route-based on Azure and as a policy-based on our Check Point, and everything was fine. Now we converted also from policy-based to route-based, but it didn't help.

Timothy_Hall · ‎2021-07-19

Right you can mix the domain-based approach with a route-based approach on the other side and still have it work. Keep in mind the only real difference between domain-based and route-based VPNs is how traffic is determined to be "interesting" (to borrow a Cisco term) and requires encryption vs. just being sent in the clear. Once established the VPN protocols more or less operate the same regardless of which one you are using.

However the traffic selector determination is also impacted (called Proxy-IDs/subnets in IKEv1 Phase 2), with a route-based VPN normally you utilize what Check Point calls a "universal tunnel" (dual 0.0.0.0/0's) whereas with domain-based individual subnets are negotiated. This is controlled from the VPN Tunnel Sharing screen of the VPN Community, do you have it set to "one tunnel per gateway"?

With the introduction of per-VPN Community VPN domains in R80.40, that code was definitely touched and may be the cause of your issue. You will need to take a closer look at the selectors being proposed with vpn debug ikeon and ikeview. Any chance the R80.40 changes are causing many more IPSec SAs to be negotiated than before and you are hitting some kind of limit on SAs that can simultaneously exist on the peer? I remember reading an SK about this but can't find it right now.

Whether a peer configured for route-based VPNs will accept a non-universal tunnel (i.e. a subset of it) is highly dependent on the other side's vendor type and configuration.

Attend my online "Be your Own TAC: Part Deux" CheckMates event
March 27th with sessions for both the EMEA and Americas time zones

mk1 · ‎2021-07-19

After we switched to route-based VPN, we changed from "One VPN tunnel per subnet pair" to "One VPN tunnel per Gateway pair", and changed both encryption domains to be empty (dummy) network groups (the routing was statically added via vpnt interface).

About your other question for IPSec SAs, please check if the stats below will help:

cpstat -f all vpn | grep SA

IKE current SAs: 147
IKE current SAs initiated by me: 21
IKE current SAs initiated by peer: 126
IKE max concurrent SAs: 149
IKE max concurrent SAs initiated by me: 22
IKE max concurrent SAs initiated by peer: 127
IKE total SAs: 258
IKE total SAs initiated by me: 24
IKE total SAs initiated by peer: 234
IKE total SA attempts: 315
IKE total SA attempts initiated by me: 208
IKE total SA attempts initiated by peer: 107
IKE current ongoing SA negotiations: 2
IKE max concurrent SA negotiations: 6
IPsec current Inbound SAs: 3247
IPsec current Outbound SAs: 254
IPsec max concurrent Inbound SAs: 4704
IPsec max concurrent Outbound SAs: 288
IPsec total Inbound SAs: 6394
IPsec total Outbound SAs: 6540

The appliances are 5600.

the_rock · ‎2021-07-19

I agree with all you said, but it would be nice if TAC had some sort of, not necessarily sk, but at least document or anything of that sort, where it clearly states Check Point supports using domain based CP side and route-based 3rd party side. Sadly, this is all dependant on who you talk to, which is not how it should be at all, in my opinion.

biskit · ‎2023-12-20

I suspect I'm having a similar problem now (R81.10).

https://support.checkpoint.com/results/sk/sk116615 no longer exists. Does anyone know of the new version of this sk?

PhoneBoy · ‎2023-12-20

Believe this is the replacement SK: https://support.checkpoint.com/results/sk/sk176957

Chris_Atkinson · ‎2021-07-19

Hi

Out of interest do you have the settings from sk108600 scenario 4 in place?

CCSM R77/R80/ELITE

mk1 · ‎2021-07-19

Hello Chris,

Yes, the TAC engineer changed that to true. He said starting from R80.10/R80.20 that should be the default value, and after we upgraded to R80.40 management from the older versions, that's why it was not changed automatically.

the_rock · ‎2021-07-19

From personal experience, sometimes, those messages can be "red herring". Its not always the case when you see them that tunnel is actually down per se...I had seen instances where, even in R81, they come up, but pings or RDP work fine from either side of VPN. I am not surprised that custom patch you were given did nothing, because I read few other customer said the same thing.

Here are some questions I have for you:

-Do you see those messages ONLY when tunnels are down or is it totally random?

-When you do see them in the logs, can you communicate back and forth through the VPN tunnel?

-Can you try set it up in a way that only remote end initiates traffic and see if problem is still there?

mk1 · ‎2021-07-19

Hello the_rock,

You're absolutely correct - sometimes we see the traffic is coming through the tunnel, although in the logs we still have messages I described above.

On your questions:

1. Totally random, but in most of the cases the tunnels are actually down, and from time to time some production traffic is coming through the tunnel.

2. Sometimes yes, sometimes no.

3. Actually this is the case - only 3rd parties are initiating traffic to us, but the issue still exists.

the_rock · ‎2021-07-19

I sort of figured those answers were coming...anywho...I will keep you posted what I find out, because we are testing in the lab couple of tunnels (routed based, combo of domain/route based), so will see how far we get.

the_rock · ‎2021-07-21

Ok, so as promised, here is my update:

-ikev1 seems way more stable from lab testing, but speeds are not that great

-as far as ikev2, yes, it does work for for both route based (100% on both sides) OR domain based CO and route based (other side), but again, its not as reliable as ikev1, even if you turn off securexl AND vpn accel

mk1 · ‎2021-07-23

Meanwhile we uninstalled the custom hotfix which was given by CP, and now the things look better. Maybe because we implemented some of the changes after the hotfix was already installed.

Are you a member of CheckMates?

IKEv2 VPN issues after upgrade to R80.40