Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
the_rock
Legend
Legend
Jump to solution

VPN Check Point - Palo Alto issue

Hey everyone,

Hope someone can maybe give a good suggestion/idea about this. So I was helping a hospital with route based VPN tunnel from their CP cluster to Palo Alto and this tunnel had been there since 2020 I think, but always working intermittently.

PAN guy was saying that for some odd reason, when there is an issue, they see ID on their end as 0.0.0.0, though it should be 169.254.0.103, which is whats configured. Im not really sure why that would happen and if its something related to CP or Palo Alto.

For the context, all other vpn tunnels on CP side work just fine, just this one. Any clue as to why the ID would show different on peer side?

We did not have time to really debug, so simply ended up lowering the encryption methods and that brought up the tunnel.

Thanks as always for any ideas. Just for the context, we tried unnumbered VTI as well, that would send ID 0.0.0.0, so thats expected, but with numbered VTI, definitely should NOT.

Andy

1 Solution

Accepted Solutions
the_rock
Legend
Legend

Though we dont know for sure until next maintenance window, Im fairly positive this is NOT CP fw issue at this point. Thanks a lot @AmirArama for being so kind to get the ike trace file and confirm that ID being sent to PAN fw is not 0.0.0.0 as they claimed during the call. We also verified with TAC that correct auth methods were being used, so at this time, if they have issue next time, I will advise them to open the case with Palo Alto support and investigate further.

Appreciate everyone who responded.

Best,

Andy

View solution in original post

44 Replies
Timothy_Hall
Legend Legend
Legend

When you say "ID" do you mean Proxy-ID (subnets/domains) or IKE Peer ID?

By default Palo Alto uses route-based VPNs and will propose a universal tunnel (0.0.0.0/0, 0.0.0.0/0 - one tunnel per gateway pair) in IKE Phase 2, although they can be configured to mimic a domain-based VPNs and propose specific subnets similar to "pair of subnets" on the Check Point side.  Whether you are using an unnumbered or numbered VTI doesn't affect the Proxy-ID negotiation, at least to my knowledge.

Using IKEv1 I presume?  IKEv2 has had some rather nasty interoperability issues, the most prominent of which was tunnel narrowing.

Another line of inquiry would be if the tunnel being initiated in one direction or another is affecting the stability.  So for example if the Palo initiates the tunnel it is stable, but when the Check Point initiates the tunnel it is not. 

Also check the obvious things like making sure the Phase 1 & Phase 2 timers match, the Palo is not configured for a data lifesize, idle timer, or anything else that could bring down the tunnel prematurely which can affect stability.

 

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
(1)
the_rock
Legend
Legend

Hey Tim,

Thanks a lot for your insight, as always. I meant ike peer ID, not proxy ID, should have clarified.

Funny thing is I asked the guy on their end, which uses palo alto firewall about route based tunnel and he said they do have actual enc domain configured, which consists of 4 subnets (como of /24, /27 and /28 nets) and single host (/32).

As I mentioned, I find it so odd that on Cp side, this is the ONLY tunnel, out of probably 10 or so, they constantly have issues with, all the other ones work fine. Ironically enough, guy told me that on the other end, they also only have issues with this tunnel to CP lol

Anywho, yes, we confirmed phase 1 and phase 2 settings many times and any time you lower enc. settings, all works fine, no issues, but if you try using say aes-256 / sha256/384/512, nothing works.

Im really not sure how to approach this at this point...I will see if I can have a remote session with guy who has access to palo alto firewall, as Im somewhat faimilar with those, though its been some time since I worked on them, but I do remember the basics. I wish their VPN config was as easy as Fortinet, which I find works 100% of the time.

O well, not everything is simple, I guess : - )

Andy

0 Kudos
Timothy_Hall
Legend Legend
Legend

Lowering the strength of the encryption algorithms shouldn't make a difference, but one point of pain in the past I've seen has been PFS which can be spotty as far as interoperability.  Maybe try leaving PFS off and increasing the other algorithm strengths for Phase 1 and Phase 2 and see what happens? 

Only other thing I can think of is there are some differences in the VPN code implementations in sim/fastpath vs. on the worker cores, especially with some of the higher-bit hash algorithms which were implemented in sim/SecureXL relatively recently.  You could try excluding only this VPN peer and its tunnels from SexureXL acceleration with vpn accel off (peer IP) and see if that stabilizes things: sk151114: "fwaccel off" does not have an effect on disabling acceleration of VPN tunnels in R80.20 a...

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
(1)
the_rock
Legend
Legend

I know PFS was on, DH group 14, so something to consider as well, though we did change it to group 2, but kept PFS. Did not touch vpn accel, but definitely something to consider.

Thanks Tim.

Andy

0 Kudos
CaseyB
Advisor

Since you bring up the encryption settings, is the Palo limiting their proposed encryption settings? Check Point has issues with getting a ton of options. I had this issue a few months ago with a vendor migrating to Palo.

0 Kudos
the_rock
Legend
Legend

I honestly dont know mate, sorry. All I know is that this worked fine before, but Im not the guy to wonder why things worked in the past and they dont now...I was 5 years younger 5 years ago and now Im not lol

Anywho, we will have a call next week and see how they wish to proceed. Im thinking we may get TAC cases going for both CP and PAN support and see how far we get. Like anything, we have to eliminate certain things to get logical sense of whats causing this. For now, tunnel works fine, but we do not want to leave it as such with weak encryption settings. There is an app doctors and nurses use through this tunnel to access patient records, so if tunnel is broke, its literally a matter of life and death, considering this is used strictly in that hospital's emergency room, so we have to make sure 100% tunnel is up, no matter what.

Andy

0 Kudos
CaseyB
Advisor

I hear you; I specifically manage tunnels for a hospital. Lots of vendor interoperability going on here.

If it is the same thing, these are the messages I was finding.

  • Child SA exchange: Sending notification to peer: No proposal chosen MyMethods Phase2: AES-GCM-256 + HMAC-SHA2-256, No IPComp, No ESN, Group 19 (256-bit random ECP group)
  • Initial exchange: Sending notification to peer: Invalid Key Exchange payload
  • Child SA exchange: Exchange failed: timeout reached.

IKEv2 negotiation for Site-to-Site VPN tunnel with 3rd party peer fails if IKEv2 SA payload contains...

0 Kudos
the_rock
Legend
Legend

Hi @CaseyB ...thanks for that sk. I dont believe its relevant here sadly, as errors mostly showed no response from peer. Plus, they are on latest R81.20 version + jumbo 65.

Below is snippet of some errors.

Andy

 

Line  89966: [iked0 4129 4067185088]@cmhfw1[11 Jun 20:31:59][tunnel] tnlmon_init: added gateway = 10.66.52.18 to GWArray

Line  89970: [iked0 4129 4067185088]@cmhfw1[11 Jun 20:31:59][IKED] get_iked_handler_for_address: daemon index for 10.66.52.18 is 2. number of iked daemons: 3

Line  89971: [iked0 4129 4067185088]@cmhfw1[11 Jun 20:31:59][IKED] is_handled_by_local_daemon: this is IKED 0, 10.66.52.18 is handled by IKED 2

Line  89987: [iked0 4129 4067185088]@cmhfw1[11 Jun 20:31:59][tunnel] RIM_Sync: Updating route for peer <10.66.52.18, 1>

Line  89988: [iked0 4129 4067185088]@cmhfw1[11 Jun 20:31:59][tunnel] tnlmon_db_get: Failed to get gateway = 10.66.52.18, type = 1 values

Line  89989: [iked0 4129 4067185088]@cmhfw1[11 Jun 20:31:59][tunnel] RIM_Sync: failed to retrieve the attributes for dbkey = <10.66.52.18, 1>

Line  93806: [iked0 4129 4067185088]@cmhfw1[11 Jun 20:41:05] GetEntryIsakmpObjectsHash: received ipaddr: 10.66.52.18 as key, found fwobj: NULL

Line  93807: [iked0 4129 4067185088]@cmhfw1[11 Jun 20:41:05] InsertXIsakmpObjectsHash: inserting interface: 10.66.52.18 for object: g_htrs

Line  93888: [iked0 4129 4067185088]@cmhfw1[11 Jun 20:41:05][tunnel] VIF_Diff_GetVPNIfTable: Inserting <36, 10.66.52.18>

0 Kudos
Duane_Toler
Advisor

Is 10.66.52.18 the Palo peer?  If so, I notice the debug says this is being handled by IKED instance 2.  Have you looked through that debug?  $FWDIR/log/iked2.elg

Does one side have "Prefer IKEv2 but support IKEv1" enabled?  I had a VPN on a Check Point gateway talking to a peer with an unknown device and I was seeing dual SAs being attempted.  One of the ends did not appreciate that and we had a flaky VPN, too.  We swapped the Check Point side to IKEv2 only and it helped.

 

0 Kudos
the_rock
Legend
Legend

Thats right Duane, that IP is PAN side of the tunnel. We tried all the combinations to make it work with strong enc. algorithms, nothing helped, until we switched to ike v1 on both sides and lower onc methods.

I think what @AmirArama said is definitely best idea. We will have call next week with everyone and see how they wish to proceed. I will update the thread once we have more info.

Thanks as always for the help.

Best,

Andy

0 Kudos
Duane_Toler
Advisor

Yeah I agree.  Debug it as hard and heavy as you can for whatever maintenance window you can get.  That's wild.  Good luck and godspeed to all.  Maybe you can get TAC Tier 3 or High-End team on it, too.

Another item I haven't seen mentioned here yet:  Do you have DPD / Permanent Tunnels enabled on this VPN?  Likewise, does PAN have DPD and or its own "Tunnel Monitoring" enabled?  In a completely grabbing-at-straws effort, I scoped out an off-the-cuff PAN article about their version of it which also doesn't require RFC-compliant DPD.  I have zero intel about this, tho.  My PAN-fu is nil.

0 Kudos
the_rock
Legend
Legend

Yep, DPD is enabled on both sides, though same issue was there even when it was not. Man, this is driving me bonkers, but I wont lose more sleep over it...lol

Andy

0 Kudos
Duane_Toler
Advisor

I dunno how relevant this PAN doc was, but it did say it had two kinds of Tunnel Monitoring: one was DPD and one was a ping of sorts and either of them would bring down the tunnel.  https://knowledgebase.paloaltonetworks.com/KCSArticleDetail?id=kA10g000000ClFaCAK#:~:text=DPD%20is%2...

Again, I have zero background on this and I'm just parsing "vendor words" to mean something.  This might not even be useful, but in the off chance it is... then ok.

For DPD on the Check Point side, you'll need Permanent Tunnels enabled to do the same.  The Site-to-Site VPN admin guide has a whole section on DPD that might be worth a read, plus a ckp_regedit command that could be useful.

Good luck!

(1)
the_rock
Legend
Legend

Thanks man, will definitely keep that article in mind and send it to the guy that has PAN access. This is why I love the community, EVERYONE is so KIND and HELPFUL.

I truly appreciate it brother.

Have a nice weekend! Well, its my weekend already, since I have Friday off lol

Andy

0 Kudos
the_rock
Legend
Legend

Keep in mind that starting R81 or R81.10, when you enable permanent tunnel setting inside the community, interoperable object belonging to that community automatically shows up as DPD in guidbedit. We also double checked and sure enough, thats exactly what it showed, not tunnel_test, which is default.It all makes sense, considering customer is on R81.20, cant recall what jumbo, though thats not relevant in this case.

Best,

Andy

0 Kudos
the_rock
Legend
Legend

Forgot to say, yes, we ended up using ikev1, but again, even with ikev2, it works, but its intermittent. I cant say 100% if its same with ikev1, but will verify with them.

Thanks again.

Andy

0 Kudos
AmirArama
Employee
Employee

Hi,
i guess it's not related to Route vs. Domain based VPN.

my suggestion: enable VPN debug on the GW, dig in, and look for the sent ID. that's the only way i know to analyze the negotiation.

the_rock
Legend
Legend

I think you are absolutely correct @AmirArama . Plan is to have guy on the other end open PAN case and they can check further. So far, we cant see any obvious issues on CP end.

Andy

0 Kudos
the_rock
Legend
Legend

I will mark your post as solution for now, as it definitely makes lots of sense, but will update the thread once we make it work with strong enc. methods to share how it was done.

Best,

Andy

0 Kudos
the_rock
Legend
Legend

Hey guys,

Just to update on this, we had remote with TAC, could not get tunnel working with higher enc. methods and ikev2, TAC guy checked the debugs, no luck there either, so we had to lower the enc methods to get the tunnel working. I guess at this point it remains a mystery...lol.

We even turned off vpn accel, disabled PFS, exact same issue...at one point tunnel showed as up on CP side, but nothing on PAN end. Their debugs did not show anything useful at all.

Andy

0 Kudos
Duane_Toler
Advisor

Is this VSX, by chance?  Or do you have a gateway with multiple external interfaces and the VPN peer routed out a non-default route interface?  I have a customer where I use this scenario and use the BestRoutingSenderIP registry setting to adjust IKE IDs per outgoing interface (for PSK VPNs; obviously, certificates wouldn't need this).

After upgrading to R81.20, we hit a bug in VSX where it was always using 0.0.0.0 as the outgoing IKE ID and even the ESP packets were being sent with source IP 0.0.0.0.  Weirdly, only one VPN tunnel was behaving this way, too.  This was fixed with a kernel parameter edit, which is documented in sk160672:

fw ctl set -f int ipsec_use_p1_src_ip=1 

I even saw this in 'fw monitor':

15:54:56.194367 IP 0.0.0.0 > x.x.x.19: ESP(spi=0x74fe85d6,seq=0x45), length 96

Yours might not be behaving this way, but maybe it's worth running this by TAC.

As Timothy was saying, I've also seen weird AES-256/AES-GCM-256 interoperability issues, but these now seem to have been resolved in earlier JHFs.  You said you're on JHF 65 now, so these issues seem statistically less likely to be the case... but perhaps you have a new one.

Good luck, again, and I hope the last round of debugs provided you and TAC with something useful.

0 Kudos
the_rock
Legend
Legend

Hey Duane,

Thanks for that mate, but its not vsx. I reviewed the debugs myself, since TAC guy said it would take some time and I think I got a pretty good idea what might be the issue.

Will update tomorrow once I verify.

Andy

0 Kudos
the_rock
Legend
Legend

Hey @Duane_Toler 

Here is one thing I find interesting from the debugs, again, this is just me, I could be way off here...

So, this was when tunnel was set as PERMANENT (though not sure that even matters here), and tunel mgmt set "per gateway"

remote peer is 10.66.52.18

it was using ikev1, 3des/sha1

from the debug:

Line 1036: [iked2 4131 4066877888]@cmhfw1[9 Jul 18:45:00][ikev2] peer: (ext addr: 10.66.52.18). peer_ip: 0.0.0.0 Using port 500
Line 1048: [iked2 4131 4066877888]@cmhfw1[9 Jul 18:45:00][ikev2] peer: (ext addr: 10.66.52.18). peer_ip: 0.0.0.0 Using port 500
Line 1063: [iked2 4131 4066877888]@cmhfw1[9 Jul 18:45:00][ikev2] peer: (ext addr: 10.66.52.18). peer_ip: 0.0.0.0 Using port 500
Line 1075: [iked2 4131 4066877888]@cmhfw1[9 Jul 18:45:00][ikev2] peer: (ext addr: 10.66.52.18). peer_ip: 0.0.0.0 Using port 500
Line 1128: [iked2 4131 4066877888]@cmhfw1[9 Jul 18:45:00][ikev2] peer: (ext addr: 10.66.52.18). peer_ip: 0.0.0.0 Using port 500
Line 1140: [iked2 4131 4066877888]@cmhfw1[9 Jul 18:45:00][ikev2] peer: (ext addr: 10.66.52.18). peer_ip: 0.0.0.0 Using port 500
Line 1150: [iked2 4131 4066877888]@cmhfw1[9 Jul 18:45:00][ikev2] SPI: 1e26c09c peer: 10.66.52.18

 

CP id is presenting 0.0.0.0, which sort of makes sense, as it was set as permanent tunnel, BUT, here is the "kicker"...once we disabled permanent tunnel and set instead per gateway to "per subnet" and installed policy ()we also changed to ikev2, aes256/sha256), it was exact same issue, it was still presenting 0.0.0.0 to PAN, though enc domains are ONLY subnets, no host.

To make it work, we changed back to ikev1, low enc methods and bam, all good again.

Here is what I believe...maybe to make this work, we should leave tunnel as non permanent, but change per gateway?

We have not tried that scenario...thoughts?

Andy

 

0 Kudos
AmirArama
Employee
Employee

I assume this is debug from the PAN?

i don't know what it means that PAN  say the peer ip is 0.0.0.0, is that mean they receive it from CP? How exactly? Ask them to show you the exact ike packet we send 0.0.0.0 

on that only PAN support can respond.

There is a difference between traffic selector of 0.0.0.0 which is negotiated for universal tunnel such as with tunnel per gw or route based vpn tunnel, to ID as 0.0.0.0 which is the ID the gw send in the ike neg. (In the debug it says peer IP (not ID, not TS.. again ask PAN).

 

Now, on the CP debug what ID do you see its send towards the PAN?

Easily you can it in the ike.xmll file for ike v2 as the id. ("IPV4_ADDR" or other and what value exactly)

If you really see it sent 0.0.0.0 as the ID (not traffic selector) can you check what is selected in the link selection tab of your CP GW?

0 Kudos
the_rock
Legend
Legend

Hey Amir,

Thanks a lot for responding, appreciated! So, to clarify, we do not have any PAN debugs, thats all from CP end. Sadly, guy on the other end that has access to PAN firewall, thats not PAN tac person, not sure case with them was ever opened, but in all honesty, Im sure ticket is not even needed, because we know 100% that when ikev2 is seclected and using high enc methods, tunnel never works, as they see CP sending id 0.0.0.0, but as soon as its changed to ike v1 and low enc methods, works fine and its NOT showing as 0.0.0.0 on PAN end as id being sent from CP fw.

Now, why that is, I have no clue in the world. Thats why I was thinking, would it make sense to say try "per gateway" option and leave as non permanent? I feel like thats only thing we had not tried...

Andy

0 Kudos
AmirArama
Employee
Employee

Tunnel mgmt has nothing to do with the ID sent.

Is your CP GW configured as DAIP? what is configured in this GW object Link selection tab?

What is the gw version?

You also can change the the ike v2 ID to FQDN. Look for the sk.

0 Kudos
the_rock
Legend
Legend

Sorry, I thought I put the screenshot, but maybe not. Link selection is configured as per below (IP of the local CP is listed there. No, its NOT daip.

Andy

Screenshot_1.png

0 Kudos
AmirArama
Employee
Employee

Ok.

So i would check ikev2.xmll file on the gw while 'vpn debug trunc ALL=5' is on. And you use ikev2.

Look under the relevant peer under authentication. 

The type should be IPV4_ADDR usually (where at the screenshot it says KEY_ID)

Then at the data see the actual ID sent. And weather it's indeed 0.0.0.0 or not.

If it is 0.0.0.0 you have something to show to TAC to investigate, or just consider change to FQDN  as mentioned.

If its other valid value. Then it means you don't send 0.0.0.0 as ID. And i would ask PAN side for the proof of ike packet coming with 0.0

(1)
the_rock
Legend
Legend

Thanks a lot for that! Sadly, I dont see that file in the debugs TAC attached to the case, so I updated the case to have them verify if the file is on sftp possible.

I will review other debug files to see if I can get this info.

 

Andy

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events