Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
the_rock
Legend
Legend
Jump to solution

VPN route based query

Hey everyone,

Hope someone can maybe give a good suggestion/idea about this. So I was helping a hospital with route based VPN tunnel from their CP cluster to Palo Alto and this tunnel had been there since 2020 I think, but always working intermittently.

PAN guy was saying that for some odd reason, when there is an issue, they see ID on their end as 0.0.0.0, though it should be 169.254.0.103, which is whats configured. Im not really sure why that would happen and if its something related to CP or Palo Alto.

For the context, all other vpn tunnels on CP side work just fine, just this one. Any clue as to why the ID would show different on peer side?

We did not have time to really debug, so simply ended up lowering the encryption methods and that brought up the tunnel.

Thanks as always for any ideas. Just for the context, we tried unnumbered VTI as well, that would send ID 0.0.0.0, so thats expected, but with numbered VTI, definitely should NOT.

Andy

1 Solution

Accepted Solutions
AmirArama
Employee
Employee

Hi,
i guess it's not related to Route vs. Domain based VPN.

my suggestion: enable VPN debug on the GW, dig in, and look for the sent ID. that's the only way i know to analyze the negotiation.

View solution in original post

19 Replies
Timothy_Hall
Legend Legend
Legend

When you say "ID" do you mean Proxy-ID (subnets/domains) or IKE Peer ID?

By default Palo Alto uses route-based VPNs and will propose a universal tunnel (0.0.0.0/0, 0.0.0.0/0 - one tunnel per gateway pair) in IKE Phase 2, although they can be configured to mimic a domain-based VPNs and propose specific subnets similar to "pair of subnets" on the Check Point side.  Whether you are using an unnumbered or numbered VTI doesn't affect the Proxy-ID negotiation, at least to my knowledge.

Using IKEv1 I presume?  IKEv2 has had some rather nasty interoperability issues, the most prominent of which was tunnel narrowing.

Another line of inquiry would be if the tunnel being initiated in one direction or another is affecting the stability.  So for example if the Palo initiates the tunnel it is stable, but when the Check Point initiates the tunnel it is not. 

Also check the obvious things like making sure the Phase 1 & Phase 2 timers match, the Palo is not configured for a data lifesize, idle timer, or anything else that could bring down the tunnel prematurely which can affect stability.

 

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
(1)
the_rock
Legend
Legend

Hey Tim,

Thanks a lot for your insight, as always. I meant ike peer ID, not proxy ID, should have clarified.

Funny thing is I asked the guy on their end, which uses palo alto firewall about route based tunnel and he said they do have actual enc domain configured, which consists of 4 subnets (como of /24, /27 and /28 nets) and single host (/32).

As I mentioned, I find it so odd that on Cp side, this is the ONLY tunnel, out of probably 10 or so, they constantly have issues with, all the other ones work fine. Ironically enough, guy told me that on the other end, they also only have issues with this tunnel to CP lol

Anywho, yes, we confirmed phase 1 and phase 2 settings many times and any time you lower enc. settings, all works fine, no issues, but if you try using say aes-256 / sha256/384/512, nothing works.

Im really not sure how to approach this at this point...I will see if I can have a remote session with guy who has access to palo alto firewall, as Im somewhat faimilar with those, though its been some time since I worked on them, but I do remember the basics. I wish their VPN config was as easy as Fortinet, which I find works 100% of the time.

O well, not everything is simple, I guess : - )

Andy

0 Kudos
Timothy_Hall
Legend Legend
Legend

Lowering the strength of the encryption algorithms shouldn't make a difference, but one point of pain in the past I've seen has been PFS which can be spotty as far as interoperability.  Maybe try leaving PFS off and increasing the other algorithm strengths for Phase 1 and Phase 2 and see what happens? 

Only other thing I can think of is there are some differences in the VPN code implementations in sim/fastpath vs. on the worker cores, especially with some of the higher-bit hash algorithms which were implemented in sim/SecureXL relatively recently.  You could try excluding only this VPN peer and its tunnels from SexureXL acceleration with vpn accel off (peer IP) and see if that stabilizes things: sk151114: "fwaccel off" does not have an effect on disabling acceleration of VPN tunnels in R80.20 a...

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
(1)
the_rock
Legend
Legend

I know PFS was on, DH group 14, so something to consider as well, though we did change it to group 2, but kept PFS. Did not touch vpn accel, but definitely something to consider.

Thanks Tim.

Andy

0 Kudos
CaseyB
Advisor

Since you bring up the encryption settings, is the Palo limiting their proposed encryption settings? Check Point has issues with getting a ton of options. I had this issue a few months ago with a vendor migrating to Palo.

0 Kudos
the_rock
Legend
Legend

I honestly dont know mate, sorry. All I know is that this worked fine before, but Im not the guy to wonder why things worked in the past and they dont now...I was 5 years younger 5 years ago and now Im not lol

Anywho, we will have a call next week and see how they wish to proceed. Im thinking we may get TAC cases going for both CP and PAN support and see how far we get. Like anything, we have to eliminate certain things to get logical sense of whats causing this. For now, tunnel works fine, but we do not want to leave it as such with weak encryption settings. There is an app doctors and nurses use through this tunnel to access patient records, so if tunnel is broke, its literally a matter of life and death, considering this is used strictly in that hospital's emergency room, so we have to make sure 100% tunnel is up, no matter what.

Andy

0 Kudos
CaseyB
Advisor

I hear you; I specifically manage tunnels for a hospital. Lots of vendor interoperability going on here.

If it is the same thing, these are the messages I was finding.

  • Child SA exchange: Sending notification to peer: No proposal chosen MyMethods Phase2: AES-GCM-256 + HMAC-SHA2-256, No IPComp, No ESN, Group 19 (256-bit random ECP group)
  • Initial exchange: Sending notification to peer: Invalid Key Exchange payload
  • Child SA exchange: Exchange failed: timeout reached.

IKEv2 negotiation for Site-to-Site VPN tunnel with 3rd party peer fails if IKEv2 SA payload contains...

0 Kudos
the_rock
Legend
Legend

Hi @CaseyB ...thanks for that sk. I dont believe its relevant here sadly, as errors mostly showed no response from peer. Plus, they are on latest R81.20 version + jumbo 65.

Below is snippet of some errors.

Andy

 

Line  89966: [iked0 4129 4067185088]@cmhfw1[11 Jun 20:31:59][tunnel] tnlmon_init: added gateway = 10.66.52.18 to GWArray

Line  89970: [iked0 4129 4067185088]@cmhfw1[11 Jun 20:31:59][IKED] get_iked_handler_for_address: daemon index for 10.66.52.18 is 2. number of iked daemons: 3

Line  89971: [iked0 4129 4067185088]@cmhfw1[11 Jun 20:31:59][IKED] is_handled_by_local_daemon: this is IKED 0, 10.66.52.18 is handled by IKED 2

Line  89987: [iked0 4129 4067185088]@cmhfw1[11 Jun 20:31:59][tunnel] RIM_Sync: Updating route for peer <10.66.52.18, 1>

Line  89988: [iked0 4129 4067185088]@cmhfw1[11 Jun 20:31:59][tunnel] tnlmon_db_get: Failed to get gateway = 10.66.52.18, type = 1 values

Line  89989: [iked0 4129 4067185088]@cmhfw1[11 Jun 20:31:59][tunnel] RIM_Sync: failed to retrieve the attributes for dbkey = <10.66.52.18, 1>

Line  93806: [iked0 4129 4067185088]@cmhfw1[11 Jun 20:41:05] GetEntryIsakmpObjectsHash: received ipaddr: 10.66.52.18 as key, found fwobj: NULL

Line  93807: [iked0 4129 4067185088]@cmhfw1[11 Jun 20:41:05] InsertXIsakmpObjectsHash: inserting interface: 10.66.52.18 for object: g_htrs

Line  93888: [iked0 4129 4067185088]@cmhfw1[11 Jun 20:41:05][tunnel] VIF_Diff_GetVPNIfTable: Inserting <36, 10.66.52.18>

0 Kudos
Duane_Toler
Advisor

Is 10.66.52.18 the Palo peer?  If so, I notice the debug says this is being handled by IKED instance 2.  Have you looked through that debug?  $FWDIR/log/iked2.elg

Does one side have "Prefer IKEv2 but support IKEv1" enabled?  I had a VPN on a Check Point gateway talking to a peer with an unknown device and I was seeing dual SAs being attempted.  One of the ends did not appreciate that and we had a flaky VPN, too.  We swapped the Check Point side to IKEv2 only and it helped.

 

0 Kudos
the_rock
Legend
Legend

Thats right Duane, that IP is PAN side of the tunnel. We tried all the combinations to make it work with strong enc. algorithms, nothing helped, until we switched to ike v1 on both sides and lower onc methods.

I think what @AmirArama said is definitely best idea. We will have call next week with everyone and see how they wish to proceed. I will update the thread once we have more info.

Thanks as always for the help.

Best,

Andy

0 Kudos
Duane_Toler
Advisor

Yeah I agree.  Debug it as hard and heavy as you can for whatever maintenance window you can get.  That's wild.  Good luck and godspeed to all.  Maybe you can get TAC Tier 3 or High-End team on it, too.

Another item I haven't seen mentioned here yet:  Do you have DPD / Permanent Tunnels enabled on this VPN?  Likewise, does PAN have DPD and or its own "Tunnel Monitoring" enabled?  In a completely grabbing-at-straws effort, I scoped out an off-the-cuff PAN article about their version of it which also doesn't require RFC-compliant DPD.  I have zero intel about this, tho.  My PAN-fu is nil.

0 Kudos
the_rock
Legend
Legend

Yep, DPD is enabled on both sides, though same issue was there even when it was not. Man, this is driving me bonkers, but I wont lose more sleep over it...lol

Andy

0 Kudos
Duane_Toler
Advisor

I dunno how relevant this PAN doc was, but it did say it had two kinds of Tunnel Monitoring: one was DPD and one was a ping of sorts and either of them would bring down the tunnel.  https://knowledgebase.paloaltonetworks.com/KCSArticleDetail?id=kA10g000000ClFaCAK#:~:text=DPD%20is%2...

Again, I have zero background on this and I'm just parsing "vendor words" to mean something.  This might not even be useful, but in the off chance it is... then ok.

For DPD on the Check Point side, you'll need Permanent Tunnels enabled to do the same.  The Site-to-Site VPN admin guide has a whole section on DPD that might be worth a read, plus a ckp_regedit command that could be useful.

Good luck!

(1)
the_rock
Legend
Legend

Thanks man, will definitely keep that article in mind and send it to the guy that has PAN access. This is why I love the community, EVERYONE is so KIND and HELPFUL.

I truly appreciate it brother.

Have a nice weekend! Well, its my weekend already, since I have Friday off lol

Andy

0 Kudos
the_rock
Legend
Legend

Keep in mind that starting R81 or R81.10, when you enable permanent tunnel setting inside the community, interoperable object belonging to that community automatically shows up as DPD in guidbedit. We also double checked and sure enough, thats exactly what it showed, not tunnel_test, which is default.It all makes sense, considering customer is on R81.20, cant recall what jumbo, though thats not relevant in this case.

Best,

Andy

0 Kudos
the_rock
Legend
Legend

Forgot to say, yes, we ended up using ikev1, but again, even with ikev2, it works, but its intermittent. I cant say 100% if its same with ikev1, but will verify with them.

Thanks again.

Andy

0 Kudos
AmirArama
Employee
Employee

Hi,
i guess it's not related to Route vs. Domain based VPN.

my suggestion: enable VPN debug on the GW, dig in, and look for the sent ID. that's the only way i know to analyze the negotiation.

the_rock
Legend
Legend

I think you are absolutely correct @AmirArama . Plan is to have guy on the other end open PAN case and they can check further. So far, we cant see any obvious issues on CP end.

Andy

0 Kudos
the_rock
Legend
Legend

I will mark your post as solution for now, as it definitely makes lots of sense, but will update the thread once we make it work with strong enc. methods to share how it was done.

Best,

Andy

0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events