Solved: Re: VPN redundancy issue from onprem cluster to Ha...

the_rock · ‎2025-08-19

Hey guys,

We already do have case with escalation TAC team on this, but figured would also post it here to see if anyone might have seen this sort of issue before. Essentially, here is the breakdown to make long story short:

-onprem cluster, 6200 appliances, R81.20 jumbo 99, mgmt is Smart-1 cloud, R82

-2 POPs involved , lets call them POP 2 and POP 3

-if users randomly get connected to Pop 3, no issues at all, but if it goes to Pop 2, nothing works.

Senior P81 guy checked everything, verified no issues on their end. They checked the routing, logs, all checked out fine.

Drop on CP side shows according to policy, packet should not have been decrypoted. Funny enough, my colleague and I initially came up with an idea (before that error happened), for redundancy, to have 2 interoperable objects set as center gateways and onprem cluster as satellite, that worked for maybe a week, but then issue happened.

TAC said was fine to have empty group as enc. domains for all 3 entities (if you will), but no joy. Below is one thing thats interesting that comes up, when they did debug, which is super odd, since they even verified VTIs are configured properly, no issues.

@;166106316.42204434;14Aug2025 14:37:41.021523;[vs_0];[tid_1];[fw4_1];get_peer_vpn_if_mapping_cpip: no vpn interface for peer x.x.x.x;

@;166106316.42204435;14Aug2025 14:37:41.021525;[vs_0];[tid_1];[fw4_1];dynamic_vpn_ip: dir 0, 10.255.0.34:1 -> 192.168.32.50:0 IPP 1 Chain: 0x7f77a4531bc8, IP: 192.168.32.50 Decr_Peer: x.x.x.x Position: 18 ;

@;166106316.42204436;14Aug2025 14:37:41.021526;[vs_0];[tid_1];[fw4_1];connection_should_be_tagged: connection should have been tagged.;

@;166106316.42204437;14Aug2025 14:37:41.021527;[vs_0];[tid_1];[fw4_1];fwconn_get_app_opaque: connection not found;

For what is worth, all tunnels show as permanent and UP and doing a reset of the tunnel sadly does not help.

I am grateful for any insights/suggestions.

Andy

Best,
Andy

the_rock · ‎2025-08-20

I had good conversation with one of customer's SE and he told me escalation team is 100% sure based on all they had discovered this is onprem cluster issue and not SASE side. They engaged R&D on it, so once they get back to me, will update on how this is all sorted out. It might take some time, but Im confident it will get solved.

Andy

Best,
Andy

View solution in original post

the_rock · ‎2025-08-25

Quick update..esc. engineer asked me if I can recreate affected VTI and try that. I was more than willing to, but since even after deleting the interface from route it was part of, kept complaining, so I simply ended up disabling/re-enabling VTI on both fws and bam, ping started working right away on affected POP.

I asked the client to test, but its amazing news, lets see what they say.

24 hours later, looks good. Its still bit confusing why this worked, makes me wonder if it could be a bug or something else, but as long as it works, good enough for me.

Andy

Best,
Andy

View solution in original post

PhoneBoy · ‎2025-08-19

When you're doing VTIs, empty encryption domains are normal.
That said, it's hard to say which "Check Point" end may be the issue here.

the_rock · ‎2025-08-19

I dont know, very strange problem...

Andy

Best,
Andy

Chris_Atkinson · ‎2025-08-19

So if this is VTIs how is all the supporting routing configured is there a diagram explaining it?

CCSM R77/R80/ELITE

the_rock · ‎2025-08-19

Hey Chris,

My colleague might have a diagram, but odd thing is that though vti's are configured exact same way for both tunnels, one never seems to work. Lets see what escalation team says. I wish sd wan was supported with sase, this would never be an issue.

Andy

Best,
Andy

AmirArama · ‎2025-08-20

Hi

How many ISPs do you have on the on prem cluster?

If you have at least two, you can set one tunnel from each ISP.

Configure Redundant tunnels in P81 side - for each tunnel configure different on-prem interface public IP.

On the CP side configure two interoperable device as center in community.

Two vti's - one per interoperable device.

Set static route such that each sase gw vpn peer IP goes out via the corresponding ISP (as what P81 expect)

Notice that if sase is able to initiate tunnel to on prem (inbound is working) it will work. But if not and only your on prem can initiate tunnel to the sase then the ID the GW send during neg must be accurate as configured in sase tunnel configuration under remote ID. or sase will reject it. (Can be seen in vpn debug. Note GAIA currently can't send different ID per interface)

Make sure both tunnels to both sase GWs are UP (vpn tu tlist)

Configure bgp vs each VTI and make sure your advertise your relevant networks to sase peers and accept routes from it.

Verify bgp established: show bgp peers

Create routemap or inbound route filters+route redistribution to accept/advertize routes.

Verify routes learned and advertized.

Let me know on which stage you get issue.

the_rock · ‎2025-08-20

Hey Amir,

Thanks for all that, appreciated! Yes, we verified all the points, so now Im waiting for Escalation guy from TAC to provide next details based on the debugs.

But again, here is what baffles me personally, why would it show this when we verified (with both P81 and CP TAC) that VTIs are set correctly.

@;166106316.42204434;14Aug2025 14:37:41.021523;[vs_0];[tid_1];[fw4_1];get_peer_vpn_if_mapping_cpip: no vpn interface for peer x.x.x.x;

@;166106316.42204435;14Aug2025 14:37:41.021525;[vs_0];[tid_1];[fw4_1];dynamic_vpn_ip: dir 0, 10.255.0.34:1 -> 192.168.32.50:0 IPP 1 Chain: 0x7f77a4531bc8, IP: 192.168.32.50 Decr_Peer: x.x.x.x Position: 18 ;

@;166106316.42204436;14Aug2025 14:37:41.021526;[vs_0];[tid_1];[fw4_1];connection_should_be_tagged: connection should have been tagged.;

@;166106316.42204437;14Aug2025 14:37:41.021527;[vs_0];[tid_1];[fw4_1];fwconn_get_app_opaque: connection not found;

Best,
Andy

AmirArama · ‎2025-08-20

I don't know.

I can't comment on few debug prints without the full kernel debug. as well as without whole context, configurations, status, and other outputs.

the_rock · ‎2025-08-20

Fair enough. I sent you the case number via DM, if you are able to check, no rush.

Andy

Best,
Andy

the_rock · ‎2025-08-20

I had good conversation with one of customer's SE and he told me escalation team is 100% sure based on all they had discovered this is onprem cluster issue and not SASE side. They engaged R&D on it, so once they get back to me, will update on how this is all sorted out. It might take some time, but Im confident it will get solved.

Andy

Best,
Andy

the_rock · ‎2025-08-25

Quick update..esc. engineer asked me if I can recreate affected VTI and try that. I was more than willing to, but since even after deleting the interface from route it was part of, kept complaining, so I simply ended up disabling/re-enabling VTI on both fws and bam, ping started working right away on affected POP.

I asked the client to test, but its amazing news, lets see what they say.

24 hours later, looks good. Its still bit confusing why this worked, makes me wonder if it could be a bug or something else, but as long as it works, good enough for me.

Andy

Best,
Andy

Are you a member of CheckMates?

VPN redundancy issue from onprem cluster to Harmony sase