Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
the_rock
Legend
Legend
Jump to solution

VPN redundancy issue from onprem cluster to Harmony sase

Hey guys,

We already do have case with escalation TAC team on this, but figured would also post it here to see if anyone might have seen this sort of issue before. Essentially, here is the breakdown to make long story short:

-onprem cluster, 6200 appliances, R81.20 jumbo 99, mgmt is Smart-1 cloud, R82

-2 POPs involved , lets call them POP 2 and POP 3

-if users randomly get connected to Pop 3, no issues at all, but if it goes to Pop 2, nothing works.

Senior P81 guy checked everything, verified no issues on their end. They checked the routing, logs, all checked out fine.

Drop on CP side shows according to policy, packet should not have been decrypoted. Funny enough, my colleague and I initially came up with an idea (before that error happened), for redundancy, to have 2 interoperable objects set as center gateways and onprem cluster as satellite, that worked for maybe a week, but then issue happened.

TAC said was fine to have empty group as enc. domains for all 3 entities (if you will), but no joy. Below  is one thing thats interesting that comes up, when they did debug, which is super odd, since they even verified VTIs are configured properly, no issues.

@;166106316.42204434;14Aug2025 14:37:41.021523;[vs_0];[tid_1];[fw4_1];get_peer_vpn_if_mapping_cpip: no vpn interface for peer x.x.x.x;

@;166106316.42204435;14Aug2025 14:37:41.021525;[vs_0];[tid_1];[fw4_1];dynamic_vpn_ip: dir 0, 10.255.0.34:1 -> 192.168.32.50:0 IPP 1 Chain: 0x7f77a4531bc8, IP: 192.168.32.50 Decr_Peer: x.x.x.x Position: 18 ;

@;166106316.42204436;14Aug2025 14:37:41.021526;[vs_0];[tid_1];[fw4_1];connection_should_be_tagged: connection should have been tagged.;

@;166106316.42204437;14Aug2025 14:37:41.021527;[vs_0];[tid_1];[fw4_1];fwconn_get_app_opaque: connection not found;

 

For what is worth, all tunnels show as permanent and UP and doing a reset of the tunnel sadly does not help.

 

I am grateful for any insights/suggestions.

 

Andy

0 Kudos
2 Solutions

Accepted Solutions
the_rock
Legend
Legend

I had good conversation with one of customer's SE and he told me escalation team is 100% sure based on all they had discovered this is onprem cluster issue and not SASE side. They engaged R&D on it, so once they get back to me, will update on how this is all sorted out. It might take some time, but Im confident it will get solved.

Andy

View solution in original post

0 Kudos
the_rock
Legend
Legend

Quick update..esc. engineer asked me if I can recreate affected VTI and try that. I was more than willing to, but since even after deleting the interface from route it was part of, kept complaining, so I simply ended up disabling/re-enabling VTI on both fws and bam, ping started working right away on affected POP.

I asked the client to test, but its amazing news, lets see what they say.

24 hours later, looks good. Its still bit confusing why this worked, makes me wonder if it could be a bug or something else, but as long as it works, good enough for me.

 

Andy

View solution in original post

0 Kudos
10 Replies
PhoneBoy
Admin
Admin

When you're doing VTIs, empty encryption domains are normal.
That said, it's hard to say which "Check Point" end may be the issue here.

the_rock
Legend
Legend

I dont know, very strange problem...

Andy

0 Kudos
Chris_Atkinson
Employee Employee
Employee

So if this is VTIs how is all the supporting routing configured is there a diagram explaining it?

CCSM R77/R80/ELITE
0 Kudos
the_rock
Legend
Legend

Hey Chris,

My colleague might have a diagram, but odd thing is that though vti's are configured exact same way for both tunnels, one never seems to work. Lets see what escalation team says. I wish sd wan was supported with sase, this would never be an issue.

Andy

0 Kudos
AmirArama
Employee
Employee

Hi

How many ISPs do you have on the on prem cluster?

If you have at least two, you can set one tunnel from each ISP.

Configure Redundant tunnels in P81 side - for each tunnel configure different on-prem interface public IP.

On the CP side configure two interoperable device as center in community. 

Two vti's - one per interoperable device.

Set static route such that each sase gw vpn peer IP goes out via the corresponding ISP (as what P81 expect)

Notice that if sase is able to initiate tunnel to on prem (inbound is working) it will work. But if not and only your on prem can initiate tunnel to the sase then the ID the GW send during neg must be accurate as configured in sase tunnel configuration under remote ID. or sase will reject it. (Can be seen in vpn debug. Note GAIA currently can't send different ID per interface)

Make sure both tunnels to both sase GWs are UP (vpn tu tlist)

Configure bgp vs each VTI and make sure your advertise your relevant networks to sase peers and accept routes from it.

Verify bgp established: show bgp peers

Create routemap or inbound route filters+route redistribution to accept/advertize routes.

Verify routes learned and advertized.

Let me know on which stage you get issue.

0 Kudos
the_rock
Legend
Legend

Hey Amir,

Thanks for all that, appreciated! Yes, we verified all the points, so now Im waiting for Escalation guy from TAC to provide next details based on the debugs. 

But again, here is what baffles me personally, why would it show this when we verified (with both P81 and CP TAC) that VTIs are set correctly.

@;166106316.42204434;14Aug2025 14:37:41.021523;[vs_0];[tid_1];[fw4_1];get_peer_vpn_if_mapping_cpip: no vpn interface for peer x.x.x.x;

@;166106316.42204435;14Aug2025 14:37:41.021525;[vs_0];[tid_1];[fw4_1];dynamic_vpn_ip: dir 0, 10.255.0.34:1 -> 192.168.32.50:0 IPP 1 Chain: 0x7f77a4531bc8, IP: 192.168.32.50 Decr_Peer: x.x.x.x Position: 18 ;

@;166106316.42204436;14Aug2025 14:37:41.021526;[vs_0];[tid_1];[fw4_1];connection_should_be_tagged: connection should have been tagged.;

@;166106316.42204437;14Aug2025 14:37:41.021527;[vs_0];[tid_1];[fw4_1];fwconn_get_app_opaque: connection not found;

0 Kudos
AmirArama
Employee
Employee

I don't know.

I can't comment on few debug prints without the full kernel debug. as well as without whole context, configurations, status, and other outputs.

0 Kudos
the_rock
Legend
Legend

Fair enough. I sent you the case number via DM, if you are able to check, no rush.

Andy

0 Kudos
the_rock
Legend
Legend

I had good conversation with one of customer's SE and he told me escalation team is 100% sure based on all they had discovered this is onprem cluster issue and not SASE side. They engaged R&D on it, so once they get back to me, will update on how this is all sorted out. It might take some time, but Im confident it will get solved.

Andy

0 Kudos
the_rock
Legend
Legend

Quick update..esc. engineer asked me if I can recreate affected VTI and try that. I was more than willing to, but since even after deleting the interface from route it was part of, kept complaining, so I simply ended up disabling/re-enabling VTI on both fws and bam, ping started working right away on affected POP.

I asked the client to test, but its amazing news, lets see what they say.

24 hours later, looks good. Its still bit confusing why this worked, makes me wonder if it could be a bug or something else, but as long as it works, good enough for me.

 

Andy

0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events