Re: VPN between 2 Checkpoint clusters fails when 2...

Chris_W23 · ‎2024-10-18

Good day,

I am having an issue with a VPN tunnel between 2 Checkpoint clusters dropping when both of my service provider's links are available. The tunnel establishes and works when only 1 ISP path is connected.

The cluster at Site A is a pair of open server systems running 81.10. The cluster at Site B is a pair of 3800 appliances running 81.10 as well.

Our local service provider has provisioned two distinct L2 circuits for us with unique VLANs within their network, and handed off to us at each location on access ports.

This configuration has been in place since earlier this year, but we found out recently that the provider's secondary circuit (shown in red in the attached image) has not been functioning correctly and was not passing traffic. Our tunnel has worked properly and failover between the cluster nodes has worked correctly as well, but that was only when the provider's primary path was available. When that path went down, the tunnel went down and didn't re-establish over the secondary path.

When the provider fixed the secondary path and it became active (the paths are active-active) the tunnel went down and didn't come back after 5 minutes. We tried reseeting the tunnel and that did not help. When we physically disconnected the secondary circuit, the tunnel re-established. Further tests have shown that when both circuits are available, the tunnel drops.

In our design, the FW interfaces are connected to a pair of our switches (themselves clustered/stacked too) with each switch uplinked to one of the ISP switches. The firewalls are on the same internal VLAN (53 or 24) as the access ports to the ISP switches. The intention here is to merge the provider VLANs on our switches and have each FW node be able to communicate over either ISP path. The problem seems to be that when both paths are up, something gets confused and the traffic doesnt go through to establish the tunnel.

Does this design make sense and *should* work, or are we missing something that would cause it not to function as intended?

Thanks

Chris

PhoneBoy · ‎2024-10-18

What's not clear from this diagram is how the gateway determines which ISP path to take.
Can you describe this in more detail?

Chris_W23 · ‎2024-10-18

Hi,

Thanks for the reply. I guess the honest answer is that we don't know how it would determine which path to take when both paths are active. The gateways aren't configured in dual-ISP mode or anything like that and its basically two L2 services between the sites.

I think we were under the assumption that the L2 broadcasts and the uplinked switches would make the path determinations, instead of being the responsibility of the gateways.

Thanks

Chris

PhoneBoy · ‎2024-10-21

Do the different paths have a different next-hop MAC address?
That might cause some issues with SecureXL, which I believe caches this information.
"Failing over" would result in a MAC address change for the next hop and SecureXL would be still sending the traffic to the old MAC in this case.
That's just a guess, though.

Chris_Atkinson · ‎2024-10-18

The diagram has multiple considerations not least of which are potential spanning-tree issues

Are the switches Layer-2 only or is there some routing configuration that we aren't seeing here?

CCSM R77/R80/ELITE

Chris_W23 · ‎2024-10-21

Hi Chris, the switches are L2 only for these services.

Thanks

Chris

Chris_Atkinson · ‎2024-10-21

In that case you need to rework the logical topology so that each firewall is aware that it has two paths/routes.

I.e. map an interface on the firewall to each path

CCSM R77/R80/ELITE

Are you a member of CheckMates?

VPN between 2 Checkpoint clusters fails when 2 ISP paths are available