WAN Failover when BGP Peer becomes unavailable in ...

dphonovation · ‎2022-10-04

Hi all,

Looking for some advice on the best way to deal with a fault scenario in a stretched 2 site environment. It will be a stretched VMWare/VSan/NSX environment. I'm currently waiting on the interconnect between the two sites to be complete, but I have tested the basic facts here (including being able to advertise any of my BGP routes from either site) from indepedent clusters at each site. When the interconnect is done, I plan to move all Checkpoints to be managed by the same MGMT server plane.

I will have 2 sites with an independent Fiber Layer 2 interconnect provided by the ISP:
* Each site has an Ethernet direct routing peer expecting BGP from me
* I am broadcasting a /29
* Each site has a different BGP gateway

On my side:
* Each site has 2 VIRTUAL checkpoints
* Each site has its own WAN route
* Sites are interconnected to each other on a Layer 2 interconnection

Assumptions I'm making:
* It must be a 4 node cluster (1 logical cluster/2 members at each site/Cluster sync and all VIPs stretched across interconnect). If I have a VM (ie: 10.10.171.248) with the default gateway of the MGMT VIP (10.10.171.1), I need that VIP to move between the sites so the VM can still get out no matter which side the VM or the Checkpoint is on. If the VM ends up living on a site where WAN is down, the stretch should carry it over to the other side. I don't really see a way around this. If the checkpoint participate in a VLAN I need stretched, I couldn't possibly make 2 clusters (2 logical clusters/2 members at each site) since they cannot both maintain control of the VIP representing the default gateway on that VLAN. Please correct me if I'm clearly missing something w/ this assumption.

Here is my drawing:

Imgur link for high rez: https://i.imgur.com/AqdFmNG.png

These Checkpoints are virtual. The only way there is a "hard failure" or a "link down" is if VMWare/Vsphere itself fails. It will failover to the other checkpoint onsite on a different ESX host as expected. In a total power loss scenario, the only 2 remaining checkpoints would be at the opposite site. I'm OK with this. Since both sites will be advertising the same BGP route and become the primary route when they are failed over to - my ISP core will route incoming traffic to me via the opposite site. Outgoing traffic will still use the internal VIP (MGMT in this case), and would also work.

But what happens if my BGP peer goes down? For instance: If the BGP gateway goes down (ie: I'm unable to ping it but all Checkpoints are powered on & would still see the interface as up). Or if for whatever reason, BGP cannot be established with the peer? How can I ensure that if Site 1 cannot reach its WAN peer, the 4 node cluster attempts a failover? Without a failover, the LAN Side Cluster MGMT VIP ( 10.10.171.1 used as the default gateway for the VM), stays with the Checkpoint that has no WAN peer.

There is no BFD. Is ping detection enough (on the default route or the BGP peer)? Or should I be using clusterXL_monitor_ips script as seen here: https://sc1.checkpoint.com/documents/R81/WebAdminGuides/EN/CP_R81_ClusterXL_AdminGuide/Topics-CXLG/c... ?

To try and put it simpler: If WAN becomes unreachable at Site 1, I want both the ClusterXL WAN VIP & the ClusterXL MGMT VIP to move to Site 2.

Appreciate any advice, including "you're dumb, do it this way". Thanks for reading.

Chris_Atkinson · ‎2022-10-04

Just quickly (will come back on the wider topology question), the upcoming R81.20 introduces:

Support for Routed control scripts to allow ClusterXL fail-over and tear down of BGP connections.

CCSM R77/R80/ELITE

dphonovation · ‎2022-10-04

Sounds just like what I need!!!

dphonovation · ‎2022-10-05

Just a friendly reminder ping and also, when is 81.20 expected to be released?

Chris_Atkinson · ‎2022-10-05

No worries I haven't forgotten.

Currently expected before the end of 2022.

CCSM R77/R80/ELITE

Chris_Atkinson · ‎2022-10-05

In general what does the rest of the network fabric around the Firewalls look like and what is the interconnect planned to be single/dual/diverse - different carrier to the Internet?

As is it's difficult to say what's practical in terms of being changed / improved and for what gains e.g. with additional dynamic routing etc.

CCSM R77/R80/ELITE

dphonovation · ‎2022-10-10

Interconnect is a single point of failure provided by ISP, layer 2. On its own switch.

Each site gets WAN from same ISP but their own gateways (not interconnect dependant). I advertise a BGP range to that gateway peer from each site as I choose.

I've thought about in the past few days and decided to break them up into 2x2 node clusters instead of 4. Managed by the same SMS via the interconnect, but in their own /24 space. This way I can treat them as separate sites. I split my block into 4, assigned 1 to each site and the rest will be shared.

And in terms of failover if BGP goes down on a node in the cluster, I think the cluster monitor script is my only option to force a failover? (I'm imagining, someone wrongly hits the disconnect network adapter button in vmware for instance). That is, until the routed scripts feature comes along.

Are you a member of CheckMates?

WAN Failover when BGP Peer becomes unavailable in a stretched cluster environment.