BGP peering with ClusterXL across 2 sites - use a ...

Rccou · ‎2020-03-11

Hi,

We have 2 sites connected by a 10Gb circuit. We have a pair of Firewalls running R80.30 set as Active/Standby using ClusterXL.

We have an eBGP peering to a remote entity which uses the Cluster VIP on our side.

The problem, as noted yesterday during a downtime window for one of the sites, is that if we take a site down for maintenance the FWs stop talking to each other so they switch the Standby FW to Active and it takes over the VIP. However this site was the one we had taken down therefore the whole company lost connection to the remote entity.

I read that it is best practice to use the VIP for ClusterXL and BGP but is that only really the case when both FWs are in the same rack?

If they are in different sites would it make more sense to have 2 eBGP peerings and an iBGP peering between them?

Is there going to be any problem with setting this up, due to them being essentially the same cluster?

Will this work?

Chris_Atkinson · ‎2020-03-11

Could you please tell us more about your BGP configuration and process used to take a site offline?

Are you using graceful restart and is there only a single non-redundant path between the sites...

CCSM R77/R80/ELITE

Rccou · ‎2020-03-11

Hi Chris,

Taking the site offline was done by powering down the switches that connects the FW to the peer router. It's a fibre across a DC.

This would have also taken down the site to site cluster communications so it would trigger the cluster switchover. Here's another question: Would the VIP move from the old Active (would this go to Standby?) to the new Active or would they both keep the VIP independently?

The BGP peering is with a Juniper MX. It's completely basic eBGP. Each FW has a peering to a different MX router, both coming from the same VIP. No iBGP between them.

(if it's done wrong or suboptimally then feel free to say - I inherited it)

Thanks

Wolfgang · ‎2020-03-11

Rccou,

what you describe is normal behaviour with ClusterXL HA.

One node is active with the VIP and the other node is in standby.

If active is failing, then standby becomes active and the VIP will be active there.

Maybee you have a design problem for your needs and you are running better not using ClusterXL.

As an example one gateway as single instance in every location, both are active. And with dynamic routing internal and external (BGP, OSPF, BFD) you can build your redundancy.

Wolfgang

Are you a member of CheckMates?

BGP peering with ClusterXL across 2 sites - use a VIP