Achieving Zero downtime in Active-Active setup whi...

bookman · ‎2021-07-27

Query : We are running a pair of Check Point Firewall with ClusterXL in active-active mode. We want to achieve zero downtime when rebooting 1 Checkpoint Firewall which is active as the pair of firewalls is supporting 24 x7 service which we cannot afford downtime during rebooting of firewalls. The consideration is the packets-in-transit passing through the Checkpoint firewall during the reboot.

How to ensure these packets-in-transit has flowed through this particular firewall gracefully and no new packets come in to this firewall before the firewall is reboot?

Version: R80.10

Thanks in Advance.

Benedikt_Weissl · ‎2021-07-27

Both Load Sharing Unicast or Load Sharing Multicast Mode should also provide high availability, so you could try to temporary "disable" the cluster node manually with "clusterxl_admin down" before restarting it.

If uptime is mission critical you might want to migrate to a maestro setup.

PhoneBoy · ‎2021-07-27

Precisely what kind of traffic is it and what blades are enabled?
In General, even in an Active/Passive setup, active connections should survive failover, though there are a few exceptions.
And there is also this bug in R80.40 (not relevant for your version): https://supportcenter.checkpoint.com/supportcenter/portal?action=portlets.SearchResultMainAction&eve...

That said, if you try to do this during a period where both gateways are in heavy use, you WILL have issues as one gateway tries to deal with the increased load.
Active/Active setups in general are better served by either a Maestro setup or a properly-sized Active/Standby cluster.
You’re also better off (most likely) on a newer release since R80.10 is nearing End of Support.

genisis__ · ‎2021-07-27

I recently upgraded a number of clusters during the working day. The Gateways where originally running R77.30, and they are now running R80.40.

In my case I failed over from active(N1) to standby(N2), I rebuilt N1 as R80.40 and ensured MVC was enabled, then simply failed over from N2 to N1. I believe we had 1 or 2 ping drops. I then repeated the process for N2.

Also as indicated below, suggest you go to R80.40 (with a minimum JHFA of T118) or R81.x

Timothy_Hall · ‎2021-07-27

This, administratively down the member you are about to reboot first. This will cause all traffic to shift off this member immediately with practically no disruption. If you just reboot or pull power on an active member, it may take the surviving member 2-2.5 seconds to figure out the other member is gone and start handling its traffic.

Gaia 4.18 (R82) Immersion Tips, Tricks, & Best Practices Video Course
Now Available at https://shadowpeak.com/gaia4-18-immersion-course

Bob_Zimmerman · ‎2021-07-27

And depending on how they are set up, the surviving member may think it has failed and may refuse to take over. This is fairly common when using a simple cable for sync. Sync should always be over a switch.

bookman · ‎2021-08-06

Thank you all for taking your time to answer the query .

Are you a member of CheckMates?

Achieving Zero downtime in Active-Active setup while a Firewall goes for a Reboot.