Failover between different HW with cphacu

TheRealDiZ · ‎2019-11-15

Hi wonderful checkmates!

I got a quick question for you:

I just want to do a zero downtime upgrade.

I’m upgrading R77.20 4400 to 5600 brand new appliances with R80.30.

Do you think with different HW the cluster will be in Active/Down and cphacu start will work?

I’ve never tried it before but I think with the same CoreXL instances it will work.

D!Z

PhoneBoy · ‎2019-11-15

Generally speaking, clustering appliances of different hardware is NOT supported.
It definitely won't work because the 4400 and 5600 have a different number of cores.

TheRealDiZ · ‎2019-11-18

Hi @PhoneBoy ,

I know that cluster will be in Active/Down.

What I'm asking here to be specific:

If I properly set SAME EXACT number of CoreXL instances on both members via cpconfig (even if at HW level there are different phisical CPU Cores), do you think cphacu start command won't work properly?

D!Z

PhoneBoy · ‎2019-11-18

The number of workers will be the same, but the number of firewall instances will still be different.
That won't work.

TheRealDiZ · ‎2019-11-20

Do you have any suggestion how to provide zero downtime upgrade in this case?

G_W_Albrecht · ‎2019-11-20

You could involve TAC and let them research for a solution - but as we all know, this is not supported...

CCSE CCTE CCSM SMB Specialist

Kaspars_Zibarts · ‎2019-11-20

Connectivity upgrade should work according to this

https://sc1.checkpoint.com/documents/Best_Practices/Cluster_Connectivity_Upgrade/html_frameset.htm

Ideally test the process in the lab 🙂

Might be tricky though to push policy middle of upgrade if you are changing interface names as then you need to re-do topology and spoofing.

In those cases, I normally pre-build new boxes in the lab and install latest policies using lab Mgmt server. Then you do "hard" failover by connecting one of new firewalls instead of existing standby and then doing cpstop on old active. Then add the other new one and once both are running, establish SIC and update cluster object relevant parts on production Mgmt server.

You will loose 1-3 pings and rest of the connections will have to be re-established of course. So it's not a zero downtime.

If you're not changing interface names, try the process from the document, it should work

Kaspars_Zibarts · ‎2019-11-20

Oh bummer, your 4400 most likely is running 32 bit OS so in that case you won't be able to sync connection tables with 64 bit GAIA on 5600

TheRealDiZ · ‎2019-11-21

Hi Guys,

Thank you all for your feedback.

In this case I will provide a downtime but it is very minimal downtime and it gives me the opportunity to roll back immediately on old cluster member.

Let me explain and share with the community, and also let me know guys what do you think about it:

1. Disconnect R77.20 standby cluster member

2. Connect R80.30 new member with ONLY interface that leads to Mgmt Server

3. Change cluster version, fix cluster member topology, install policy (remove flag "if fails")

4. Disconnect R77.20 (DOWNTIME)

5. Quickly Connect all the remaing interface on the R80.30 (check arp table and clear them if needed on network equipment/routers connected to R80.30 gateway)

6. Verify that everything is working fine -- END PROCEDURE

**In case there are issues you can switch back to the R77.20 cluster member

I know this will interrupt all the connections and the customer has to agree on this one.

At least you will have the ability to quickly switch from one node with old SW version to the new one and viceversa.

This is basically the same procedure I have applied with cphacu in other situation.

In this case we cannot use it as we already discussed it, and so I think this is the only way to do it.

If someone has other idea/solution it will be helpful.

Kaspars_Zibarts · ‎2019-11-22

Step 0 - take backup of Mgmt 😉
Its actually should be safe to connect all interfaces (at your own risk) the logic should keep the member with higher version in READY state not active until you stop or disconnect R77.20 member. You should be able to test all steps in VM lab if you have one

HeikoAnkenbrand · ‎2019-11-22

4400 -> 2 Cores -> Intel Celeron Dual-Core E3400 2.6 GHz

5600 -> 4 Cores -> Intel Core i5-4590S, 3.00GHz (Quad-Core)

If you want to use 4 cores on the 5600 appliance, there is no way to change the systems without losing sessions.

More read here: R80.x - cheat sheet - ClusterXL

➜ CCSM Elite, CCME, CCTE

HeikoAnkenbrand · ‎2019-11-22

When you start, the systems should have the following status:

[4400 A] -> active

[4400 B] -> standby

1. [4400 B] Poweroff the R77.20 the standby cluster member (4400 B)

2. [5600 B] Connect to R80.30 new member and configure interfaces and routes,... with the same settings from the old [4400 B].

3. Install SIC, add license, change cluster version, fix cluster member topology, install policy on gateway [5600 B] (remove flag "if fails")

Note: The member with the lower CCP version (GAIA version) remains active [4400 A].

4. [4400 A] Poweroff the R77.20 appliance (4400 A)

Note: Now you're losing all your sessions and the [5600 B] should become active.

5. If possible delete all ARP entries on all participating routers in real time.

6. (5600 A) Connect to R80.30 new second member and configure interfaces and routes,... with the same settings from the old [4400 A]

7. Install SIC, add license, fix cluster member topology, install policy on both new gateways (add flag "if fails")

➜ CCSM Elite, CCME, CCTE

TheRealDiZ · ‎2019-11-25

NICE @HeikoAnkenbrand

This was exactly my idea with a more detailed explanation!!!

I'm happy that you agreed on this one!

TheRealDiZ · ‎2019-11-25

Always a SNAP of Mgmt ALWAYS! haha 😄

Are you a member of CheckMates?

Failover between different HW with cphacu