Re: Optimal steps for ClusterXL cluster upgrade

Benjamin_Olson · ‎2019-08-22

I have an upcoming change that will involve replacing a pair of ClusterXL firewalls with new hardware. My organization has only been using ClusterXL for a year and this will be the first time we are replacing a cluster that is extremely high-impact to applications/end users. The new cluster will have different physical address IPs but will take over the existing Virtual IPs. I am wondering what are the recommended steps for transitioning from one cluster to another with the least amount of impact.

For the clusters I have replaced since moving from VRRP to ClusterXL, I have stood up the new cluster side-by-side the existing one with different physical IPs in the same subnets. Added them to Smart Console with those IPs and left the VIPs blank on the new cluster until the time to cutover to them. The new pair would already have the same policy before the change is done, but would not have any VIP information in them.

For the actual cutover my steps have been:

1) Update new cluster object to add both VIPs and save
2) Begin policy push to new cluster
3) While policy is pushing stop services on backup member of old cluster, then stop services on primary member
4) As soon as policy shows it is pushed verify that the VIPs show up in the new cluster
5) Refresh ARP manually if necessary on connecting L3 switches to avoid issues with ARP cache

Can I do anything to improve upon this plan? Have any suggestions for minimizing the impact?

Lari_Luoma · ‎2019-08-22

Hi!

Your plan sounds about right. Make sure you remove the old cluster from the network entirely (shut down the switch ports) after the change is done. It might be that just shutting down the services would not be enough. Good luck!

Benjamin_Olson · ‎2019-08-22

Yep, before actually removing the old hardware the last step would be to push the policy out to the old cluster which will remove the VIP information on the gateways so they don't come back online accidentally.

JozkoMrkvicka · ‎2019-08-22

One important lesson learned from the past:

TURN OFF STARTING CHECKPOINT PROCESSES AFTER REBOOT ON OLD DEVICES.

In "cpconfig" there is an option to disable automatic start of Check Point processes. Turn this off on old devices to avoid cpstart after accidental reboot.

By the way, in case interface names will differ on devices it will be known by firewall and firewall will assign this subnet to the corrrect interface where subnet is configured.
In case you have in Topology interface name Lan1.20 with subnet 10.10.10.0/24 and new appliance has interface eth5.20 with subnet 10.10.10.0/24, the firewall will recognize it during policy push and no issues will be seen.

Kind regards,
Jozko Mrkvicka

phlrnnr · ‎2019-08-22

This method works well:

https://community.checkpoint.com/t5/Enterprise-Appliances-and-Gaia/Migrating-cluster-from-old-to-new...

If you have IP space in your existing cluster network, you may be able to join the new hardware into the existing cluster and bring it in as Standby / Ready (depending on OS / software version), and then cutover to it with a blip or possibly with full connection synchronization. That way the VIP will just move over to it automatically as it is already part of the cluster.

Might be worth testing it out in a lab. I just tested the process in the link above in the lab swapping out open servers for 6500 appliances. I didn't use new IPs on the new cluster members. I just made sure the appliance I was replacing was down before bringing up the new one, and made sure to clear arp on the corresponding routers. I lost one ping in the failover to the new hardware. As expected, because full sync didn't work across the different hardware, connections had to be re-established, but the impact was minimal, and most applications these days recover pretty well.

Maarten_Sjouw · ‎2019-08-22

This will work IF you run the same software on all members, when you however do a version update in the same migration, you should make sure the new members cannot take over the VIP's. Had a situation like this a while back, only connected the internet side (we manage through the internet) and they ne boxes took on the VIP as well (old members show down so) and when an arp request came the newer faster boxes responded first... This means you need to shut clustering until you are ready to migrate.
I still have not found a way to prevent ClusterXL to start working until I allow it (that will survive a reboot).
VRRP has that mechanism and this can be simply configured from clish and saved.

Regards, Maarten

JozkoMrkvicka · ‎2019-08-23

1. ClusterXL_admin down -p (-p means PERSISTENT, the member will be down until you specifically bring it up)
2. cpconfig -> Start Automatic of Check Point Services after reboot -> Disable

Kind regards,
Jozko Mrkvicka

Maarten_Sjouw · ‎2019-08-24

clusterXL_admin down -p does not keep it down in this situation, down does not mean down but Problem state. So when there is (according to its own measurement) no other member that is active it WILL become active.
I know cpconfig and how to disable autostart, but that is not really something you think about when you are doing these things and yes you can disable cluster membership there, but it requires a reboot.

I was just stating that there is a very simple way to do it in VRRP and there is no easy way to do it with clusterXL, cphastop will also stop the clustering but it will not survive a reboot either.

Regards, Maarten

JozkoMrkvicka · ‎2019-08-24

Hmmm... never faced the situation I had to use -p flag in clusterXL_admin down command, but it is a really good question if another member is also down (let's say powered off) if this member with clusterXL_admin down with -p flag will switch to active ... I think it shouldn't = will be down, but cpstarted.

Anyway, in case we need to make sure member must be down all the time, we will switch off external interface + sync link via console and problem solved 🙂

Kind regards,
Jozko Mrkvicka

Maarten_Sjouw · ‎2019-08-25

the case was we have a cluster and add 2 new members to replace them. While not ready to do the actual replacement, but the fisical installation has been done already. The new members run a newer version and do think the older version is DOWN.

Regards, Maarten

Alexander_Asta1 · ‎2019-09-03

Hi @Benjamin_Olson ,

Instead of stopping the services for the old firewalls I would suggest shutting down the interfaces. IMHO it is

simpler - you can prepare the clish commands for "interface state off" and just copy paste it

faster - faster solution in case of rollback

Couple of months ago I have done something similar for one of our customer - the migration was both software and hardware (switched to new devices running newer version). Our plan was:

1. Prepared the new devices (network config and policy)

2, Shut down the interfaces on the switch and on the firewalls

3. Configured VIP on the few cluster and pushed policy

4. In the day of migration copy pasted the commands to shutdown FW and switch interfaces for the old cluster

5. Copy pasted the commands for enabling interfaces on FW and switch for the new cluster

6. We had issues with one VPN tunnel so we had to rollback and just copy-pasted the interface commands again (down for new and up for old cluster of course)

You can choose to shutdown interfaces only on one device (switch or firewall)

JozkoMrkvicka · ‎2019-09-05

Just one important note:
You cannot configure any VLANs on the interface which is down. So turning interface off (set interface state off) will not allow you to configure anything on that interface. You need to turn it on and ensure that link in not up (simple shut port on switch).

Kind regards,
Jozko Mrkvicka

Are you a member of CheckMates?

Optimal steps for ClusterXL cluster upgrade