VSX cluster Hardware Upgrade

Mark_Tremblay · ‎2021-07-21

Hi All,

We are currently running two 12600s in a VSX cluster on R80.30. We have purchased two new 7000s and would like to migrate the current VSX cluster to the new hardware. Is there a recommended migration path/doc for this scenerio?

Thanks,

Mark

Bob_Zimmerman · ‎2021-07-21

This block of shell code run in expert mode will tell you how many times each physical interface or bond is used:

{ vsenv 0 >/dev/null 2>&1;for iface in $(ifconfig|egrep '^[^ ]'|cut -d' ' -f1);do ip addr show $iface|grep inet>/dev/null;if [ $? -eq 0 ];then echo "$iface";fi;done;for vsid in $(ls /proc/vrf/|sort -n|tail -n +2);do vsenv $vsid>/dev/null;ifconfig|egrep '^[^ ]'|cut -d' ' -f1;done }|egrep -v '^(lo|usb|wrpj?|br)[0-9]*$'|cut -d. -f1|sort|uniq -c

Are you using any interface names which don't exist on the new box?

Mark_Tremblay · ‎2021-07-21

We have a bond interface that hasn't been created on the new boxes.

Bob_Zimmerman · ‎2021-07-21

That's not a problem, fortunately. You just need to build the bond before you run your 'vsx_util reconfigure'. What would be a problem is if you were using eth3-## interfaces directly. The 7000 only has two card slots. If you're using interfaces which cannot exist on the new box, you will have to use 'vsx_util change_interfaces' first to move to interfaces which can exist. The ideal VSX deployment has everything in bonds, as bonds are easy to move between physical ports on the members.

If you're going to a different version (I don't know if the 7000 can run R80.30), you will have to use 'vsx_util upgrade' on the management to update the object definitions to the new version.

If all of your interfaces can exist on the new boxes, you need to make your bonds, run through the initial config (either via web UI or config_system), shut down the member you are replacing, use 'vsx_util reconfigure' on the management to have the new physical box take over the old object, then apply any dynamic routing configuration to the VSs. The process is then repeated for the other member.

If all the boxes are running the same version, you can also add the new members to the cluster with 'vsx_util add_member' and remove the old members from the cluster with 'vsx_util remove_member'. This is commonly done if you have hardware information in your hostnames.

Mark_Tremblay · ‎2021-07-22

Ok, I see where you were going with the initial interface question. We are using the bond interface and all of the eth1-xx interfaces. We were not planning on upgrading so we will have to check and make sure the 7000s can run with R80.30. Which method do you see used more often - "vsx_util reconfigure" or "add_member" then "remove_member"?

Thanks again!

RamGuy239 · ‎2021-07-22

Check Point 7000 appliance can't run GAiA R80.30. The CPU in the 7000-series requires 3.10 kernel and R80.30 only has 2.6 kernel. There is a special release of R80.30 with 3.10 kernel that can be used:

https://supportcenter.checkpoint.com/supportcenter/portal?eventSubmit_doGoviewsolutiondetails=&solut...

But this is not a widely adopted or recommended release. These special 3.10 kernel releases of R80.20 and R80.30 got released because some newer appliances and open severs required 3.10 kernel because of their CPU's so Check Point had to come up with R80.20 and R80.30 releases with 3.10 kernel as the 3.10 kernel for gateways got pushed to R80.40.

It's R80.40 that is the first regular release of GAiA that features 3.10 kernel for gateways. I would recommend you to consider going with R80.40, R81 or R81.10 instead of going with the limited R80.30 3.10 release.

Certifications: CCSA, CCSE, CCSM, CCSM ELITE, CCTA, CCTE, CCVS, CCME

Bob_Zimmerman · ‎2021-07-22

Oof. Yeah, I would not recommend running a special release. That means this swap will involve an upgrade at the same time as a member replacement. Fortunately, that's not a big deal, as member replacements are the recommended way to upgrade VSX firewalls anyway! Here are the rough steps:

Take a backup of your management server. For a SmartCenter, 'migrate export' works. For an MDS, mds_backup.
On the management, use 'vsx_util upgrade' to update the objects on the management server to the new version. This also builds the policy for the new version, which usually takes a few minutes per context. This doesn't push the policies, it just compiles them for the reconfigure. You won't be able to push after this until you upgrade at least one member.
Set up the replacement member 1. I would go with R80.40, as any version which can manage R80.30 can also manage R80.40 (though it might require a jumbo on the management). Go through the first-time setup (either web UI or via config_system), update CPUSE, pick a jumbo and install it.
Shut down the old member 1 and plug the cables into the new member 1.
Use 'vsx_util reconfigure' on the management to establish SIC with the new member 1 and have it take over the object. This will push all the interface definitions and static routes down to the new member 1.
Apply any dynamic routing configuration to the VSs.
Here, you can do whatever kind of failover you want. MVC allows R80.30 and R80.40 to sync. You can also just tell people that ongoing traffic won't survive the upgrade, kill the old member 2, and the new member 1 will take over.
Repeat steps 3, 4, 5, and 6 for member 2.

For an upgrade, it's recommended to combine steps 3 and 4. You shut down the old member 1, then reinstall the OS from scratch and treat it like a whole new member. It's basically the same process (minus the 'vsx_util upgrade') to replace a failed member, too.

Mark_Tremblay · ‎2021-07-22

Thanks guys for all the info. I've got to rap my head around all of these steps and put together a plan.

CheckPointerXL · ‎2023-12-14

Hey bob, any news with newer r81.xx version for an hardware upgrade?

Suggested plan is still replace standby member, vsx util reconfigure + clish configs + cpstop and so on with missing member? Do you suggest to combine hardware+software upgrade or perform it separately?

Bob_Zimmerman · ‎2023-12-15

I always prefer to separate software upgrades from hardware swaps, but a software upgrade on VSX is a special case. It's generally best on VSX to completely rebuild the cluster members as you "upgrade" them. As long as VSX is only aware of interfaces which can exist on both the old hardware and the new hardware, you should be fine. I still prefer to separate them when easy, but it's a more mild preference. For non-VSX firewalls, the preference to separate software upgrades from hardware swaps is much stronger because a lot more can go wrong.

I'm pretty aggressive on software upgrades, in part so I don't need to combine a software upgrade and hardware swap. I'm running R81.20 on about 15 clusters, and R81.10 on about 50. I expect to have everything on R81.20 by the end of February. By staying very current on all of my firewalls, a replacement box is likely to be able to run exactly the same software as an older box I'm replacing.

Keep this set of management debugs handy. They let you make changes to the VSX object and to VSs when the VSX members are unreachable. For example, if you rebuild one member and discover you can't use 'vsx_util reconfigure' to reprovision it because the management expects an interface the hardware doesn't have, those debugs can let you fix the management's expectations. After you make changes in this way, you ABSOLUTELY MUST reprovision the firewalls, but if you're upgrading, that's the next step anyway.

CheckPointerXL · ‎2023-12-16

Thank you very much Bob

So assuming all interfaces are bond and no software upgrade, it is a good practice for a hardware upgrade/replacement to follow instruction on sk101515? In my opinion it should work...

Magnus-Holmberg · ‎2023-12-16

in the admin guides its states

"
Note - All VSX Cluster Members must use the same type of platform, with the same specifications and configuration.
"

When i have done hardware upgrades i have installed it in paralell, used vsx provisioning to provision the same VS.
Then just shutdown the traffic interface to old cluster and open up the VLAN:s to the new cluster.
Yes there is a short downtime but rollbacks / tshoot / preverifications etc all is much easier.

Some downsides of doing it this way.
- Licenses within DMN need to support multiple VS.
- Namnes cant be the same
- VPN communities need to be moved within smartconsole.
- <30second downtime.

https://www.youtube.com/c/MagnusHolmberg-NetSec

CheckPointerXL · ‎2023-12-16

Hey Magnus, thanks for your feedback

I think that what admin guide says it's the same recommendation for classic gaia cluster. In real, having another machine with same or higher corexl it should minimize downtime near to zero during replacement-upgrade.

What you usually do it seems to require too much effort for my tastes.

Maybe simply add a third member to the cluster it's a quicker solution? Of course you had to accept a new phisical name and mgmt ip, because of you can't heritate from the production ones

Are you a member of CheckMates?

VSX cluster Hardware Upgrade