Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Luis_Miguel_Mig
Advisor

Gateway Cluster Hardware Upgrade

I am upgrading the hardware of a cluster made of two open server gateways. The manager has a license of 10 gateways and it manages 10 gateways.

Is it possible to have a cluster made up of two gateways with different hardware?

So what process would you recommend for the migration?
I was thinking three options:


1) shutdown one of the old gateways and connect one of the new gateways with the configuration of the old gateway, establish SIC, push policies and failover. Finally repeat the process for the second old gateway.

Is this possible as we will have a cluster made up of gateways with different hardware?

2) Add the 2 new gateways with new ip address (the cluster will be made up of 4 gateways at this stage) , failover to them and shutdown the old gateways.

Is it possible, as we will have 12 gateways and we have a license only for 10 gateways.

3) shutdown the old gateways. Connect the new gateways, establish the SIC, push the policies. 
This is the less preferred procedure as it will require outage.

29 Replies
Vladimir
Champion
Champion

You can ask CP for temporary licenses that will allow you to manage more gateways. I am sure they will accommodate. 

0 Kudos
Kaspars_Zibarts
Employee Employee
Employee

I would go with option 1. Depending on the release you are running you actually can achieve seamless upgrade. Check for "ClusterXL upgrade methods and paths" sk107042. 

HW wise (assuming you are on fairly recent SW release like R77.30 or R80.10) - it all depends on if you use CoreXL. If you do and are changing number of FWK instances then connection sync is not possible. Also you have to take care of interface naming (if you use open servers, you can keep the same interface names making life easier)

The easiest is to keep the same SW level with new members but it is also possible to upgrade to newer version during the process fairly seamlessly with latest SW releases

We have gone from appliances to chassis, VSX gateway (downgrading from R77.30 > R76) with a single ping packet loss

Too many questions to give you exact answer Smiley Happy but it's not complicated

Good luck!

0 Kudos
Kaspars_Zibarts
Employee Employee
Employee

One comment that I forgot - you can allow "out of state" connections for cutover time - this way you can minimise the outage. Once you have built the second member and just before pushing policy set to allow OOS connections. Then the failover to the new member will be less noticeable. Then re-instate it by pushing policy again once running on the new firewall.

0 Kudos
Kaspars_Zibarts
Employee Employee
Employee

Here are the steps, not in absolute detail but gives enough to tweak them to your requirements. They have been tested these on four different clusters HW+SW upgrades with one ping loss.

Assumption is that interface names do not change and you use the same physical cables to the switches! You will need to take extra steps to amend those if they do. I have excluded obvious steps like backups

Preparation:

  1. Pre-build both new firewalls with exact OS configuration as old (routes, interfaces, users, backups, DNS. NTP etc)

Start of the upgrade:

  1. Disable stateful inspection in global properties to allow “out of state” connections during cutover. Connections cannot be synchronized if CoreXL is changing
  2. Set “maintain current active member” on ClusterXL tab cluster object if set otherwise
  3. Un-tick the box to push policy to both members (allow only one to succeed) – this is needed when we change SW version on the member
  4. Push policy to both existing firewalls

FW2 upgrade:

  1. cpstop OLDFW2 (do not shutdown as it's easier to roll back with cpstart)
  2. connect cables from OLDFW2 to NEWFW2
  3. establish SIC to NEWFW2
  4. Change SW version in the cluster object
  5. Push policy, it only should succeed to NEWFW2 (old has different SW version)
  6. Do you final checks throughput / connections / ping through FW etc of your choice (we run scripts to collect that)
  7. cphaprob stat state should be Ready on the NEWFW2
  8. Failover to the new firewall by cpstop on OLDFW1
  9. Check that NEWFW2 becomes active cphaprob stat
  10. Do your testing now

FW1 upgrade:

  1. Connect all cables from OLDFW1 (that's in cpstop state) firewall to NEWFW1
  2. Establish SIC to the NEWFW1
  3. Push policy and make sure it works on both cluster member now
  4. cphaprob stat state should become Standby on the new firewall NEWFW1
  5. Failover to the new firewall by clusterXL_admin down on NEWFW2
  6. Check that NEWFW1 becomes active cphaprob stat
  7. Do your checks again
  8. Re-enable ClusterXL on NEWFW2 by clusterXL_admin up

Finalise:

  1. Enable stateful inspection again in global properties (turn off allowing out of state)
  2. Reset cluster object ClusterXL active member to the original setting
  3. Set to push policy to both cluster members
  4. Push policy
  5. Check and update licenses in SmartUpdate
  6. Check sim affinity for SecureXL

And that's it - go and enjoy your beer! Smiley Happy

ROLLBACK
Connect all cables back to old firewalls.
Connect with SSH and run cpstart on both
Enable stateful inspection again in global properties.
Reset cluster object ClusterXL active member to the original setting
Set to push policy to both cluster members
Check and update appliance version in GUI
Push policy

Luis_Miguel_Mig
Advisor

Thanks Kaspars, I hadn't thought of the OOS. Good idea.

I was thinking about your vsx implementation. I was how would you design the network interfaces. In a checkpoint cluster you typically  have three separate type of interfaces: cluster interfaces, non-monitored private interfaces and sync interfaces.

Did you keep these interfaces separated in a VSX invironment where you have your VSX gateway in two separated physical boxes? I guess that in a VSX environment it is still a good idea to have separated physical interfaces for syn, cluster and non-monitor (mgmt) if it is possible

0 Kudos
Kaspars_Zibarts
Employee Employee
Employee

Hey Luis, I'm not entirely sure if I understood your question about VSX. Typically I have slightly different approach with VSX HW+SW upgrades:

LAB part

  1. Change VSLS to run all VS active on one box vsx_util vsls, set VSX2 with higher priority (it's needed later so box does not failover back to VSX1)
  2. I would have MDS (management) replica in lab environment - do datafreeze in production and restore production MDS data in the lab
  3. Pre-build new VSX gateways with physical interfaces and other OS settings as required
  4. Upgrade / change VSX object version using vsx_util upgrade if you are changing gateway SW version
  5. Change interface names using vsx_util change_interfaces on MDS if required
  6. Push out VSX config using vsx_util reconfigure
  7. Verify licenses
  8. Change CoreXL if required
  9. Depending on your VSX environment set to allow OOS in all policies if CoreXL has changed
  10. Now your two new boxes are fully pre-configured!
  11. Create MDS backup in the lab
  12. Rack new VSX gateways in production racks and power on (no cables)

PROD part

    1. Restore MDS backup from lab to prod (at this point you will lose control over your VSX cluster)
    2. cpstop VSX2, and move all cables to NEWVSX2
    3. Test SIC (should b working) and make sure all VSes are trusted. NEWVSX2 will be in READY state
    4. Do hard cutover by cpstop VSX1
    5. Connections should work as you have OOS allowed
    6. Do your checks on NEWVSX2
    7. Now move cables to NEWVSX1
    8. Test SIC (should b working) and make sure all VSes are trusted. NEWVSX1 should be in STANDBY state
    9. Do hard cutover by cpstop NEWVSX2
    10. Do your checks on NEWVSX1
    11. Re-enable NEWVSX2
    12. Check licenses
    13. Check logs
    14. Set VSX1 to have higher priority if needed
    15. Turn off OSS

ROLLBACK

  1. Cpstart and plug back old firewalls
  2. MDS restore prod MDS

That's to give you an idea of approach that I have been using for years now. I mean you will need a lot of small tweaks to handle your environment.

Again you can always PM me Smiley Happy

Merry Xmas!

0 Kudos
Luis_Miguel_Mig
Advisor

It is about the network interfaces. With a physical appliance you would typically have dedicated interfaces for mgmt, for sync and then cluster interfaces with multiple vlans for data. I was wondering if you keep that design  with dedicated interfaces or you end up with sync, mgmt and data on the same trunk in a VSX environment.

0 Kudos
Luis_Miguel_Mig
Advisor

Hi Kaspars, to sens you a PM I need you to follow me.

0 Kudos
GDell_CP
Participant

Hi Kaspars,

 

We don't have a LAB environment and we are planning on upgrading our VSX cluster from 77.20 to R77.30. However, we also need to replace the CP 4200 GW appliances with 4800's. We have bonding (bond0) configured with interfaces that have different names between 4200 and 4800. We are running MDS R80.10 and I was thinking:

 

  1. In production, move all VS's to  OLDGW2 using vsx_util vsls
  2. In R80.10 SmartConsole, add a new Bond (bond1)
  3. From MDS, run vsx_util change_interfaces and select to " 2. Apply changes to the management database only"
  4. Select to replace bond0 with bond1
  5. setup the new 4800 GW's with R77.30 with bond1 including the new interfaces in the bond.
  6.  Upgrade VSX Cluster via vsx_util upgrade in MDS to R77.30
  7. disconnect OLDGW1 and connect NEWGW1 with same Mgmt, sync IP and the new bond1 interfaces
  8. Run vsx_util reconfigure and select to reconfigure OLDGW1 (but NEWGW1 is physically connected)
  9. Disable OOS, push cluster policy to one gateway. 
  10. Perform a hard cutover, check traffic
  11. Disconnect OLDGW2 and replace with NEWGW2 with R77.30 
  12. Perform a vsx_util reconfigure on OLDGW2 (but NEWGW2 is physically connected)
  13. Push policy on both and perform vsls to distribute the VS's to different firewalls.

My main concern is the bond that needs to be replaced with another bond that has different interface names between the two appliances.

 

George

0 Kudos
Kaspars_Zibarts
Employee Employee
Employee

Actually it's easier than that if the only interfaces in use are Mgmt, Sync and bond0 (being the only production interface)

You don't need to rename it as part of upgrade as management server does not care what interface names forms the bond. For example:

OLDGW: bond0 = eth1+eth2

NEWGW: bond0 = eth1-01 + eth1-02

you don't need need to worry about eth1 > eth1-01 and eth2 > eth1-02 change as that's invisible to VSX object, it only "sees" bond0 Smiley Happy

Otherwise it should work! Good luck

0 Kudos
GDell_CP
Participant

Aaaah,

So just remove the interfaces that do not reflect the new GW and add those that are missing then do a vsx_util reconfigure.

Can I do cphacu start to move traffic over or just do a cpstop on the old GW and traffic should failover to the new GW?

George Dellota

0 Kudos
Kaspars_Zibarts
Employee Employee
Employee

Not too sure if I understood you correctly so might be easier if you sent a screenshot of your VSX object physical interfaces.

For example here

if interface names inside bond1 and bond2 changed on the new appliance, it would not matter. Don't need any special steps during upgrade (vsx_util change_interfaces)

But it would matter if eth2-0x interface names changed, then you would run vsx_util change_interfaces just as you described.

BTW, I haven't had time to dig into it, but for us we had to run vsx_util change_interfaces command on the same interface twice, despite the fact it said execution completed successfully first time round. I discovered it accidentally by searching for old interface name after I run command first time and found some references still present in the  DB. Running command second time actually "fixes" it. I wish I had an answer for it. It even says the second time that previous run has not fully completed - do you want to complete, answer yes.

Just remember to back up your mgmt before you start for easy rollback! Smiley Happy

0 Kudos
GDell_CP
Participant

Okay, I understand what you mean about the interfaces within bond0. So this is how our current 4200 appliance interfaces looks like:

When I replace the cluster with a 4800, I’ll configure each GW with bond0 and add eth1 and eth2.

I would have to do a vsx_util change_interfaces(twice) to delete eth1-02, eth1-03 and eth1-04 and add new interfaces eth4-eth7 and then do a vsx_util reconfigure to “sync” the MDS and GW settings.

I uncheck “Drop out of state TCP packets” in the global properties before the swap of OLDGW2

Do a hard cut on the running OLDGW2 and traffic (hopefully) fails over to NEWGW1 and redo the previous steps on NEWGW2.

Sounds about right?

Kaspars_Zibarts
Employee Employee
Employee

sounds about right! Smiley Happy 

0 Kudos
Rikus_van_Tond1
Explorer

Kaspars plan seems very solid. Just a quick question if you swap out a Check Point cluster with an open server, what happens with the interfaces? Your active member will stay original as its the old Check Point device, when you click get interface to get the new interface configuration from the Open server it will be different for your active/standby members. Is it fine to temporarily have different interfaces in smartdashboard?
0 Kudos
Rikus_van_Tond1
Explorer

Hi Opal,

 

If you replace a 12600 with a Check Point Open Server the interfaces names of the two appliances might be different for example the Check Point might use modules eth1-01 / eth3-02 etc.   What will happen mid way through when you install the new standby, move cables from existing cluter eth1-01 to eth3(openserver), establish SIC.   When you now go into topology table and "get interfaces" will it fetch the new eth3?    This will also now mismatched to the active member.   

0 Kudos
PhoneBoy
Admin
Admin

Clustering is only supported with identical hardware.

You should be able to get a temporary license from either UserCenter or your Check Point SE to support the management of additional gateways.

0 Kudos
Luis_Miguel_Mig
Advisor

Thanks Dameon,

absolutely it makes sense to support clustering only with identical hardware.  But what about when the open server require a hardware upgrade? I guess that checkpoint supports or it should support at least one procedure, right? Is there any other procedure that checkpoint would recommend?

1) may not be ideal but I haven't being able to come up with anything better. I think that 1)  may be better in terms of minimizing the service outage and also providing a easy/quick rollback if required.

0 Kudos
PhoneBoy
Admin
Admin

Sync won't work (or could potentially have unexpected behavior) unless the CPUs in the different systems are identical.

Assuming they are different, then the only way to swap things out with minimal disruption is to temporary disable the "Drop Out of State" options before the gateways are physically swapped.

You would disable this before swapping and leave it set for maybe 24 hours afterwords to allow long-standing connections to re-establish.

Similar to the following thread on CPUG: Zero downtime upgrade? 

Note: This setting is not recommended long-term as this reduces the overall security of your gateways.

For TCP/ICMP, they are set in the Global Properties as shown below.

For UDP, refer to the following SK (note it's an "Expert" level SK, so you may not have access): How to configure the Security Gateway to drop Out of State UDP packets 

 

Takumi_Tsumura
Contributor

Hi, Daemon

R80.10 is not described in Version of the SK.

Is this solution available in R80.10, too?

Takumi,

0 Kudos
PhoneBoy
Admin
Admin

It should work the same in R80.10

0 Kudos
Takumi_Tsumura
Contributor

Thank you.

I am glad if you can add it to this sk.

0 Kudos
PhoneBoy
Admin
Admin

You're welcome to leave feedback on the SK to this effect.

I did spot check this particular Global Property is available via guidbedit in R80.10 (fw_drop_out_of_state_udp).

Takumi_Tsumura
Contributor

Thank you.

I will try it.

Regards,

0 Kudos
Ricardo_Sichera
Explorer

Hello,

Resurrecting this thread to ask a question:

Can you say if zero downtime upgrade would be possible for R80.30 13500 to 15400 cluster appliance only migration.  Both seem to show 16 cores when using cpview. 

Also referencing the solution provided in thread below I ask if you could clarify the procedure for minimal downtime.

https://community.checkpoint.com/t5/Enterprise-Appliances-and-Gaia/Migrating-cluster-from-old-to-new...

 

Thanks.

RS

0 Kudos
CheckPointerXL
Advisor

Hello Phoneboy,

is this still valid?

i mean, hardware replacement between devices with same CPUs will handle the failover from old to new hardware with no disruption like normal Cluster Failover? other question, "same CPU" you mean number of core or number of CoreXL/SecureXL ?

thanks

 
0 Kudos
PhoneBoy
Admin
Admin

ClusterXL is only supported with identical hardware for all customer members.
It may work in situations where the hardware is "close enough" (same core count and snd/worker config).
This is not guaranteed and hasn't changed.

The alternate method I described changing the "out of state" configuration is not foolproof...and comes with a security risk.

AlekseiShelepov
Advisor

As I understand, you will migrate to a new open server (not Check Point appliance). In this case you need to make sure that you have the same amount of enabled cores (SND, fw_worker) on both servers, preferably the same software version with the same hotfixes. A cluster like this should work fine, based on my experience.

Personally I would choose the first plan. Install policy, check the cluster status, check sessions, etc. If everything is fine, then failover. If synchronization was not ok and sessions are lost, it still would be faster than plan no 3 Smiley Happy

But as Vladimir mentioned, evaluation licenses can help for the second plan.

0 Kudos
Luis_Miguel_Mig
Advisor

Thanks Aleksei, good to know that you have tested a similar process/environment succesfully

0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events