- CheckMates
- :
- Products
- :
- General Topics
- :
- Re: Zero Downtime Upgrade - R80.10 - R80.40
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Are you a member of CheckMates?
×- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Zero Downtime Upgrade - R80.10 - R80.40
I will be following the following sk to upgrade my VSX Cluster from R80.10 to R80.40.
My confusion is on the following part..
In the Install Policy window:
In the Policy field, select the default policy for this VSX Cluster object.
This policy is called:
<Name of VSX Cluster object>_VSX
Now, i have a VS also which carries the traffic of my envrionment..so in this step do i need to install only Cluster Policy or Cluster + VS Policy as well ?
Also, in the following part :
Stop all Check Point services:
cpstop |
| Notes:
|
is this like a normal failover where on switching the members it cause a few timeouts and traffic is shifted to the new member..so ideally traffic should be normal after a few timeouts ?
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You are right - document only refers to VS0 policy. If I'm honest, I always install all VSes just to be sure. Takes extra time but I think it's worth it. 🙂
As for failover to do damage control you can set to allow out of state connections before upgrade and revert back to normal after upgrade. This way if any of TCP connections isn't synchronised but is still ongoing, it will get accepted and there will be no need to restart that TCP connection (i.e. long running jobs like backups)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
For the policy question, it depends. 'vsx_util upgrade' changes the version of the VSX cluster object, all the physical member objects, all of the hidden VS member objects, and all of the VS cluster objects. You should install policy with the new version before failing traffic to a member (physical or VS) running the new version. If you're doing the VSLS trick, you only need to install the VS0 policy to get it updated, then you can install the individual VS policies as you are ready to fail them over.
As for the second part, a Zero Downtime Upgrade is not a normal failover. R80.10 can't sync the connection table with R80.40. Think of it as rebooting the firewall, but it comes back up instantly rather than needing to wait for POST, wait for OS startup, wait for service startup, and so on. If somebody is downloading a 100 GB file, and you do the Zero Downtime Upgrade when they have 99 GB, that connection will not survive the failover. They will have to start the download over again (fortunately, most applications have ways to recover from interrupted connections now, but some still don't).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Good point there...I dont think same process is applicable on vsx as regular fw cluster. You might wish to check with TAC.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The "In the Install Policy window" part occurs multiple times in the process. Which one are you concerned about? It's specifically talking about the VS0 policy, which normally governs management access to the cluster members themselves. This policy is installed as a part of vsx_util reconfigure, but pushing after that is a good idea to get the policy rebuilt for the new version.
The failover when you stop services on member 1 would be a stateless failover. All ongoing connections will be lost, and new connections should work immediately. There is no time at which a new connection cannot be formed, thus zero downtime. If you want your upgrade to be more like a normal failover (to preserve long-running connections), you should look at the Multi-Version Cluster Upgrade.
Normal failovers shouldn't involve a few timeouts. The last several upgrades I have installed, nobody outside my team even noticed the change.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I would definitely go with MVC upgrade. Plus if you are running VSLS cluster mode as opposed to HA, you can fail over one VS at a time thus having better control over upgrade
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
i looked at the MVC upgrade but the connections i have are static NAT based..and it is a limitation in MVC ...hence going with Zero Downtime ..i wouldn't mind a few drops in connections as long as it gets restored in a min or two
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I think you misunderstood the limitation. It only applies to failovers from R80.40 back to an earlier version, which should only happen if the upgrade breaks things anyway. It also only applies if you are using VMAC mode.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
ok..got it
One more thing here which is putting me off..
at the bottom of this link a note states..
When Cluster Members of different versions are on the same network, Cluster Members of the new (upgraded) version remain in the state Ready, and Cluster Members of the previous version remain in state Active Attention.
Cluster Members in the state Ready do not process traffic and do not synchronize with other Cluster Members.
this is the condition before switching on MVC and will change once MVC is switched on.. is this correct ?
wudnt this condition auto correct once MVC is enabled.. isnt this always the condition during MVC upgrade that an upgraded member will always be in "Ready" state at first...But why then the next steps might be required like removing physical interfaces , shutdown interfaces etc..
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
yes thats correct..it specifically say install cluster object policy.. my confusion is..after upgrading secondary member i need to force a failover..in that case the upgraded member should have VS policy as well so as to cater the running VS traffic.. but in the sk it says install cluster object policy ..hence my confusion that only cluster policy is to be installed or cluster and VS both.
Also,normally during failover testing users didnt even noticed that something went wrong or changed..i just wanted to confirm that this is going to be same in this case..do you mean to say during the upgrade when the member is switched it takes more time to build connections as compared to failover scenario ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You are right - document only refers to VS0 policy. If I'm honest, I always install all VSes just to be sure. Takes extra time but I think it's worth it. 🙂
As for failover to do damage control you can set to allow out of state connections before upgrade and revert back to normal after upgrade. This way if any of TCP connections isn't synchronised but is still ongoing, it will get accepted and there will be no need to restart that TCP connection (i.e. long running jobs like backups)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks... this looks helpful
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
For the policy question, it depends. 'vsx_util upgrade' changes the version of the VSX cluster object, all the physical member objects, all of the hidden VS member objects, and all of the VS cluster objects. You should install policy with the new version before failing traffic to a member (physical or VS) running the new version. If you're doing the VSLS trick, you only need to install the VS0 policy to get it updated, then you can install the individual VS policies as you are ready to fail them over.
As for the second part, a Zero Downtime Upgrade is not a normal failover. R80.10 can't sync the connection table with R80.40. Think of it as rebooting the firewall, but it comes back up instantly rather than needing to wait for POST, wait for OS startup, wait for service startup, and so on. If somebody is downloading a 100 GB file, and you do the Zero Downtime Upgrade when they have 99 GB, that connection will not survive the failover. They will have to start the download over again (fortunately, most applications have ways to recover from interrupted connections now, but some still don't).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you.. this clears my confusion
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
That's what I meant by allowing out-of-state connections - then 99G will continue
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yeah, but that has some other concerns. Most notably, it's a global property, so it applies to all firewalls in the environment. Very few people run just one VSX cluster by itself in a management, so this setting might get pushed to other firewalls completely unrelated to the upgrade.
Also, I don't think it adds ongoing connections to the connections table, it just doesn't drop them. This would deal with some long-running connections like the download or backup which eventually end, but some systems like ATMs often keep the same connection open for over a year with very little data. When you eventually switch the setting off, I think any connections like that will be dropped when you push policy.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes that's totally correct so you always need to evaluate against specific environment
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
During upgrade from R80.10 to R80.30 i have experienced a few active / active scenarios that was less fun.
So i try to be extra careful for those and actually turn of the production nics to the VSX node (in the switch) to just make sure that everything works correct. and just keep the sync and vs0 open.
Am not 100% sure what the reason where anymore. but we experience on 3 clusters upgrades and after that we just said "F*** it lets make it bulletproof"
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
it's never 100% guaranteed, i saw weird state even with only Mgmt and Sync connected during one of the latest rollbacks.. 🙂
Plus we are talking R80.40 and it's totally different beast to R80.30 hehe
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
hehe, i have no production vsx on r80.40 yet 🙂
But am suspecting its similar upgrade as am running the r80.30 3.10 kernel.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Magnus-Holmberg @Kaspars_Zibarts your conversation is making me nervous.. 😄
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
hey Magnus.. big fan of your youtube content .. good to hear from you..😀
Active/Active scenario can occur once the members are switched ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I dont fully remember the senario, but i do believe it was after we have made the failover with VSLS and 1 member was on R80.10 (possible 32bit vs) and then second member came up with R80.30 (without any HFA) and then 64bit for the VS.
And the members simply didn´t see each other anymore so both went active instead of being Active / ready.
We didn´t spend much time troubleshooting as it was in the middle of the night, so instead when doing those jumps we killed all the interfaces except vs0 and sync so even if it would go active it would not take any traffic.
Having said that we recently made some testing upgrading r80.10 VSLS to R80.30 with CDT and it worked perfectly.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
"At this moment, all connections that were initiated through the old VSX Cluster Member M1 are dropped (because VSX Cluster Members with different software versions cannot synchronize)"
Does this also apply to Jumbo Hotfixes?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
No, when you install JHF, the members are on the same main version, and they continue sync without any additional effort.
