- CheckMates
- :
- Products
- :
- Quantum
- :
- Security Gateways
- :
- No Downtime (Zero Downtime) hardware refresh
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Are you a member of CheckMates?
×- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
No Downtime (Zero Downtime) hardware refresh
Hi Mates,
I need an advice from you, experts!
One of my customers is going to upgrade some 5000 something gateways in a classic HA A-P cluster to some brand new 9400 gateways.
How in the world can I do this with zero downtime without messing with SND cores (as there's no way to revert back to 20/24 cores (or how many 9400 has) without , again, downtime!vI mean I could just change the number of SND cores to one 9400, join it in cluster (5x00 + 9400 with lowered cores) and it will be just fine, but errr ... my brain is in a boot loop and I can't figure it out!
The only way I see it is to remove standby member of actual 5000 cluster, add the new 9400 gateway and try to be flash fast to disable clusterXL on 5000 when 9400 becomes active.
Any ideas ? (will be highly appreciated).
Thanks
- Labels:
-
ClusterXL
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sadly, I doubt anyone can guarantee them they would not lose a single packet. Last time I did this, no packets were lots, though I always see one time out when we run constant ping.
Andy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I would follow below process. I had done it many times and no issues,
Best,
Andy
https://community.checkpoint.com/t5/Security-Gateways/Replace-Upgrade-Cluster/m-p/157228#M27268
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hey Andy.
I know about that but I was thinking about something like "mvc" but for hardware. Beliveit or not, but the customer doesn't want a single packet or session to be lost 😞 Difficult one but it is what it is.
I already did this once with messing up the SND cores but it was a cluster with 7000 gateways doing about 2Mbps with "peak" at 8Mbps :))) I could afford to have 2 SND cores for everything.
This one is different though .. 5400's CPUs are screaming so I cannot mess with 9400 SND.
I think I will let them know that there will a little outage and that's it. Move traffic to the other site and do the hardware upgrade.
Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sadly, I doubt anyone can guarantee them they would not lose a single packet. Last time I did this, no packets were lots, though I always see one time out when we run constant ping.
Andy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I agree with Andy, we promise always 99,999% only.
Akos
\m/_(>_<)_\m/
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I think there is a saying in North America (well maybe more specifically USA, not sure here in Canada), but I think it says "Only 2 things in life are guaranteed...taxes and death". Though, thats true no matter where in the world you go lol
Andy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Haha! That's a really good one!
Indeed, I've messed up a whole cluster in the middle of the day with a simple accelerated policy installation. Both members rebooted (kernel panic) at the same time! So .. nothing is guaranteed (beside what you've already indicated 🙂 )
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What was the version?
\m/_(>_<)_\m/
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
some R81 (not R81.x0 .. just R81) .. ancient times 🙂 First time when Accelerated Policy was implemented.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As another saying goes "No point crying over spilled milk" as in to say all we can do is learn from our mistake and not repeat it again.
Thats it 🙂
Andy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Lets see...I messed up in the past with Fortinet, Palo Alto, Cisco, Check Point, haha. If life was perfect, none of us would have these jobs lol
Andy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It is wise to not guarantee zero downtime for such a swap.
For awareness the devices also operate with different SecureXL modes by default.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Agree 100% 🙂
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What is current version on 5000 cluster ?
If 9400 appliance has more cores than 5000 one, it should be better way to go. If naming of all configured interfaces will match between old and new member, then you should be able to disconnect old standby member from cluster (cpstop and/or shut all ports), connect new 9400 member, reset SIC, push policy and should go into standby state.
You can also disable checking out-of-state packets in Global Properties during the initial first failover.
Best option is to have new 9400 member configured in advance while using new cablings and do not play with cables during the change window. You will simple have new 9400 member cabled, but ports on switch (or fw) should be disabled/enabled depending where you want to work (old vs new member). During the replacement change itself you will just shut all ports on old member, enable on new member and thats it.
Jozko Mrkvicka
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Since I literally keep all my emails and notes from ages ago, I checked one case back with a client in R76 days and they asked the TAC this same question...how to ensure they would not lose a single packet. Answer from TAC was that there was no one in Check Point that could give them guarantee for something like that.
Im 100% positive that even if you opened a case now days and ask them this, they would most likely tell you the same.
Best,
Andy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I see there are lots of opinions already expressed here.
I just want to add a note from my personal experience. Even if you are confident you can perform an upgrade or HW migration with only minimal downtime, announce an extended service window interruption beforehand. Unexpected happens, even to the best of us.
It is always better to tell the business there will be a service interruption and manage the procedure without it than hope for the best and miss it because of a random contingency.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hey Val
Of course! My usual window for this kind of stuff is 30 minutes and I like to tell my customers that even if I know everything will go smooth they still have to be aware that a full outage may occur in this time frame.
I did lots of replacements but this is the second time when I am asked to have "no downtime". It worked once 🙂 messing with SND cores but now it's not possible due to high traffic passing the gateways.
So in the end customer has to be aware that even a policy installation can go wrong!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I would say 30 mins is a bit too short, maybe at least 60, or even 90 mins if possible.
Andy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I agree with Andy, and don't forget the revert process, and its time consumption.
If you stuck somewhere in the process (15min)-> you start to debug (30min) -> no success -> decision point (10min)- > decide by revert -> the revert process (30 min)
Akos
\m/_(>_<)_\m/
