- Products
- Learn
- Local User Groups
- Partners
- More
Policy Insights and Policy Auditor in Action
19 November @ 5pm CET / 11am ET
Access Control and Threat Prevention Best Practices
Watch HereOverlap in Security Validation
Help us to understand your needs better
CheckMates Go:
Maestro Madness
Hi all,
OS: Gaia R81.20
Environment: Maestro + VSX
We have two management servers running as an active-passive HA, and both are VMs, running on a vCenter.
The vCenter servers are in two physically separate locations, one is a DataCenter, and one, a DR.
In case of failover to the DR, the entire vCenter will be available there including the Checkpoint Management server,
as it's being replicated all the time in a hot backup.
Since there's plenty redundancy through the vCenter, Is there any point in having also a secondary Management server in this case?
Or did I miss something...
Thanks in advance!
It entirely depends on the types of failures that you are attempting to guard against and what interdependencies / risks you choose to accept.
To me personally, but again its just my honest opinion, I would never bother with mgmt HA in such scenario, because if there is constant replication on vCentre side, you dont really have a need for another server.
Just my 2 cents.
Andy
If I understand the question correctly, are you asking whether a second management server is required for each data center?
Then my answer is: No.
We also only have one MDS per data center.
One thing to keep in mind is the CRL check. Default is 24 hours. This is for VPN tunnels only from Check Point towards other CP's on the same mgmt! If mgmt is down to long firewalls cannot do CRL check. (CRL check can be disabled, not secure).
HA mgmt could be handy, if you have frequent changes on the system. If system is allowed to be down couple hours I would not invest in HA mgmt.
It entirely depends on the types of failures that you are attempting to guard against and what interdependencies / risks you choose to accept.
Thank you Chris!
1. Could I please trouble you to elaborate about why with VSX there's a greater importance for redundancy than in other environments?
2. We're using SRM, not vMotion, so I don't think that should be a problem.
3. There are a couple other teams such as DevOps and SysOps who will be involved in a Disaster Recovery
4. The machines are all in the same dedicated subnet.
Thanks again!
Many types of changes on VSX systems must be done from the management server and pushed down to the firewall. This includes most changes to logical interfaces (building a new one, removing one, changing the VLAN, changing the IP or mask, etc.), and all changes to static routing. If your disaster involves things being unable to reach each other, fixing it could require interface or routing changes.
I've personally seen way too many situations where the VM environment broke catastrophically, and we needed to make extensive firewall changes to fix it. As a result, I don't trust stuff all under one hypervisor management system, including vCenter. Since admins have to go through my firewalls to get to the hypervisor management system, I run management HA in VMs managed by two totally separate hypervisor managements, and I personally consider anything less an existential threat to the environment.
To me personally, but again its just my honest opinion, I would never bother with mgmt HA in such scenario, because if there is constant replication on vCentre side, you dont really have a need for another server.
Just my 2 cents.
Andy
Out of curiosity: What is MVP ?
I know in sports it stands for most valuable player, but I believe in community context it means most valuable professional...I THINK : - )
Andy
Thank you!
Just wanted to see if I'm missing anything...
If I understand the question correctly, are you asking whether a second management server is required for each data center?
Then my answer is: No.
We also only have one MDS per data center.
Not quite..
It's one domain. One Production datacenter, and one Backup datacenter.
There's full and constant replication between both sites.
The question is if we need one virtual MGMT server in each site, or is one at the production enough since is fully backed up at the DR site.
One thing to keep in mind is the CRL check. Default is 24 hours. This is for VPN tunnels only from Check Point towards other CP's on the same mgmt! If mgmt is down to long firewalls cannot do CRL check. (CRL check can be disabled, not secure).
HA mgmt could be handy, if you have frequent changes on the system. If system is allowed to be down couple hours I would not invest in HA mgmt.
This is very interesting, and indeed I did not think about it.
Won't all CRL info be replicated to the backup machine?
If there's a loss of connectivity between the tunnel peers for 24 hours then we're in a pretty bad shape as it is...
Agree with @Bob_Zimmerman. You need to review your RPO/RTO policies. How long is Site A "down" before you declare "DR Event"? Then when "DR Event" is declared, how long will you need to get basic networking services online? How much is currently offline, and how long before basic network services are online? During the state change are any firewall/VPN/routing changes required?
Plus, having the HA mgmt in Site B allows you to do general maintenance on Site A without worries, or do your dry(-ish) run exercises. Just because a CRL check is "every 24 hours", keep in mind that the last CRL check was not "24 hours ago from this moment in time." The last CRL check was (for example) 14 hours ago. You only have 10 hours remaining before that next CRL check! Don't go with "ok, we got 24 hours; good enough". This is a common fallacy. Nope, you're already 14 or 19 or 23 hours into that last retry.
All of these tiny details are always overlooked when people do "DR planning". I see it ALL. THE. TIME. No one ever understands what the "D" in "DR" is.... until it happens. You need to plan on this with the expectation that your management server has vanished and is unrecoverable. Thanos just snapped it out of existence. Now what are you going to do? Has your SAN or SRM been Thanos-snapped, too? You need to plan as if an asymmetric 50% of your infrastructure disappeared.
Tabletop exercises are great, and each time you do, you need to use different Choose Your Own Adventure paths. I absolutely positively would not rely on vCenter to be your DR plan for the things that are responsible for your network and perimeter OAM services. vCenter requires ESX, and SAN, and iSCSI connectors via fabric connectors (be it Ethernet or FibreChannel or whatever). If you have an entire vCenter/ESX/SAN stack in Site B, that's fine. Just don't plan to "move Site A to Site B during DR event" (had a customer try that... they underestimated).
You said you had a "hot backup" and that is FANTASTIC! 👏👏 I always recommend having OAM things be hot in Site B. Even if you don't need a policy change, you will have your logs! You will have your visibility, and when everyone comes screaming at you about "The Firewall", you have logs to prove "nope, it isn't me." Even better, you will have a jump host to SSH to your firewalls in Site B. You have the BGP routes to the ISP and local LANs. You'll have your backup local VPN user, too. You have access, you have your things at the ready. Everyone is coming to you for those logs or to troubleshoot routing, VPN, etc. But you can play it cool, because you had a hot management server at the ready. 8) Let the server team scramble and fall over themselves trying to figure out why they had a misconfigured VLAN on vCenter. 🫣
What a grwat explanation @Duane_Toler
For that matter, what even gets detected as an outage? I had one a while ago in which a SAN filled up and all the VMs acted like their drives had been pulled. They were still up and on the network! They could respond to ping. If you sent them a SYN, they would reply with a SYN-ACK. Even stuff in the RAMdisk image worked! Nothing else did, though.
Remote access to the environment depended on VMs which were stored on the SAN. The DR VPN boxes were pointed at the primary DC's authentication servers and wouldn't switch to its local authentication while they responded to ping. We had to get somebody physically into the datacenter to plug into some routers and add blackhole routes before any of the server or VM admins could log in at all to even start figuring out what was happening.
If the DR environment had been a full copy of production, it would have had a similar SAN which would have filled up at the same time (or very shortly after). Trying to bring up a copy of the management in the DR environment would have failed due to the nature of what had gone wrong.
Yes! Exactly! "What classifies as a DR event?". Lots of monitoring needed for all the little things, too. Sadly, we always miss something (sigh, "humans"). Indeed, what is DR-worthy and what is just an "inconvenience".
I can't quite recall the exact circumstances, (yes it involved "firewall" in some way) but I had a customer having some issue at one point and they started getting twitchy and asked me [just a vendor/consultant!] "do we need to declare DR?" Uh, as an "outsider" I never expected that I had to make that call on their behalf! I recall that we didn't do that, but it was certainly an experience. The issue got resolved and all was well. It certainly made me think that exact question... "what is indeed DR worthy?"
Here's an SK that gives more reasons to why you want management HA if you're doing DR-type things:
https://support.checkpoint.com/results/sk/sk100731
Gateway tries to fetch the CRL from the first Security Management server that responds. By default only the IP address of the primary Security Management server is written in that file.
CRL fetching fails because the gateway tries to fetch CRL from the primary Security Management server that is down.
Leaderboard
Epsum factorial non deposit quid pro quo hic escorol.
| User | Count |
|---|---|
| 23 | |
| 10 | |
| 8 | |
| 6 | |
| 4 | |
| 4 | |
| 3 | |
| 3 | |
| 3 | |
| 2 |
Thu 06 Nov 2025 @ 10:00 AM (CET)
CheckMates Live BeLux: Get to Know Veriti – What It Is, What It Does, and Why It MattersTue 11 Nov 2025 @ 10:00 AM (CET)
Your First Response: Immediate Actions for Cyber Incident Containment- EMEAThu 06 Nov 2025 @ 10:00 AM (CET)
CheckMates Live BeLux: Get to Know Veriti – What It Is, What It Does, and Why It MattersTue 11 Nov 2025 @ 10:00 AM (CET)
Your First Response: Immediate Actions for Cyber Incident Containment- EMEAThu 20 Nov 2025 @ 05:00 PM (CET)
Hacking LLM Applications: latest research and insights from our LLM pen testing projects - AMERThu 13 Nov 2025 @ 10:00 AM (CET)
Cloud Architect Series - Guarding Generative AI: Next-Gen Application Security with CloudGuard WAFTue 11 Nov 2025 @ 06:00 PM (COT)
San Pedro Sula: Risk Management al Horno: ERM, TEM & Pizza NightTue 11 Nov 2025 @ 06:00 PM (COT)
San Pedro Sula: Risk Management al Horno: ERM, TEM & Pizza NightAbout CheckMates
Learn Check Point
Advanced Learning
YOU DESERVE THE BEST SECURITY