Re: Maestro Dual Site Active/Active

HeikoAnkenbrand · ‎2025-10-26

Check Point is planning to operate the Maestro environment in Dual Site mode as Active-Active. Are there any further details available on this? In another forum post, Check Point published the following:

The following information is already known*:

Active/Active mode allows two geographically remote data centers to be protected behind a single Security Group.
Both sites can handle traffic simultaneously
Traffic is synchronized between both sites
Inter-site asymmetric traffic is supported due to inter-site correction
Based on UIPS addresses (Unique IP-address per site)
UIPS enables configuration of multiple addresses for each interface, with one address designated for each site. The UIPS configuration is set as an alias interface unique to all members within the same site.
Traffic is distributed between the sites based on dynamic routing and UIPS.
Each site has its DR Manager responsible for communicating with a third-party peer using its own UIPS.
Through this communication, the third-party peer constructs its routing table, enabling it to accurately forward traffic to the appropriate site.

Limitations
IPv6 is not supported
vsx_util reconfigure is not supported
Proxy arp is not supported
Anti-spoofing is supported only if defined by routes is used
Bridge mode (L2) is not supported
All limitations of ClusterXL Active/Active apply here except VSX, which is supported
Managing via an uplink is not supported
Since this is a new technology, all deployments must be done in coordination with Check Point R&D until further notice.

Considerations

Think if you really need active/active dual site or if you'd be good with two single sites. Two single site deployment would be simpler, but the connections are not synchronized.
The main benefit in my opinion with A/A dual site is the support for asymmetric connections.

Has Check Point already made an official statement regarding the following points?

Will this be introduced with R82.10, or possibly with a JHF under R82?
What additional information is available about Active/Active mode?
Will this also be supported with ElasticXL?
What inter-site correction traffic can be expected?

*reference

➜ CCSM Elite, CCME, CCTE ➜ www.checkpoint.tips

Mark_Gabert · ‎2025-10-26

We’re also using Maestro with version R81.20. Is it already possible to use the active/active mode?

When we purchased our Maestro environment, the active/active mode was announced. When will it be possible to switch to this mode, and will it be possible to make the change after an update without any downtime for the security group?

emmap · ‎2025-10-26

It's already considered a GA option in R82, it's just sort of tucked away. I don't know if there's a roadmap to fully make it public, that probably depends on how many people take it up. It's tucked away because it is a niche solution that we don't really feel is necessary for most architectures. Generally, when different datacentres are not sharing layer 2 address spaces and are being managed via diverse routing, they are also split at the application layer and hence there is not a requirement to sync state between them. We would not recommend adding complexity when it is not fully justified. If it's a multiple DCs sharing the same layer 2 spaces situation, A/A is not a fit because it's a layer 3 failover architecture (different IP addressing per site, dynamic routing to manage traffic paths).

As for inter-site correction, if a connection was established on site 1 and then the routing changes so that it's routed through site 2, its packets will correct back over to site 1 to continue processing that connection. Any new connections between the same IPs will process on site 2. The site_sync connections will be utilised for this correction, so it's a rare case where production data is carried over Sync.

HeikoAnkenbrand · ‎2025-10-27

@emmap , thank you very much for the information.

Many of our customers operate their data centers across two separate locations, often 40–50 km apart. Since 10G, 25G, or 100G Layer 2 connections are quite expensive, a single-site installation with long-range transceivers is usually very costly and not economically viable. Therefore, we typically have to use the dual-site option, which, however, comes with the limitation of active/standby operation. A dual-site active/active solution would therefore be a very good and interesting option in this case.

If it is already considered GA in R82, is there a possibility to test it together with you, the local Check Point SEs, or your Maestro team, or to obtain more detailed information?

➜ CCSM Elite, CCME, CCTE ➜ www.checkpoint.tips

emmap · ‎2025-10-27

If you contact your local sales team they will be able to assist with more information.

Again though, it has to fit the architecture. You won't be spanning a VIP across two sites, each site will have its own IP addressing and routing. So it's not conceptually similar to a stretched single site setup.

HeikoAnkenbrand · ‎2025-10-27

Hi @emmap,

If I understand correctly, each site will have its own IP address scheme.
Within a security group, each site will have its own IP address on each interface of the security group.
Traffic between site 1 and site 2 will be routed through a dedicated internal network.
Incorrectly sent packets will be forwarded via a correction layer.

For the forwarding of IPs between site 1 and site 2, dynamic routing will likely be used.

Can it be visualized schematically like in this picture?

Is that correct?

➜ CCSM Elite, CCME, CCTE ➜ www.checkpoint.tips

emmap · ‎2025-10-27

The dynamic routing would be up to and across the surrounding network devices. Between the MHOs directly we just have site sync, like any dual site deployment. The two sites don't form routing adjacencies directly with each other.

It's basically like two separate single site deployments, only with state sync and correction.

HeikoAnkenbrand · ‎2025-10-27

Ok, then it would look more like this. The dynamic routing takes place between the different IP pools on both sides.

➜ CCSM Elite, CCME, CCTE ➜ www.checkpoint.tips

emmap · ‎2025-10-27

Yes that's more a more accurate picture for it.

HeikoAnkenbrand · ‎2025-10-28

THX @emmap,
I will include the image like this in the original article.

➜ CCSM Elite, CCME, CCTE ➜ www.checkpoint.tips

RS_Daniel · ‎2025-10-27

Hello,

I would say that data centers sharing Layer 2 address spaces are actually quite common. In these cases, an Active/Active deployment becomes a necessary feature, since a failover between DCs would otherwise result in connection loss.

In fact, this is the recommended deployment for Cisco ACI architectures using stretched bridge domains, but using ClusterXL in Active/Active mode, since Maestro Active/Active is not GA yet.

The main concern is that customers already running Maestro feel a bit unsure about relying on a feature that’s disabled by default and needs several approval steps before it can even be tested.
You have to go through multiple stages — submit a request, wait for approval, apply a specific hotfix — and at each step there are warnings about design alignment, involving PS, or getting R&D directly engaged. It doesn’t really inspire confidence for production use 😅.

Regards

emmap · ‎2025-10-27

Hi, yes sharing layer 2 is quite common, but that's the point I was making - Dual site A/A is probably not suitable for such cases as it requires that routing layer to make use of it. Traffic is not load balanced between sites at the MHO layer or anything, it's just based on what gets routed where. The fact that it is more of a niche solution is part of why it's hidden like it is, we don't want to have customers going down a path that introduces complexities unnecessarily.

Lari_Luoma · ‎2025-10-27

Each site in Dual-Site A/A has independent L2 domain, not shared, and fail-over is handled by BGP.

RS_Daniel · ‎2025-10-28

Hello @Lari_Luoma and @emmap , on Cisco ACI scenarios with stretched networks both sites are able to share L2 domains, this is the case usually on multi pod deployments. And yes, failover is handled by routing protocols, again in ACI scenarios usually EIGRP for internal network and BGP for external. The problem when active/active is not used at firewall level, is not who is responsible for the failover, but that when failover happens, the connections must be reset.

With clusterXL we already have this feature and it is recommended from checkpoint for Active/Active data centers, I do not see why with Maestro it is strongly contraindicated. I am facing this situation with our customers and would like to better understand this, is it possible you can help with some examples about where it could bring problems/ increase unnecessary complexity? Thanks in advance!!!

Regards

emmap · ‎2025-10-28

The point that we're making is that Maestro Dual Site A/A is not suitable for stretched layer 2 scenarios, because it is not a stretched layer 2 cluster setup. I can't comment on how it would integrate into ACI, it's not an area I am familiar with. We do have members of our architecture team who can help here if you have customers asking. Your local office can help set up a meeting there.

Maestro Dual Site A/A is similar to CXL Active/Active Geo Cluster, it is not like ClusterXL Load Sharing.

CXL LS: https://sc1.checkpoint.com/documents/R82/WebAdminGuides/EN/CP_R82_ClusterXL_AdminGuide/Content/Topic...

CXL A/A: https://sc1.checkpoint.com/documents/R82/WebAdminGuides/EN/CP_R82_ClusterXL_AdminGuide/Content/Topic...

Load Sharing is a layer 2 scenario where packets are pivotted (or multicast) to cluster members across the customer network links. A/A is a layer 3 scenario where a routing layer takes care of packet distribution. If your setup can work with CXL A/A then it can probably work with Maestro DS A/A.

Lari_Luoma · ‎2025-10-27

Hi All,
Here is my architectural perspective of A/A Dual-Site.

Active-Active across two sites often sounds appealing in theory: both sites are used, no idle resources, and automatic failover. In practice, it usually adds complexity without providing meaningful benefit for most deployments.

There are very few real-world scenarios that actually require both sites to be active with full connection synchronization. Even when latency is low, synchronization between two locations introduces unnecessary overhead, operational complexity, and more room for error. Most enterprise traffic simply doesn’t need it.

The few environments where Active-Active dual-site makes sense are low-latency, high-throughput systems where both sites must process traffic simultaneously, such as:

Financial trading or market data platforms
Real-time middleware or messaging systems with stateless connections
Certain industrial or IoT control systems with independent session flows
Architectures that truly require asymmetric routing where traffic enters through one site and exits through another.

For almost everyone else, a single active Maestro stack per site achieves the same goals — high availability, redundancy, and scalability — and is much simpler to deploy and maintain. Active-Active dual-site is niche by nature; for most customers, it adds complexity without improving resilience or performance.

Modern applications, which are almost entirely HTTPS-based, don’t rely on session synchronization between sites. They reconnect quickly and gracefully during failover events. Meanwhile, most legacy applications that do maintain long-lived TCP sessions tend to be sensitive to latency and timing variations, so they wouldn’t function reliably in an Active-Active dual-site setup anyway. In both cases, cross-site synchronization provides little practical value.

HeikoAnkenbrand · ‎2025-10-29

I completely agree with you @Lari_Luoma .

In most cases, the added value of an active/active setup is not really necessary. It might be more reasonable to consider two single-site installations instead. Every increase in complexity also leads to a higher risk of failure. That’s why I’m a fan of more traditional, simpler solutions that have proven themselves in many customer installations. From my point of view, these solutions are technically easier to manage and maintain.

Furthermore, an active/active setup usually requires additional technical prerequisites. These can include more complex networking configurations, higher synchronization requirements, and stricter latency or performance constraints, all of which add extra layers of difficulty to the overall system design:

In most cases, external BGP routing with one or even multiple AS numbers is required, or alternatively, Geo DNS with very short DNS cache times. In addition, connectivity to multiple providers via BGP is usually necessary.

This level of complexity also increases the risk of errors due to the intricate architecture — whether caused by human mistakes, misconfigurations, or system failures that were not anticipated during the design phase.

➜ CCSM Elite, CCME, CCTE ➜ www.checkpoint.tips

Gennady · ‎2025-10-28

Good day!

We (our team) are working on QA testing of R81.20 JHF take 118. There is one thing worth mentioning in regard to Active/Active. The environment is Dual-Site Maestro with Single Security Group.

If we install JHF take 118 on MHO before installing the same on Security Group, we always get Active/Active with no traffic loss. SSH to SMO address brings us to Site-1 or Site-2 randomly. Changing HA mode from Active UP to Primary UP resolves the Active/Active back to Acitve/Standby.

If we install JHF take 118 on Security Group before MHO, there is no Active/Active and everything works standard.

At the moment we are in a rush to complete the test installations. Later on, I will conclude complete testing of the "MHO before SG" JHF installation scenario to find out why Active/Active appears and why there is no connection interruptions.

On the JHF take 98 we see the following in correction debug in case if a packet arrives on a Standby Site:

Obviously, the same drop doesn't happen on JHF take 118.