ClusterXl synchronization network bandwidth

CheckMate-R77 · ‎2019-07-31

Hello everyone.

I would like to ask if on CheckPoint ClusterXL (2 Gaia R80.10 gateways) working in HA (active/standby) mode sychronization interface/network can have less bandwith than "production" interfaces. In documentation I have found that for sync only distance (delay) matters. For example - can I use 10 Gbit links for DMZ, internal and external networks and "only" 1 Gbit for sync interface? Or maybe I would use 4x1 Gbit (bond) if 1 Gbit link is insufficient?

Thanks for Your precious help

CheckMate-R77 · ‎2019-07-31

I have also found that (https://sc1.checkpoint.com/documents/R80.10/WebAdminGuides/EN/CP_R80.10_ClusterXL_AdminGuide/html_fr...😞

"Note - There is no requirement for throughput of Sync interface to be identical to, or larger than throughput of traffic interfaces (although, to prevent a possible bottle neck, a good practice for throughput of Sync interface is to be at least identical to throughput of traffic interfaces)."

So there can be bottle neck in my link configuration - I think especially in Full Sync transfers :-(.

_Val_ · ‎2019-07-31

@CheckMate-R77 , as you have quoted already, there are two types of synchronisation: full and delta sync. Although full sync is extensive, it is not equivalent to the passing traffic, it is just transferring all kernel tables as is from one member to another. It also can lag a bit, delaying full functionality of the cluster but not affecting production traffic. However, delta sync is a direct function of production traffic, and it is time sensitive.

There is no exact formula to calculate the required bandwidth, but it is assumed that you might need between 10 and 30% of your production bandwidth. You can have a limited control over delta sync by disabling ore delaying sync for specific services, but it does not give you lots of flexibility anyway.

In your specific case I would advise to use 10Gbps interface for sync to be on the safe side. Mind you may try bonded interfaces, but as stated in a different comment to this post, the nature of LACP does not allow you to multiply bandwidth in this specific case.

PhoneBoy · ‎2019-07-31

2x1GB for sync should suffice.

_Val_ · ‎2019-07-31

No, it will not.

LACP will not give you 2x1=2Gbps, because the balancing is per pair of IPs, and and the IPs are always the same. You will have 1 Gbps available for sync there.

HeikoAnkenbrand · ‎2019-07-31

I completely agree with @_Val_ here.

LACP is only used with the sync interface to make the sync fail-safe. In the beginning you could define several sync interfaces. That's no longer possible.

If you want to be 100% safe, you have to use two 10 GBit/s interfaces as bond for the sync.

➜ CCSM Elite, CCME, CCTE ➜ www.checkpoint.tips

Wolfgang · ‎2019-07-31

I never saw more then 800Mbit/s on a sync link. And these value was only seen with IPSO-clustering in forwarding mode and four fully utilized 10 GB interfaces. And these utilization was only seen in case of a full sync.

We had a some clusters running with heavy utilized 10GB links and 2x 1GB bond active/passive as sync. Highest sync utilization ever was 450 MBit/s .

That‘s my experience but maybee someone can show as some more production throughput on a sync interface.

Wolfgang

HeikoAnkenbrand · ‎2019-07-31

I agree with you @Wolfgang.

In practice, I haven't seen a firewall that has generated more than 1GBit/s sync traffic.

But if you want to be on the safe side, you have to use two 10Git/s interfaces as bond.

PS: But I also have several firewalls running with two 1 Gbit/s sync interfaces as LACP bond:-)

➜ CCSM Elite, CCME, CCTE ➜ www.checkpoint.tips

HeikoAnkenbrand · ‎2019-07-31

Interesting would be here what R&D recommends:-)

It's just an idea. Maybe you can calculate this with the connections which are in the stat tabel. Is there a rule of thumb here?

➜ CCSM Elite, CCME, CCTE ➜ www.checkpoint.tips

Julian_Sanchez · ‎2020-09-28

Hello,

Sorry I have a question. How do you know or how can you know the utilizacion over sync interface in VSX?

Any command? I try to use SmartMonitor but I dont found.

Regards,

Julian S.

_Val_ · ‎2020-09-28

The rule of thumb is to calculate it based on overall bandwidth utilization of your VSX cluster. We are talking about up to 5% of overall bandwidth. If you want o be on the safe side, take up to 10%.

Timothy_Hall · ‎2019-07-31

In my experience 1 Gbit is sufficient for cluster state table sync unless the cluster has an extremely high new connection rate passing through it. Loss and re-transmits on the sync interface as reported by cphaprob syncstat are typically caused by overall high CPU load on the cluster members, not by a lack of raw bandwidth on the sync interface. High CPU load can be mitigated with CoreXL/SecureXL tuning as described in my "Max Power" book, as long as the firewall hardware was sized appropriately. By selectively disabling synchronization for services such as DNS, HTTP, and HTTPS the amount of sync traffic (and associated CPU utilization) can be reduced significantly.

Gaia 4.18 (R82) Immersion Tips, Tricks, & Best Practices Video Course
Now Available at https://shadowpeak.com/gaia4-18-immersion-course

CheckMate-R77 · ‎2019-08-02

What a great discussion we have here. I think I'll follow CheckPoint recommendation anyway.

By the way it would be very interesting to test it in CP lab.

Thank You all

_Val_ · ‎2019-08-02

I am glad you guys have never seen an issue with a sync interface being a bottleneck. I did, in a couple of very special VSX related cases. That does not say you are safe with a physical cluster.

Vladimir · ‎2021-01-25

Hi Tim,

I know this subject has been talked about in the past, but can you pitch in with your opinion on current versions preference for either direct link or via-switch connectivity for Sync interfaces?

I too fail to justify 10G or multiple 10G links for the state table sync unless it is used for some-kind of large IoT environment.

Bob_Zimmerman · ‎2021-01-25

This actually came up just the other day in another thread.

I would concur with the people here that LACP on a sync bond is not useful, but round robin works perfectly and gives you the full throughput of all the interfaces you add to the bond. I wouldn't recommend LACP anyway because that implies a single switch carrying all the sync traffic. If it fails, both members will see all of their sync ports go down, and bad things will happen. A bond with round robin transmission can go through two totally separate switches which don't know about each other. That lets you lose a switch without the cluster caring.

Magnus-Holmberg · ‎2021-01-25

LACP, Do Not imply that you using a single switch.

LACP could be used to a switch stack
Also most datacenter switches uses functions such as VPC (Cisco virtual port channel) allowing you to build LACP to diff physical switches that are within a VPC pair.

https://www.youtube.com/c/MagnusHolmberg-NetSec

Bob_Zimmerman · ‎2021-01-25

Not sure if you're bringing up vPC as a joke, but in my experience, it's deeply, deeply janky. I've had it cause major outages on a variety of Nexus gear. Using it implies you care more about throughput than reliability. I guess load-sharing firewall clusters are a thing, and still need sync, but I wouldn't use vPC even then, since we've already established sync traffic to a given member always gets hashed to the same link for transmission.

As for stackable switches, most issues I've had with switch stacks were OS panics, and if one member panics, it often brings down the whole stack. Also not great for reliability.

Using totally independent switches is more reliable (since one can't interfere with the other) and allows for higher throughput.

Magnus-Holmberg · ‎2021-01-25

Main point of it was to provide the fact that there are tech to build LACP between diff physical boxes.
And Cisco is the largest network provider, but i do belive that all of the vendors Juniper, Huawei, aruba etc, have similar functions.

I do agree that VPC can complicate things alot and cause outages for diff reasons.
I have seen alot of missconfiguration when it comes to it, the most common issues is the lack of actually doing redundancy test
Within early software versions this was deffently the case, such as the lack of delaying vpc ports to come up after a recovery of a switch.
More or less this by default caused outages on traffic as it was blackholed for 30 seconds during recovery.

Building without VPC is deff an option even within datacenter as it allows you to upgrade switches more or less independent.
You dont need to care about vpc compability within a pair, simplifing it.
And you do still have the additional thruput with a rr bond just like you said, am not sure i would do rr in all cases as the network can see it as mac flapping.
within ACI i do belive that it would think its a loop and actually kill the traffic, so i would do an HA bond there if not running LACP.

Having said that, we have not experiance any issue with VPC over the last years.
Either in fabric path networks with 5/7K and within ACI with 9K.
We more or less only use it to other network equipment (well including check point VSX running on HPE servers)
In some fabric path datacenters we do not use VPC for the reason you mention above, additional stability and to have a very simple design.

VPC simplifes datacenter migrations by ALOT and actually allows you to have redundancy between the datacenters.
But you can ofc achive this with having a specific VPC migration pair of switches and you dont need to run it across the entire fabric.
And in this case you are interested in thrughput and also that STP is horrible within datacenters.

regards,
Magnus

https://www.youtube.com/c/MagnusHolmberg-NetSec

Bob_Zimmerman · ‎2021-01-26

Regarding the "network can see round-robin bonding as MAC flapping", that's why I say totally independent switches. As long as they don't talk to each other, each one only sees the MAC address in one place. That is:

FW1.eth1 -> SW1.g0/0, SW1.g0/1 -> FW2.eth1

FW1.eth2 -> SW2.g0/0, SW2.g0/1 -> FW2.eth2

Then you bond eth1 and eth2 on each firewall. FW1 sends a sync frame to FW2. It picks an interface using the round-robin config. That switch has only one path to the destination MAC. It picks the next interface using the round-robin config. Again, that switch has only one path to the destination. The switches can even be connected, as long as you use an isolated VLAN on each one which isn't propagated to the other. That allows for dumb human error, though, so I prefer not to connect them at all.

Vladimir · ‎2021-01-25

While I wouldn't do Sync LACP in Cisco vPC (bad experiences in the past), I dig the concept with HPE/Aruba VSF or Distributed Trunking.

Timothy_Hall · ‎2021-01-25

In short, a direct cabling for the sync interface in ClusterXL is fine now but it didn't used to be. The following is from memory which is a bit hazy since it was so long ago.

The long answer is that a hub/switch was required on the Nokia IPSO boxes when they were using Nokia Clustering (active-active basically, not VRRP) for the "Clustering Network" between the 2 IPSO appliances. This was actually a distinct and separate interface from the Check Point state sync network. If a direct cable was used for the Nokia Clustering Network and one of the members was powered off or the cable was unplugged, it would cause an traffic interruption as the cluster state bounced and reformed with one member. Thus the use of a hub or switch to maintain link integrity (green light) on the NIC during such an event to prevent the bounce.

As for ClusterXL, Check Point introduced something called "New HA" around version NG (R50), which is more or less what we still use today. The original HA implementation was renamed "Legacy HA" and still available for a few versions after that. The "Legacy HA" code did require the use of the hub or switch for the sync network to avoid a similar traffic interruption caused by a cluster bounce if link state was lost on the sync interface. However "New HA" changed how it dealt with failures specifically on the sync interface. I'm not completely sure about this, but I think that when a cluster member detects a sync interface failure, under New HA it waits ~2.5 seconds (the ClusterXL dead timer) to decide what to do before possibly changing state. If that member is currently active and detects a sync interface failure (especially due to a dead or flapping peer) it remains active without any bouncing.

When Legacy HA would detect a sync interface failure, based on what it knew from the last CCP update from the peer, it would assume that the peer was in a "better" state and immediately go standby. If the sync failure was due to the other member dying or otherwise going away, it would take the member that just went standby ~2.5 seconds (the ClusterXL dead timer) to realize it was the only surviving member of the cluster, and return to active state (i.e. "bounce"). Needless to say if the peer member had a hardware or severe operating system problem and was constantly flapping the sync interface, this would lead to numerous cluster bounce events that were definitely noticeable.

As far as bandwidth for the sync interface, 1Gbps should really be enough. If you are having sync interface issues I doubt it is because the members are saturating a 1Gbps interface with delta state sync updates.

Gaia 4.18 (R82) Immersion Tips, Tricks, & Best Practices Video Course
Now Available at https://shadowpeak.com/gaia4-18-immersion-course

Bob_Zimmerman · ‎2021-01-25

With new-mode HA, a member whose sync interface goes down still checks to see if it's the healthiest member of the cluster and may decide to go down to prevent contention for the cluster VIPs. Specifically, if you have a cluster interface which doesn't have any other devices on it which respond to ping, the cluster member (even if it was active at the time of the sync failure) will assume it has a problem.

And agreed, saturating sync throughput is not a common cause of sync issues. The big benefit of bonded sync is fault tolerance, not throughput. 1g of sync throughput should be enough for half a million connections per second, no problem. Probably much more. Sure, some environments see connection volumes that high, but not many.

PhoneBoy · ‎2021-01-25

In the past, we used to say "use the largest interface type used for a sync interface" (meaning if you were using 1gb data interfaces, use 1gb sync interfaces too).
I believe the current guidance on sync traffic is 2GB max.

_Val_ · ‎2021-01-25

Second that. The rule of thumb is to provision up to 10% of production bandwidth. That means, if you want 20Gbps, you may need up to 2 Gbps for sync. Which means, 10GB interface.

Vladimir · ‎2021-01-26

In this case, it may be prudent of CP to implement SFP+ as an on-board "Sync" interface on larger appliances.

Are you a member of CheckMates?

ClusterXl synchronization network bandwidth