Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Raj_Khatri
Advisor

Best Practice for HA sync interface

What is the best practice for the sync interface when connecting 2 cluster members using ClusterXL?

We have always connected the cluster members together using the sync interface between them.  Just curious if that is according to best practice vs. connecting the members directly to a switch for sync.

12 Replies
Claudio_Bolcato
Contributor

From my experience I can say it depends if is a physical cluster or virtual.

If it's physical the sync interface should be in a dedicated vlan and you should keep attention on CCP traffic. Some switches can cause problems with multicast traffic so you have to switch to CCP in broadcast.

In pre R80.10 you have to specify the cluster ID (or MAC magic or fwha_mac_magic) on both members (default value 1) and it must not be in conflict with other cluster on the network.

In a virtual environment (vmware) you have to check the security settings of the portgroups where the vNICs are connected on.

Further info can be found on this sk Connecting multiple clusters to the same network segment (same VLAN, same switch)  or How to set ClusterXL Control Protocol (CCP) in Broadcast / Multicast mode in ClusterXL 

Teddy_Brewski
Contributor

And is it a common/best practice to bond sync interfaces as well? So dedicated two interfaces in a bond used for sync only.

0 Kudos
Roman_Niewiado1
Contributor

I would say that bonding the sync interface is a very good idea.

Having 2 sync interfaces is not supported.

0 Kudos
Kris_Meldrum
Contributor

In my experience the preference was always a direct connection vs using a switch in between cluster members for sync. The reason behind this is if the switch experiences a failure it can cause the cluster to become unstable. A direct connection just removes a potential failure point/latency point. If the firewalls are not physically close enough to do a direct connection then using a switch is fine but my recommendation is always direct.

Bonding the sync interface is the correct way to do it as 2 sync interfaces (while technically supported) will cause issues. However, IMO I wouldn't bother bonding the Sync interface unless:

1) You find that you are over-running your sync interface with traffic. In most deployments this won't be the case, you need to be moving a lot of data or have some other special circumstances.

2) If you need a little extra redundancy. If you find you want to account for a single port failure in your cluster design then go ahead and bond but in my view you're not gaining a whole lot of redundancy. Just my .02

JozkoMrkvicka
Mentor
Mentor

In our environment, we have 2 nodes 500 km away, so using direct connection is not possible.

Using 10G link is also recommended, preffered as 2x 10G bond interface.

Increased of RX/TX ring size is also good to have.

Kind regards,
Jozko Mrkvicka
0 Kudos
Anthony_Kahwati
Collaborator

Found this thread and am pretty sure it answers my question but wanted to make sure no change of opinion over the last 2 and a bit years.

I'm favouring a direct gateway to gateway sync, using a bonded pair of 10G interfaces. Should I consider anything else or is this the ideal scenario?

The devices are in the same DC, about 40 - 50m apart as the cable is run. I have the option of switches if I want to but prefer P2P if that is still favoured.

Thanks

0 Kudos
Bob_Zimmerman
Leader
Leader

You can do direct-wired sync, and it is supported, but it will cause problems. Not may, will. The TAC had already recommended against direct-wired sync for years when I started in 2006.

One of the things a cluster member monitors to determine its health is its interface status. With direct-wired sync, if one member dies or is rebooted, the remaining member sees its interface go down. This is a failure, and the member can't tell if its interface died, or if something outside it failed, so it causes the member to evaluate whether it is the healthiest member left in the cluster. This is a pretty conservative process, as you really don't want a member whose NIC failed thinking it is healthiest and claiming the VIPs while the other member is also claiming the VIPs.

Among other things, a member checking to see if it is healthiest will ping addresses on all of its monitored interfaces (and on the highest and lowest VLANs of monitored interfaces with subinterfaces). If it gets a response, the interface is marked as good. If it doesn't get a response, the member interprets this as further evidence that it is the one with the problem and therefore should not take over the VIPs.

This can easily lead to situations where you reboot one member to install an update, and the other member thinks it has failed and goes down.

 

The best way I've seen to run sync is two interfaces in a non-LACP bond (round robin works) going to two totally separate switches. If one switch dies? No problem: the bond just handles it. If one member dies? No problem: the other member has link on all of its interfaces. If the link between one of the switches and one of the members dies? No problem: the members realize that sync is broken, but the interfaces are still up, so no failover is triggered; you just lose redundancy.

Anthony_Kahwati
Collaborator

Thanks Bob. I needed this. It's so obvious and how we currently have it. Makes sense to carry this into the new DC's.

Have a agreat weekend.

0 Kudos
Bob_Zimmerman
Leader
Leader

Absolutely! While in the TAC, I handled a fair number of tickets from people whose clusters were not working how they expected. Direct-wired sync was the most common cause, and the most common issue it caused was intentionally rebooting one member would take the whole cluster down. The other member would just refuse to take over because it thought it had failed.

Check Point's cluster status state machine is complicated, and can easily cause problems if you don't know about some of the things it does. When set up properly, installing an update should not disrupt traffic at all.

coolhandclay
Employee
Employee

As a Current Check Point Tech, having your sync cables directly connected is fine and even recommended here is a link to the Doc.

https://sc1.checkpoint.com/documents/R81/WebAdminGuides/EN/CP_R81_ClusterXL_AdminGuide/Topics-CXLG/S...

 

Connecting the physical network interfaces of the Cluster Members directly using a cross-cable. In a cluster with three or more members, use a dedicated hub or switch.

all that will happen is the active gateway will go from Active to Active attention or (Active!) with a ! as it no longer sees the sync interface as up, but the other fw is down so it stays up.

 

 

0 Kudos
Bob_Zimmerman
Leader
Leader

Note how at the bottom of that page, there is a link to Supported Topologies for Synchronization Network. All four listed topologies have a switch in the middle.

If you use a cable for sync with no switch in the path and you reboot one member, the remaining member will start probing on all monitored interfaces to see if it can get responses from things. If it finds a monitored interface other than the one which is down where it can't reach anything, it will refuse to take over. If it's already Active, it may actually go Down.

By default, the highest and lowest VLANs on a given physical interface or bond are monitored. This means if you are in the process of adding a new highest VLAN or a new lowest VLAN and it doesn't have anything on it yet, and you are using direct-wired sync, rebooting one cluster member may cause the other member to go Down. This also applies if you are decommissioning the highest VLAN or lowest VLAN, the endpoints in that VLAN are evacuated, but the VLAN hasn't been removed from the cluster configuration.

If it didn't do this, a failure could cause both members to try to become Active at the same time, which is worse than the cluster going down entirely.

JozkoMrkvicka
Mentor
Mentor

In addition to that, if you are using trunk interface with more VLANs on it, the synchronization VLAN has to be the lowest VLAN.

Kind regards,
Jozko Mrkvicka
0 Kudos