Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Raj_Khatri
Advisor

Best Practice for HA sync interface

What is the best practice for the sync interface when connecting 2 cluster members using ClusterXL?

We have always connected the cluster members together using the sync interface between them.  Just curious if that is according to best practice vs. connecting the members directly to a switch for sync.

9 Replies
Claudio_Bolcato
Contributor

From my experience I can say it depends if is a physical cluster or virtual.

If it's physical the sync interface should be in a dedicated vlan and you should keep attention on CCP traffic. Some switches can cause problems with multicast traffic so you have to switch to CCP in broadcast.

In pre R80.10 you have to specify the cluster ID (or MAC magic or fwha_mac_magic) on both members (default value 1) and it must not be in conflict with other cluster on the network.

In a virtual environment (vmware) you have to check the security settings of the portgroups where the vNICs are connected on.

Further info can be found on this sk Connecting multiple clusters to the same network segment (same VLAN, same switch)  or How to set ClusterXL Control Protocol (CCP) in Broadcast / Multicast mode in ClusterXL 

Teddy_Brewski
Contributor

And is it a common/best practice to bond sync interfaces as well? So dedicated two interfaces in a bond used for sync only.

0 Kudos
Roman_Niewiado1
Contributor

I would say that bonding the sync interface is a very good idea.

Having 2 sync interfaces is not supported.

0 Kudos
Kris_Meldrum
Contributor

In my experience the preference was always a direct connection vs using a switch in between cluster members for sync. The reason behind this is if the switch experiences a failure it can cause the cluster to become unstable. A direct connection just removes a potential failure point/latency point. If the firewalls are not physically close enough to do a direct connection then using a switch is fine but my recommendation is always direct.

Bonding the sync interface is the correct way to do it as 2 sync interfaces (while technically supported) will cause issues. However, IMO I wouldn't bother bonding the Sync interface unless:

1) You find that you are over-running your sync interface with traffic. In most deployments this won't be the case, you need to be moving a lot of data or have some other special circumstances.

2) If you need a little extra redundancy. If you find you want to account for a single port failure in your cluster design then go ahead and bond but in my view you're not gaining a whole lot of redundancy. Just my .02

JozkoMrkvicka
Leader
Leader

In our environment, we have 2 nodes 500 km away, so using direct connection is not possible.

Using 10G link is also recommended, preffered as 2x 10G bond interface.

Increased of RX/TX ring size is also good to have.

Kind regards,
Jozko Mrkvicka
0 Kudos
Anthony_Kahwati
Contributor

Found this thread and am pretty sure it answers my question but wanted to make sure no change of opinion over the last 2 and a bit years.

I'm favouring a direct gateway to gateway sync, using a bonded pair of 10G interfaces. Should I consider anything else or is this the ideal scenario?

The devices are in the same DC, about 40 - 50m apart as the cable is run. I have the option of switches if I want to but prefer P2P if that is still favoured.

Thanks

0 Kudos
Bob_Zimmerman
Advisor

You can do direct-wired sync, and it is supported, but it will cause problems. Not may, will. The TAC had already recommended against direct-wired sync for years when I started in 2006.

One of the things a cluster member monitors to determine its health is its interface status. With direct-wired sync, if one member dies or is rebooted, the remaining member sees its interface go down. This is a failure, and the member can't tell if its interface died, or if something outside it failed, so it causes the member to evaluate whether it is the healthiest member left in the cluster. This is a pretty conservative process, as you really don't want a member whose NIC failed thinking it is healthiest and claiming the VIPs while the other member is also claiming the VIPs.

Among other things, a member checking to see if it is healthiest will ping addresses on all of its monitored interfaces (and on the highest and lowest VLANs of monitored interfaces with subinterfaces). If it gets a response, the interface is marked as good. If it doesn't get a response, the member interprets this as further evidence that it is the one with the problem and therefore should not take over the VIPs.

This can easily lead to situations where you reboot one member to install an update, and the other member thinks it has failed and goes down.

 

The best way I've seen to run sync is two interfaces in a non-LACP bond (round robin works) going to two totally separate switches. If one switch dies? No problem: the bond just handles it. If one member dies? No problem: the other member has link on all of its interfaces. If the link between one of the switches and one of the members dies? No problem: the members realize that sync is broken, but the interfaces are still up, so no failover is triggered; you just lose redundancy.

Anthony_Kahwati
Contributor

Thanks Bob. I needed this. It's so obvious and how we currently have it. Makes sense to carry this into the new DC's.

Have a agreat weekend.

0 Kudos
Bob_Zimmerman
Advisor

Absolutely! While in the TAC, I handled a fair number of tickets from people whose clusters were not working how they expected. Direct-wired sync was the most common cause, and the most common issue it caused was intentionally rebooting one member would take the whole cluster down. The other member would just refuse to take over because it thought it had failed.

Check Point's cluster status state machine is complicated, and can easily cause problems if you don't know about some of the things it does. When set up properly, installing an update should not disrupt traffic at all.

0 Kudos