Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
AlekseiShelepov
Advisor

VRRP vs ClusterXL

What would you choose for a cluster on Gaia R77.X - R80.X? 

It is time to decide once and for all. The Ultimate Battle!

Let's start a small holy war here, discuss advantages and downsides for each technology or just personal preferences. You can read some opinions and insights in these threads:

What is ClusterXL and VRRP ?  

HA Cluster with 150+ VLAN Interfaces

ClusterXL - standby cannot reach gateway 

VRRP2
ClusterXL18
0 Kudos
13 Replies
HeikoAnkenbrand
Champion Champion
Champion

Hi Aleksei,

I don't want to vote here!  It's a design question from my point of view.  I always tend to ClusterXL.

See SK‘s:

VRRP:

Configuration requirements / considerations and limitations for VRRP cluster on Gaia OS

ClusterXL:

Recommended configuration for ClusterXL 

ATRG: ClusterXL 

Regards,

Heiko

➜ CCSM Elite, CCME, CCTE
0 Kudos
AlekseiShelepov
Advisor

So, in what designs I might prefer VRRP? You mean that if the design that I have in mind for new project requires 3 members of a cluster, or 200 VLAN interfaces, or VSX, then I just cannot use VRRP? Well, then I don't see any reason to use VRRP at all. Why just not use ClusterXL without these limitations that might affect network in future, with easier configuration, etc? It is anyway used for sessions synchronisation. What are the situations (designs) when VRRP would perform better?

Mainly I am interested in arguments for using VRRP on Check Point devices with Gaia, with modern versions. I understand why it was used previously on IPSO. But now... There are more possibilities to mess up configuration of VRRP, to have some interfaces active on master node and some on secondary, it is more difficult to monitor.

I heard only a couple of theoretical advantages of VRRP over ClusterXL:

1) In VRRP you can configure priority of each interface more granular

Here I'm not sure if anyone configures this nowadays. This might be used in some very specific situation, but I believe it could be also configured on ClusterXL.

2) VRRP is a standard protocol used by multiple vendors and described in RFC 

I don't think that's an advantage. How does this help? Again, ClusterXL anyway is used for sessions synchronisation. Other vendors also have their own HA protocols, which can work better than VRRP on these devices. 

We can also add VRRP FAQ to list of things to read on this topic.

0 Kudos
Maik
Advisor

Well, I remember that I asked my instructor during the CCSE class back in July this year exactly the same question, as I just did not see any "real" benefits for VRRP. The only difference, which could result in an advantage over ClusterXL, is the possibility to configure advanced VRRP where each interface is kinda monitored individually. But this can also lead to the so-called "Black Hole Effect" which basically is asymmetric routing due to a defect interface and the separate monitoring of these. (The "solution" here is using the "monitored circuit" VRRP option, which basically cuts off the advantage with the individually monitoring option.) But to go back to my CCSE class, my instructor basically said, that the only real advantage of VRRP is that its a well known, recognized standard and some people just prefer to use well known common things over proprietary solutions. And in addition ClusterXL operates in the Security Gateway software (that runs on top of Gaia) while VRRP is directly placed on the GAiA OS itself - at least that's what I heard back then. It could be, that the fact of direct operation on GAiA brings a performance benefit with it - but I can't approve that for sure.

0 Kudos
_Val_
Admin
Admin

At this point in time there is no sense in choosing VRRP instead of ClusterXL HA. Check Point has had to maintain VRRP in Gaia after acquisition of Nokia Security Appliances business, to ensure smooth transition from IPSO appliances into the new brave world.

0 Kudos
Maria_Pologova
Collaborator

For example, one huge disadvantage of VRRP when using OSPF is described in OSPF Graceful Restart configured in Clish or in Gaia Portal does not work 

When configuring OSPF Graceful Restart with VRRP, fast cluster failovers/failbacks can still lead to traffic outage. Therefore, it is recommended to use non-prempt mode in VRRP. This happens because the VRRP Backup members have no information about OSPF routes as VRRP Master member has not completed the restart operations yet.

With ClusterXL and OSPF routing information should sync between cluster members.

0 Kudos
HeikoAnkenbrand
Champion Champion
Champion

Launches a survey on whether VRRP should be abolished.

Regards,

Heiko

➜ CCSM Elite, CCME, CCTE
0 Kudos
Maria_Pologova
Collaborator

As Timothy Hall‌ mentioned in What is ClusterXL and VRRP ?

I recommend ClusterXL over VRRP unless one has the rare need to present more than one Cluster IP (VIP) on a single interface (which VRRP can do but ClusterXL can't), or there is some external load balancing algorithm in use (like OSPF) controlling the traffic distribution with load sharing via VRRP.

There are still rare cases, when VRRP is preferred.

And if some people are comfortable with VRRP, let them use it. After all, it is a matter of preference, and it is always good to have a choice. Smiley Happy  

0 Kudos
Maarten_Sjouw
Champion
Champion

Well, as I'm a person that has used quite a bit of IPSO and VRRP before, I'm still leaning towards VRRP. Some of the most simple things you can do in VRRP is find out which interfaces (if any) are misbehaving, this is mostly during implementation and switch migrations, where someone forgets to add a VLAN to a switch. the show vrrp interfaces command will show exactly which interface does not communicate to the other side.

Till today I have not found a way to find this info in ClusterXL, even worse, bu deafault it will ONLY check the first and last VLAN on an interface!!

The command set vrrp disable-all-virtual-routers on will allow you to completely shut VRRP down without loosing the config.

And yes VRRP is part of the OS, where it should be(IMHO).

Regards, Maarten
0 Kudos
Maria_Pologova
Collaborator

Even though I don't like the default behavior myself, fw ctl set int fwha_monitor_all_vlan 1 for a period of migration does it's work. But I think it is fair disadvantage to mention.

In case of using ClusterXL, you don't need make any configuration on the gateway, apart from configuring sync interfaces, so you have nothing to loose. Smiley Happy And there are different methods of bringing the cluster member down, such as: 

  • clusterXL_admin down - script that creates a fake critical device and brings it down. Usually used to perform manual failover.
  • cphastop - stops the cluster member from passing traffic. Used in case of emergency.
0 Kudos
Maarten_Sjouw
Champion
Champion

I just don't like to have to add a underwater variable to enable something that should be enabled from the start. Next to that I'm not able to see the same detail as with the cphaprob commands that I can see show vrrp interfaces.

  • clusterXL_admin down only puts the member in problem state and does not survive a reboot
    • set mcvr vrid XX priority 195 changes the priority when the other member has 200 as it's prio and 10 as delta
  • cphastop does about the same as the vrrp disable command, not sure if it does survive a reboot?

  • show configuration vrrp shows the configured state and show vrrp shows the actual state.
  • show configuration mcvr shows the actual configuration of the cluster IP's and the priorities.

One other thing that is default with VRRP also is a virtual mac-address and yes I know you can, no sorry, MUST enable that with clusterXL as well.

And as it has been said already a number of times in this thread, it is mainly a matter of opinion.

Within our team we also have this discussion on a regular base, 3 years ago our current team was combined from a open server historic team and a IPSO historic team and we run both, still very happily next to each other.

With BGP there is an advantage as it will allow you to use the VIP with VRRP, which was not possible with R77.x but I don't have any BGP clusters with R80.x running, so have not been able to check that really.

Regards, Maarten
0 Kudos
Maria_Pologova
Collaborator

With clusterXL_admin up -p (permanent) does survive reboot. A small remark - this command puts a critical device (a pnote) in problem state, what brings the cluster member to down state. 

  • cphaprob stat - Gives you an overview of how the cluster is doing. 
  • cphaprob -ia list - View the list of critical devices on a cluster member, and of all the other machines in the cluster.
  • cphaprob -a if - Prints the summary of cluster interfaces with the following information:

    - Number of required cluster interfaces - including the Sync interfaces (the maximal number of good cluster interfaces seen since the last reboot)

    - Number of required secured (trusted) interfaces (the maximal number of good sync interfaces seen since the last reboot)

    - Names of monitored cluster interfaces (refer to CCP and VLAN interfaces)

    - State of cluster interfaces (based on arrival/transmission of CCP packets)

    - CCP mode on cluster interfaces

    - Number of cluster Virtual IP addresses

    - Virtual IP addresses

    - Virtual MAC addresses

In most cases these three commands are enough for troubleshooting. In this term both VRRP and ClusterXL has the same functionality, but different commands are used. For me personally ClusterXL is more straightforward, than VRRP. And again, what I've posted above regarding using OSPF with VRRP, in my scenario was a decisive consideration to not use VRRP. 

Looking as BGP on Gaia OS, I don't see any limitations of using VIP in VRRP or ClusterXL.

0 Kudos
Peter_Sandkuijl
Employee
Employee

@ Maarten and all, make sure to check out ClusterXL R80.20 improvements such as status:

Improved sync stats:

Improved CLI (no need for expert mode)

And more such as automatic CCP mode and improved sync redundancy options

BR

Peter !!

0 Kudos
AlekseiShelepov
Advisor

So you are ready to put youself into a situation with many limitations of VRRP, which can affect in future, because of two (maybe not very often) functions that you use?

As it was mentioned here, in general you can do the vlan checking and clustering disabling on ClusterXL. Maybe not in a very comfortable way for you, maybe not in the best way, but it's possible to come up with something.

Isn't ClusterXL worth of easier management in everyday operations with less chances of a mistake? As I see for now, you prefer VRRP because you used to it and because it shows all vlan interfaces in monitoring. But it brings too much limitations and minuses in my opinion. And using both ClusterXL and VRRP in the same environment adds unnecessary complexity.

A simple thing, checking which cluster member is active now in SmartView Monitor, is not possible with VRRP. I saw situations when some interfaces in VRRP were in master state on one node and some interfaces on other, and cluster members didn't failover. With VRRP setup ClusterXL is still used for session synchronisation, so you would need to troubleshoot two things instead of one in case of synchronisation issues. And so on.

0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events