Re: ClusterXL member disable?

Maarten_Sjouw · ‎2018-04-16

I was working on preparing a upgrade and replacement of a cluster, in our IPSO/GAIA and VRRP era this was pretty simple, you just on the member issue the 'set vrrp disable-all-virtual-routers on' command and the cluster member will not participate in VRRP.

How can you do this with clusterXL?

Situation:

2 old HP cluster members running R77.30

2 new 5000 appliances running R80.10

All 4 have an external IP from where we manage them, all 4 have been added to the cluster in the SmartConsole.

Now when I load a policy on the new 5000 units one of the units will start fighting with the HP that is the master.

I need to be able to stop that behaviour untill we can migrate to the new units.

Regards, Maarten

Daniel_Taney · ‎2018-04-16

In CLI expert mode, try clusterXL_admin down on whichever boxes you'd like to stop participating in the Cluster. You can also do clusterXL_admin up to put the member back into service.

R80 CCSA / CCSE

Maarten_Sjouw · ‎2018-04-16

Nope, clusterXL_admin down will still, when the member thinks it is alone, try to bring up the cluster and activate the Cluster IP's.

On both clusters (R77 and R80) they are not showing the other two members, therefore, even though it is 1 clusterobject, they still operate as 2 separate clusters. Unloadlocal or cpstop are the only quick ways we could think of to disable clusterXL from trying to take the Master.

I am really getting to the point of turning the cluster into a VRRP cluster.

Regards, Maarten

HeikoAnkenbrand · ‎2018-04-16

You have two choices.

The cluster with the smaller software version always remains aktive. So you can reinstall a gateway without anything happening. If you install the policy on the gateway with the higher version, it goes into Ready mode. As soon as the other gateway is installed or cpstop is executed, the other one goes into active mode.

Alternatively, you can under cpconfig remove one, two or tree gateways from the ClusterXL and resume it after the update.

➜ CCSM Elite, CCME, CCTE ➜ www.checkpoint.tips

Maarten_Sjouw · ‎2018-04-16

As soon as I install the R80.10 Policy it does try to take over the cluster. It does not seem to "see" the old members.

So there is NO CLI command to temporarily disable the cluster, only cpstop, fw unloadlocal and cpconfig are the possibilities?

I really do not understand why everyone is so happy with clusterXL then.

Regards, Maarten

AlekseiShelepov · ‎2018-04-16

I really do not understand why everyone is so happy with clusterXL then.

I can tell you why I am not happy with VRRP and IPSO clustering. It adds another point which you need to make sure is working. To check which node is active and to check cluster status you need to deal with two things. ClusterXL synchronization of sessions is used in any case. You cannot make priorities and fully manage cluster behavior from policy, but you need to manually do that on cluster members. You need to add interfaces first in VRRP configuration, then in policy. Additional cadmin account.

With ClusterXL you know that you deal with each cluster members separately only for it's own network settings. If you do fw unloadlocal it should still have proper interfaces and routing. And for all cluster things - go to policy.

As soon as I install the R80.10 Policy it does try to take over the cluster. It does not seem to "see" the old members.

I believe this is the main problem, but not ClusterXL.

Also I don't understand why it is not possible to use cpstop, if you still want to break the cluster.

cphastop

Description Running cphastop on a cluster member stops the cluster member from passing traffic. State synchronization also stops. It is still possible to open connections directly to the cluster member. In High Availability Legacy mode, running cphastop may cause the entire cluster to stop functioning.

Maarten_Sjouw · ‎2018-04-16

Well this is a command, cphastop, that might just be the one I was looking for.

My main advantages for VRRP are that it just gives me better control over the behavior of the cluster and I have a much better view on the interface states. As soon as 1 of the interfaces is seen as Master on both you know you have an issue in the network or with spoofing/rulebase.

Regards, Maarten

AlekseiShelepov · ‎2018-04-16

And just out of curiosity, am I understanding it right?

You have a cluster object in a policy into which you have added four gateways - two old ones with R77.30 and VRRP installed on HP servers, and two new 5000 appliances with R80.10 and ClusterXL (with probably different number of cores).

But how would you know that an interface is seen as master on both nodes? I mean after initial setup, when some time passes, and someone installs a little bit wrong policy. I think you would see drops in logs first.

I had cases when some interfaces are shown as masters on one node and other interfaces are masters on the other. And I didn't know that until I decided to check vrrp, just to look what it there. So how did that happen, why it is stayed like that? I expect all interfaces be masters only on the active node.

Priority in policy doesn't matter in that case, as I understand. chpaprob stat just shows that both nodes are active. Then I need to go to vrrp output to see what is there.

For me VRRP is opposite, it complicates things, adds new entities.

And IPSO clustering... I would prefer just not to touch it.

Maarten_Sjouw · ‎2018-04-16

No everything is running ClusterXL, I myself am from a team that worked with Nokia's and IPSO for a long time, and VRRP was THE clustering method we used. Nowadays we joined, with our customers another team that used to install HP's everywhere with SPLAT on it, therefore only ClusterXL was possible. We have CluterXL on most HP's from that team and VRRP on most of the clusters from our old team, so I know both. But the ease of shutting VRRP should be similar to cphastop/start.

clusterXL_admin down does not do the trick in this case, as there should always be a member up, and as the 2 versions do not seem to "see" each other, there will be a 77.30 member up and a 80.10 member.

I just want to be able to install policy and have the cluster in a shut state until we are ready to switch over in a maintenance window.

Master-Master can have several causes, different VRIDs network links down / VLANs not connected between the switches the FW are connected to, or a policy problem.

Regards, Maarten

Daniel_Taney · ‎2018-04-16

Not to hijack the thread here, but for my own curiosity, what is the difference between clusterXL admin <down/up> vs. chpastop / cphastart? I've sometimes used the two commands interchangeably and assumed they did more or less the same thing.

R80 CCSA / CCSE

AlekseiShelepov · ‎2018-04-16

I do not use cphastop on any occasions. When I'm ready to shut down a cluster or a member of that cluster, I would do cpstop.

I am not sure if I use the right words to describe that, so maybe someone would correct me later.

cphastop - stops sessions synchronization, stops ClusterXL (HA) module, the node does not pass traffic. In my opinion, it is a similar thing to running cpstop, but it doesn't stop all processes. Requires full synchronization of sessions afterwards. Important note from guides:

Running cphastart on a cluster member activates ClusterXL on the member. It does not initiate full synchronization. cpstart is the recommended way to start a cluster member.

clusterXL_admin down - sends a signal that a critical monitored device is down on this node, so the whole node should switch to down state, but it does not disable synchronization. If the standby node itself has some problems and/or synchronization is not working, the node where you entered this command should not go down. Does not require full synchronization after clusterXL_admin up.

This is the recommended method for a failover, and it doesn't really break cluster on the same level as cphastop. Also it is easy and quick to get the node back in the game again.

Daniel_Taney · ‎2018-04-18

Very helpful, thanks!

R80 CCSA / CCSE

Maarten_Sjouw · ‎2019-02-26

To get back to this discussion, cphastop also does not survive a reboot. In this perticular case we really neede this to survive the reboot as someone got to together to work on the UPS system and shut down the whole site, so when the boxes cam back up, both were a r77.30 and a R80.10 were active.

The main reason I didn't want to run cpstop, is that the gateways were all connected to the internet.

Regards, Maarten

Vincent_Bacher · ‎2018-04-16

Add following line to $FWDIR/boot/modules/fwkern.conf

"fwha_version=9999"

and reboot the device

Maybe "fw ctl set int fwha_version 9999" works as well.

Just kidding

and now to something completely different - CCVS, CCAS, CCTE, CCCS, CCSM elite

Are you a member of CheckMates?

ClusterXL member disable?

cphastop