Re: failover logic operation in VSX?

Michael_Briceño · ‎2017-08-07

Hello friends,

I'm noticing strange behavior in the VSX cluster that I have implemented.

I commented, there are random commutations in one of the virtual instances that have been configured. Specifically, it is a Virtual Router.

This VR comes a moment when it switches to the other member of the VSX cluster (the other VS remain in the active member of the VSX cluster), and communication to the Internet is completely lost. It is worth mentioning that this VR is the medium for communications to the outside (Internet).

When I perform a cphaprob stat in VSID 0, I see that the VR is active on the other member, and I have to force the switching by connecting to the secondary device and applying the command
#vsenv 8
#clusterXL_admin down

And automatically recover connection. When generating these random commutations, should it not be transparent ?? Or what is the logic of operation?

The VSX CLuster mode is Single VS FailOver.

Thanks in advance.

Regards,

PhoneBoy · ‎2017-08-08

Failovers should generally be transparent unless you're trying to force it to failover for one reason or another.

What version of code are we talking here (with Jumbo Hotfix if installed)?

Have you, by chance, opened a ticket with TAC on this?

Michael_Briceño · ‎2017-08-08

Hi Daemon
We currently have VSX deployed in a cluster, and we have 5 Virtual System, 4 Virtual Sw and 1 Virtual Router. All these instances share LACP links, each with their respective VLANs to work. The problem is that the VR switches randomly and becomes active on the second Cluster member causing the connection to be lost.
If I do not do anything about it, the cut lasts approximately 10 to 15 minutes, after this time it returns as active to the primary member where it is always.

But being the loss of connection for 10 to 15 min, I find it necessary to manually switch it to return as active to the primary member. I do not know if I let myself be understood.

Currently, the JH installed is as follows:

[Expert@fw01:0]# cpinfo -y all

This is Check Point CPinfo Build 914000176 for GAIA
[FW1]
HOTFIX_R77_30
HOTFIX_GEYSER_PINK5_HF
HOTFIX_R77_30_HF5_PINK_PERF
HOTFIX_R77_30_HF5_PINK_PERF_003
HOTFIX_GEYSER_HF_BASE_861
HOTFIX_R77_30_JUMBO_HF Take: 216

FW1 build number:
This is Check Point's software version R77.30 - Build 048
kernel: R77.30 - Build 048

[SecurePlatform]
HOTFIX_GAIA_GEYSER_PINK5_HF
HOTFIX_R77_30_JUMBO_HF Take: 216

[PPACK]
HOTFIX_R77_30
HOTFIX_GAIA_GEYSER_PINK_HF
HOTFIX_R77_30_JUMBO_HF Take: 216

[CPinfo]
No hotfixes..

[CVPN]
HOTFIX_R77_30
HOTFIX_R77_30_HF5_PINK_PERF
HOTFIX_R77_30_JUMBO_HF Take: 216

[CPUpdates]
BUNDLE_R77_30_JUMBO_HF Take: 216

[DIAG]
HOTFIX_R77_30

[rtm]
No hotfixes..

The tac indicated that we expanded the table of concurrent sessions since it was what you saw in the CPInfo that we sent, but anyway, we continue with these random incidents.

Regards,

Michael Briceño.

_Val_ · ‎2017-08-08

Do you use VSLS or not?

Michael_Briceño · ‎2017-08-08

Hello Valeri,

I not use VSLS.

PhoneBoy · ‎2017-08-08

Right, but individual VS failover can only happen if you are using VSLS.

If you are running Chassis HA mode, then ALL VSes will fail over, not just one.

Have you opened a TAC case on this?

_Val_ · ‎2017-08-08

Dameon and Michael, although usually ClusterXL per VS failover is automatically configured when building a VSLS cluster, it can also be manually changed after the fact through cpconfig.

To correct the issue, I suggest the following set of actions:

1. make sure VSX cluster is not configured in VSLS mode. If it was set to VSLS by mistake, this should should be corrected with "vsx_util convert cluster".

2. if VSX cluster object is configured in HA mode, one should disable per VS failover through cpconfig on each cluster member and then reboot.

In any case, the root cause is per VS failover. It should not be in place, if VRs are part of the topology. The only legitimate use of this feature is with VSLS

Michael_Briceño · ‎2017-08-08

Thanks Valeri,
I will review what you indicate and validate the cluster mode of both computers and thus avoid any configuration errors.

_Val_ · ‎2017-08-08

Just the last note. Even after you put correct config in place and will be in a supported situation, there is still a chance of an external network issue causing failover. If you experience failover after configuring proper ClusterXL mode, look for an external problem

Michael_Briceño · ‎2017-08-08

That is, we have an open case for the review of it. We are waiting for your answers.

_Val_ · ‎2017-08-08

It sounds like you have VSX configured in VSLS config while using a Virtual Router to share a physical interface. VSLS is not supported with Virtual Routers. You need to use a Virtual Switch instead.

If you are NOT using VSLS, then failover per VS/router would not happen. You should observe full failover from one physical cluster member to another.

It would help to get more details about your issue. The original description is a bit vague.

Michael_Briceño · ‎2017-08-08

Hello Valeri,

Thank you for your comments. To give more details, I am not currently using VSLS. CPHA stat adj.

[Expert @ fw01: 0] # cphaprob stat

Cluster Mode: Single VS failover

Number Unique Address Assigned Load State

1 (local) 10.10.10.1 100% Active
2 10.10.10.2 0% Standby

Cluster name: FWCL

Virtual Devices Status on each Cluster Member
============================================================

The Virtual Router is only using it precisely because of the need to enforce a PBR that the network needs.

It is not clear to me what the failover behavior is in this mode, since only the VR with ID 8 is the only one that switches from time to time and is not presenting general hardware failure or disconnection of some cable to do The failover. The strange thing is that only the VR is the one that commutates, being something like this:

So, my question came up, what was the failover logic in this cluster mode: Single VS failover

I hope I have been clear in my explanation.

Regards,

Michael Briceño.

_Val_ · ‎2017-08-08

No, sir, you ARE using VSLS, according to this output. Even if each VS is active on the same cluster member, VSLS is in play. Without VSLS you can only see Active/Standby per physical cluster member. You are in a not-supported configuration, and it is causing your router to failover.

To fix it, you have to use "vsx_util convert cluster" on the management and reconfigure VSX cluster from VSLS to HA mode. You will have to reboot VSX cluster members afterwards. Refer to VSX admin guide for more details.

_Val_ · ‎2017-08-08

As I cannot edit the previous reply,

Cluster Mode: Single VS failover - this is VSLS by definition.

Michael_Briceño · ‎2017-08-14

Hello Valeri,

I did what you recommended, and in short, the Single VS FailOver mode is practically VSLS, therefore, the Virtual Router oscillated at random for not being supported in this mode.

We have changed the mode of the CLuster through vsx_util and we already have the High Availability mode working. Likewise, we will be monitoring the platform and see that these sudden falls do not occur.

Just as a note, when you create the VSX CLuster objects, there are the ClusterXL options:
Checkpoint Secure Platform (ClusterXL)
Check POint ClusterXL Virtual System Load Sharing
Crossbeam System
Check Point IPSO VRRP

The High Availability option does not appear.

Regards,

Michael Briceño.

_Val_ · ‎2017-08-14

Well, not exactly. The first option: Checkpoint Secure Platform (ClusterXL) is classic HA. there are only two for Check Point proper, so it is either VSLS or not 🙂

Glad it worked out for you.

bochsmann · ‎2017-08-10

Hello,

I think Valerie not not fully correct, I guess you have enabled single VS failover in cpconfig (new default in R77.30 is enabled while it was diabled before).

This value set to enabled means only a single broken VS does the failover and disabled means the machine does failover for all VSes, regadless of VSLS....

in HA you MUST disable this as otherwise you will run into the trouble you described.

HTH

Bernd Ochsmann

Are you a member of CheckMates?

failover logic operation in VSX?