HA Failover in Azure

venkata_marutur · ‎2018-03-21

Hello Team

I would like to request Checkpoint to provide more sk's with different scenarios specially regarding HA in Azure.

The only sk that most of the guys point to is "How to deploy checkpoint cluster in Azure" which is a good platform to cover most of the stuff (Because I see lot of folks running into issues with creds or service account related issues) but there are some scenarios which the sk does not cover.

Example: I deployed a vsec cluster in Azure according to the sk and my HA test script came back with "All tests are successful". One day suddenly the service account used for the HA has initiated an API call to Azure to point all the routes to the standby node and standby node is still in standby according to cphaprob state. So all the traffic stopped passing the firewall. I dont know the command like clusterxl_admin up in an Azure enviroment, so I had to change the priority in the dashboard and push policy.

My questions are:

1) Why the API call was triggered automatically ? what caused it?

2) Why did the failover fail even after the tests are successful ?

3) Is there any command to generate a failover in Azure gateways (Except shutting down an interface) ?

Please correct me if I am wrong.

Thanks.

Martin_Valenta · ‎2018-03-21

1) API calls are triggered when, clusterXL detect change between active/standby

2) I've never seen an issue that test would be on 100 % and clusterXL would not trigger API calls. Probably something changed between time when you made a test and time when failover happened.

3)In Expert mode:

clusterXL_admin down -p;sleep 2;cphaprob stat;clusterXL_admin up -p;sleep 2;cphaprob stat

that will trigger failover and move member to "down" state, wait for 2 seconds, give you status after "down" registration and then put member back to "standby" status and again show you cphaprob status

venkata_marutur · ‎2018-03-22

Thanks for the info Martin!

1) I want to know what change did the clusterXL see to trigger the API call.

2) It did trigger the API call successfully and the API also changed the routes to standby node successfully but the standby node did not change from standby to active, that is my question.

3) I've tried clusterxl_admin up on the standby and looks like there is no such command. Also got to know form other sources that shutting down an interface is the only option in azure, no commands yet.

Thanks.

Martin_Valenta · ‎2018-03-22

1) it's based on cluster member state, there is daemon which monitor cluster state and if it see that failover occured it will initiate API calls

2) that should not happen

3) you must do it in expert mode, otherwise from clish you can do only "set interface eth1 state off/on"

venkata_marutur · ‎2018-03-22

1) Checking azure_had.elg file ?

3) Yeah did that from expert mode, does not exist.

Martin_Valenta · ‎2018-03-22

1) elg file containt dump of jsons used for api calls and showing all errors related to api calls, you can enable debug on azure_ha.json file to get more details in that log file

3)

[Expert@gateway1:0]# fw ver
This is Check Point's software version R77.30 - Build 024
[Expert@gateway1:0]# cphaprob stat

Cluster Mode: High Availability (Active Up) with IGMP Membership

Number Unique Address Assigned Load State

1 (local) 10.8.104.196 100% Active
2 10.8.104.197 0% Standby

[Expert@gateway1:0]# clusterXL_admin down -p
Setting member to administratively down state ...
Member current state is Down
[Expert@gateway1:0]# cphaprob stat

Cluster Mode: High Availability (Active Up) with IGMP Membership

Number Unique Address Assigned Load State

1 (local) 10.8.104.196 0% Down
2 10.8.104.197 100% Active

[Expert@gateway1:0]# cat /etc/in-azure
gey_hvm-48-205.vhd

Are you a member of CheckMates?

HA Failover in Azure