- CheckMates
- :
- Products
- :
- CloudMates Products
- :
- Cloud Network Security
- :
- Re: Troubleshooting Azure HA cluster failover and ...
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Are you a member of CheckMates?
×- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Troubleshooting Azure HA cluster failover and the API call
We are deploying a new cluster for a customer and we wanted to test failover. I have tested this in a test Azure account previously and this worked.
I built another test environment today and I am showing the same symptoms as the customer.
Everything seems to deploy fine, can establish SIC with management server and install policy etc. However, if we failover, either by running clusterXL_admin down or by powering off the active gateway. A failover is triggered within Check Point, i.e., cphaprob stat on the secondary gateway shows it is now active but the cluster-vip IP is still showing in Azure on the other gateway. This has not moved across to the second gateway.
This suggests to me that either the gateway isn't triggering the API call or the API call is triggered but not actioned and I wonder how we troubleshoot this.
Was hoping to get some help from the community before going through TAC because you have to do the initial hoop jumping before you get to someone who knows cloud.
Thanks
Scott
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'd start with running $FWDIR/scripts/azure_ha_test.py and see what it says.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
So the output I get is: -
Image version is: harry_main-294-801-GW
Reading configuration file...
Setting api versions for "ha" solution
ARM versions are: {
"resources": "?api-version=2019-07-01"
}
Error:
The hostname xxxxfw002 should be either 'xxxxfw01' or 'xxxxfw02'
[Expert@xxxxfw002:0]#
What is it comparing it to? The name in the SmartConsole or the name in Azure?
Must be Azure as I have checked SmartConsole and it has the fw002 object name matching the fw002 hostname on GAIA.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes it is checking the name of the VM in the azure portal.
If you deployed the ARM template and manually did some changes to the hostname you're in for some fun changes in the azure_ha_test.py and azure_had.py script on the gateways
This is part of the script where it (hardcoded) looks for cluster_name+1 as the name of the first member
if conf['hostname'] not in {cluster_name + '1', cluster_name + '2'}:
Please also check
It explains manual testing without executing the failover
And the important part about the naming convention (because of the hardcoded scripts):
Naming Constraints
Do not change the name of any resources.
Cluster Members VM names must match the Cluster name with a suffix of '1' and '2'.
Network Interface names must match the Cluster Member VM names with a suffix of '-eth0' and '-eth1'.
The IP address of the cluster has to match the configuration file.
By default it should match the cluster name.
