- CheckMates
- :
- Products
- :
- CloudMates Products
- :
- Cloud Network Security
- :
- Discussion
- :
- Re: Azure based Vsec R80.10 Cluster - Secondary no...
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Are you a member of CheckMates?
×- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Azure based Vsec R80.10 Cluster - Secondary node issue
Hi, I have deployed an R80.10 Checkpoint Cluster into Microsoft Azure. Cluster XL is working (active/standby) and I can manage and push policies to both cluster nodes (inbound connectivity ok)
However when running the azure test script to check connectivity to Azure to make UDR and cluster IP changes the secondary node can't resolve DNS. Primary node works fine. If I try and ping 8.8.8.8 for example, I get no response as if the node has no outbound Internet connectivity not just a DNS issue. This is very odd because I can manage the cluster nodes and cluster XL is working but because the secondary node has no outbound connectivity failover is not working and also it can't contact checkpoint.com to get its contracts status so its complaining about licensing. Any ideas?
Output from the secondary node below which is unsuccessful.
[Expert@vsec-node-2]# $FWDIR/scripts/azure_ha_test.py
Image version is: ogu_GAR1-289
Reading configuration file...
Testing if DNS is configured...
- Primary DNS server is: 8.8.8.8
Testing if DNS is working...
Error:
Failed to resolve login.windows.net
!
[Expert@vsec-node-2]# ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
--- 8.8.8.8 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2001ms
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
For anyone else who experiences this issue. The cause was that the secondary node was natting its own traffic behind the cluster IP address. The Cluster IP was assigned to the Primary node so assymetric routing was occurring.
The solution was a "no nat" rule on both vsec nodes so that traffic originating from itself is not hidden behind the cluster IP address but behind its own public IP address. I've not had to do this on my R77.30 vsecs so looks like a missing step from the R80.10 vsec guide.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi:
How much time does failover take? SK says under 2 minutes but I think that's too long .
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Unfortunately failover is around 2 minutes.
ClusterXL fails over in seconds but the API calls to Azure to change routes and Cluster IP take their time. UDRs change pretty quickly in fairness but the disassociation of the cluster IP from the primary node and association to secondary node is the main delay
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have observed that even after 30-40 mins the disassociation of the cluster IP from the primary node and association to secondary node is not happening post Fail-Over.
Th Cluster Testing Script $FWDIR/scripts/azure_ha_test.py, Result in All tests were successful!
Everything as in:-
1. DNS
2. login.windows.net:443
3. Interfaces
4. Credentials
5. Route Tables
6. Load Balancers
7. Azure interface configuration
Any Suggestions??
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What Checkpoint version are you running? Have you run the test script on both nodes? I've only had failover/failback issues on R77.30 Azure based clusters, not had enough time with R80.10 yet. Typically this was because I had modified the inbound NAT rules on the Azure load balancer to include additional services such as SNMP which bizarrely seem to cause an issue or another Azure admin had restricted Public IPs at the subscription level so I couldn't associate the public IP to the secondary node.
You can manually associate the IP once it has disassociated. Not ideal but if you've got a 30-40 min outage and need to restore service it is possible.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Checkpoint R80.10.
The Cluster Testing Script $FWDIR/scripts/azure_ha_test.py, Result in All tests were successful! on both nodes.
The Inbound NAT Rules are not getting updated and the disassociation of the cluster IP from the primary node and association to secondary node is not happening post Fail-Over.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Have done some troubleshooting ass follows:-
1. Removed all LB Inbound NAT Rules and did fail-over.
Result:- Success !! UDR's getting updated and points at M2 as next hop. Cluster VIP gets diassociate from M1 and Associates to M2 automatically.
2. Added 1 Inbound NAT Rule on LB. and did fail-over.
Result:- Failure !!, UDR's getting updated and points at M2 as next hop.
Cluster VIP does not move to M2
On Azure we can see activity log:-
- Operation nameWrite NetworkInterfaces FAILED
- Time stampFri Apr 27 2018 13:08:43 GMT+0530 (India Standard Time)
- Event initiated bycheck-point-cluster-ha-failover
- Error codeInvalidResourceReference
- MessageResource /subscriptions/d3e8c785-de15-4ba5-8afb-953e277061a2/resourceGroups/CPClust_RG/providers/Microsoft.Network/networkInterfaces/CPClust1-eth0/ipConfigurations/cluster-vip referenced by resource /subscriptions/d3e8c785-de15-4ba5-8afb-953e277061a2/resourceGroups/CPClust_RG/providers/Microsoft.Network/virtualNetworks/VNET01/subnets/Frontend was not found. Please make sure that the referenced resource exists, and that both resources are in the same region.ANY1 has encountered such error. I'm facing the same in 2 deployments.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There is a new sk125435 for this problem. It says that a new template will be published in a week or so.
There is a workaround that can be done today with a fixed azure_had.py that can be requested from TAC.
Arnfinn
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Arnfinn Strand
Checking with the TAC. But there's a little delay from them.
Do you have any source of the fixed azure_ha.py script ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Got the fixed Script. Due to changes in the API permissions on Azure, new script need to be loaded on both the Cluster Members.
Also the Inbound NAT rules need to be pointed at Active Member's private IP and not the Cluster VIP.
Following the above, 2 NAT rules need to be implemented in Dashboard which receive the traffic on Member Ip's of both Cluster Member (when either of them are active) and not the Cluster VIP.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Only source I know is TAC. Sorry
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I think that's too slow for mission critical services, I would rather suggest our customer use higher level vm size for check point.