- CheckMates
- :
- Products
- :
- CloudMates Products
- :
- Cloud Network Security
- :
- Discussion
- :
- Re: I am curious about the fail-over logic in the ...
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Are you a member of CheckMates?
×- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am curious about the fail-over logic in the Azure environment.
Hi
I am curious about the fail-over logic in the Azure environment.
To test the customer configuration, I deployed cloudguard clusterXL.
And most things worked as intended, but during fail-over, things didn't work as I intended.
I created a VM server under the firewall backend and I executed the ping command with destination 8.8.8.8
FW_A confirmed that communication was normal, but a problem occurred when fail-over.
If you look at the tcpdump results, you can see that request packets come in to FW_B and request packets are sent out again to the firewall's VIP.
But I can't see the response packet at all.
I waited about 10 minutes considering the nature of the Azure environment, but the result was the same.
I have not configured LB on the frontend based on the firewall. Could this be a problem?
If there is any other configuration you need, please let us know.
The test environment I have configured is below:
CP - Frontend VIP: 10.4.0.7
CP - Frontend FW_A: 10.4.0.5
CP - Frontend FW_B: 10.4.0.6
CP - Backend FW_A: 10.4.1.6
CP - Backend FW_B: 10.4.1.7
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Legend
Found the cause
There was a problem with the application in the environment.
API communication did not occur between Check Point and Azure due to a client secret issue.
Thanks for your help
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi ChoiYunSoo,
The topology you are describing is not correct. Did you deploy the cluster using the market place template for high availability? you should have a frontend loaf balancer and subnet by default.
A few things to note:
1. The backend subnet should only have the Check Point VM interfaces. you need to deploy a separate subnet for the VM server.
2. You need to implement hide NAT scenario for the outgoing traffic. per the admin guide
Please refer to the topology and instructions of the admin guide for this:
Network (checkpoint.com)
Hope this helps
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for your reply
There are parts of your advice that I would like to respond to.
1. I deployed the checkpoint cluster from the marketplace.
2. Configured a separate subnet for the vm server.
3. Configured Hide NAT for the subnet of the VM server
And as per your advice, I added both firewall IP and VIP to Frontend-LB and configured it, but the symptom is the same.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi ChoiYunSoo,
The outbound response should be directed to the public IP of the member originating the request.
Can you confirm whether the HA was set up with public IPs?
Did you also run tcpdump on the second member? If you received a response on the standby member, I suggest verifying whether the outbound response unexpectedly passes through the external LB.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Rivka-Strilitz
Thanks you for reply
I tried to add the firewall's public VIP to the frontend LB IP as you said, but it seems I can't add it.
Are the settings below correct what you were trying to tell me?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Everything CP folks said is correct. Ping me if you need help, I have perfectly working cluster in Azure lab, we can do any tests you like.
Best,
Andy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Legend
Thanks you for reply
Below is the current configuration of my test lab.
The only thing I think is unique is that FW_A shows the Frontend, Backend, and VIP interfaces, but FW_B does not show the VIP interface.
Is there anything I did wrong in the configuration below?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the details. I will review in the morning and update.
Best,
Andy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks you for help
To help you understand, we will update the configuration and also update the checkpoint settings.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Since it may take some time to review all this, in the meantime, can you run below from both members and post the output. Below is my lab. Also, SUPER IMPORTANT...MAKE SURE anti-spoofing is DISABLED, as its not supported to have it on on any interface, and it would also cause policy failure.
Andy
master:
[Expert@cpazurecluster1:0]# cphaprob state
Cluster Mode: High Availability (Active Up) with IGMP Membership
ID Unique Address Assigned Load State Name
1 (local) 10.5.1.5 100% ACTIVE CPAZUREcluster1
2 10.5.1.6 0% STANDBY CPAZUREcluster2
Active PNOTEs: None
Last member state change event:
Event Code: CLUS-114904
State change: ACTIVE(!) -> ACTIVE
Reason for state change: Reason for ACTIVE! alert has been resolved
Event time: Sat Feb 10 16:01:44 2024
Cluster failover count:
Failover counter: 0
Time of counter reset: Sat Feb 10 15:59:48 2024 (reboot)
[Expert@cpazurecluster1:0]# cd /opt/CPsuite-R81.20/fw1/scripts/azure_
azure_conf.py azure_ha_globals.py azure_had.py
azure_ha_cli.py azure_ha_test.py
[Expert@cpazurecluster1:0]# cd /opt/CPsuite-R81.20/fw1/scripts/azure_ha_test.py
-bash: cd: /opt/CPsuite-R81.20/fw1/scripts/azure_ha_test.py: Not a directory
[Expert@cpazurecluster1:0]# cd /opt/CPsuite-R81.20/fw1/scripts/
[Expert@cpazurecluster1:0]# ./azure_ha_
azure_ha_cli.py azure_ha_test.py
[Expert@cpazurecluster1:0]# ./azure_ha_test.py
Setting api versions for "ha" solution
ARM versions are: {
"resources": "?api-version=2019-07-01"
}
Testing if DNS is configured...
- Primary DNS server is: 168.63.129.16
Testing if DNS is working...
- DNS resolving test was successful
Testing connectivity to login.windows.net:443...
Testing ClusterXL parameters...
Testing cluster interface configuration...
Testing credentials...
Getting information about the environment...
Getting information about the VM cpazurecluster1...
Id : /subscriptions/40c8d051-e4b3-45ea-b165-451d47e33fec/resourceGroups/CP-cluster/providers/Microsoft.Network/networkInterfaces/CPAZUREcluster1-eth0
Subscription : 40c8d051-e4b3-45ea-b165-451d47e33fec
Resource group: CP-cluster
Type : Microsoft.Network/networkInterfaces
Name : CPAZUREcluster1-eth0
Attempting to read - [OK]
Attempting to write - [OK]
Getting information about the VM cpazurecluster2...
Id : /subscriptions/40c8d051-e4b3-45ea-b165-451d47e33fec/resourceGroups/CP-cluster/providers/Microsoft.Network/networkInterfaces/CPAZUREcluster2-eth0
Subscription : 40c8d051-e4b3-45ea-b165-451d47e33fec
Resource group: CP-cluster
Type : Microsoft.Network/networkInterfaces
Name : CPAZUREcluster2-eth0
Attempting to read - [OK]
Attempting to write - [OK]
Testing cluster public IP address...
Id : /subscriptions/40c8d051-e4b3-45ea-b165-451d47e33fec/resourcegroups/CP-cluster/providers/Microsoft.Network/publicIPAddresses/CPAZUREcluster
Subscription : 40c8d051-e4b3-45ea-b165-451d47e33fec
Resource group: CP-cluster
Type : Microsoft.Network/publicIPAddresses
Name : CPAZUREcluster
Attempting to read - [OK]
Verifying Azure interface configuration...
- Interface eth0: local IP address = 10.5.0.4, peer IP address = 10.5.0.5
- Interface eth1: local IP address = 10.5.1.5, peer IP address = 10.5.1.6
- Interface vpnt7: local IP address = 10.5.0.4, peer IP address = 10.5.0.5
All tests were successful!
[Expert@cpazurecluster1:0]#
**************************************************************
backup:
[Expert@cpazurecluster2:0]# cphaprob state
Cluster Mode: High Availability (Active Up) with IGMP Membership
ID Unique Address Assigned Load State Name
1 10.5.1.5 100% ACTIVE CPAZUREcluster1
2 (local) 10.5.1.6 0% STANDBY CPAZUREcluster2
Active PNOTEs: None
Last member state change event:
Event Code: CLUS-114802
State change: INIT -> STANDBY
Reason for state change: There is already an ACTIVE member in the cluster (member 1)
Event time: Sat Feb 10 16:11:31 2024
Cluster failover count:
Failover counter: 0
Time of counter reset: Sat Feb 10 15:59:48 2024 (reboot)
[Expert@cpazurecluster2:0]# cd /opt/CPsuite-R81.20/fw1/scripts/
[Expert@cpazurecluster2:0]# ./azure_ha_test.py
Setting api versions for "ha" solution
ARM versions are: {
"resources": "?api-version=2019-07-01"
}
Testing if DNS is configured...
- Primary DNS server is: 168.63.129.16
Testing if DNS is working...
- DNS resolving test was successful
Testing connectivity to login.windows.net:443...
Testing ClusterXL parameters...
Testing cluster interface configuration...
Testing credentials...
Getting information about the environment...
Getting information about the VM cpazurecluster2...
Id : /subscriptions/40c8d051-e4b3-45ea-b165-451d47e33fec/resourceGroups/CP-cluster/providers/Microsoft.Network/networkInterfaces/CPAZUREcluster2-eth0
Subscription : 40c8d051-e4b3-45ea-b165-451d47e33fec
Resource group: CP-cluster
Type : Microsoft.Network/networkInterfaces
Name : CPAZUREcluster2-eth0
Attempting to read - [OK]
Attempting to write - [OK]
Getting information about the VM cpazurecluster1...
Id : /subscriptions/40c8d051-e4b3-45ea-b165-451d47e33fec/resourceGroups/CP-cluster/providers/Microsoft.Network/networkInterfaces/CPAZUREcluster1-eth0
Subscription : 40c8d051-e4b3-45ea-b165-451d47e33fec
Resource group: CP-cluster
Type : Microsoft.Network/networkInterfaces
Name : CPAZUREcluster1-eth0
Attempting to read - [OK]
Attempting to write - [OK]
Testing cluster public IP address...
Id : /subscriptions/40c8d051-e4b3-45ea-b165-451d47e33fec/resourcegroups/CP-cluster/providers/Microsoft.Network/publicIPAddresses/CPAZUREcluster
Subscription : 40c8d051-e4b3-45ea-b165-451d47e33fec
Resource group: CP-cluster
Type : Microsoft.Network/publicIPAddresses
Name : CPAZUREcluster
Attempting to read - [OK]
Verifying Azure interface configuration...
- Interface eth0: local IP address = 10.5.0.5, peer IP address = 10.5.0.4
- Interface eth1: local IP address = 10.5.1.6, peer IP address = 10.5.1.5
- Interface vpnt7: local IP address = 10.5.0.5, peer IP address = 10.5.0.4
All tests were successful!
[Expert@cpazurecluster2:0]#
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Also @ChoiYunSoo , can you run below when other member thats having issues is active.
from expert:
curl_cli -k google.com
ping 8.8.8.8
ip r g 8.8.8.8
clish -c "show route"
Please compare with one that works to ensure 100% it is the same.
Best,
Andy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for your active help.
Here are the answers to your inquiries
* FW_A (Standby)
[Expert@northclu11:0]# cphaprob stat
Cluster Mode: High Availability (Active Up) with IGMP Membership
ID Unique Address Assigned Load State Name
1 (local) 10.4.1.6 0% STANDBY FW_A
2 10.4.1.7 100% ACTIVE FW_B
Active PNOTEs: None
Last member state change event:
Event Code: CLUS-114802
State change: DOWN -> STANDBY
Reason for state change: There is already an ACTIVE member in the cluster (member 2)
Event time: Fri Feb 23 03:59:40 2024
Last cluster failover event:
Transition to new ACTIVE: Member 1 -> Member 2
Reason: ADMIN_DOWN PNOTE
Event time: Fri Feb 23 03:59:36 2024
Cluster failover count:
Failover counter: 1
Time of counter reset: Fri Feb 23 03:53:33 2024 (reboot)
[Expert@northclu11:0]#
[Expert@northclu11:0]#
[Expert@northclu11:0]# ./azure_ha_test.py
Setting api versions for "ha" solution
ARM versions are: {
"resources": "?api-version=2019-07-01"
}
Testing if DNS is configured...
- Primary DNS server is: 168.63.129.16
Testing if DNS is working...
- DNS resolving test was successful
Testing connectivity to login.windows.net:443...
Testing ClusterXL parameters...
Testing cluster interface configuration...
Testing credentials...
Getting information about the environment...
Getting information about the VM northclu11...
Id : /subscriptions/1efe27ac-5c1b-497b-bc60-6510b07d1c92/resourceGroups/North_CLU_1/providers/Microsoft.Network/networkInterfaces/NorthClu11-eth0
Subscription : 1efe27ac-5c1b-497b-bc60-6510b07d1c92
Resource group: North_CLU_1
Type : Microsoft.Network/networkInterfaces
Name : NorthClu11-eth0
Attempting to read - [OK]
Attempting to write - [Forbidden]
Error:
HTTP/1.1 403 Forbidden
b'{"error":{"code":"LinkedAuthorizationFailed","message":"The client \'b7a8cf26-f859-41aa-b8af-f103f9a14aa9\' with object id \'b7a8cf26-f859-41aa-b8af-f103f9a14aa9\' has permission to perform action \'Microsoft.Network/networkInterfaces/write\' on scope \'/subscriptions/1efe27ac-5c1b-497b-bc60-6510b07d1c92/resourceGroups/North_CLU_1/providers/Microsoft.Network/networkInterfaces/NorthClu11-eth0\'; however, it does not have permission to perform action(s) \'Microsoft.Network/virtualNetworks/subnets/join/action\' on the linked scope(s) \'/subscriptions/1efe27ac-5c1b-497b-bc60-6510b07d1c92/resourceGroups/ODL-checkpoint_v1-72163-01/providers/Microsoft.Network/virtualNetworks/North-Hub/subnets/VMSS-FrontEnd\' (respectively) or the linked scope(s) are invalid."}}'
[Expert@northclu11:0]#
[Expert@northclu11:0]#
[Expert@northclu11:0]# curl_cli -k google.com
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="http://www.google.com/">here</A>.
</BODY></HTML>
[Expert@northclu11:0]#
[Expert@northclu11:0]#
[Expert@northclu11:0]# clish -c "show route"
Codes: C - Connected, S - Static, R - RIP, B - BGP (D - Default),
O - OSPF IntraArea (IA - InterArea, E - External, N - NSSA),
A - Aggregate, K - Kernel Remnant, H - Hidden, P - Suppressed,
NP - NAT Pool, U - Unreachable, i - Inactive
S 0.0.0.0/0 via 10.4.0.1, eth0, cost 0, age 659
S 10.0.0.0/8 via 10.4.1.1, eth1, cost 0, age 659
S 10.4.0.0/16 via 10.4.1.1, eth1, cost 0, age 659
C 10.4.0.0/24 is directly connected, eth0
C 10.4.1.0/24 is directly connected, eth1
C 127.0.0.0/8 is directly connected, lo
S 168.63.129.16/32 via 10.4.0.1, eth0, cost 0, age 659
S 169.254.169.254/32 via 10.4.0.1, eth0, cost 0, age 659
S 172.16.0.0/12 via 10.4.1.1, eth1, cost 0, age 659
S 192.168.0.0/16 via 10.4.1.1, eth1, cost 0, age 659
[Expert@northclu11:0]#
FW_B (Active)
[Expert@northclu12:0]# cphaprob stat
Cluster Mode: High Availability (Active Up) with IGMP Membership
ID Unique Address Assigned Load State Name
1 10.4.1.6 0% STANDBY FW_A
2 (local) 10.4.1.7 100% ACTIVE FW_B
Active PNOTEs: None
Last member state change event:
Event Code: CLUS-114704
State change: STANDBY -> ACTIVE
Reason for state change: No other ACTIVE members have been found in the cluster
Event time: Fri Feb 23 03:59:36 2024
Last cluster failover event:
Transition to new ACTIVE: Member 1 -> Member 2
Reason: ADMIN_DOWN PNOTE
Event time: Fri Feb 23 03:59:36 2024
Cluster failover count:
Failover counter: 1
Time of counter reset: Fri Feb 23 03:53:33 2024 (reboot)
[Expert@northclu12:0]# cd /opt/CPsuite-R81.10/fw1/scripts/
[Expert@northclu12:0]# ./azure_ha_test.py
Setting api versions for "ha" solution
ARM versions are: {
"resources": "?api-version=2019-07-01"
}
Testing if DNS is configured...
- Primary DNS server is: 168.63.129.16
Testing if DNS is working...
- DNS resolving test was successful
Testing connectivity to login.windows.net:443...
Testing ClusterXL parameters...
Testing cluster interface configuration...
Testing credentials...
Getting information about the environment...
Getting information about the VM northclu12...
Id : /subscriptions/1efe27ac-5c1b-497b-bc60-6510b07d1c92/resourceGroups/North_CLU_1/providers/Microsoft.Network/networkInterfaces/NorthClu12-eth0
Subscription : 1efe27ac-5c1b-497b-bc60-6510b07d1c92
Resource group: North_CLU_1
Type : Microsoft.Network/networkInterfaces
Name : NorthClu12-eth0
Attempting to read - [OK]
Attempting to write - [Forbidden]
Error:
HTTP/1.1 403 Forbidden
b'{"error":{"code":"LinkedAuthorizationFailed","message":"The client \'08b7ff4e-a0e2-462e-a85c-d5dea401b99c\' with object id \'08b7ff4e-a0e2-462e-a85c-d5dea401b99c\' has permission to perform action \'Microsoft.Network/networkInterfaces/write\' on scope \'/subscriptions/1efe27ac-5c1b-497b-bc60-6510b07d1c92/resourceGroups/North_CLU_1/providers/Microsoft.Network/networkInterfaces/NorthClu12-eth0\'; however, it does not have permission to perform action(s) \'Microsoft.Network/virtualNetworks/subnets/join/action\' on the linked scope(s) \'/subscriptions/1efe27ac-5c1b-497b-bc60-6510b07d1c92/resourceGroups/ODL-checkpoint_v1-72163-01/providers/Microsoft.Network/virtualNetworks/North-Hub/subnets/VMSS-FrontEnd\' (respectively) or the linked scope(s) are invalid."}}'
[Expert@northclu12:0]#
[Expert@northclu12:0]#
[Expert@northclu12:0]# curl_cli -k google.com
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="http://www.google.com/">here</A>.
</BODY></HTML>
[Expert@northclu12:0]# clish -c "show route"
Codes: C - Connected, S - Static, R - RIP, B - BGP (D - Default),
O - OSPF IntraArea (IA - InterArea, E - External, N - NSSA),
A - Aggregate, K - Kernel Remnant, H - Hidden, P - Suppressed,
NP - NAT Pool, U - Unreachable, i - Inactive
S 0.0.0.0/0 via 10.4.0.1, eth0, cost 0, age 667
S 10.0.0.0/8 via 10.4.1.1, eth1, cost 0, age 667
S 10.4.0.0/16 via 10.4.1.1, eth1, cost 0, age 667
C 10.4.0.0/24 is directly connected, eth0
C 10.4.1.0/24 is directly connected, eth1
C 127.0.0.0/8 is directly connected, lo
S 168.63.129.16/32 via 10.4.0.1, eth0, cost 0, age 667
S 169.254.169.254/32 via 10.4.0.1, eth0, cost 0, age 667
S 172.16.0.0/12 via 10.4.1.1, eth1, cost 0, age 667
S 192.168.0.0/16 via 10.4.1.1, eth1, cost 0, age 667
[Expert@northclu12:0]#
[Expert@northclu12:0]#
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Let me examine this later carefully and will update.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry, just drove to the office today, had a quick look...to me, this apepars 100% right. Here is my question...when the problematic fw is active, are you having issues connecting outbound, period, OR only certain apps dont work?
Best,
Andy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks you for reply
To summarize the problem situation, if you fail-over from FW_A to FW_B, all communication will not work.
Backend LB recognizes the fail-over situation and sends traffic to FW_B.
However, the problem situation is that FW_A continues to recognize VIP in the frontend interface.
When the server pings the firewall Real IP and VIP, VIP traffic is delivered to FW_A even though FW_B is Active.
I believe this is the core issue
FW_B receives traffic from the server and forwards the traffic by NATing the source IP to the VIP, but since FW_A owns the VIP, FW_B cannot receive traffic.
However, when tcpdump is performed on FW_A, there is no traffic received from FW_A either.
I suspect that there may be a problem with the API call to Azure to transfer the VIP when fa/ilover is done.
However, I cannot accurately determine whether there is a problem with my settings or a checkpoint bug.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I dont think its cp bug, sounds like something with config in Azure.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I think so too. The probability of it being a checkpoint bug is very small.
I suspect that I may have configured something incorrectly in the Azure environment.
but i don't know what it is
Please let me know if there are any mistakes or additional parts I have set up.
I welcome your response at any time.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I agree 100%. The only way for me to tell would be if we did remote session...hard to say for sure based on screenshots you sent : - (
Andy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Legend
Found the cause
There was a problem with the application in the environment.
API communication did not occur between Check Point and Azure due to a client secret issue.
Thanks for your help
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Excellent!
Andy