Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
ChoiYunSoo
Contributor
Jump to solution

I am curious about the fail-over logic in the Azure environment.

Hi

 

I am curious about the fail-over logic in the Azure environment.

To test the customer configuration, I deployed cloudguard clusterXL.

And most things worked as intended, but during fail-over, things didn't work as I intended.

 

I created a VM server under the firewall backend and I executed the ping command with destination 8.8.8.8

FW_A confirmed that communication was normal, but a problem occurred when fail-over.

 

If you look at the tcpdump results, you can see that request packets come in to FW_B and request packets are sent out again to the firewall's VIP.

But I can't see the response packet at all.

I waited about 10 minutes considering the nature of the Azure environment, but the result was the same.

 

I have not configured LB on the frontend based on the firewall. Could this be a problem?

If there is any other configuration you need, please let us know.

 

The test environment I have configured is below:

 

CP - Frontend VIP: 10.4.0.7 

CP - Frontend FW_A: 10.4.0.5

CP - Frontend FW_B: 10.4.0.6

 

CP - Backend FW_A: 10.4.1.6

CP - Backend FW_B: 10.4.1.7

2024-02-20_16-42-39.png

 

 

 

0 Kudos
1 Solution

Accepted Solutions
ChoiYunSoo
Contributor

Hi Legend

 

Found the cause

There was a problem with the application in the environment.

API communication did not occur between Check Point and Azure due to a client secret issue.

Thanks for your help

 

 

View solution in original post

(1)
19 Replies
Edan_Leventhal
Employee
Employee

Hi ChoiYunSoo,
The topology you are describing is not correct. Did you deploy the cluster using the market place template for high availability? you should have a frontend loaf balancer and subnet by default.
A few things to note:
1. The backend subnet should only have the Check Point VM interfaces. you need to deploy a separate subnet for the VM server.
2. You need to implement hide NAT scenario for the outgoing traffic. per the admin guide

Please refer to the topology and instructions of the admin guide for this:
Network (checkpoint.com)
HA topology.png

Hope this helps

0 Kudos
ChoiYunSoo
Contributor

Thanks for your reply

 

There are parts of your advice that I would like to respond to.

 

1. I deployed the checkpoint cluster from the marketplace.

2. Configured a separate subnet for the vm server.

3. Configured Hide NAT for the subnet of the VM server

 

And as per your advice, I added both firewall IP and VIP to Frontend-LB and configured it, but the symptom is the same.

2024-02-21_16-24-53.png

2024-02-21_16-30-39.png

0 Kudos
Rivka-Strilitz
Employee
Employee

Hi ChoiYunSoo,
The outbound response should be directed to the public IP of the member originating the request.
Can you confirm whether the HA was set up with public IPs?
Did you also run tcpdump on the second member? If you received a response on the standby member, I suggest verifying whether the outbound response unexpectedly passes through the external LB.

0 Kudos
ChoiYunSoo
Contributor

 Hi Rivka-Strilitz

Thanks you for reply

 

I tried to add the firewall's public VIP to the frontend LB IP as you said, but it seems I can't add it.

Are the settings below correct what you were trying to tell me?

2024-02-22_10-23-47.png

0 Kudos
the_rock
Legend
Legend

Everything CP folks said is correct. Ping me if you need help, I have perfectly working cluster in Azure lab, we can do any tests you like.

Best,

 

Andy

0 Kudos
ChoiYunSoo
Contributor

Hi Legend 

Thanks you for reply

Below is the current configuration of my test lab.

The only thing I think is unique is that FW_A shows the Frontend, Backend, and VIP interfaces, but FW_B does not show the VIP interface.

Is there anything I did wrong in the configuration below?

1.png2.png

 

3.png

 

4.png5.png

 

 

 

0 Kudos
the_rock
Legend
Legend

Thanks for the details. I will review in the morning and update.

Best,

Andy

0 Kudos
ChoiYunSoo
Contributor

Thanks you for help
To help you understand, we will update the configuration and also update the checkpoint settings.

 

2024-02-22_13-45-18.png2024-02-22_13-37-47.png2024-02-22_13-36-43.png2024-02-22_13-36-49.png2024-02-22_13-44-13.png

0 Kudos
the_rock
Legend
Legend

Since it may take some time to review all this, in the meantime, can you run below from both members and post the output. Below is my lab. Also, SUPER IMPORTANT...MAKE SURE anti-spoofing is DISABLED, as its not supported to have it on on any interface, and it would also cause policy failure.

Andy

master:

 

[Expert@cpazurecluster1:0]# cphaprob state

Cluster Mode: High Availability (Active Up) with IGMP Membership

ID Unique Address Assigned Load State Name

1 (local) 10.5.1.5 100% ACTIVE CPAZUREcluster1
2 10.5.1.6 0% STANDBY CPAZUREcluster2


Active PNOTEs: None

Last member state change event:
Event Code: CLUS-114904
State change: ACTIVE(!) -> ACTIVE
Reason for state change: Reason for ACTIVE! alert has been resolved
Event time: Sat Feb 10 16:01:44 2024

Cluster failover count:
Failover counter: 0
Time of counter reset: Sat Feb 10 15:59:48 2024 (reboot)


[Expert@cpazurecluster1:0]# cd /opt/CPsuite-R81.20/fw1/scripts/azure_
azure_conf.py azure_ha_globals.py azure_had.py
azure_ha_cli.py azure_ha_test.py
[Expert@cpazurecluster1:0]# cd /opt/CPsuite-R81.20/fw1/scripts/azure_ha_test.py
-bash: cd: /opt/CPsuite-R81.20/fw1/scripts/azure_ha_test.py: Not a directory
[Expert@cpazurecluster1:0]# cd /opt/CPsuite-R81.20/fw1/scripts/
[Expert@cpazurecluster1:0]# ./azure_ha_
azure_ha_cli.py azure_ha_test.py
[Expert@cpazurecluster1:0]# ./azure_ha_test.py
Setting api versions for "ha" solution
ARM versions are: {
"resources": "?api-version=2019-07-01"
}
Testing if DNS is configured...
- Primary DNS server is: 168.63.129.16
Testing if DNS is working...
- DNS resolving test was successful
Testing connectivity to login.windows.net:443...
Testing ClusterXL parameters...
Testing cluster interface configuration...
Testing credentials...
Getting information about the environment...
Getting information about the VM cpazurecluster1...
Id : /subscriptions/40c8d051-e4b3-45ea-b165-451d47e33fec/resourceGroups/CP-cluster/providers/Microsoft.Network/networkInterfaces/CPAZUREcluster1-eth0
Subscription : 40c8d051-e4b3-45ea-b165-451d47e33fec
Resource group: CP-cluster
Type : Microsoft.Network/networkInterfaces
Name : CPAZUREcluster1-eth0
Attempting to read - [OK]
Attempting to write - [OK]
Getting information about the VM cpazurecluster2...
Id : /subscriptions/40c8d051-e4b3-45ea-b165-451d47e33fec/resourceGroups/CP-cluster/providers/Microsoft.Network/networkInterfaces/CPAZUREcluster2-eth0
Subscription : 40c8d051-e4b3-45ea-b165-451d47e33fec
Resource group: CP-cluster
Type : Microsoft.Network/networkInterfaces
Name : CPAZUREcluster2-eth0
Attempting to read - [OK]
Attempting to write - [OK]
Testing cluster public IP address...
Id : /subscriptions/40c8d051-e4b3-45ea-b165-451d47e33fec/resourcegroups/CP-cluster/providers/Microsoft.Network/publicIPAddresses/CPAZUREcluster
Subscription : 40c8d051-e4b3-45ea-b165-451d47e33fec
Resource group: CP-cluster
Type : Microsoft.Network/publicIPAddresses
Name : CPAZUREcluster
Attempting to read - [OK]
Verifying Azure interface configuration...
- Interface eth0: local IP address = 10.5.0.4, peer IP address = 10.5.0.5
- Interface eth1: local IP address = 10.5.1.5, peer IP address = 10.5.1.6
- Interface vpnt7: local IP address = 10.5.0.4, peer IP address = 10.5.0.5

All tests were successful!
[Expert@cpazurecluster1:0]#

 

**************************************************************

 

backup:

 

[Expert@cpazurecluster2:0]# cphaprob state

Cluster Mode: High Availability (Active Up) with IGMP Membership

ID Unique Address Assigned Load State Name

1 10.5.1.5 100% ACTIVE CPAZUREcluster1
2 (local) 10.5.1.6 0% STANDBY CPAZUREcluster2


Active PNOTEs: None

Last member state change event:
Event Code: CLUS-114802
State change: INIT -> STANDBY
Reason for state change: There is already an ACTIVE member in the cluster (member 1)
Event time: Sat Feb 10 16:11:31 2024

Cluster failover count:
Failover counter: 0
Time of counter reset: Sat Feb 10 15:59:48 2024 (reboot)


[Expert@cpazurecluster2:0]# cd /opt/CPsuite-R81.20/fw1/scripts/
[Expert@cpazurecluster2:0]# ./azure_ha_test.py
Setting api versions for "ha" solution
ARM versions are: {
"resources": "?api-version=2019-07-01"
}
Testing if DNS is configured...
- Primary DNS server is: 168.63.129.16
Testing if DNS is working...
- DNS resolving test was successful
Testing connectivity to login.windows.net:443...
Testing ClusterXL parameters...
Testing cluster interface configuration...
Testing credentials...
Getting information about the environment...
Getting information about the VM cpazurecluster2...
Id : /subscriptions/40c8d051-e4b3-45ea-b165-451d47e33fec/resourceGroups/CP-cluster/providers/Microsoft.Network/networkInterfaces/CPAZUREcluster2-eth0
Subscription : 40c8d051-e4b3-45ea-b165-451d47e33fec
Resource group: CP-cluster
Type : Microsoft.Network/networkInterfaces
Name : CPAZUREcluster2-eth0
Attempting to read - [OK]
Attempting to write - [OK]
Getting information about the VM cpazurecluster1...
Id : /subscriptions/40c8d051-e4b3-45ea-b165-451d47e33fec/resourceGroups/CP-cluster/providers/Microsoft.Network/networkInterfaces/CPAZUREcluster1-eth0
Subscription : 40c8d051-e4b3-45ea-b165-451d47e33fec
Resource group: CP-cluster
Type : Microsoft.Network/networkInterfaces
Name : CPAZUREcluster1-eth0
Attempting to read - [OK]
Attempting to write - [OK]
Testing cluster public IP address...
Id : /subscriptions/40c8d051-e4b3-45ea-b165-451d47e33fec/resourcegroups/CP-cluster/providers/Microsoft.Network/publicIPAddresses/CPAZUREcluster
Subscription : 40c8d051-e4b3-45ea-b165-451d47e33fec
Resource group: CP-cluster
Type : Microsoft.Network/publicIPAddresses
Name : CPAZUREcluster
Attempting to read - [OK]
Verifying Azure interface configuration...
- Interface eth0: local IP address = 10.5.0.5, peer IP address = 10.5.0.4
- Interface eth1: local IP address = 10.5.1.6, peer IP address = 10.5.1.5
- Interface vpnt7: local IP address = 10.5.0.5, peer IP address = 10.5.0.4

All tests were successful!
[Expert@cpazurecluster2:0]#

0 Kudos
the_rock
Legend
Legend

Also @ChoiYunSoo , can you run below when other member thats having issues is active.

from expert:

curl_cli -k google.com

ping 8.8.8.8

ip r g 8.8.8.8

clish -c "show route"

Please compare with one that works to ensure 100% it is the same.

Best,

Andy

0 Kudos
ChoiYunSoo
Contributor

Thank you for your active help.

 

Here are the answers to your inquiries

 

* FW_A (Standby)


[Expert@northclu11:0]# cphaprob stat

Cluster Mode: High Availability (Active Up) with IGMP Membership

ID Unique Address Assigned Load State Name

1 (local) 10.4.1.6 0% STANDBY FW_A
2 10.4.1.7 100% ACTIVE FW_B


Active PNOTEs: None

Last member state change event:
Event Code: CLUS-114802
State change: DOWN -> STANDBY
Reason for state change: There is already an ACTIVE member in the cluster (member 2)
Event time: Fri Feb 23 03:59:40 2024

Last cluster failover event:
Transition to new ACTIVE: Member 1 -> Member 2
Reason: ADMIN_DOWN PNOTE
Event time: Fri Feb 23 03:59:36 2024

Cluster failover count:
Failover counter: 1
Time of counter reset: Fri Feb 23 03:53:33 2024 (reboot)


[Expert@northclu11:0]#
[Expert@northclu11:0]#
[Expert@northclu11:0]# ./azure_ha_test.py
Setting api versions for "ha" solution
ARM versions are: {
"resources": "?api-version=2019-07-01"
}
Testing if DNS is configured...
- Primary DNS server is: 168.63.129.16
Testing if DNS is working...
- DNS resolving test was successful
Testing connectivity to login.windows.net:443...
Testing ClusterXL parameters...
Testing cluster interface configuration...
Testing credentials...
Getting information about the environment...
Getting information about the VM northclu11...
Id : /subscriptions/1efe27ac-5c1b-497b-bc60-6510b07d1c92/resourceGroups/North_CLU_1/providers/Microsoft.Network/networkInterfaces/NorthClu11-eth0
Subscription : 1efe27ac-5c1b-497b-bc60-6510b07d1c92
Resource group: North_CLU_1
Type : Microsoft.Network/networkInterfaces
Name : NorthClu11-eth0
Attempting to read - [OK]
Attempting to write - [Forbidden]
Error:
HTTP/1.1 403 Forbidden
b'{"error":{"code":"LinkedAuthorizationFailed","message":"The client \'b7a8cf26-f859-41aa-b8af-f103f9a14aa9\' with object id \'b7a8cf26-f859-41aa-b8af-f103f9a14aa9\' has permission to perform action \'Microsoft.Network/networkInterfaces/write\' on scope \'/subscriptions/1efe27ac-5c1b-497b-bc60-6510b07d1c92/resourceGroups/North_CLU_1/providers/Microsoft.Network/networkInterfaces/NorthClu11-eth0\'; however, it does not have permission to perform action(s) \'Microsoft.Network/virtualNetworks/subnets/join/action\' on the linked scope(s) \'/subscriptions/1efe27ac-5c1b-497b-bc60-6510b07d1c92/resourceGroups/ODL-checkpoint_v1-72163-01/providers/Microsoft.Network/virtualNetworks/North-Hub/subnets/VMSS-FrontEnd\' (respectively) or the linked scope(s) are invalid."}}'
[Expert@northclu11:0]#
[Expert@northclu11:0]#
[Expert@northclu11:0]# curl_cli -k google.com
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="http://www.google.com/">here</A>.
</BODY></HTML>
[Expert@northclu11:0]#
[Expert@northclu11:0]#
[Expert@northclu11:0]# clish -c "show route"
Codes: C - Connected, S - Static, R - RIP, B - BGP (D - Default),
O - OSPF IntraArea (IA - InterArea, E - External, N - NSSA),
A - Aggregate, K - Kernel Remnant, H - Hidden, P - Suppressed,
NP - NAT Pool, U - Unreachable, i - Inactive

S 0.0.0.0/0 via 10.4.0.1, eth0, cost 0, age 659
S 10.0.0.0/8 via 10.4.1.1, eth1, cost 0, age 659
S 10.4.0.0/16 via 10.4.1.1, eth1, cost 0, age 659
C 10.4.0.0/24 is directly connected, eth0
C 10.4.1.0/24 is directly connected, eth1
C 127.0.0.0/8 is directly connected, lo
S 168.63.129.16/32 via 10.4.0.1, eth0, cost 0, age 659
S 169.254.169.254/32 via 10.4.0.1, eth0, cost 0, age 659
S 172.16.0.0/12 via 10.4.1.1, eth1, cost 0, age 659
S 192.168.0.0/16 via 10.4.1.1, eth1, cost 0, age 659
[Expert@northclu11:0]#

 

 

FW_B (Active)

[Expert@northclu12:0]# cphaprob stat

Cluster Mode: High Availability (Active Up) with IGMP Membership

ID Unique Address Assigned Load State Name

1 10.4.1.6 0% STANDBY FW_A
2 (local) 10.4.1.7 100% ACTIVE FW_B


Active PNOTEs: None

Last member state change event:
Event Code: CLUS-114704
State change: STANDBY -> ACTIVE
Reason for state change: No other ACTIVE members have been found in the cluster
Event time: Fri Feb 23 03:59:36 2024

Last cluster failover event:
Transition to new ACTIVE: Member 1 -> Member 2
Reason: ADMIN_DOWN PNOTE
Event time: Fri Feb 23 03:59:36 2024

Cluster failover count:
Failover counter: 1
Time of counter reset: Fri Feb 23 03:53:33 2024 (reboot)


[Expert@northclu12:0]# cd /opt/CPsuite-R81.10/fw1/scripts/
[Expert@northclu12:0]# ./azure_ha_test.py
Setting api versions for "ha" solution
ARM versions are: {
"resources": "?api-version=2019-07-01"
}
Testing if DNS is configured...
- Primary DNS server is: 168.63.129.16
Testing if DNS is working...
- DNS resolving test was successful
Testing connectivity to login.windows.net:443...
Testing ClusterXL parameters...
Testing cluster interface configuration...
Testing credentials...
Getting information about the environment...
Getting information about the VM northclu12...
Id : /subscriptions/1efe27ac-5c1b-497b-bc60-6510b07d1c92/resourceGroups/North_CLU_1/providers/Microsoft.Network/networkInterfaces/NorthClu12-eth0
Subscription : 1efe27ac-5c1b-497b-bc60-6510b07d1c92
Resource group: North_CLU_1
Type : Microsoft.Network/networkInterfaces
Name : NorthClu12-eth0
Attempting to read - [OK]
Attempting to write - [Forbidden]
Error:
HTTP/1.1 403 Forbidden
b'{"error":{"code":"LinkedAuthorizationFailed","message":"The client \'08b7ff4e-a0e2-462e-a85c-d5dea401b99c\' with object id \'08b7ff4e-a0e2-462e-a85c-d5dea401b99c\' has permission to perform action \'Microsoft.Network/networkInterfaces/write\' on scope \'/subscriptions/1efe27ac-5c1b-497b-bc60-6510b07d1c92/resourceGroups/North_CLU_1/providers/Microsoft.Network/networkInterfaces/NorthClu12-eth0\'; however, it does not have permission to perform action(s) \'Microsoft.Network/virtualNetworks/subnets/join/action\' on the linked scope(s) \'/subscriptions/1efe27ac-5c1b-497b-bc60-6510b07d1c92/resourceGroups/ODL-checkpoint_v1-72163-01/providers/Microsoft.Network/virtualNetworks/North-Hub/subnets/VMSS-FrontEnd\' (respectively) or the linked scope(s) are invalid."}}'
[Expert@northclu12:0]#
[Expert@northclu12:0]#
[Expert@northclu12:0]# curl_cli -k google.com
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="http://www.google.com/">here</A>.
</BODY></HTML>
[Expert@northclu12:0]# clish -c "show route"
Codes: C - Connected, S - Static, R - RIP, B - BGP (D - Default),
O - OSPF IntraArea (IA - InterArea, E - External, N - NSSA),
A - Aggregate, K - Kernel Remnant, H - Hidden, P - Suppressed,
NP - NAT Pool, U - Unreachable, i - Inactive

S 0.0.0.0/0 via 10.4.0.1, eth0, cost 0, age 667
S 10.0.0.0/8 via 10.4.1.1, eth1, cost 0, age 667
S 10.4.0.0/16 via 10.4.1.1, eth1, cost 0, age 667
C 10.4.0.0/24 is directly connected, eth0
C 10.4.1.0/24 is directly connected, eth1
C 127.0.0.0/8 is directly connected, lo
S 168.63.129.16/32 via 10.4.0.1, eth0, cost 0, age 667
S 169.254.169.254/32 via 10.4.0.1, eth0, cost 0, age 667
S 172.16.0.0/12 via 10.4.1.1, eth1, cost 0, age 667
S 192.168.0.0/16 via 10.4.1.1, eth1, cost 0, age 667
[Expert@northclu12:0]#
[Expert@northclu12:0]#

 

2024-02-23_18-00-25.png

 

0 Kudos
the_rock
Legend
Legend

Let me examine this later carefully and will update.

0 Kudos
the_rock
Legend
Legend

Sorry, just drove to the office today, had a quick look...to me, this apepars 100% right. Here is my question...when the problematic fw is active, are you having issues connecting outbound, period, OR only certain apps dont work?

Best,

Andy

0 Kudos
ChoiYunSoo
Contributor

Thanks you for reply

 

To summarize the problem situation, if you fail-over from FW_A to FW_B, all communication will not work.

Backend LB recognizes the fail-over situation and sends traffic to FW_B.

However, the problem situation is that FW_A continues to recognize VIP in the frontend interface.

When the server pings the firewall Real IP and VIP, VIP traffic is delivered to FW_A even though FW_B is Active.

I believe this is the core issue

 

FW_B receives traffic from the server and forwards the traffic by NATing the source IP to the VIP, but since FW_A owns the VIP, FW_B cannot receive traffic.

However, when tcpdump is performed on FW_A, there is no traffic received from FW_A either.

 

I suspect that there may be a problem with the API call to Azure to transfer the VIP when fa/ilover is done.

However, I cannot accurately determine whether there is a problem with my settings or a checkpoint bug.

0 Kudos
the_rock
Legend
Legend

I dont think its cp bug, sounds like something with config in Azure.

0 Kudos
ChoiYunSoo
Contributor

I think so too. The probability of it being a checkpoint bug is very small.

I suspect that I may have configured something incorrectly in the Azure environment.

but i don't know what it is

Please let me know if there are any mistakes or additional parts I have set up.

I welcome your response at any time.

0 Kudos
the_rock
Legend
Legend

I agree 100%. The only way for me to tell would be if we did remote session...hard to say for sure based on screenshots you sent : - (

Andy

0 Kudos
ChoiYunSoo
Contributor

Hi Legend

 

Found the cause

There was a problem with the application in the environment.

API communication did not occur between Check Point and Azure due to a client secret issue.

Thanks for your help

 

 

(1)
the_rock
Legend
Legend

Excellent!

Andy

0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.