Vsec Cluster in Azure ?? anyone know how to?
I need to know is someone can help me or lead me in order to setup a HA cluster of checkpoint Vsec on Azure.
Following sk110194 doesn't mention nothing about the proper way to configure the cluster HA, just we need to work with Active directory and API. but , to be honest, i never worked before with active directory on azure and no API knowledge.
So , if someone can lead me some tip about how to deploy it? we already have all running but, to failover, have to be done manually changing the route tables to point the new active memeber.
Thanks for any help
For HA to work correctly in Azure, we must make calls to the Azure API.
The API calls allow us to monitor state and fail over the relevant routes when needed.
In order to call the API, you need credentials.
Those credentials need to be created and configured on the instances, as described in the SK.
My name is Dmitry and I'm from Check Point R&D.
It is possible to deploy a check point vSEC high availability cluster in MS Azure. The deployment and configuration process is described in sk110194 that you're referring to.
- After deploying the vSEC Cluster from the Azure marketplace you should follow the steps in the article to create a service account and assign it to the cluster's resource group. No API knowledge is required in order to do that - this can be done via the azure portal.
- It is then required to configure the high availability daemon on each cluster member (see the section "Deployment using a Solution Template" that covers the creation of a service account and the configuration of the HA daemon).
- On failover, the HA daemon will make all the API calls automatically and reassign all routing tables accordingly.
- Note that the cluster must be properly configured in SmartDashboard / SmartConsole and policy must be installed (see section "SmartDashboard Configuration" for instructions).
If you have specific comments on the SK please feel free share them with me. If you are having trouble configuring the cluster you may open a support ticket or contact your local SE.
Thanks for your detailed answer Dmitry,
Actually, we already have working the cluster in Azure following the sk110194 and this video which is really helpful: Checkpoint vSEC cluster Deployment in azure - YouTube
But now, we are stock because everything is ok for the incoming traffic, but outgoing, traffic generated from the lan of the checkpoint in azure doesn't work.....
Off the top of my head, I would troubleshoot each hop.
From host behind the vSEC Gateway, try to ping the vSEC Gateway.
Use tcpdump on the vSEC gateway to confirm traffic is being received on the expected interface.
If not you will need to review the User Defined Routes to ensure they are configured correctly.
i have followed sk110194 and applied the routing tables as explained in the article.
When i NAT behind the clustermember-frontend IP, outgoing internet access is fine. When I NAT behind the cluster virtual front-end IP, we have no internet access.
From fwmonitor we see that traffic is correctly being NAT-ted behind the cluster IP but we see no return traffic. Is there any further configuration we need to do in Azure routing tables for this to work?
we had the same issue. To get outgoing azure traffic hide NATed by the cluster IP address, it should leave the gateway as hide NATed to its _private_ interface address, and Azure will translate the src address to the cluster IP.
This works fine until a failover, when the secondary still translates to the private address of the _primary_ gateway, as they have identical policies/config but interface addresses are different.
I am still trying to find an answer to a failover scenario and outgoing internet traffic.
You need to set the external interface as Sync only not Cluster or Cluster + Sync. There is no need for a VIP to be defined. Gateways will then automatically NAT behind their respective Front End IPs.
You do not need to define a manual NAT rule to hide behind a specific IP, just auto hide NAT all subnet behind Gateway.
thanks for your share. Inherited the system as-is... no auto hide-nat enabled on Azure subnet objects, just a single NAT to the Cluster PIP, which was not working
On the other hand, due to Azur's fabric nature (heavily policy routing like UDRs) there is a heavy traffic arriving and leaving the same inside interface. This is the reason why I tried to do hide NAT explicitly/manually applied only traffic heading out to the Internet. I do not want to get the traffic (from subnets in Azure with auto hide-nat configured) NAT-ed when not going out to the internet, and the ingress/egress are inside.
I have created an addtional interface (3rd one) as per sk113583 to an existing cluster of two interfaces (created using the template Check Point CloudGuard IaaS High Availability for Microsoft Azure R80.10 and above Deployment Guide) as we have a requirement to route to certain parts of the network via this new interface (eth2) .. As per your advise, I have defined eth2 as 'External' interface and 'Sync' and put this behind a new Load balancer IP 172.19.16.70 on the 'backend-lb'
I need to NAT traffic leaving this interface to hide behind the gateway IP address (.68 when GW1 is active and .69 when GW2 is active)..
However traffic is always ever getting NATed to GW1 IP address (172.19.16.68) even when GW2 is Active resulting in traffic being dropped as the load balancer sends return traffic to GW1 which is offline
Any ideas what needs to change to achieve the desired results ?
Can I ask what is the point to have a cluster in Azure?
There is no physical device that you would need to replace.
HA - takes a bit of time, within average environment almost the same time as to redeploy appliance.
Keep a live packages some time cause split brain, so you have an outage.
From my personal point of view it is quite a bad practice to try to replicated on-premise datacentre in the cloud, instead of using benefits that cloud introduce.
The gateways in the cluster are put into an availability-group. Although they are in the same Datacenter, they should be on different racks and hardware. This ensures that in case of any failures or maintenance the cluster will switch over onto the secondary gateway. This process takes around 4 minutes due to changes done automatically using APIs.
I have the same question as Vladislav Nedosekin
I have a ClouldGuard Cluster but had lots of problems when fail-over takes place, and it happens occasionally. Every few days/weeks randomly, fail-over takes place with no good reason out of the blue, and most of the time the whole API call process does not complete smooth, until manually using cluster_XL down/up command.
We decided to shutdown manually the secondary node for a while (we are few months in this state), and since then we have a stable environment, except one known in advance Microsoft maintenance activity.
I agree about availability-group, but I think we saved lots of money every month since then.
We've had vSec cluster deployed for around 6 months now and I agree with you, there are a lot of issues when it comes to cluster and failover.
Whenever a change related to interfaces, static routes, dynamic routing or route redistribution is made on the gateways, the routing daemon crashes, causing a failover. Unlike on premise, a failover in Azure takes 5-6 minutes due to API calls so this effectively results in 6 minutes of downtime.
As a workaround, we always do a clusterXL_admin down on the standby member to avoid failover. We then perform the necessary changes and re enable cluster on the standby member.
Due to the nature of public cloud environments, we are reliant on things like the public cloud APIs to perform cluster failovers.
Obviously we can't control how long it takes for these API calls to complete.
That said, the experience is not currently optimal and we are looking for ways to improve it.
If u are still facing LAN outgoing traffic issue, please check on Network Interface of any LAN VM, the Effective Routes that are active to understand what routes are active.
Validate your HA related configuration is correct. (including API Credentials and Role being provided to respective Resource Group)
[Expert]# python -m json.tool $FWDIR/conf/azure-ha.json
Reconf the same
[Expert]# $FWDIR/scripts/azure_ha_cli.py reconf
Test the same
Work on the errors found with test.
FACT: Highest Priority for User defined routes, the Express Route / On Premise Routes and Lastly System Defined Routes.
Thanks for the information, I have already used the script long time ago:
# python -m json.tool $FWDIR/conf/azure-ha.json
And while the results were OK, we found later that the API user credentials where incorrect.
I'll go over the configuration again and see if it is mature enough to be stable as it should be.
What are the internal interfaces need to be set to? sync or cluster + sync.
Both interfaces set to sync.
"You do not need to define a manual NAT rule to hide behind a specific IP, just auto hide NAT all subnet behind Gateway."
I currently have 2 NAT rules in the below manner and failover works(UDR's change) but outgoing traffic still NAT's to GW1-IP.
1. Internal LAN subnet | Any | Any | GW1-external private IP(hide) | Original | Original
2. Internal LAN subnet | Any | Any | GW2-external private IP(hide) | Original | Original
Please let me know what am i missing?
Hi Sarath M
External = Sync
Internal = cluster + Sync.
Do not use manual Hide-NAT rules. Use automatic NAT (object NAT) and choose hide behind gateway. This will ensure that during a fail over, traffic is NAT-ted behind the correct active gateway.
during a discussion I had with a Check Point engineer during an event for CloudGuard IaaS in Azure, I asked the same question and his reply was "at the moment is not possible to configure HA based on CXL o VRRP due the multicast limitation on azure vnet".
In some documents he shared with me, the solution using the load balance in Azure. It means lose the session stateful.
Public Clouds generally don't support Multicast for their networking.
This pretty much leaves you with two classes of solutions for High Availability/Load Sharing firewalls
- A solution that interacts with the underlying networking somehow
- In Azure, you can do this with User-Defined Routes, but this requires extra monitoring and API calls that turn failovers into events that can take tens of seconds.
- AWS doesn't have User-Defined Routes, but there are ways to achieve a similar result--with similar limitations
- Using a solution that relies on Load Balancers
- This allows for greater scalability, since you can do more than a couple of gateways in a "cluster", but cannot be used in all topologies/use cases.
- It also typically does not allow for stateful failover, which when you consider the rest of the application stack being protected, is not a significant limitation.
Speaking of which, we just released our new HA solution for Azure.
We were early adopters for installing vSEC clusters in Azure and experienced many issues (not just cluster related). Everything else except for cluster failover has gotten better still being on R77.30. I now have my management servers on R80.20 and just waiting for a fix from developers to have my log server updated. The vSEC cluster failover is inconsistent with sometimes completing successfully in 4 minutes and other times taking multiple hours to the point that shutting down a member has resulted in higher up time. My question is whether cluster failover is any better with R80.10 or if we cannot take advantage of availability sets and rather should just use single gateways. What are you doing?
we have the same issues with cluster on Azure. It is very unstable. we were thinking of re-deploying using the newer templates but we are not sure what to deploy between "IAAS r80.10 Cluster" and "IAAS High Availability"
I have query with deploying Checkpoint cluster from market palce.
I am wondering where we can define firewall physical IP addresses. I was watching youtube video and in that they have directly configured cluster interfaces.
These will get assigned by Azure from the Subnet that you define in the Deployment.
You specify a FrontEnd and BackEnd Subnet and the Firewalls will grab IP from those subnets.
The Script also gets the Public IP and associates them as well.
Doing a quick check on youtube and those video's seem quite old.
I find that the Cloud updates that fast these days that before go to do a new deployment then always goto my work azure subscription and spin one up to find what changed.