Check Point Clustering Query

Nilay_Vyas · ‎2018-08-22

Hi,

I am fairly new to the check point and I am not the security expert either. I am working on the lab design from network prospective and I have a question related to Check Point R80 Clustering.

I have been advised that Check point require a switch in between them to perform the clustering properly or else they will end up in split brain scenario.

My initial design for lab was

cp cluster ( cp a and cp b)

wan 1 --> cp a (Lan and WAN L3 on the CP) --> Lan Switch stack 1--> users

wan 2 --. cp b (LAN and WAN L3 on the CP) --> Lan Switch stack 2--> users

cp a <--> cp b for clustring and LAN and WAN port monitored for failover.

Cp cluster setup Active/ standby.

I have been told it needs to be

wan 1 -- Lan Switch stack 1 -- CP A -- Lan switch stack 1 -- Users

wan 2 -- Lan switch stack 2 -- CP B -- Lan switch stack 2 -- users

cp a <-- lan switch stack 1 / 2 --> cp b

I failed to understand the reason behind this. I have been told that each WAN and LAN interface doe the keep alive like HSRP via switch L2 brodcast domain. Cluster sync is only to share the session information.

if we do the 1st design then if the WAN 1 go down CP don't know if the issue related to the port or the CP it self and it may fail or half fail.

If someone please explain to me following?

- what is the real requirement do we need a L2 switch between cluster or not?

- Can's check point share the port information up/down via sync link and make decesion ( like ASA or FG)

- how the communication and failover happens in the sync or failover scenario.

I would love to understand the mechanics behind it and best practice or validated practice.

Thanks,

Nilay.

Lari_Luoma1 · ‎2018-08-22

Check Point supports two type of clustering, ClusterXL a proprietary Check Point mechanism and VRRP, which is an open standard. I'm assuming that you refer to ClusterXL cluster in your question.

Cluster members need to see each others in the same L2 network and it's not allowed to have any routing devices between them. Clusters should meet the following requirements: max. delay 30 ms, max. packet loss 2-3%

Cluster members communicate with cluster control packets, which are sent every 100ms by default (which is also the min. value). CCP packets can be sent in multicast, broadcast or unicast. Multicast is the default in regular clusters, but you can change the mode to broadcast. Scalable Platform appliances and VSX in VSLS mode work with unicast.

CCP packets are sent over the sync and cluster interfaces (except on SP appliances only over the sync link) and is responsible for synchronization and health checks.

ClusterXL cluster can work in high-availability or load sharing mode.

Synchronization

Synchronization happens over the sync-link. Full-sync happens when the cluster member boots and delta sync every time when new information is available. Some connections use delayed synchronization by default.

PNOTEs

ClusterXL cluster monitors its interfaces and critical processes via a PNOTE-mechanism (problem notification). PNOTEs are also called critical devices. It's very flexible method and also allows you to add own PNOTEs to be monitored. If a PNOTE goes down (interface, process...) the cluster member declares itself as down. This usually means that a fail over takes place.

When all PNOTEs report their states as OK, the machine will try to change its state to 'Active', depending on the cluster configuration (HA mode / LS mode) and states of the peer members.

With ClusterXL cluster it's almost impossible to get a split-brain situation.

Failover

When a cluster member stops receiving ccp packets from its peer, it starts so called probing to determine whether the problem is on its own interface or on peer side. If the member itself has a problem it sets itself to down state. In case the peer had an issue, the status will become Active Attention. In Active Attention state, the cluster member forwards traffic, but other cluster members are down.

Please see sk66527 for more information about recommended configuration for ClusterXL.

Lari_Luoma · ‎2018-08-23

Seems that I was logged with my private account when posting last night... 🙂

Timothy_Hall · ‎2018-08-23

> If someone please explain to me following?

>

> - what is the real requirement do we need a L2 switch between cluster or not?

You do not need a switch between the two ClusterXL members for the sync network, a simple Cat5e or better network cable will be fine; I actually prefer using a piece of cable/fiber as it does not have any components like a power supply or silicon that can fail. Nokia Clustering (not VRRP) did actually require a switch for the "cluster interface" between the members to maintain link integrity at all times, otherwise if one of the members was powered off it would cause a big bounce in the cluster and brief outage. Obviously if you have three or more members of the same ClusterXL cluster you will need a switch; note however that all cluster members assume the sync network is secure, and there is no encryption or even authentication present there. If an attacker can inject or tamper with the cleartext traffic on the sync network very bad things can happen; this is pretty difficult with a piece of cable/fiber unless an attacker is physically present with the gateways and is vampire tapping into the sync cable itself in which case you have much, much bigger problems to worry about.

> - Can's check point share the port information up/down via sync link and make decesion ( like ASA or FG)

Even though the sync network uses the CCP protocol to synchronize the state tables, my understanding is that cluster status and health is not communicated via the sync network. It is not analogous to a "failover" cable from the Cisco PIX/ASA world. All members of a ClusterXL cluster send several CCP multicasts a second out all clustered interfaces (i.e. those presenting a cluster/virtual IP address) to test the network connectivity between them and report state/load to each other.

- how the communication and failover happens in the sync or failover scenario.

Obviously in the case of a catastrophic failure of the active member, CCP emanations cease from the failed member. The surviving member notices this and after waiting the dead interval (approximately 2.5 seconds by default unless a freeze or CUL is active) goes active and begins passing traffic.

In the case of an impairment on the active member (i.e. unplugging a network cable plugged directly into the active member), the next CCP update sent over the other interfaces that are still working a failure is being reported. When the standby member of the cluster is notified of this failure on the active member, the standby notices that it is in a "better" or "less failed" state than the other member and immediately goes active. In the case of an equal failure (i.e. a switch both members are attached to has its power cord pulled), they both report an identical failure to each other and nothing happens; whichever firewall that was previously active remains active as there is nothing to gain by attempting a failover.

I glossed over a few details but hopefully that helped.

--
Second Edition of my "Max Power" Firewall Book
Now Available at http://www.maxpowerfirewalls.com

New Book: "Max Power 2026" Coming Soon
Check Point Firewall Performance Optimization

HeikoAnkenbrand · ‎2018-08-23

> If someone please explain to me following?

>

> - what is the real requirement do we need a L2 switch between cluster or not?

Timothy described the pro blems well. I just want to point out that the active gateway in a HA cluster goes into the following status "Active attention" if there is no connection via the CAT5e cable. This has no direct effect, but the active machine also reports an problem. Active attention means that a problem has been detected, but the cluster member is still forwarding packets because it is the only machine in the cluster or there are no other active machines in the cluster. In any other situation the state of the machine would be down.

I like to avoid this and use a switch. From my point of view, however, both are possible without any problems. I think switch "yes or no" is a fundamental discussion.

I once started a voting on the topic:

Regards

Heiko

➜ CCSM Elite, CCME, CCTE, CCVS ➜ www.checkpoint.tips

Nilay_Vyas · ‎2018-08-23

Thank you guys for so much information. I am so happy that I can get such a good wealth of knowledge and explanation from the experts which helps me to understand the technology much better.

I still have a question,

I understand that if we have a switch between the CP then the CCP protocol will communicate among all the interfaces and if one of the interface goes down then entire cluster will take over. which is all good.

However what if we don't have a switch,

so the wan on the CP

LAN switch on the CP trunk

and sync interface.

LAN switch will take care of the LAN interface sync via CCP protocol

Sync interface between CP will take care of whatever Sync suppose to do..

but WAN interfaces are connected directly on the CP,

- How they communicate their state of active and failover

- what happens if the WAN interface go down on the one CP .. as they can't see each other's WAN interfaces on L2 device how do they get notified that analog is down? Do sync interface to track their WAN and inform other device my WAN is down please takeover the active state and other device without verifying over L2 that it is really down it take over the state or something else?

Thanks

Nilay.

Lari_Luoma · ‎2018-08-24

WAN-interfaces in a cluster must be connected to the same L2-network as well.

Say that you have the following cluster:

member 1:

eth1 (external): 172.80.3.2

eth2 (internal): 192.168.1.2

sync: 10.255.255.1

member2:

eth1: 172.80.3.3

eth2: 192.168.1.3

sync: 10.255.255.2

eth1 and eth2 have virtual IPs of their respective networks set to .1.

Because the cluster-interfaces need to be in the same L3-network, they also need to share the underlying L2-network (VLAN). If you have geographically separated cluster members, you need to extend your L2 infra to both sites. By default the active cluster member uses gratuitous ARP to advertise its mac-address.

Example L2 connectivity of my cluster:

external interface from both members: Vlan 299

internal interfac from both members: vlan 199

sync-interface from both members: vlan 99

Hope this clarified it.

-lari-

Are you a member of CheckMates?

Check Point Clustering Query