Solved: ClusterXL – Should the standby member ever generat...

RemoteUser

In a ClusterXL HA, does the standby member ever send traffic using the Virtual IP, or should only the active member use it?

G_W_Albrecht

VIP is used for all communication with the Cluster, represented by its active member - so only the active member will use the VIP.

https://sc1.checkpoint.com/documents/R81.20/WebAdminGuides/EN/CP_R81.20_ClusterXL_AdminGuide/Content...

CCSP - CCSE / CCTE / CTPS / CCME / CCSM Elite / SMB Specialist

View solution in original post

the_rock

Hey bro, just tested it and as you mentioned, VIP would be only referenced with whichever member is active.

Andy

View solution in original post

G_W_Albrecht

VIP is used for all communication with the Cluster, represented by its active member - so only the active member will use the VIP.

https://sc1.checkpoint.com/documents/R81.20/WebAdminGuides/EN/CP_R81.20_ClusterXL_AdminGuide/Content...

CCSP - CCSE / CCTE / CTPS / CCME / CCSM Elite / SMB Specialist

RemoteUser

So In a ClusterXL High Availability setup, if I log into the Standby node and generate traffic to a server (e.g. ping, curl, telnet), should the source IP always be the member’s physical interface address, and never the cluster VIP?

Bob_Zimmerman

Unfortunately, traffic sent by the standby member typically goes out from the VIP, then replies go back to the active member which doesn't know anything about the connection and drops it.

In the past, I've used fwha_cluster_hide_active_only to fix this, but it seems it no longer works how it did in the past. Now, I have to make explicit no-NAT rules everywhere to get the standby member able to talk out to stuff like RADIUS, or to Check Point for updates.

RemoteUser

So, by setting NO nat on the standby , we force connections to exit only with the physical IP and not with the VIP.
That's what we want, right?

Bob_Zimmerman

Rules aren't aware of which member is active or standby. The rule can only say the cluster members to anywhere (or to specific destinations), don't NAT. And this affects all traffic from the firewalls, including (for example) traffic from clients using the firewall as an explicit web proxy.

Frustrating to say the least.

emmap

Rather than noNAT rules, you can use the table.def file to prevent the cluster NAT from occurring for specific services.

https://support.checkpoint.com/results/sk/sk31832

Be aware this will completely break outbound traffic from VSs because don't have routable IPs per interface. So don't do it for HTTPS or DNS or like that. Historically I've only seen the cluster NAT be an issue for RADIUS and SecureID, for general outbound stuff like DNS and HTTPS you shouldn't have an issue leaving it at default. If you do, please raise a TAC case so we can fix it.

Bob_Zimmerman

The problem with table.def modifications is they get wiped out by every upgrade, and upgrades are infrequent enough that everyone has forgotten about the modifications by the time one rolls around. We then have to deal with whatever it was being broken for a week while we rediscover the modification.

I just went through that for something minor after upgrading to R82. I don't want to do it again for R82.10 for something bigger like my cluster members' ability to get to the Internet to fetch updates, or to get to my RADIUS servers so I can log in to half of them. We don't yet have automated testing of authentication, so it would take time for people to even notice how widespread the problem is.

I opened a ticket with diamond for problems connecting out from my standby members for AV/AB/IPS updates, and the solution was originally fwha_cluster_hide_active_only. We went through the process of deploying that everywhere, then a few months ago, we started having the same problem again. Opened another ticket, and our diamond reps said we needed to get rid of the kernel parameter and use a no-NAT rule instead.

the_rock

I would say no nat rule definitely makes more sense in this case, at least to me.

Andy

JozkoMrkvicka

There is one more hidden feature which wasnt mentioned yet. Even if you implement No-NAT rule, communication from standby member goes over synchronization interface to active member. This is by default starting from R80.40. Active member is the one doing decision if traffic should leave or will be dropped. With No-NAT in place, you will face asymmetric routing for standby member. But standby member will somehow manage it and communication will work.

Flow from standby member using No-NAT to any radius server:

Request:

standby physical IP --> sync interface --> active member --> radius

Reply from radius:

radius --> standby physical IP

More info here:

Traffic from the Standby member to any other host goes through the SYNC interface

Outgoing connections from cluster members are sent with cluster Virtual IP address instead of member...

Asymmetric Connections in ClusterXL R80.20 and Higher (section 3.4).

Kind regards,
Jozko Mrkvicka

JozkoMrkvicka

Lets say we have No-NAT rule for cluster members. Active member is in location A, Standby member is in location B, RADIUS server is in the same location as Standby member - in location B.

How will radius packet travel ? well, following:

Standby cluster in location B will send radius request packet over sync interface to Active member in location A. Active member will then send the same radius packet to RADIUS server in location B.Radius in location B will reply directly to standby member in location B. So we have B-A-B-B. If distance between location A and B is many kilometers, there might be some delay.

Kind regards,
Jozko Mrkvicka

the_rock

All super valid!

the_rock

I beliebe that to be the case, but will test in the lab to be 100% sure.

Andy

the_rock

Hey bro, just tested it and as you mentioned, VIP would be only referenced with whichever member is active.

Andy

RemoteUser

Hi bro,
So you tested it and when generating traffic from the standby, you can confirm the source is the physical interface IP, not the VIP, right?

the_rock

Thats right.

RemoteUser

Ok, so if I see the VIP being used as the source from the standby member, that would indicate a misconfiguration or an error, right?

the_rock

Not really misconfiguration, but could be no nat needed, as Bob had said.

Andy

RemoteUser

ah ok understood thk buddy

the_rock

No problem! If anything else, let me know, I can test.

Andy

emmap

How did you check this? It's not the expected behaviour.

the_rock

Just did some captures (tcpdumps, fw monitor), thats all. I will test it tomorrow in the lab again.

Andy

emmap

Please make sure there's a policy installed, you have VIPs on all outbound interfaces, and you don't have any noNAT rules in there.

the_rock

Im 99.99% sure that is the case, but will double check tomorrow.

Best,

Andy

the_rock

Just tested it and as I suspected, no nat rule, policy is there and VIPs are configured.

Andy

emmap

OK, so when you are checking this, you are tcpdumping on 'any' interface?

The default behaviour from the standby one (tested on R82 ClusterXL, current jumbo) is that it will send the packets over the SYNC interface to the active member, which then hideNATs it behind the VIP. So the tcpdump on the standby looks like it's not NATing it (because it isn't) but the actual network sees it as a packet from the VIP but coming from the active member's MAC. The active then sends the replies back over the sync links when it gets them.

The purpose of this behaviour is to get around things like MAC learning on switches updating their ARP tables to point the VIP to the standby when it reaches out for updates etc.

RemoteUser

Where can I find this documentation to study it?

emmap

Section 3.4 here:

https://support.checkpoint.com/results/sk/sk169154

RemoteUser

thk

the_rock

Nope...I used eth0, which is correct interface. I get all the points you made, it all makes sense.

Andy

Are you a member of CheckMates?

ClusterXL – Should the standby member ever generate traffic with the VIP?