Remove interface from ClusterXL

morris · ‎2023-07-13

Hello guys,

is there a way to configure ClusterXL to not failover when ccp packets are lost or an interface is down?
I know that is not the idea of redundancy/clustering as technique but the interface I am talking about is somekind special.

Yes, you can put an interface into private mode. But then you lose the virtual IP as well.
Yes, you can monitoring the link instead of the ccp packets, but we do not want to monitor the interface at all while having a virtual IP.

Short info about the topology:

- 2 appliances form a cluster
- several interfaces with one virtual IP connected via inhouse-cabling
- 1 Interface is connected to a switch (which is hosted at a external partner site) via L2.

This last interface or better the connection to the switch at the external site has short outages here and there. So the active gateway failsover over which does not make any difference as the passive gateway uses the same connection.

Is there any configuration where we can keep the virtual IP and but the interface has no role in ClusterXL?

best regards,

the_rock · ‎2023-07-13

Personally, I never heard of something like that being possible.

Andy

Chris_Atkinson · ‎2023-07-13

There are parameters to control sensitivity in some failover scenarios but those tend not to extend to specific interfaces

Your requirement has similarities to this previous discussion:

https://community.checkpoint.com/t5/General-Topics/Cluster-XL-Interface-Preference/td-p/10107

CCSM R77/R80/ELITE

the_rock · ‎2023-07-13

Hey Chris,

Thanks for that, good to know. Quick question...so say if you wanted to this, you just add interface name once that file is created? $FWDIR/conf/discntd.if

Andy

Chris_Atkinson · ‎2023-07-13

Suggest it would require testing or confirmation via TAC, reviewing some SK there are some mixed results depending on version.

CCSM R77/R80/ELITE

the_rock · ‎2023-07-13

Thats what I recall from R77.30 days, but I assume it might be different now...

Andy

morris · ‎2023-07-17

Hi Chris,

I've tested it in our lab.

If adding the interface name to $FWDIR/conf/discntd.if the cluster won't failover once the link goes down.
But at the same time you lose your VIP (cphaprob -a if).

CCP mode: Manual (Unicast)
Required interfaces: 3
Required secured interfaces: 1


Interface Name: Status:

eth1 UP
eth2 Non-Monitored
Sync (S) UP
Mgmt UP

S - sync, LM - link monitor, HA/LS - bond type

Virtual cluster interfaces: 2

eth1 192.168.1.60
Mgmt 192.168.0.60

At the moment I don't see any solution to this.

Bob_Zimmerman · ‎2023-07-17

Do you need things to be able to talk to the VIP, or only through the VIP?

If only through, you may be able to use proxy ARP to set up a manual VIP. I'm not 100% sure it would work, but it's similar enough to how one of my major clusters works that I think it would. On both members:

add arp proxy ipv4-address <VIP> macaddress <member's MAC for eth2> real-ipv4-address <member's IP for eth2>

Be sure "Merge manual and automatic proxy ARP configuration" is checked in Global Properties. Traffic to the VIP itself (e.g, a client trying to connect to the VIP via SSH) wouldn't work. Traffic through the VIP (i.e, using the VIP as a gateway address for a route) should work.

I'm not sure what would happen during failover. Traffic through that interface might fail while everything learns the new ARP entry.

morris · ‎2023-07-17

The VIP acts as a gateway.

I will give that proxy ARP a try.

the_rock · ‎2023-07-17

You can try that, but I would be shocked if that worked...

Bob_Zimmerman · ‎2023-07-17

It definitely works in general. It's basically how VSX works internally (and more generally, how off-net cluster VIPs work). For a route without a gateway, the sender ARPs for the destination address and uses that MAC for the frame's destination. For a route with a gateway, the sender ARPs for the gateway instead and uses that MAC for the frame's destination. This is what the proxy ARP is trying to manipulate.

An IP network stack doesn't care how traffic gets to it. If it owns the destination IP, it responds. If it doesn't own the destination IP and forwarding isn't enabled, it drops the packet. If it doesn't own the destination IP and forwarding is enabled, it forwards according to the routing table. Cluster VIPs get some additional per-member NAT stuff which you can't configure manually in the rules. Traffic to the cluster VIP gets translated to instead go to the member's unique IP, so it sees the connection and responds. If we don't need the cluster VIP to accept connections, then we don't actually need that part.

The part I'm not sure about is whether proxy ARP works at all on an interface marked as Non-Monitored/Private or placed in discntd.if. It should, since those are still valid interfaces in the firewall kernel, but I haven't personally tried it.

the_rock · ‎2023-07-17

I agree, I know it works 100%, BUT, I dont see how it would help in @morris case, thats all.

Andy

Bob_Zimmerman · ‎2023-07-17

The idea is to let you use one consistent gateway address (the VIP in the proxy ARP entry) regardless of which member is active, just like a normal cluster VIP. As long as nothing has to talk to the VIP, and as long as proxy ARP entries on non-monitored interfaces work, this should work.

The other potential complication is during failover. Old versions of the firewall used to flush out gratuitous ARP replies for all of the proxy ARP entries on failover. That seems to no longer happen in R80.10 and up. Without that, traffic will keep going to the now-standby member until the ARP entries time out and are relearned.

_Val_ · ‎2023-07-18

I do not think the statement of G-ARP not being sent after a failover with R80.10+ is correct. Please refer to sk120495, and also to the ClusterXL guide. In the latter, G-ARP is mentioned multiple times, look here, for example.

Bob_Zimmerman · ‎2023-07-18

It definitely is correct. I've been fighting an issue related to gratuitous ARP replies for several years across many TAC tickets. Recently built a lab environment at R80.10, R80.40, and R81.10 to try a fix for VMAC, and none of them sent out gratuitous ARP replies for any manual proxy ARP entry on failover. They only send it for the cluster VIPs. It's incredibly annoying and wasted over six months of my time.

the_rock · ‎2023-07-18

Im with you @Bob_Zimmerman , I tested the same and results were exactly what you mentioned.

Andy

_Val_ · ‎2023-07-18

Uh, I thought you referred to no G-ARPs for VIPs. Sorry I misread your post.

Bob_Zimmerman · ‎2023-07-18

No problem! I should have been more clear. This issue has been a painful dive into the depths of how several features work internally and how they have each changed since R77.

Chris_Atkinson · ‎2023-07-18

I feel for you if this is a situation that you cannot solve by routing (I.e. NATs in same subnet as VIP).

Proxy-arp is a horrible thing to need to rely on for many reasons based on my own personal multi-vendor experience.

CCSM R77/R80/ELITE

Bob_Zimmerman · ‎2023-07-18

My situation is so much worse than just this proxy ARP stuff implies. We're doing proxy ARP for addresses on networks the firewall doesn't have any IP on. Think of it like a bridge mode firewall, but created by M.C. Escher (or maybe H.R. Giger, depending on who you ask). We needed to be able to use features which aren't available in normal bridge mode, so we manually fake all the normal layer 3 to layer 2/1 logic.

On the plus side, it constantly refreshes my knowledge of the low levels of the network stack.

_Val_ · ‎2023-07-17

I am not sure I understand the request. Disabling CCP probing is out of the question. Are you trying to improve the cluster tolerance in case of intermittent network failures? What's the goal?

JozkoMrkvicka · ‎2023-07-18

Just bring the standby member to DOWN -p state?

ClusterXL_admin down -p

But here I am not sure how will CP act if the physical interface on active member will go down, while on down member (ClusterXL_admin down -p) that same physical interface will be up. Will failover happen or not ?

Kind regards,
Jozko Mrkvicka

Are you a member of CheckMates?

Remove interface from ClusterXL