R80.30 ClusterXL Member1 blocking traffic Member2 ...

supbhath · ‎2020-02-08

Hi,

We have Two cluster member. Member1 having license CPAP-SG5600-NGTP and Member2 having license CPAP-SG5600-NGTP-HA

While Member1 is the Active member, servers behind that are not acessable. But while Member2 is the Active member, servers are reacable.

Logs are showing traffic is passing Member1 while Member1 is active.

Configuration, OS and Hotfixes all are identical in both the member. Plase suggest what may be the reason for Member1 not working.

PhoneBoy · ‎2020-02-09

Have you confirmed the routing configuration is identical between the two cluster members?
Have you done tcpdump to see what's happening when traffic passes through the primary member?

Daniel_Schlifka · ‎2020-02-09

This sounds somewhat strange, cluster policies should be more or less identical for all cluster members.
Normally there should be no reason why one gateway would drop traffic which the other members let pass.
So you might want to check some things.

Is the cluster ok?
->> check all clustermember interfaces are shown up for both cluster members. (cphaprob stat)
->> check the cluster is synced
(see the "monitoring delta synchronization" section inside the R80.30 CLI Reference Guide)
-> use 'show cluster statistics sync' from clish ( or 'cphaprob syncstat' from expert mode)

Is the traffic truly blocked?
- Verify that with smartlog, if smartlog doesn't show a thing go for fw monitor or cppcap.
If it is blocked check why -> usually it antispoofing and/or incidentally a clustermember object was used instead of the cluster object inside the firewall policy. If it's antispoofing, check that it is allowed inside according antispoofing group and also that there is no asymmetric routing in place.
Checkpoints are by default acting stateful, therefore it will drop asymmetric traffic.(traffic arrives at one interface/ip and derrives from another interface/ip at server side) most of the time that happens if the server default routes are pointing towards another interface. The Firewall will consider this as new connection and because it never received a TCP-SYN for the connection it will drop the traffic. If you're unable to change the routing, you can define exceptions, please be aware that this can become nasty with ips blade enabled. A clean routing should be always the preferred solution.

If it's not blocked by the policy the arp caches on server side might be an issue.
By default ClusterXL uses the gateways mac and sends gratitious arp, when failing over. If the arp caches on server side have a long expiration timer it might happen that server still tries to send ethernet frames towards the mac of the previous active cluster member or expects them from the old mac.
A short term work around from server side is pinging the gateways ip address, this should renew the arp entry on server side. Under circumstances you might have to flush the arp caches on the server depending on the installed os. -> check arp table on server side
If that is your issue you can work around longterm using virtual mac addresses. (see sk50840)

HtH

Timothy_Hall · ‎2020-02-10

Sounds like a classic ARP issue, check the ARP caches of Layer 3 devices adjacent to the cluster when member1 is active and I'll bet you still find the MAC address of member2 in the caches. Assuming the default settings on your cluster object, this would indicate that the gratuitous ARP function is not working properly with your network. This can be verified on member1 by using tcpdump withe the -e option which will show Layer 2 MAC addresses, and you will still see member2's MAC address in the destination of incoming frames.

Try flushing the surrounding ARP caches when member1 is active and see if things start working. If so try enabling VMAC on the cluster object, but be sure to set portfast mode on all switchports in use by the cluster members before doing this to avoid what I call a "slow" failover in my book due to STP.

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com

supbhath · ‎2020-06-24

Hi,

I have enabled VMC and cleare ARP caches of all the connected ports of adjacent L3 switches. But still no luck.

Timothy_Hall · ‎2020-06-24

Try all the following from Member 1 while it is active.

As mentioned earlier you'll need to use tcpdump -e to verify the inbound destination MAC address is correct when Member 1 is active.

If you have verified that, next step is to verify that the traffic is reaching the Firewall Worker (INSPECT) with fw monitor -F, in particular the "i" capture point. If packets are showing up in tcpdump but not reaching "i" you have some kind of inbound layer 2 networking problem like ARP.

If the traffic is reaching "i", next step is to run fw ctl zdebug drop which will display all traffic dropped by INSPECT/SecureXL in real time and the reason.

If the traffic is not being dropped, use fw monitor to verify the traffic is reaching "i" then "I". You should next see that packet enter "o" but if it doesn't, you have a layer 3 routing problem in Gaia.

In fw monitor -F assuming the packet reaches "o" it should next go through "O". If it goes through "O" but does not appear in a tcpdump on the egress interface or actually get transmitted to the network, you have an outbound layer 2 networking problem, such as the inability to form an IP-MAC mapping for the next-hop router or destination.

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com

Are you a member of CheckMates?

R80.30 ClusterXL Member1 blocking traffic Member2 working fine