Solved: Standby Cluster Member cannot reach the Internet

tpattgeek · ‎2020-08-16

Hi all,

I recently deployed a cluster for a customer with a fresh R80.30 install for both virtual MGMT and 3100 gateways. We cut over from a single appliance to the cluster which connects via layer 2 to a cable modem. VRRP is working fine, but the standby member is failing to reach the Internet. All routes are the same and valid, but the standby arp table only shows the inside next hop and the outside VIP. After checking the logs, I see the standby is hitting the policy and being blocked (since there are no rules in the policy to allow the gateways' public IPs. It is showing the source of all traffic as coming from the standby's physical external IP.

I checked all of the common SKs and it seems like THIS - sk43807 may be the most promising one.

Is the recommended fix to implement all of the steps from this SK in order?

Since it only bypasses hide NAT for the specific ports, how do you resolve ICMP failures from the standby member?

The option for using a virtual MAC is unchecked; should that always be enabled, especially with a cable modem?

The reason I mention the virtual MAC is because I attempted to fail over to the standby as a means to test this, and all connectivity was lost. The standby assumed Active role, but no connectivity was allowed through the device, that is why I'm thinking there are multiple adjustments needed. I added a rule to allow the standby gateway through the policy, and connectivity was restored to the standby unit, but I don't believe that is the proper fix... Any help is appreciated!

PhoneBoy · ‎2020-09-10

Do you have explicit rules in the rulebase allowing the traffic in this case?

View solution in original post

PhoneBoy · ‎2020-08-18

It's actually port and protocol listed in no_hide_services_ports.
ICMP is Protocol 1, as I recall.
I think you can leave port as zero in this case, though haven't tried it.
May be worth involving the TAC.

cosmos · ‎2020-09-08

Hey guys, I'm trying to understand why this doesn't work out of the box with ClusterXL. Is my understanding correct with ClusterXL, by default the standby member forwards outbound traffic via the sync network?

In the same scenario, regardless of the value of fwha_forw_packet_to_not_active I am unable to even reach the default gateway from the standby member, and I don't see any outbound traffic from the device based on the routing table - instead I get drops on the active gateway "dropped by fw_send_log_drop Reason: Rulebase drop" and I'm seeing these drops on the Mgmt interface (secondary sync) hitting the cleanup rule. Tcpdump confirms the traffic is forwarded over the management interface.

The behaviour is just with the HA module, when I cphastop the standby member I can at least reach the default gateway. I would prefer the outbound traffic to come from the cluster IP if it's forwarded to the active member, since the external interface is private and we only have a NAT for the cluster IP.

Whatever is happening with cluster hide and fold, it's just not right and I've never seen a permanent fix without messing with table.def that leads to other issues (this is also not a fix for me, due to the NAT above).

@PhoneBoy I noticed you linked to sk80520 in another article, I'm willing to go down this path and contact TAC if necessary but it's making what was supposed to be a short engagement unprofitable. I'd really love to see a permanent fix for this that works out of the box. IMHO if the standby gateway wants to send stuff to active, mr active needs to be prepared to deal with it 🙂

https://community.checkpoint.com/t5/Enterprise-Appliances-and-Gaia/ClusterXL-standby-cannot-reach-ga...
https://community.checkpoint.com/t5/Enterprise-Appliances-and-Gaia/Problem-accessing-standby-cluster...
https://community.checkpoint.com/t5/Enterprise-Appliances-and-Gaia/Connectivity-issues-from-standby-...
https://supportcenter.checkpoint.com/supportcenter/portal?eventSubmit_doGoviewsolutiondetails=&solut...
https://supportcenter.checkpoint.com/supportcenter/portal?eventSubmit_doGoviewsolutiondetails=&solut...

PhoneBoy · ‎2020-09-10

Do you have explicit rules in the rulebase allowing the traffic in this case?

cosmos · ‎2020-09-28

As it turned out, explicit rules were needed until the manager was upgraded to R80.40 (it was running R80.10 with the cluster at R80.40). I had to do a double take, didn't think this was possible before I saw the compatibility matrix... supported with limitations 🙂

tpattgeek · ‎2020-09-29

Yes, we had explicit rules allowing this as long as the primary unit remained Active. As soon as the secondary unit became Active, all traffic failed.

This actually turned out to be an issue with the ISP. We went from a standalone deployment to a cluster and the secondary's public IP was not routing properly, although it should've been. It uncovered a greater underlying problem, but there was no issue with the cluster itself. Thanks all for the input!

cosmos · ‎2020-09-10

Didn't think it was necessary if we "allow outgoing traffic from gateway" in global properties? Now that you mention it, if I do have explicit rules allowing this traffic (i.e. GW cluster to Internets) wouldn't anti-spoofing break the flow, dropping packets on a sync interface with the src IP of the standby member's external interface?

Would be great to have a separate routing table / VRF for management and control over 'service routes' 🙂

Alex1994 · ‎2024-06-19

I am also having this same issue in R81.10. Still looking for a proper solution.

Chris_Atkinson · ‎2024-06-19

My understanding was the issue above was resolved with the correct rules & routing.

CCSM R77/R80/ELITE

Alex1994 · ‎2024-06-19

Nothing error with routung or rules.

1. If the the gateway becomes ptimary, ICMP packets are forwarding to the gateway and, then the secondary (which was primary before failover) would unable to ping to its gateway.

2.No routing issue. Once the gw is removed from cluster, it starts to ping its gateway.

bnosie · ‎2024-08-21

I have added the following to no_hide_services_ports to allow http, https, ssh, dns, tftp, and icmp:

<80, 6>, <443, 6>, <53, 17>, <69, 17>, <22, 6>, <0, 1>

Installed policy, and now almost everything works, except I am unable to check for updates on the standby.

Routing shows it should be taking the external interface, but logs show it originating from inside interface.

Are you a member of CheckMates?

Standby Cluster Member cannot reach the Internet