Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
tpattgeek
Explorer
Jump to solution

Standby Cluster Member cannot reach the Internet

Hi all,

 

I recently deployed a cluster for a customer with a fresh R80.30 install for both virtual MGMT and 3100 gateways.  We cut over from a single appliance to the cluster which connects via layer 2 to a cable modem.  VRRP is working fine, but the standby member is failing to reach the Internet.  All routes are the same and valid, but the standby arp table only shows the inside next hop and the outside VIP.  After checking the logs, I see the standby is hitting the policy and being blocked (since there are no rules in the policy to allow the gateways' public IPs.  It is showing the source of all traffic as coming from the standby's physical external IP.

I checked all of the common SKs and it seems like THIS - sk43807 may be the most promising one.

Is the recommended fix to implement all of the steps from this SK in order?

Since it only bypasses hide NAT for the specific ports, how do you resolve ICMP failures from the standby member?

The option for using a virtual MAC is unchecked; should that always be enabled, especially with a cable modem?

The reason I mention the virtual MAC is because I attempted to fail over to the standby as a means to test this, and all connectivity was lost.  The standby assumed Active role, but no connectivity was allowed through the device, that is why I'm thinking there are multiple adjustments needed.  I added a rule to allow the standby gateway through the policy, and connectivity was restored to the standby unit, but I don't believe that is the proper fix... Any help is appreciated!

 

0 Kudos
1 Solution

Accepted Solutions
PhoneBoy
Admin
Admin

Do you have explicit rules in the rulebase allowing the traffic in this case?

View solution in original post

0 Kudos
10 Replies
PhoneBoy
Admin
Admin

It's actually port and protocol listed in no_hide_services_ports.
ICMP is Protocol 1, as I recall.
I think you can leave port as zero in this case, though haven't tried it.
May be worth involving the TAC.

0 Kudos
cosmos
Advisor

Hey guys, I'm trying to understand why this doesn't work out of the box with ClusterXL. Is my understanding correct with ClusterXL, by default the standby member forwards outbound traffic via the sync network?

In the same scenario, regardless of the value of fwha_forw_packet_to_not_active I am unable to even reach the default gateway from the standby member, and I don't see any outbound traffic from the device based on the routing table - instead I get drops on the active gateway "dropped by fw_send_log_drop Reason: Rulebase drop" and I'm seeing these drops on the Mgmt interface (secondary sync) hitting the cleanup rule. Tcpdump confirms the traffic is forwarded over the management interface.

The behaviour is just with the HA module, when I cphastop the standby member I can at least reach the default gateway. I would prefer the outbound traffic to come from the cluster IP if it's forwarded to the active member, since the external interface is private and we only have a NAT for the cluster IP.

Whatever is happening with cluster hide and fold, it's just not right and I've never seen a permanent fix without messing with table.def that leads to other issues (this is also not a fix for me, due to the NAT above).

@PhoneBoy I noticed you linked to sk80520 in another article, I'm willing to go down this path and contact TAC if necessary but it's making what was supposed to be a short engagement unprofitable. I'd really love to see a permanent fix for this that works out of the box. IMHO if the standby gateway wants to send stuff to active, mr active needs to be prepared to deal with it 🙂

https://community.checkpoint.com/t5/Enterprise-Appliances-and-Gaia/ClusterXL-standby-cannot-reach-ga... 
https://community.checkpoint.com/t5/Enterprise-Appliances-and-Gaia/Problem-accessing-standby-cluster... 
https://community.checkpoint.com/t5/Enterprise-Appliances-and-Gaia/Connectivity-issues-from-standby-... 
https://supportcenter.checkpoint.com/supportcenter/portal?eventSubmit_doGoviewsolutiondetails=&solut... 
https://supportcenter.checkpoint.com/supportcenter/portal?eventSubmit_doGoviewsolutiondetails=&solut... 

0 Kudos
PhoneBoy
Admin
Admin

Do you have explicit rules in the rulebase allowing the traffic in this case?

0 Kudos
cosmos
Advisor

As it turned out, explicit rules were needed until the manager was upgraded to R80.40 (it was running R80.10 with the cluster at R80.40). I had to do a double take, didn't think this was possible before I saw the compatibility matrix... supported with limitations 🙂

0 Kudos
tpattgeek
Explorer

Yes, we had explicit rules allowing this as long as the primary unit remained Active.  As soon as the secondary unit became Active, all traffic failed.

 

This actually turned out to be an issue with the ISP.  We went from a standalone deployment to a cluster and the secondary's public IP was not routing properly, although it should've been.  It uncovered a greater underlying problem, but there was no issue with the cluster itself.  Thanks all for the input!

0 Kudos
cosmos
Advisor

Didn't think it was necessary if we "allow outgoing traffic from gateway" in global properties? Now that you mention it, if I do have explicit rules allowing this traffic (i.e. GW cluster to Internets) wouldn't anti-spoofing break the flow, dropping packets on a sync interface with the src IP of the standby member's external interface?

Would be great to have a separate routing table / VRF for management and control over 'service routes' 🙂

 

0 Kudos
Alex1994
Explorer

I am also having this same issue in R81.10. Still looking for a proper solution.

0 Kudos
Chris_Atkinson
Employee Employee
Employee

My understanding was the issue above was resolved with the correct rules & routing.

CCSM R77/R80/ELITE
0 Kudos
Alex1994
Explorer

Nothing error with routung or rules.

1. If the the gateway becomes ptimary, ICMP packets are forwarding to the gateway and, then the secondary (which was primary before failover) would unable to ping to its gateway.

2.No routing issue. Once the gw is removed from cluster, it starts to ping its gateway.

0 Kudos
bnosie
Participant

I have added the following to no_hide_services_ports to allow http, https, ssh, dns, tftp, and icmp:

<80, 6>, <443, 6>, <53, 17>, <69, 17>, <22, 6>, <0, 1>

Installed policy, and now almost everything works, except I am unable to check for updates on the standby.  

Routing shows it should be taking the external interface, but logs show it originating from inside interface.

0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events