Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Richard_Scott1
Participant
Jump to solution

ClusterXL - standby cannot reach gateway

I've got a R77.30 cluster of two nodes (running on vmware). 

 

The active node can ping the default gateway and onward to the rest of the network without any issue. 

 

However, the standby node can't even ping the gateway, let alone anything beyond it. If I unload the policy from the node, then it is able ping it.

Logs suggest the traffic is being nat'd to the cluster's address. The gateway can ping active, standby and cluster addresses.

 

I've tried  fw ctl set int fwha_forw_packet_to_not_active 1 on both nodes, but that didn't help.

 

 The management interface is reachable via a different gateway (and static route).

Any suggestions greatly appreciated!

1 Solution

Accepted Solutions
JH_Ranger
Participant

Thanks Scott. I spent 2 weeks troubleshooting this issue.

I have found it strange that when doing a TCPDump on the SYNC interface, the clusterXL control traffic was visible (8116/TCP) on both the standby and active firewalls, but DNS queries, HTTPS requests and other traffic was only seen on the standby (blocked by the VMWare switch due to coming from a different source MAC).

After setting the "Forged Transmits" to "Accept", everything works as expected.

View solution in original post

(1)
26 Replies
PhoneBoy
Admin
Admin

I'm pretty sure that's the wrong "fix" for the problem, reading this sk: ClusterXL drops traffic with 'dropped by fwha_forw_run Reason: Failed to send to another cluste... 

But perhaps the debugging in this SK might be helpful in figuring out where the true problem is.

0 Kudos
Richard_Scott1
Participant

Thanks. I've tried the 'fw zdebug' command both with and without that "fix" (currently disabled).

Only error displayed is :-

;fw_log_drop_ex: Packet proto=-1 ?:0 -> ?:0 dropped by fwha_select_arp_packet Reason: CPHA replies to arp;

0 Kudos
PhoneBoy
Admin
Admin

There's at least one TAC case that mentions this error message and the fix for it being the kernel variable you set it.

Which suggests a TAC case might be in order for further troubleshooting.

0 Kudos
Richard_Scott1
Participant

ok - thanks. I'll get a case logged.

0 Kudos
Maarten_Sjouw
Champion
Champion

One of the things with running clusters in VM-Ware is that they run better with VRRP. Then you can also control this behaviour from the Dashboard.

Regards, Maarten
0 Kudos
Richard_Scott1
Participant

The decision not to use VRRP was taken by a Checkpoint engineer, not us. They migrated us from an old (R6x) physical cluster to the new R77.30 VMware based one. 

Support ticket has been responded to, asking for any error shown by zdebug drop... even though I'd included that in the original ticket.

I'm now waiting for them to set up a remote session..

0 Kudos
Maarten_Sjouw
Champion
Champion

Yeah I know, when you open the ticket, you upload the cpinfo, first thing they ask for is? Yep a cpinfo.

Regards, Maarten
Richard_Scott1
Participant

End of the working day for me here in the UK and no progress. Updated the ticket 8 hours ago with suitable timeslots for remote access, but not heard anything at all..

0 Kudos
Maarten_Sjouw
Champion
Champion

If you have the opportunity, switch to VRRP and see what comes of it.

Be aware that in the VM switch you have to disable all security features of the ports connected to your FW's and make sure IGMP snooping is allowed.

When you do choose VRRP do not use extended vMAC just the standard vMAC mode.

Last resort, try if you need to change from multicast to broadcast for ccc protocol.

Regards, Maarten
0 Kudos
Richard_Scott1
Participant

It's a production firewall, so slightly hesitant to switch to VRRP yet..

Had a ticket update overnight, to say someone else is working it and asking me questions I've already answered 🙂

0 Kudos
Richard_Scott1
Participant

Not impressed with support at all.

Spent an hour on the phone and clearly explained that the problem affects all outgoing traffic from the standby node (specifically NTP, DNS and HTTPS) but the tech has focused solely on NTP and wants screenshots of the NTP configuration, diagnostics of NTP, etc. 

Completely ignoring the basic fault that there's zero outbound connectivity to anything via the default gateway.

0 Kudos
G_W_Albrecht
Legend Legend
Legend

What i would want to know is the current business impact of this issue - are any TP updates not working on the standby node ?

CCSP - CCSE / CCTE / CTPS / CCME / CCSM Elite / SMB Specialist
0 Kudos
Richard_Scott1
Participant

Immediate impact is that the standby can't get any checkpoint updates, sync time. We're also looking at deploying the IPS blade onto it, but can't while the standby can get out of sync for updates, etc.

0 Kudos
G_W_Albrecht
Legend Legend
Legend

Do you already know sk43807 Anti-Virus / URL Filtering / IPS update fails on the Standby member of ClusterXL in High Ava...?

CCSP - CCSE / CCTE / CTPS / CCME / CCSM Elite / SMB Specialist
0 Kudos
Networks_Winter
Explorer

Hi, 

We have the same issue where the secondary node is not able to reach the next hop gateway. Did you come to resolution to your issue?

0 Kudos
Daniel_Bourne
Participant

Hello!  We are having the same issue, were you able to get this resolved?

Thanks,

0 Kudos
Vincent_Van_Bru
Explorer
Hello,
was there any feedback from TAC on this ?
Thanks.
0 Kudos
Gregory_Link
Contributor

We are having this exact same issue.  Did anyone find a resolution?  We have firewall appliances though and not VMs.

0 Kudos
Ruan_Kotze
Advisor

Hi,

Have a look at my thread over here: https://community.checkpoint.com/t5/Enterprise-Appliances-and-Gaia/Connectivity-issues-from-standby-...

My issue occurred after upgrading the environment from R80.30.  We ended up having to follow step 4 in sk43807 (also mentioned by @G_W_Albrecht ).

Thanks,
Ruan

 

 

0 Kudos
Gregory_Link
Contributor

So, not exactly the same issue as we are only running clusterXL and not VRRP.  Working with support and our Sales Engineer now but will update this post with our results.  Checkpoint currently thinks it's a network issue because it can see DNS requests going out but no replies on the standby.

0 Kudos
Diego_dg
Collaborator

Hi! we are having the same issue (arp dropped by CPHA) but on the active node with message:

;fw_log_drop_ex: Packet proto=-1 ?:0 -> ?:0 dropped by fwha_select_arp_packet Reason: CPHA replies to arp;

I have contacted TAC and have a SR but this is an old installation on R77.30 (that we are going to upgrade soon, but in the meantime we need to fix this issue)

Did anyone find the cause for this drops? Best regards

0 Kudos
Scott_Paisley
Advisor

I know this thread is nearly 5 years old, but I don't see a solution, and we hit exactly the same issue

R81.10 machines running on ESXi VM hosts, secondary can't ping the gateway unless the policy is unloaded. Gateway management traffic works fine, probably because it doesn't pass through the policy.

The standby box actually tries to pass external traffic through the active box using the sync connection, which is designed behaviour I believe.

My colleague found a setting on the vSwitch in ESX that seems to be cauing the problem. Under policies, there is a setting for 'Forged transmits'. The default is Reject. Setting it to Accept on the VLAN the Sync traffic uses seems to be working now

The checkpoint uses some kind of virtual MAC for that traffic that the vSwitch doesn't like, so it drops it apparently

(2)
Chris_Atkinson
Employee Employee
Employee

Note these settings for VMware are documented in sk101214.

CCSM R77/R80/ELITE
0 Kudos
Scott_Paisley
Advisor

Hi. Is there an equivalent for NSX-T? We just hit exactly the same issue, but we can't resolve it the same way as there is no access to the ESX vSwitch in this environment, only the NSX-T overlay settings.

0 Kudos
Diego_dg
Collaborator

Hi, i remember that the message about fwha_select_arp_packet was the expected behaviour, in our case it seems that the issue was not related with Checkpoint....

0 Kudos
JH_Ranger
Participant

Thanks Scott. I spent 2 weeks troubleshooting this issue.

I have found it strange that when doing a TCPDump on the SYNC interface, the clusterXL control traffic was visible (8116/TCP) on both the standby and active firewalls, but DNS queries, HTTPS requests and other traffic was only seen on the standby (blocked by the VMWare switch due to coming from a different source MAC).

After setting the "Forged Transmits" to "Accept", everything works as expected.

(1)

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events