- CheckMates
- :
- Products
- :
- Quantum
- :
- Security Gateways
- :
- Re: ClusterXL - standby cannot reach gateway
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Are you a member of CheckMates?
×- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
ClusterXL - standby cannot reach gateway
I've got a R77.30 cluster of two nodes (running on vmware).
The active node can ping the default gateway and onward to the rest of the network without any issue.
However, the standby node can't even ping the gateway, let alone anything beyond it. If I unload the policy from the node, then it is able ping it.
Logs suggest the traffic is being nat'd to the cluster's address. The gateway can ping active, standby and cluster addresses.
I've tried fw ctl set int fwha_forw_packet_to_not_active 1 on both nodes, but that didn't help.
The management interface is reachable via a different gateway (and static route).
Any suggestions greatly appreciated!
- Tags:
- clusterxl
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Scott. I spent 2 weeks troubleshooting this issue.
I have found it strange that when doing a TCPDump on the SYNC interface, the clusterXL control traffic was visible (8116/TCP) on both the standby and active firewalls, but DNS queries, HTTPS requests and other traffic was only seen on the standby (blocked by the VMWare switch due to coming from a different source MAC).
After setting the "Forged Transmits" to "Accept", everything works as expected.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm pretty sure that's the wrong "fix" for the problem, reading this sk: ClusterXL drops traffic with 'dropped by fwha_forw_run Reason: Failed to send to another cluste...
But perhaps the debugging in this SK might be helpful in figuring out where the true problem is.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks. I've tried the 'fw zdebug' command both with and without that "fix" (currently disabled).
Only error displayed is :-
;fw_log_drop_ex: Packet proto=-1 ?:0 -> ?:0 dropped by fwha_select_arp_packet Reason: CPHA replies to arp;
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There's at least one TAC case that mentions this error message and the fix for it being the kernel variable you set it.
Which suggests a TAC case might be in order for further troubleshooting.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
ok - thanks. I'll get a case logged.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
One of the things with running clusters in VM-Ware is that they run better with VRRP. Then you can also control this behaviour from the Dashboard.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The decision not to use VRRP was taken by a Checkpoint engineer, not us. They migrated us from an old (R6x) physical cluster to the new R77.30 VMware based one.
Support ticket has been responded to, asking for any error shown by zdebug drop... even though I'd included that in the original ticket.
I'm now waiting for them to set up a remote session..
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yeah I know, when you open the ticket, you upload the cpinfo, first thing they ask for is? Yep a cpinfo.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
End of the working day for me here in the UK and no progress. Updated the ticket 8 hours ago with suitable timeslots for remote access, but not heard anything at all..
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you have the opportunity, switch to VRRP and see what comes of it.
Be aware that in the VM switch you have to disable all security features of the ports connected to your FW's and make sure IGMP snooping is allowed.
When you do choose VRRP do not use extended vMAC just the standard vMAC mode.
Last resort, try if you need to change from multicast to broadcast for ccc protocol.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It's a production firewall, so slightly hesitant to switch to VRRP yet..
Had a ticket update overnight, to say someone else is working it and asking me questions I've already answered 🙂
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Not impressed with support at all.
Spent an hour on the phone and clearly explained that the problem affects all outgoing traffic from the standby node (specifically NTP, DNS and HTTPS) but the tech has focused solely on NTP and wants screenshots of the NTP configuration, diagnostics of NTP, etc.
Completely ignoring the basic fault that there's zero outbound connectivity to anything via the default gateway.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What i would want to know is the current business impact of this issue - are any TP updates not working on the standby node ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Immediate impact is that the standby can't get any checkpoint updates, sync time. We're also looking at deploying the IPS blade onto it, but can't while the standby can get out of sync for updates, etc.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Do you already know sk43807 Anti-Virus / URL Filtering / IPS update fails on the Standby member of ClusterXL in High Ava...?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We have the same issue where the secondary node is not able to reach the next hop gateway. Did you come to resolution to your issue?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello! We are having the same issue, were you able to get this resolved?
Thanks,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
was there any feedback from TAC on this ?
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We are having this exact same issue. Did anyone find a resolution? We have firewall appliances though and not VMs.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Have a look at my thread over here: https://community.checkpoint.com/t5/Enterprise-Appliances-and-Gaia/Connectivity-issues-from-standby-...
My issue occurred after upgrading the environment from R80.30. We ended up having to follow step 4 in sk43807 (also mentioned by @G_W_Albrecht ).
Thanks,
Ruan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
So, not exactly the same issue as we are only running clusterXL and not VRRP. Working with support and our Sales Engineer now but will update this post with our results. Checkpoint currently thinks it's a network issue because it can see DNS requests going out but no replies on the standby.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi! we are having the same issue (arp dropped by CPHA) but on the active node with message:
;fw_log_drop_ex: Packet proto=-1 ?:0 -> ?:0 dropped by fwha_select_arp_packet Reason: CPHA replies to arp;
I have contacted TAC and have a SR but this is an old installation on R77.30 (that we are going to upgrade soon, but in the meantime we need to fix this issue)
Did anyone find the cause for this drops? Best regards
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I know this thread is nearly 5 years old, but I don't see a solution, and we hit exactly the same issue
R81.10 machines running on ESXi VM hosts, secondary can't ping the gateway unless the policy is unloaded. Gateway management traffic works fine, probably because it doesn't pass through the policy.
The standby box actually tries to pass external traffic through the active box using the sync connection, which is designed behaviour I believe.
My colleague found a setting on the vSwitch in ESX that seems to be cauing the problem. Under policies, there is a setting for 'Forged transmits'. The default is Reject. Setting it to Accept on the VLAN the Sync traffic uses seems to be working now
The checkpoint uses some kind of virtual MAC for that traffic that the vSwitch doesn't like, so it drops it apparently
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Note these settings for VMware are documented in sk101214.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi. Is there an equivalent for NSX-T? We just hit exactly the same issue, but we can't resolve it the same way as there is no access to the ESX vSwitch in this environment, only the NSX-T overlay settings.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, i remember that the message about fwha_select_arp_packet was the expected behaviour, in our case it seems that the issue was not related with Checkpoint....
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Scott. I spent 2 weeks troubleshooting this issue.
I have found it strange that when doing a TCPDump on the SYNC interface, the clusterXL control traffic was visible (8116/TCP) on both the standby and active firewalls, but DNS queries, HTTPS requests and other traffic was only seen on the standby (blocked by the VMWare switch due to coming from a different source MAC).
After setting the "Forged Transmits" to "Accept", everything works as expected.
