Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Collaborator

Connectivity issues from standby gateway after R80.10 -> R80.30 upgrade

Jump to solution

Good day,

I have recently completed an upgrade from R80.10 to R80.30 (Management + 2 gateways in HA cluster).  The upgrade itself was successful but I have noticed one issue on the standby gateway.  We cannot ping or do NSlookups etc from the standby node.  License checks also fails on this node.

What I have attempted thus far:

  1. Set the "fw ctl set int fwha_forw_packet_to_not_active 1" on both gateways
  2. Followed the guidance in sk147093 (fw ctl zdebug output matched that in the SK, as per below, IP sanitised)

121670435;[cpu_1];[SIM-207375815];update_tcp_state: invalid state detected (current state: 0x10000, th_flags=0x10, cdir=0) -> dropping packet, conn: [<1.1.1.1,2022,2.2.2.2,88,6>][PPK0];
@;121670435;[cpu_1];[SIM-207375815];sim_pkt_send_drop_notification: (0,0) received drop, reason: general reason, conn:

It is important to note that all connectivity is restored when I do a fw unloadlocal.  There has also been no changes to either NAT or firewall policies.

I've found a couple of posts on Checkmates describing similiar issue, but unfortunately no resolution apart from the steps above.

I will also log a TAC case, but hoping to hear if anyone has experienced similiar issues after an upgrade?

Thanks,

Ruan

 

0 Kudos
1 Solution

Accepted Solutions
Highlighted
Collaborator

Hi Everyone,

We worked with TAC and manage to resolve the issue.  In the end we had to follow step 4 in sk43807.  All updates etc are working and all warnings in Smartconsole have been cleared.

Cheers,

Ruan

View solution in original post

0 Kudos
16 Replies
Highlighted
Champion
Champion
All R80.30 gateway clusters we run are using VRRP and I can set this NAT function on the cluster object and still do not understand why this option is not available for ClusterXL.
I really don't.
Regards, Maarten
0 Kudos
Highlighted
Champion
Champion

Looks like sk147493 - seems no R80.30 Jumbo has this fix yet...

0 Kudos
Highlighted
Collaborator
I understood from TAC that there is a hotfix available, but they prefer not to deploy as it might be overwritten by the next Jumbo, causing behaviour regression.
0 Kudos
Highlighted
Collaborator

I have opened a case with TAC.  They seemed surprised that the kernel parameter did not fix the issue, I will update this thread once we have a resolution.

0 Kudos
Highlighted
Champion
Champion

What i do wonder is why this is regarded as an issue ? Usually, i do not issue ping nor nslookup from the CLI of standby cluster members - or is there a very good reason for that ?

0 Kudos
Highlighted
Champion
Champion
The nslookup prevents that the gateway has access to the Check Point cloud, so when there is a failover, many things need to get their updates at that moment...
Meaning there is no updated URL/APCL database, IPS (when set to the gateway gets it by itself), Dynamic objects will fail during the first minute.
Also Cpuse will not be able to show you the list of available downloads, so when you want to update the cluster with the latest Jumbo, you need to make the member master first wait for it to get the update list etc etc.
Regards, Maarten
0 Kudos
Highlighted
Collaborator

Hi Everyone,

We worked with TAC and manage to resolve the issue.  In the end we had to follow step 4 in sk43807.  All updates etc are working and all warnings in Smartconsole have been cleared.

Cheers,

Ruan

View solution in original post

0 Kudos
Highlighted
Explorer

I'm seeing this issue still in R80.40, is the 'solution' listed here by Ruan still considered the accepted approach to remediate?  

 

thanks

0 Kudos
Highlighted
Explorer

Same issue for me with a new cluster R80.40 on OpenPlatform (Dell R640).  Whichever member is in standby state cannot establish outbound connections (UDP, TCP, ICMP, etc) .  The odd thing is that only traffic generated from the firewall going outbound is having the issue.  Traffic that establishes a connection to the firewall (pings, ssh, etc)  is working just fine from external IPs on the Internet.

I checked the SYNC interface and the local STANDBY member's generated traffic to the Internet is being forwarded to the ACTIVE member over the SYNC interface and ignoring the default gateway.  I am seeing the source external interface IP of the STANDBY member and destination of the external google server I am pinging and the destination MAC address of the ACTIVE member's sync interface IP.  This is with a simple ping to google.  DNS to an external DNS server doesn't work either as that is also forwarded to the ACTIVE member.

This seems like pretty basic stuff that needs to work.  I don't have this problem on clusters running R80.10 or earlier (I have never used R80.10 or R80.20 on gateways so I don't know about them).  We really shouldn't need any workarounds for the STANDBY to be able to send local outbound traffic to check for updates.  The default route is not being used so something in the checkpoint firewall stuff is intercepting the local traffic and forwarding it to the ACTIVE member's SYNC interface IP.  An 'ip route get <www.google.com IP>' shows that it should be forwarded to the default gateway and not over the SYNC interface.

0 Kudos
Highlighted
Explorer

I fixed the issue with 'fw ctl set int fwha_forw_packet_to_not_active 1'.  I tried that once (had the same setting on other sites) and it didn't work so I assumed something else was problematic.  Not sure why it didn't work the first time I tried it but it is working now.

It appears the forwarding of cluster member originating traffic to the active member is expected and normal behavior.  The gateways hide behind the cluster IP.  STANDBY sends STANDBY originating traffic through the SYNC network with it's own external IP as the source.  The ACTIVE member NATs the source IP to look like it came from the external cluster IP.  Return traffic goes back to the ACTIVE member and then gets forwarded to the STANDBY.

0 Kudos
Highlighted
Explorer

It does work, but I guess the question still remains as to why a cli hack is needed for what is basically a required function.  Ah well, perhaps in R80.50 and beyond it will become a simple tick button..

0 Kudos
Highlighted
Champion
Champion

A simple tick button that has always been there for a VRRP cluster....

Regards, Maarten
0 Kudos
Highlighted
Explorer

If you found a tick box in the GUI for 'fw ctl set int fwha_forw_packet_to_not_active 1', please do share.

0 Kudos
Highlighted
Explorer

Interesingly, I just found sk93204 article that was last updated in May 2020 but showing it applies to R80.40 saying that you should not use the 'fwha_forw_packet_to_not_active 1' setting on R80.20+ for a 'Standyby drops' problem and not to do anything for that for the newer versions.  It claims you should use 'fw ctl set int fwha_silent_standby_mode 1' for R80.20+ to solve a 'Peer drops' problem.  I never set that and only just found that article.  I am pretty sure that I had the 'Standby drops' problem though because the standby couldn't connect outbound to checkpoint.  It seems like the 'fwha_forw_packet_to_not_active 1'  is still needed from my experience.

0 Kudos
Highlighted
Participant

This behaviour changed in R80.40. In R80.30 the traffic is still sent between the gateways, but they both use the same interface. In R80.40 the standby uses the sync interface instead. This broke stuff for us (I have a ticket open). The standby gateway needs to access a website through a VPN tunnel. Works fine pre-R80.40

Since the upgrade, the standby forwards the packet over the sync interface, the active encrypts it, send it through the tunnel, gets the reply, decrypts it, then drops it because it says the interface is illegal...

dropped by vpn_encrypt_chain Reason: illegal interface group

0 Kudos
Highlighted
Collaborator

Hi Jon,

Something else that worked well for me in a couple of other instances (and is perhaps a bit easier to do) is to do NAT rules which forces a "no hide":

Ruan_Kotze_0-1603188954065.jpeg