Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
johnnyringo
Advisor

R80.40 NAT port exhaustion: why do cluster members show vastly different high port capacity?

Saw a few warnings/errors today on a specific R80.40 gateway regarding NAT pool exhaustion. This showed up before with R80.30 since we have a source NAT hide rule for traffic from Internet coming to the application.  

I went to the gateway and ran a cpview, then looked under Advanced -> NAT.  The problematic gateway showed a High Port capacity of 66.  The other gateway in the cluster showed 16533, which seems to be the normal value 

I've also confirmed this walking SNMP OID 1.3.6.1.4.1.2620.1.56.1301.3.1.8  One cluster member shows a value of 16533, the other shows a number in the mid-60s.  Did the same for some other R80.40 gateways and the numbers were always the same. 

Very confused why this would be.  I do understand the port allocations changed in R80.40 but would certainly expect each member to show the same amount of capacity.  I've replicated this in a lab setup and found it's consistent for R80.40, and failing over the firewalls had no effect on the numbers reported.  

 

0 Kudos
9 Replies
Chris_Atkinson
Employee Employee
Employee

What model gateway is used here out of interest? 

CCSM R77/R80/ELITE
0 Kudos
johnnyringo
Advisor

CloudGuard IaaS (High Availability) on Google Cloud Platform

0 Kudos
Chris_Atkinson
Employee Employee
Employee

I presume with fewer than 5-cores assigned to each instance, has GNAT been enabled manually?

CCSM R77/R80/ELITE
0 Kudos
johnnyringo
Advisor

No, everything should just be running a factory-default configuration.  

0 Kudos
johnnyringo
Advisor

Just to be sure, I've verified GNAT is disabled on both members:

[Expert@cp-member-a:0]# modinfo -p $FWDIR/boot/modules/fw_kern*.o | sort -u | awk 'BEGIN {FS=":"} ; {print $1}' | xargs -n 1 fw ctl get int | grep gnat_

enable_cgnat_hairpinning = 0
fwx_cgnat_sync_table = 0
fwx_gnat_enabled = 0


[Expert@cp-member-b:0]# modinfo -p $FWDIR/boot/modules/fw_kern*.o | sort -u | awk 'BEGIN {FS=":"} ; {print $1}' | xargs -n 1 fw ctl get int | grep gnat_

enable_cgnat_hairpinning = 0
fwx_cgnat_sync_table = 0
fwx_gnat_enabled = 0

 

0 Kudos
the_rock
Legend
Legend

0 Kudos
johnnyringo
Advisor

Yeah already read that last year due to having a similar issue in R80.30 where we hit NAT exhaustion at around 1200 connections (nowhere near the 16,335 capacity).  TAC could never explain why.  In this case, question is why the capacity on two different cluster members reports 66 vs. 16,335.  

This thread is interesting: R80.40 GNAT issue after Upgrade

But, these were fresh R80.40 deployments.  Also, I ran the one-liner and verified fwx_gnat_enabled = 0 is on both members 

0 Kudos
the_rock
Legend
Legend

I saw someone post below link that TAC gave them when they had same issue, but can't recall what they ended up changing from the sk. Let me see if I can find that post.

https://supportcenter.checkpoint.com/supportcenter/portal?eventSubmit_doGoviewsolutiondetails=&solut...

0 Kudos
johnnyringo
Advisor

Well TAC just replied.  What a surprise, it's a bug.  Never saw that coming!  😂

sk177228: /var/log/messages Is Flooded on the Standby Member with the Log 'allocate_port_impl: Could...

Currently the custom Hotfix is only available on Take 120.  Were on a mix of Take 125 and 139. 

I really wonder if just force-enabling GNAT is the better solution.  I still don't understand why it would only be abled for 5 vCPUs or higher.   

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events