Greetings.
I'm having a customer that is facing an issue with Hide-NAT Exhaustion but it seems to be a case of the NAT table not clearing correctly. When viewing CPview History data it doesn't seem normal for the customer to be facing these issues, but suddenly they maxed out and CPview -> NAT is showing all the capacity being used and the logs are showing messages about "NAT Hide failure - There are currently not available ports for hide operation.".
The customer already did a smart thing and changed the NAT rule for this traffic to use another address for Hide-NAT. They did this yesterday but still, CPview -> NAT is showing that the old Hide-NAT is still using its full capacity. We know that the new address is in place as traffic is working and it's showing in the statistics as well.
But we don't understand why the statistics for the old address is still showing as 99% used?
The customer is running a Check Point High Availability cluster on a set of two Check Point Quantum 28800 Plus appliances running R80.40 with the latest general availability jumbo hotfix accumulator take 118.
I've tried to verify a few things. GNAT is enabled by default on R80.40 as long as there are more than six firewall instances / CoreXL workers. We also notice from the logs that the format is showing as protocol, source IP, destination IP, destination port which indicates GNAT as the output should have been protocol, source IP and destination IP (not destination port) if it was not using GNAT.
Just to make sure I got a few outputs from the customer:
[Expert@TK-FW-12:0]# fw ctl get int fwx_alloc_entry_expiration
fwx_alloc_entry_expiration = 300
[Expert@TK-FW-12:0]# fw ctl get int fwx_alloc_free_port_timeout
fwx_alloc_free_port_timeout = 1
[Expert@TK-FW-12:0]# dynamic_split -p
Dynamic Balancing is currently off
[Expert@TK-FW-12:0]# fw ctl get int fwx_gnat_enabled
fwx_gnat_enabled = 1
Configuring Check Point CoreXL...
=================================
CoreXL is currently enabled with 60 IPv4 firewall instances and 4 IPv6 firewall instances.
These are all default settings for R80.40. GNAT is active and the default expiration time for NAT is 300 seconds. I was confused on why CoreXL Split / Dynamic Balancing would not be in use on an appliance with so many cores but it seems like it was never on by default on R80.40, that is a thing with R81+. On R80.40 it has to be enabled manually regardless of the number of cores you have.
It's been over 18 hours since the customer changed the NAT rule from using the old IP to using the new IP. Still CPview -> NAT is showing 99% used on the old IP. When running a cppcap for over five minutes we can't see anything on the active member indicating traffic or existing connections that could be keeping the old hide-NAT table alive:
[Expert@TK-FW-12:0]# cppcap -f "src 185.176.215.252 and dst 146.192.252.64" -DNT
0 packets captured (0 B)
This doesn't make much sense to me. If there is no traffic passing the firewall indicating that the old IP is being used for hide-NAT why doesn't the table clear? Is this a bug?
The commands I know in order to manually clear the NAT-table etc.. Are not working on R80.40 with GNAT enabled so I don't have any commands for verifying the current state of the NAT-table or to manually clear it. So I have no real way of knowing if this is simply CPview showing the wrong data, if the NAT-table is somehow stuck not clearing or what is going on.
Normally I would have used:
fw tab -t fwx_alloc -s, to view the statistics of the NAT-table
fw tab -t fwx_alloc -x, to clear it
But fwx_alloc does not exist on R80.40 with GNAT enabled:
No such table fwx_alloc
sk165153 for GNAT doesn't really give me any similar commands to run. All it contains is information on how to do kernel debugs.
Do we have any commands that can be used on R80.40+ with GNAT enabled?
Certifications: CCSA, CCSE, CCSM, CCSM ELITE, CCTA, CCTE, CCVS, CCME