The NAT Cache
The first packet of a new connection accepted by the Network Access layer that involves
NAT will always make a trip to the Firewall Path (F2F) by default, unless NAT
Templates are enabled. Even if the firewall has a very large NAT rulebase, most of the
time the firewall doesn't even have to evaluate it due to the NAT Cache Table
fwx_cache which is enabled by default. Essentially the most common NAT rulebase
hits are cached in a special state table that can determine the required NAT operation
quite efficiently during future NAT lookups. The presence of this NAT caching
mechanism is probably why hit counts are not available for NAT rules. Once the first
packet of an accepted connection has been NATted, the NAT rulebase and its
fwx_cache table is never consulted again for that particular connection, and as such the
NAT applied to a connection's packets cannot ever change after the connection’s first
packet.
By default this NAT cache can contain up to 10,000 cached NAT entries (the cached
entries are expired from the table after 30 minutes by default). Whenever an “Original
Packet” NAT rule match occurs in the NAT rule base, the source, destination, and
service port number associated with the matched NAT rule are cached in the fwx_cache
table, along with the necessary NAT operation to be performed under “Translated
Packet”. If the same source, destination, and service port number show up for a new
allowed connection, the NAT rulebase itself is never consulted, and the cached NAT
operation is performed instead thus saving the overhead of a full NAT rulebase lookup.
So how many of the potential 10,000 NAT cache entries are in use on your firewall?
Run the command fw tab -t fwx_cache –s
#VALS indicates the number of current cached NAT entries and #PEAK indicates the
peak number of entries in the table used since the firewall was last booted. If both
#VALS and #PEAK are substantially less than 10,000, either you are not doing very much
NAT on your firewall, or a maximal level of NAT caching is already occurring in the
Firewall Path and there is nothing more to do.
However, what if #PEAK shows 10,000 exactly? First off, fear not. Using up all the
entries in the NAT cache table does not cause an outage or otherwise impede traffic; it
just results in more NAT rulebase lookups then there otherwise would be, which does
impact performance. So how can we assess whether the NAT cache size should be
increased?
As mentioned earlier, hit counts are not available for NAT rules. However you can
poke around directly in the fwx_cache table on the live gateway to get an idea of which
NAT rules are being used the most. The fwx_cache table does track the NAT rule
number of cached entries; this lengthy command will show the top 20 most commonly
cached/hit NAT rules:
fw tab -u -t fwx_cache|awk '{ print $3 }'|cut -c5-8|sort -n|uniq -c| sort -nr|head -20
Hopefully you should now have a good idea of the most commonly hit rules in your
NAT policy. But here comes the next big question: If the #PEAK value for the
fwx_cache table is exactly 10,000, how much would potentially increasing the NAT
cache size help? There is no direct way to answer this question, but there is a way to see
how much CPU overhead the NAT cache is currently saving at its current size: we can
simply turn it off and see what happens to CPU utilization on the firewall!
To accomplish this, run the command mpstat 1 in a separate window. This
command will show CPU utilization (user/sys/soft/hi as also shown in the top
command) once per second. Let the command run for a while to get a good idea of the
firewall’s current CPU load. Now in a separate window disable the NAT cache on the
fly with: fw ctl set int fwx_do_nat_cache 0 . Observe the CPU utilization
closely after the change. Does it go up, down or stay the same? You can probably draw
your own conclusions about what to do given what you’ve observed thus far, but in
general if the CPU load remains the same or drops (and earlier #PEAK showed exactly
10,000 or whatever the limit was set to) increasing the fwx_cache won’t help. If
however the overall CPU load went up, and especially if it went up a lot, increasing the
fwx_cache size may help especially if you have a large NAT rulebase (1500+ rules).
Before considering an increase in the NAT cache size, make sure firewall has plenty
of free memory (run free -m and look at the third line as detailed in the last chapter).
The fwx_alloc table by default can have up to 10,000 entries, and it appears that each
entry consumes about 50 bytes of memory, so the maximum amount of RAM that
fwx_alloc can consume by default is 500,000 bytes (or about 500Kbytes). To increase
the NAT cache size see sk21834: How to modify values of properties related to NAT cache table "fwx_cache".
If you have determined that you need to increase it, I’d recommend just
doubling the fwx_cache size to start with – please resist the urge to crank it up to an
obnoxiously high value as that can cause its own problems and actually hurt
performance. After increasing the value run the commands shown earlier again to assess
the impact of the change, and if it still needs to be increased try doubling it again. Just
don’t forget to turn the NAT cache back on by setting fwx_do_nat_cache back to 1
when you are done!