Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Tchangoloro
Participant
Participant

Packet Loss and Traffic Drops Caused by ARP Table Overflow in Layer 2 Core Topology

The purpose of this post is to present a technical analysis of the behavior observed in the customer environment, detailing the symptoms, investigation process, identified root cause, and corrective actions applied.

During the analysis, the environment presented intermittent packet loss affecting communication between internal networks and services traversing the Check Point Firewall (R82.10). The issue was not constant, which made the initial identification more complex and required a deeper investigation.

From an operational standpoint, one of the main observed behaviors was the firewall dropping traffic unexpectedly. Legitimate traffic was being matched against the cleanup rule without any clear indication of misconfiguration in security policies or NAT rules. The behavior appeared random and inconsistent across different traffic flows.

The initial investigation focused on common areas such as:

  • Security policy evaluation

  • NAT rules

  • Connection table utilization

  • CPU and memory consumption

No abnormal behavior or resource exhaustion was identified in these components that could justify the observed symptoms.

The analysis then progressed to system-level verification, where logs from /var/log/messages were reviewed. During this step, repeated entries of:

kernel: neighbour table overflow


Apr 8 11:20:03 2026 FW-CP-01 kernel:[140434.823181] neighbour: arp_cache: neighbor table overflow!
Apr 8 11:20:03 2026 FW-CP-01 kernel:[140434.823291] neighbour: arp_cache: neighbor table overflow!
Apr 8 11:20:03 2026 FW-CP-01 kernel:[140434.823416] neighbour: arp_cache: neighbor table overflow!

 

were identified.

This finding indicated a condition related to the ARP/neighbor table at the operating system level.

From a topology perspective, the client environment is designed with the core switch operating at Layer 2, delegating Layer 3 responsibilities to the Check Point firewall. As a result, the firewall is responsible for both inter-VLAN routing and maintaining ARP resolution (IP-to-MAC mapping) for all connected devices.

Considering the high number of devices in the environment, this design leads to a significant increase in ARP table entries on the firewall.

To support the analysis, the following command was used to monitor the ARP table size:

ip -s neigh | wc -l

It was observed that the number of ARP entries was approaching and reaching the configured limit of the ARP cache.

Based on this evidence, the root cause was identified as an insufficient ARP table size. The cache-size parameter was configured with a limit of 4096 (Default value) entries, which is not adequate for the scale and characteristics of the client’s environment.

Once this threshold was reached:

  • The system began aggressively removing entries from the ARP table

  • ARP resolution became inconsistent

  • The firewall was intermittently unable to resolve destination MAC addresses

  • Valid traffic could not be properly forwarded

This behavior directly explains the packet loss observed and the fact that legitimate traffic was being dropped and matched against the cleanup rule.

As a corrective action, the ARP table size was increased using the following command:

set arp table cache-size 16000

The change was applied immediately, without requiring a reboot or policy installation.

After the adjustment:

  • The “neighbour table overflow” messages were no longer observed

  • Packet loss symptoms were eliminated

  • Traffic stopped being unexpectedly matched against the cleanup rule

  • Overall network stability was restored

In conclusion, the observed behavior was caused by ARP table exhaustion due to the number of devices in an architecture where Layer 3 functions are centralized on the firewall. The implemented adjustment resolved the issue, and this analysis documents the scenario for future reference and preventive actions.


Reference SK:
https://support.checkpoint.com/results/sk/sk43772

6 Replies
Tchangoloro
Participant
Participant

I’d like to thank my Friend work Paulo Henrique for his support during the analysis of this issue. His contribution was important in identifying the root cause and helping drive the investigation forward.

0 Kudos
Timothy_Hall
MVP Gold
MVP Gold

Great write-up, this was covered in the last edition of my book and will be covered in the new one as well.  I characterize this issue as a "rolling outage": some systems can seem to communicate OK through the firewall, then suddenly can't, then can again, etc.  Tough to troubleshoot, as it is not a Check Point code problem; it is a Linux issue.  Whenever I see a subnet mask on firewall interfaces of /22 or larger (i.e. /22 through /8), I'm definitely on the lookout for this issue.  It could have just been someone being lazy with subnet mask assignments, or there really could be hundreds or thousands of systems directly adjacent to the firewall overflowing the ARP cache, which is shared by all firewall interfaces.

New Book: "Max Power 2026" Coming Soon
Check Point Firewall Performance Optimization
0 Kudos
Bob_Zimmerman
MVP Gold
MVP Gold

I mostly see this triggered by inventory scanners checking for new systems they don't yet know about. Say you have a firewall with 20 interfaces with /24 blocks (e.g, a whole /24 playground for 20 different application teams to build things in). Ping through the firewall to every address on those blocks to discover new stuff which hasn't been added to asset management, and the firewall's ARP table will fill with thousands of "Incomplete" entries.

It's really annoying how tightly Linux hangs on to open ARP requests.

0 Kudos
PhoneBoy
Admin
Admin

I ran into a similar problem when I set my default route to an interface rather than a specific IP address.
In that case, it wasn't resolved by increasing the arp cache.

0 Kudos
Bob_Zimmerman
MVP Gold
MVP Gold

Oof. I thought there was a guardrail against that. Setting a route to an interface with no gateway says to ARP out that interface for the addresses. With the default route set that way, you'll ARP for everything unless you have a more specific route pointing elsewhere.

0 Kudos
PhoneBoy
Admin
Admin

This was a while ago...having done it back in the R77.x days. 🙂

0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events