Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Jason_Carrillo
Collaborator

RX-DRP/rx_missed_errors on Interface

We have a cluster of Open Hardware R80.10 systems that have some decent numbers of RX-DRPs showing up when we run netstat -ni. We are seeing that those numbers coincide with rx_missed_errors when we run the ethtool command. Output of netstat ni and and ethtool -S output for that gateway.

Output of ethtool -i:

1:0]# ethtool -i eth0
driver: ixgbe
version: 3.9.15-NAPI
firmware-version: 0x61bd0001
bus-info: 0000:04:00.0

I've got 3 other similarly outfitted clusters that don't see errors on the same scale but they are no where near as busy as this main gateway. 

The volume of these errors is pretty small, 0.0087%, and as far as we can tell, there aren't any issues being caused by it, but the presence of these errors prevents us from being able to run Optimized Drops on this wall. When we turn on Optimized Drops, SecureXL crashes after these errors accrue and then our CPU goes up across all the cores. 

When it comes to acceleration we rely on it quite a bit to keep stuff moving:

[Expert@FW1:0]# fwaccel stats -s
Accelerated conns/Total conns : 266989/361053 (73%)
Accelerated pkts/Total pkts : 1334142802/1410322803 (94%)
F2Fed pkts/Total pkts : 41668137/1410322803 (2%)
PXL pkts/Total pkts : 34511864/1410322803 (2%)
QXL pkts/Total pkts : 0/1410322803 (0%)
[Expert@FW-1:0]#

Any input is appreciated, just trying to figure out if there is an easy fix for this or if there is something more sinister going on underneath.

6 Replies
HeikoAnkenbrand
Champion Champion
Champion

RX Error counters are incremented by frames received by the NIC that are corrupted in some way:

Possible duplex mismatch on both interfaces of the link.
Faulty NIC, cable, physical media issue.
CRC failures.
In addition, NIC speed / duplex mis-match with the connecting port on the switch/router might be the cause.

RX Drops in excess of 0.1% - 0.5% (it is recommended to err on the smaller value) of total transmitted packets are indicative of an issue. Anything below these % values can just be random errors / trivial effect on performance.
➜ CCSM Elite, CCME, CCTE ➜ www.checkpoint.tips
HeikoAnkenbrand
Champion Champion
Champion

I think 0.0087% shoud not an issue.

➜ CCSM Elite, CCME, CCTE ➜ www.checkpoint.tips
HeikoAnkenbrand
Champion Champion
Champion

Or see SK:

Excessive RX Errors / RX Drops found on interface

➜ CCSM Elite, CCME, CCTE ➜ www.checkpoint.tips
christian_konne
Participant

I think also 0,0087% is not a problem.

0 Kudos
Timothy_Hall
Legend Legend
Legend

That RX-DRP percentage is quite low and really nothing to worry about.  I highly doubt that RX-DRPs are interfering with the SecureXL Optimized Drop function as they are two completely different things, but in general I'm not a fan of enabling optimized drops unless you need them as it can lead to complications with SecureXL like the ones you are experiencing.

Since it looks like you have a high percentage of fully-accelerated traffic (SXL path), you almost certainly need to reduce the number of kernel instances via cpconfig, thus adding more SND/IRQ instances to help handle all the fully accelerated traffic.  An rx_missed_error indicates no ring buffer slot is available for an incoming frame, which is mainly caused by not enough SND/IRQ CPU resources being available to empty it in a timely fashion.  Please provide the output of the following commands and I can recommend how many additional SND/IRQ cores you should allocate:

fwaccel stat

fw ctl affinity -l -r

free -m

grep -c ^processor /proc/cpuinfo

sim affinity -l

enabled_blades

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
0 Kudos
Jason_Carrillo
Collaborator

"I highly doubt that RX-DRPs are interfering with the SecureXL Optimized Drop function as they are two completely different things,"

I'll see if I can find the ticket, but that is what TAC told me. Turning off Optimized Drops on this particular cluster fixed the issue. I get what you are saying though.

I attached the output requested and I think you might be on to something because I am only allocating two cores to the SNDs on this cluster.

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events