Solved: RX-DRP rx_missed_errors on VSX 23800 R80.30 with m...

Kaspars_Zibarts · ‎2020-03-26

I'm getting interesting appliance behaviour with increased traffic due to home working situation.

Traffic has grow considerably through the gateway but no core exceeds 60% load.

We have five 10Gbps interfaces configured in three bonds all using MQ. With the traffic increase we started seeing rx_missed_errors on 4 out 5 interfaces. I run this oneliner to get discard percentage:

ifconfig | awk '/Link /{print $1}' | grep ^eth | while read line; do perc=`ethtool -S $line | awk '/rx_packets/{tot=$2} /rx_missed_errors/{err=$2 ; tot = tot + 1; perc = err * 100 / tot; print perc}'`; echo "$line $perc"; done

We are getting close to the magic 0.1% recommended by @Timothy_Hall and it makes me nervous as we are far off appliance performance limits.

What's peculiar is that not the busiest interface that has most discards, actually one with the least traffic (bond0 / eth2-01)

Of course I could chuck more CPU at it, but considering it runns at 60% now, i shouldn't really see any discards if you ask me.

MQ hs 6 cores configured and all seem to be running roughly at the same level.

I might try to increase ring buffer on my standby node from 512 to 1024 just for the test.

But any other thoughts are welcome!

P.S. tried already to disable optimised drops but it did not help.

Timothy_Hall · ‎2020-03-26

The interfaces look fine, even though you are not seeing the associated SND CPUs hit 100% some RX-DRPs can still occur. This happens when a large burst of traffic arrives fast enough to fill the ring buffer before a SoftIRQ run can even start; can also happen if the CPU quickly bursts to 100% then recovers after a few seconds, monitoring tools like cpview and sar won't show brief spikes like that in history mode. The first and best solution is to always add more SND cores by reducing the number of worker cores. I realize this can be a pain on a VSX system that has all processors already assigned to various VSes.

I'm pretty sure that the interfaces are just very busy, especially since you are seeing some Ethernet Flow Control (tx_off) requests but no actual overruns of the NIC buffer. If you can't add more SND cores (or it will be too difficult to reassign everything in VSX) and Multi-Queue is already enabled, than doubling the ring buffer is acceptable.

I assume you are using the 2.6.18 kernel which has some lower queue number limits that 3.10, if you are bumping against these limits (i.e. 2-8 for igb and 16 for ixgbe - check with ethtool -i), adding more SND cores beyond these values won't help these interfaces. The limit for i40e and mlx_core drivers are a whopping 48 queues.

Gaia 4.18 (R82) Immersion Tips, Tricks, & Best Practices Video Course
Now Available at https://shadowpeak.com/gaia4-18-immersion-course

View solution in original post

Timothy_Hall · ‎2020-03-26

Please provide full output of ethtool -S on the physical interfaces of concern. RX-DRP is a roll-up of many different error counters, not all of which are directly related to ring buffer misses. Many other factors can cause RX-DRPs even through the ring buffers are never full, some of these situations are covered in my books in the section called "RX-DRP Revisited: Still Racking Them Up?".

Gaia 4.18 (R82) Immersion Tips, Tricks, & Best Practices Video Course
Now Available at https://shadowpeak.com/gaia4-18-immersion-course

Kaspars_Zibarts · ‎2020-03-26

actually my calculation was based on rx_missed_errors not RX-DRP, that was more of "entrance" into the research.

But sure, here's one sample 🙂

NIC statistics:
rx_packets: 59234278986
tx_packets: 47389040467
rx_bytes: 45116214092656
tx_bytes: 24548776983785
rx_errors: 0
tx_errors: 0
rx_dropped: 0
tx_dropped: 0
multicast: 372954
collisions: 0
rx_over_errors: 0
rx_crc_errors: 0
rx_frame_errors: 0
rx_fifo_errors: 0
rx_missed_errors: 34374835
tx_aborted_errors: 0
tx_carrier_errors: 0
tx_fifo_errors: 0
tx_heartbeat_errors: 0
rx_pkts_nic: 59234278994
tx_pkts_nic: 47389040476
rx_bytes_nic: 45604360801935
tx_bytes_nic: 24959156206165
lsc_int: 1
tx_busy: 0
non_eop_descs: 0
broadcast: 0
rx_no_buffer_count: 0
tx_timeout_count: 0
tx_restart_queue: 0
rx_long_length_errors: 0
rx_short_length_errors: 0
tx_flow_control_xon: 1093
rx_flow_control_xon: 0
tx_flow_control_xoff: 430937
rx_flow_control_xoff: 0
rx_csum_offload_errors: 1561370
alloc_rx_page_failed: 0
alloc_rx_buff_failed: 0
rx_no_dma_resources: 0
hw_rsc_aggregated: 0
hw_rsc_flushed: 0
fdir_match: 0
fdir_miss: 59239067677
fdir_overflow: 0
os2bmc_rx_by_bmc: 0
os2bmc_tx_by_bmc: 0
os2bmc_tx_by_host: 0
os2bmc_rx_by_host: 0
tx_queue_0_packets: 8196375609
tx_queue_0_bytes: 4218487439982
tx_queue_1_packets: 8090617291
tx_queue_1_bytes: 4199299032683
tx_queue_2_packets: 7811562109
tx_queue_2_bytes: 4021101469143
tx_queue_3_packets: 7788556218
tx_queue_3_bytes: 4036432035023
tx_queue_4_packets: 7579470416
tx_queue_4_bytes: 3925032074877
tx_queue_5_packets: 7907086610
tx_queue_5_bytes: 4146499979964
tx_queue_6_packets: 165795
tx_queue_6_bytes: 11656223
tx_queue_7_packets: 1410
tx_queue_7_bytes: 174840
tx_queue_8_packets: 1591
tx_queue_8_bytes: 197284
tx_queue_9_packets: 2286723
tx_queue_9_bytes: 210933242
tx_queue_10_packets: 190
tx_queue_10_bytes: 23560
tx_queue_11_packets: 141
tx_queue_11_bytes: 17484
tx_queue_12_packets: 1334792
tx_queue_12_bytes: 176220049
tx_queue_13_packets: 1219862
tx_queue_13_bytes: 160897604
tx_queue_14_packets: 1098819
tx_queue_14_bytes: 145017031
tx_queue_15_packets: 1035504
tx_queue_15_bytes: 135040573
tx_queue_16_packets: 935553
tx_queue_16_bytes: 122521022
tx_queue_17_packets: 1104874
tx_queue_17_bytes: 142870639
tx_queue_18_packets: 1068024
tx_queue_18_bytes: 140193716
tx_queue_19_packets: 1316526
tx_queue_19_bytes: 169215721
tx_queue_20_packets: 1404518
tx_queue_20_bytes: 184257242
tx_queue_21_packets: 2391428
tx_queue_21_bytes: 324914066
tx_queue_22_packets: 3018
tx_queue_22_bytes: 374232
tx_queue_23_packets: 3454
tx_queue_23_bytes: 428296
rx_queue_0_packets: 10397547303
rx_queue_0_bytes: 7826569033239
rx_queue_1_packets: 10108678420
rx_queue_1_bytes: 7705304743529
rx_queue_2_packets: 9677411142
rx_queue_2_bytes: 7455949515349
rx_queue_3_packets: 9602311738
rx_queue_3_bytes: 7306073555722
rx_queue_4_packets: 9619741384
rx_queue_4_bytes: 7238526203214
rx_queue_5_packets: 9828589007
rx_queue_5_bytes: 7583791043384

Kaspars_Zibarts · ‎2020-03-26

One one more from the other node where I increased RX buffer to 1024

Discards have reduced noticably here

NIC statistics:
     rx_packets: 492281539
     tx_packets: 333985053
     rx_bytes: 342486420872
     tx_bytes: 170250356763
     rx_errors: 0
     tx_errors: 0
     rx_dropped: 0
     tx_dropped: 0
     multicast: 851467
     collisions: 0
     rx_over_errors: 0
     rx_crc_errors: 0
     rx_frame_errors: 0
     rx_fifo_errors: 0
     rx_missed_errors: 142523
     tx_aborted_errors: 0
     tx_carrier_errors: 0
     tx_fifo_errors: 0
     tx_heartbeat_errors: 0
     rx_pkts_nic: 492281544
     tx_pkts_nic: 333985058
     rx_bytes_nic: 346487185143
     tx_bytes_nic: 173133431192
     lsc_int: 2
     tx_busy: 0
     non_eop_descs: 0
     broadcast: 14173266
     rx_no_buffer_count: 0
     tx_timeout_count: 0
     tx_restart_queue: 0
     rx_long_length_errors: 0
     rx_short_length_errors: 0
     tx_flow_control_xon: 11
     rx_flow_control_xon: 0
     tx_flow_control_xoff: 1074
     rx_flow_control_xoff: 0
     rx_csum_offload_errors: 2889
     alloc_rx_page_failed: 0
     alloc_rx_buff_failed: 0
     rx_no_dma_resources: 0
     hw_rsc_aggregated: 0
     hw_rsc_flushed: 0
     fdir_match: 0
     fdir_miss: 482249215
     fdir_overflow: 0
     os2bmc_rx_by_bmc: 0
     os2bmc_tx_by_bmc: 0
     os2bmc_tx_by_host: 0
     os2bmc_rx_by_host: 0
     tx_queue_0_packets: 53013804
     tx_queue_0_bytes: 28368267507
     tx_queue_1_packets: 59589477
     tx_queue_1_bytes: 30486582691
     tx_queue_2_packets: 51973202
     tx_queue_2_bytes: 25714775321
     tx_queue_3_packets: 54907040
     tx_queue_3_bytes: 30003511161
     tx_queue_4_packets: 51534709
     tx_queue_4_bytes: 24595977836
     tx_queue_5_packets: 58331198
     tx_queue_5_bytes: 30653022467
     tx_queue_6_packets: 764
     tx_queue_6_bytes: 32252
     tx_queue_7_packets: 0
     tx_queue_7_bytes: 0
     tx_queue_8_packets: 5
     tx_queue_8_bytes: 620
     tx_queue_9_packets: 2297882
     tx_queue_9_bytes: 211230646
     tx_queue_10_packets: 0
     tx_queue_10_bytes: 0
     tx_queue_11_packets: 0
     tx_queue_11_bytes: 0
     tx_queue_12_packets: 29953
     tx_queue_12_bytes: 2944055
     tx_queue_13_packets: 72259
     tx_queue_13_bytes: 6893659
     tx_queue_14_packets: 166301
     tx_queue_14_bytes: 15482975
     tx_queue_15_packets: 139160
     tx_queue_15_bytes: 12882424
     tx_queue_16_packets: 110788
     tx_queue_16_bytes: 10380792
     tx_queue_17_packets: 201929
     tx_queue_17_bytes: 18778639
     tx_queue_18_packets: 178532
     tx_queue_18_bytes: 16604576
     tx_queue_19_packets: 254072
     tx_queue_19_bytes: 23568047
     tx_queue_20_packets: 316342
     tx_queue_20_bytes: 29289618
     tx_queue_21_packets: 867640
     tx_queue_21_bytes: 80135140
     tx_queue_22_packets: 0
     tx_queue_22_bytes: 0
     tx_queue_23_packets: 0
     tx_queue_23_bytes: 0
     rx_queue_0_packets: 90810065
     rx_queue_0_bytes: 59060853917
     rx_queue_1_packets: 77696100
     rx_queue_1_bytes: 56167445221
     rx_queue_2_packets: 79309725
     rx_queue_2_bytes: 55689291854
     rx_queue_3_packets: 74184195
     rx_queue_3_bytes: 49611856230
     rx_queue_4_packets: 92202028
     rx_queue_4_bytes: 67993290818
     rx_queue_5_packets: 78079431
     rx_queue_5_bytes: 53963683758

Timothy_Hall · ‎2020-03-26

The interfaces look fine, even though you are not seeing the associated SND CPUs hit 100% some RX-DRPs can still occur. This happens when a large burst of traffic arrives fast enough to fill the ring buffer before a SoftIRQ run can even start; can also happen if the CPU quickly bursts to 100% then recovers after a few seconds, monitoring tools like cpview and sar won't show brief spikes like that in history mode. The first and best solution is to always add more SND cores by reducing the number of worker cores. I realize this can be a pain on a VSX system that has all processors already assigned to various VSes.

I'm pretty sure that the interfaces are just very busy, especially since you are seeing some Ethernet Flow Control (tx_off) requests but no actual overruns of the NIC buffer. If you can't add more SND cores (or it will be too difficult to reassign everything in VSX) and Multi-Queue is already enabled, than doubling the ring buffer is acceptable.

I assume you are using the 2.6.18 kernel which has some lower queue number limits that 3.10, if you are bumping against these limits (i.e. 2-8 for igb and 16 for ixgbe - check with ethtool -i), adding more SND cores beyond these values won't help these interfaces. The limit for i40e and mlx_core drivers are a whopping 48 queues.

Gaia 4.18 (R82) Immersion Tips, Tricks, & Best Practices Video Course
Now Available at https://shadowpeak.com/gaia4-18-immersion-course

Kaspars_Zibarts · ‎2020-03-26

Great! That's exactly what I wanted to hear! Thanks heaps! I'll add two more SND cores - I can do some chess moves with CoreXL 🙂

Would you say we might gain some performance with 3.10 kernel and R80.40? Apparently MQ has improved there

Timothy_Hall · ‎2020-03-26

MQ is enabled by default on all interfaces under Gaia 3.10 except the management interface, and some of the queue limits are higher for some interfaces too. Gaia 3.10 does permit the use of much later network driver versions than 2.6.18 did (since it is really the network drivers doing much of the MQ heavy lifting) so it may well be more efficient. MQ is configured directly from clish too, and the MQ setup is per VS when under VSX. There is also a new "out of the box" tool that is supposed to auto-configure MQ into the most efficient state possible when first loaded.

Gaia 4.18 (R82) Immersion Tips, Tricks, & Best Practices Video Course
Now Available at https://shadowpeak.com/gaia4-18-immersion-course

Kaspars_Zibarts · ‎2020-03-26

To be seen! I have 26k appliances ready configured in the lab to swap out 41k. I'm bit nervous as we run 40Gbps on 41k but 26k is suppose to handle way more than that 🙂

Kaspars_Zibarts · ‎2020-10-12

slightly more efficient one-liner for drop / discard percentage calculator that works also on 3.10 kernel 🙂

netstat -ni | awk '{print $1" "$4" "$6}' | egrep "^bond|^eth" | while read line; do echo $line | awk '{name=$1; tot=$2; err=$3; tot = tot + 1; perc = err * 100 / tot; print name" "perc}'; done

krit · ‎2021-04-22

Very handy, thanks!

Are you a member of CheckMates?

RX-DRP rx_missed_errors on VSX 23800 R80.30 with mltiqueue