RX-OVR drops and 10 gb hardware buffer

Daniel_Westlund · ‎2018-09-04

Customer's rx drops equals their rx overrides. They are having issues with too many drops, which manifests on policy install as the adjacent switches see the link going down. They have 2 1gb interfaces bonded on internal and external. Regarding rx-ovr drops, according to SK33781:

the number of times the receiver hardware was unable to hand received data to a hardware buffer - the internal FIFO buffer of the chip is full, but is still tries to handle incoming traffic ; most likely, the input rate of traffic exceeded the ability of the receiver to handle the data.

Max Power says the best solution here is to bond more interfaces. As they already have 2 bonded interfaces, we are considering 10 gb interfaces. To my question then: does a 10 gb interface have a larger hardware buffer than 2 1 gb bonded interfaces? And my follow up question: do you have numbers on this? These are 13800s with the onboard 1 gb and 10 gb interfaces. This is a big customer and we are in a precarious position on this one and this answer may help decide how we proceed. This is very time sensitive as well. Thank you.

JozkoMrkvicka · ‎2018-09-04

One option is to increase RX ring size buffer, BUT as was mentioned by Tim, this is not the solution.

Check this one:

https://community.checkpoint.com/message/22354-re-increasing-fifo-buffers-on-firewall-interface

cc Timothy Hall‌

Kind regards,
Jozko Mrkvicka

Timothy_Hall · ‎2018-09-05

10Gbps interfaces definitely have a lot more NIC hardware buffer space and much more processing power to avoid overruns. 1Gbps interfaces tend to start running out of gas somewhere north of 900Mbps of throughput, especially if most frames are near the minimum size. Increasing ring buffer sizes is a last resort and may get you into further trouble.

So a 10Gbps interface would certainly help, but before going down that road there are a few things that you should check:

1) Is the traffic being reasonably load balanced between the two 1Gbps bonded interfaces? Please post the output of netstat -ni showing both physical interfaces that are part of the bond. The RX-OK and TX-OK values should be roughly equivalent between the two interfaces. If RX-OK is substantially higher on one interface vs. another, check your bond operation mode/hash policy & STP setup on the upstream switch; if TX-OK is substantially higher on one interface vs. another, check your bond operation mode/hash policy on the firewall itself via the Gaia web interface.

2) Some driver/NIC combinations increment RX-DRP and RX-OVR in lock-step, and it is not possible to conclusively determine what is going on with just the netstat command. Please post the output of ethtool -S (interface) for the two physical interfaces, with this info it will be possible to see if there are overruns or drops/misses occurring on the interfaces. The mitigation strategy for one vs. the other is quite different.

3) If the adjacent switches are truly losing link integrity that sounds like a physical issue, since no matter how bad RX-DRPs/RX-OVR's get it should never cause a loss of link integrity. The ethtool output will show if there are actual carrier transitions occurring, it is also possible the switches are seeing the firewall NICs sending a flow control XOFF but that situation is quite different from actually losing carrier.

--
Second Edition of my "Max Power" Firewall Book
Now Available at http://www.maxpowerfirewalls.com

Attend my online "Be your Own TAC: Part Deux" CheckMates event
March 27th with sessions for both the EMEA and Americas time zones

Daniel_Westlund · ‎2018-09-05

Thanks! An SE is going on site today and we may get this info you asked about. In the time since I posted this, I found a thread on CPUG for a similar issue to what we are seeing here, and so for the time being I'm running with your ideas from that thread, that the rx-ovr may just be regular drops in lock step as you say here, and that "Keep all connections" should reduce the load on policy push at the time we see connections drop and rx-drops and the link show as down from the switch. The customer understands that is more of a fix for the symptom than the core issue.

Daniel_Westlund · ‎2018-09-05

Below are netstat -ni and ethtool -S (interface) from one of the physical interfaces, and they all look similar. I don't see override errors on the physical. I do see in netstat -ni that TX-OK is about 3 times more for the first interface in the bond than the second. They are running 802.3ad for the bond operational mode. The ClusterXL Admin guide says "All the slave interfaces of a bond must be connected to the same switch. The switch itself must support and be configured for Link Aggregation, by the same standard (for example, 802.3ad, or XOR) as the Security Gateway bond." I do not know that to be the case on the switch, but I can check. Let me know your thoughts:

Kernel Interface table
Iface       MTU Met    RX-OK RX-ERR RX-DRP RX-OVR    TX-OK TX-ERR TX-DRP TX-OVR Flg
Mgmt       1500   0        0      0      0      0        0      0      0      0 BMU
Sync       1500   0 139761106      0   4769      0 483359145      0      0      0 BMRU
bond1      1500   0 5871992169      0 45491 45491 7323084565      0      0      0 BMmRU
bond1.5    1500   0 256770036      0      0      0 197671961      0      0      0 BMmRU
bond1.6    1500   0        7      0      0      0     7051      0      0      0 BMmRU
bond1.10   1500   0 5349158116      0      0      0 6813570243      0      0      0 BMmRU
bond1.162 1500   0 7998804      0      0      0 9863117      0      0      0 BMmRU
bond1.164 1500   0       94      0      0      0   302840      0      0      0 BMmRU
bond1.2096 1500   0 5913743      0      0      0 7977585      0      0      0 BMmRU
bond1.2128 1500   0 240366030      0      0      0 287076273      0      0      0 BMmRU
bond2      1500   0 8035743271      0 232231 232231 6498159149      0      0      0 BMmRU
bond2.2    1500   0 417512540      0      0      0 493632825      0      0      0 BMmRU
bond2.3    1500   0 6657176536      0      0      0 5183424132      0      0      0 BMmRU
bond2.4    1500   0 221669412      0      0      0 173697823      0      0      0 BMmRU
bond2.7    1500   0        7      0      0      0     7051      0      0      0 BMmRU
bond2.11   1500   0 715470330      0      0      0 639767094      0      0      0 BMmRU
bond2.12   1500   0       26      0      0      0    16350      0      0      0 BMmRU
bond2.69   1500   0   252108      0      0      0   204379      0      0      0 BMmRU
bond2.209 1500   0 4406029      0      0      0   753373      0      0      0 BMmRU
bond2.2231 1500   0   575132      0      0      0   176454      0      0      0 BMmRU
bond2.2232 1500   0        7      0      0      0     7051      0      0      0 BMmRU
bond2.2233 1500   0 3237699      0      0      0 3268497      0      0      0 BMmRU
eth1-05    1500   0 3207332404      0 15410 15410 5565121913      0      0      0 BMsRU
eth1-06    1500   0 2664661974      0 30081 30081 1757965501      0      0      0 BMsRU
eth1-07    1500   0 4295070957      0 130127 130127 4859459422      0      0      0 BMsRU
eth1-08    1500   0 3740674264      0 102104 102104 1638701316      0      0      0 BMsRU
lo        16436   0 14648647      0      0      0 14648647      0      0      0 LRU

NIC statistics:
     rx_packets: 3743909849
     tx_packets: 1640140201
     rx_bytes: 3340303812889
     tx_bytes: 695877051313
     rx_broadcast: 9467756
     tx_broadcast: 8181805
     rx_multicast: 506441
     tx_multicast: 14089
     multicast: 506441
     collisions: 0
     rx_crc_errors: 0
     rx_no_buffer_count: 83
     rx_missed_errors: 102104
     tx_aborted_errors: 0
     tx_carrier_errors: 0
     tx_window_errors: 0
     tx_abort_late_coll: 0
     tx_deferred_ok: 0
     tx_single_coll_ok: 0
     tx_multi_coll_ok: 0
     tx_timeout_count: 0
     rx_long_length_errors: 0
     rx_short_length_errors: 0
     rx_align_errors: 0
     tx_tcp_seg_good: 0
     tx_tcp_seg_failed: 0
     rx_flow_control_xon: 0
     rx_flow_control_xoff: 0
     tx_flow_control_xon: 0
     tx_flow_control_xoff: 0
     rx_long_byte_count: 3340303812889
     tx_dma_out_of_sync: 0
     lro_aggregated: 0
     lro_flushed: 0
     lro_recycled: 0
     tx_smbus: 0
     rx_smbus: 0
     dropped_smbus: 0
     os2bmc_rx_by_bmc: 0
     os2bmc_tx_by_bmc: 0
     os2bmc_tx_by_host: 0
     os2bmc_rx_by_host: 0
     rx_errors: 0
     tx_errors: 0
     tx_dropped: 0
     rx_length_errors: 0
     rx_over_errors: 0
     rx_frame_errors: 0
     rx_fifo_errors: 102104
     tx_fifo_errors: 0
     tx_heartbeat_errors: 0
     tx_queue_0_packets: 196959828
     tx_queue_0_bytes: 83040941180
     tx_queue_0_restart: 0
     tx_queue_1_packets: 201112750
     tx_queue_1_bytes: 92708792933
     tx_queue_1_restart: 1
     tx_queue_2_packets: 213423707
     tx_queue_2_bytes: 86340004477
     tx_queue_2_restart: 1
     tx_queue_3_packets: 216066507
     tx_queue_3_bytes: 83047793486
     tx_queue_3_restart: 1
     tx_queue_4_packets: 204186677
     tx_queue_4_bytes: 73263555527
     tx_queue_4_restart: 0
     tx_queue_5_packets: 204344731
     tx_queue_5_bytes: 92081661081
     tx_queue_5_restart: 2
     tx_queue_6_packets: 213009461
     tx_queue_6_bytes: 93208852639
     tx_queue_6_restart: 1
     tx_queue_7_packets: 191036540
     tx_queue_7_bytes: 77855042012
     tx_queue_7_restart: 0
     rx_queue_0_packets: 453259566
     rx_queue_0_bytes: 409861555187
     rx_queue_0_drops: 0
     rx_queue_0_csum_err: 835
     rx_queue_0_alloc_failed: 0
     rx_queue_1_packets: 489210382
     rx_queue_1_bytes: 429176166020
     rx_queue_1_drops: 0
     rx_queue_1_csum_err: 782
     rx_queue_1_alloc_failed: 0
     rx_queue_2_packets: 525554274
     rx_queue_2_bytes: 416704083839
     rx_queue_2_drops: 0
     rx_queue_2_csum_err: 1347
     rx_queue_2_alloc_failed: 0
     rx_queue_3_packets: 477789928
     rx_queue_3_bytes: 449291458631
     rx_queue_3_drops: 0
     rx_queue_3_csum_err: 1047
     rx_queue_3_alloc_failed: 0
     rx_queue_4_packets: 464353091
     rx_queue_4_bytes: 423139957349
     rx_queue_4_drops: 0
     rx_queue_4_csum_err: 1394
     rx_queue_4_alloc_failed: 0
     rx_queue_5_packets: 455349424
     rx_queue_5_bytes: 390616991572
     rx_queue_5_drops: 0
     rx_queue_5_csum_err: 1065
     rx_queue_5_alloc_failed: 0
     rx_queue_6_packets: 428981359
     rx_queue_6_bytes: 384441256460
     rx_queue_6_drops: 0
     rx_queue_6_csum_err: 493
     rx_queue_6_alloc_failed: 0
     rx_queue_7_packets: 449411832
     rx_queue_7_bytes: 407122763165
     rx_queue_7_drops: 0
     rx_queue_7_csum_err: 1075
     rx_queue_7_alloc_failed: 0

Timothy_Hall · ‎2018-09-05

OK now we are getting somewhere:

0) No carrier transitions, at least on the eth1-08 interface. Looks like bond1 consists of eth1-05 and eth1-06, while bond2 is eth1-07 and eth1-08. RX-DRP percentage is far less than the target of 0.1% though.

1) Looks like you only provided the ethtool stats for eth1-08, but is showing 99% misses/drops. There were 83 overruns probably caused by the ring buffer being full thus causing backpressure into the NIC buffer which was then overrun a few times. I would expect the other interfaces are similar, see #3 below...

2) Inbound RX balancing of the bonds looks good, but TX numbers are far enough apart that you probably should set L3/L4 hash balancing if you haven't already, although you don't seem to be having any problems on the TX side.

3) Given this is a 13800 with 20 cores, you are almost certainly running with a default split of 2/18 (4/36 if SMT enabled) for CoreXL allocations. So only two physical SND/IRQ cores are emptying the ring buffers of four very busy 1Gbps interfaces and they are not keeping up and causing drops/misses; if a large percentage of traffic is accelerated (use fwaccel stats -s to check) those 2 cores will be getting absolutely killed and seriously crimp the throughput of the box.

Would strongly recommend adjusting CoreXL split via cpconfig. If "Accelerated pkts/Total pkts" is >50% as reported by fwaccel stats -s reduce number of kernel instances from 18 to 14 to allocate 6 SND/IRQ cores; you may also want to disable SMT/Hyperthreading in this instance.

If "Accelerated pkts/Total pkts" is <50% as reported by fwaccel stats -s reduce number of kernel instances from 18/36 to 16/32 to allocate more SND/IRQ cores and leave SMT/Hyperthreading on.

--
Second Edition of my "Max Power" Firewall Book
Now Available at http://www.maxpowerfirewalls.com

Attend my online "Be your Own TAC: Part Deux" CheckMates event
March 27th with sessions for both the EMEA and Americas time zones

Daniel_Westlund · ‎2018-09-05

Thanks again. We are actually way below 50% acceleration because they have a ton of blades turned on, and we already have Hyperthreading enabled with 8 SNDs and 32 workers like you suggested. Multiqueue is enabled on these interfaces (igb) and we added all 8 SNDs to multiqueue. Our next plan is to change to Keep all connections on the policy install. So, I don't know how to set L3/L4 hash balancing offhand but am looking into how to do that.

Timothy_Hall · ‎2018-09-07

Right since you have plenty of SND/IRQ instances, automatic interface affinity (assuming SecureXL is enabled) will eventually assign each firewall interface its own dedicated SND/IRQ core to empty its ring buffer. If you are still experiencing RX-DRPs in that situation, that's the time to enable Multi-Queue on your busiest interfaces which you have already done.

You may want to run sar -n EDEV to see exactly when you are piling up those RX-DRPs. If they are only happening around the time of policy installation that is expected to some degree, and setting "Keep all connections" under Connection Persistence will help. If you are slowly accumulating them over time though, I'd argue that is not an actual problem since your RX-DRP rate is well below the generally recommended 0.1% (yours is actually 0.002%). This is a rather deep topic; please see the "RX-DRP Analysis & Discussion" section of my book for a discussion about why we don't necessarily expect RX-DRP to always be zero, even on a well-tuned firewall. RX-ERR and RX-OVR on the other hand are another matter entirely and should be zero or very close to it.

--
Second Edition of my "Max Power" Firewall Book
Now Available at http://www.maxpowerfirewalls.com

Attend my online "Be your Own TAC: Part Deux" CheckMates event
March 27th with sessions for both the EMEA and Americas time zones

Daniel_Westlund · ‎2018-09-07

Thank you. We are sure the drops in question are happening on policy push so I think you're right that Keep all connections is a good next step. 23900s are on the way as well.

Are you a member of CheckMates?

RX-OVR drops and 10 gb hardware buffer