- CheckMates
- :
- Products
- :
- General Topics
- :
- RX-OVR drops and 10 gb hardware buffer
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Are you a member of CheckMates?
×- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
RX-OVR drops and 10 gb hardware buffer
Customer's rx drops equals their rx overrides. They are having issues with too many drops, which manifests on policy install as the adjacent switches see the link going down. They have 2 1gb interfaces bonded on internal and external. Regarding rx-ovr drops, according to SK33781:
the number of times the receiver hardware was unable to hand received data to a hardware buffer - the internal FIFO buffer of the chip is full, but is still tries to handle incoming traffic ; most likely, the input rate of traffic exceeded the ability of the receiver to handle the data.
Max Power says the best solution here is to bond more interfaces. As they already have 2 bonded interfaces, we are considering 10 gb interfaces. To my question then: does a 10 gb interface have a larger hardware buffer than 2 1 gb bonded interfaces? And my follow up question: do you have numbers on this? These are 13800s with the onboard 1 gb and 10 gb interfaces. This is a big customer and we are in a precarious position on this one and this answer may help decide how we proceed. This is very time sensitive as well. Thank you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
One option is to increase RX ring size buffer, BUT as was mentioned by Tim, this is not the solution.
Check this one:
https://community.checkpoint.com/message/22354-re-increasing-fifo-buffers-on-firewall-interface
cc Timothy Hall
Jozko Mrkvicka
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10Gbps interfaces definitely have a lot more NIC hardware buffer space and much more processing power to avoid overruns. 1Gbps interfaces tend to start running out of gas somewhere north of 900Mbps of throughput, especially if most frames are near the minimum size. Increasing ring buffer sizes is a last resort and may get you into further trouble.
So a 10Gbps interface would certainly help, but before going down that road there are a few things that you should check:
1) Is the traffic being reasonably load balanced between the two 1Gbps bonded interfaces? Please post the output of netstat -ni showing both physical interfaces that are part of the bond. The RX-OK and TX-OK values should be roughly equivalent between the two interfaces. If RX-OK is substantially higher on one interface vs. another, check your bond operation mode/hash policy & STP setup on the upstream switch; if TX-OK is substantially higher on one interface vs. another, check your bond operation mode/hash policy on the firewall itself via the Gaia web interface.
2) Some driver/NIC combinations increment RX-DRP and RX-OVR in lock-step, and it is not possible to conclusively determine what is going on with just the netstat command. Please post the output of ethtool -S (interface) for the two physical interfaces, with this info it will be possible to see if there are overruns or drops/misses occurring on the interfaces. The mitigation strategy for one vs. the other is quite different.
3) If the adjacent switches are truly losing link integrity that sounds like a physical issue, since no matter how bad RX-DRPs/RX-OVR's get it should never cause a loss of link integrity. The ethtool output will show if there are actual carrier transitions occurring, it is also possible the switches are seeing the firewall NICs sending a flow control XOFF but that situation is quite different from actually losing carrier.
--
Second Edition of my "Max Power" Firewall Book
Now Available at http://www.maxpowerfirewalls.com
CET (Europe) Timezone Course Scheduled for July 1-2
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks! An SE is going on site today and we may get this info you asked about. In the time since I posted this, I found a thread on CPUG for a similar issue to what we are seeing here, and so for the time being I'm running with your ideas from that thread, that the rx-ovr may just be regular drops in lock step as you say here, and that "Keep all connections" should reduce the load on policy push at the time we see connections drop and rx-drops and the link show as down from the switch. The customer understands that is more of a fix for the symptom than the core issue.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Below are netstat -ni and ethtool -S (interface) from one of the physical interfaces, and they all look similar. I don't see override errors on the physical. I do see in netstat -ni that TX-OK is about 3 times more for the first interface in the bond than the second. They are running 802.3ad for the bond operational mode. The ClusterXL Admin guide says "All the slave interfaces of a bond must be connected to the same switch. The switch itself must support and be configured for Link Aggregation, by the same standard (for example, 802.3ad, or XOR) as the Security Gateway bond." I do not know that to be the case on the switch, but I can check. Let me know your thoughts:
Kernel Interface table
Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
Mgmt 1500 0 0 0 0 0 0 0 0 0 BMU
Sync 1500 0 139761106 0 4769 0 483359145 0 0 0 BMRU
bond1 1500 0 5871992169 0 45491 45491 7323084565 0 0 0 BMmRU
bond1.5 1500 0 256770036 0 0 0 197671961 0 0 0 BMmRU
bond1.6 1500 0 7 0 0 0 7051 0 0 0 BMmRU
bond1.10 1500 0 5349158116 0 0 0 6813570243 0 0 0 BMmRU
bond1.162 1500 0 7998804 0 0 0 9863117 0 0 0 BMmRU
bond1.164 1500 0 94 0 0 0 302840 0 0 0 BMmRU
bond1.2096 1500 0 5913743 0 0 0 7977585 0 0 0 BMmRU
bond1.2128 1500 0 240366030 0 0 0 287076273 0 0 0 BMmRU
bond2 1500 0 8035743271 0 232231 232231 6498159149 0 0 0 BMmRU
bond2.2 1500 0 417512540 0 0 0 493632825 0 0 0 BMmRU
bond2.3 1500 0 6657176536 0 0 0 5183424132 0 0 0 BMmRU
bond2.4 1500 0 221669412 0 0 0 173697823 0 0 0 BMmRU
bond2.7 1500 0 7 0 0 0 7051 0 0 0 BMmRU
bond2.11 1500 0 715470330 0 0 0 639767094 0 0 0 BMmRU
bond2.12 1500 0 26 0 0 0 16350 0 0 0 BMmRU
bond2.69 1500 0 252108 0 0 0 204379 0 0 0 BMmRU
bond2.209 1500 0 4406029 0 0 0 753373 0 0 0 BMmRU
bond2.2231 1500 0 575132 0 0 0 176454 0 0 0 BMmRU
bond2.2232 1500 0 7 0 0 0 7051 0 0 0 BMmRU
bond2.2233 1500 0 3237699 0 0 0 3268497 0 0 0 BMmRU
eth1-05 1500 0 3207332404 0 15410 15410 5565121913 0 0 0 BMsRU
eth1-06 1500 0 2664661974 0 30081 30081 1757965501 0 0 0 BMsRU
eth1-07 1500 0 4295070957 0 130127 130127 4859459422 0 0 0 BMsRU
eth1-08 1500 0 3740674264 0 102104 102104 1638701316 0 0 0 BMsRU
lo 16436 0 14648647 0 0 0 14648647 0 0 0 LRU
NIC statistics:
rx_packets: 3743909849
tx_packets: 1640140201
rx_bytes: 3340303812889
tx_bytes: 695877051313
rx_broadcast: 9467756
tx_broadcast: 8181805
rx_multicast: 506441
tx_multicast: 14089
multicast: 506441
collisions: 0
rx_crc_errors: 0
rx_no_buffer_count: 83
rx_missed_errors: 102104
tx_aborted_errors: 0
tx_carrier_errors: 0
tx_window_errors: 0
tx_abort_late_coll: 0
tx_deferred_ok: 0
tx_single_coll_ok: 0
tx_multi_coll_ok: 0
tx_timeout_count: 0
rx_long_length_errors: 0
rx_short_length_errors: 0
rx_align_errors: 0
tx_tcp_seg_good: 0
tx_tcp_seg_failed: 0
rx_flow_control_xon: 0
rx_flow_control_xoff: 0
tx_flow_control_xon: 0
tx_flow_control_xoff: 0
rx_long_byte_count: 3340303812889
tx_dma_out_of_sync: 0
lro_aggregated: 0
lro_flushed: 0
lro_recycled: 0
tx_smbus: 0
rx_smbus: 0
dropped_smbus: 0
os2bmc_rx_by_bmc: 0
os2bmc_tx_by_bmc: 0
os2bmc_tx_by_host: 0
os2bmc_rx_by_host: 0
rx_errors: 0
tx_errors: 0
tx_dropped: 0
rx_length_errors: 0
rx_over_errors: 0
rx_frame_errors: 0
rx_fifo_errors: 102104
tx_fifo_errors: 0
tx_heartbeat_errors: 0
tx_queue_0_packets: 196959828
tx_queue_0_bytes: 83040941180
tx_queue_0_restart: 0
tx_queue_1_packets: 201112750
tx_queue_1_bytes: 92708792933
tx_queue_1_restart: 1
tx_queue_2_packets: 213423707
tx_queue_2_bytes: 86340004477
tx_queue_2_restart: 1
tx_queue_3_packets: 216066507
tx_queue_3_bytes: 83047793486
tx_queue_3_restart: 1
tx_queue_4_packets: 204186677
tx_queue_4_bytes: 73263555527
tx_queue_4_restart: 0
tx_queue_5_packets: 204344731
tx_queue_5_bytes: 92081661081
tx_queue_5_restart: 2
tx_queue_6_packets: 213009461
tx_queue_6_bytes: 93208852639
tx_queue_6_restart: 1
tx_queue_7_packets: 191036540
tx_queue_7_bytes: 77855042012
tx_queue_7_restart: 0
rx_queue_0_packets: 453259566
rx_queue_0_bytes: 409861555187
rx_queue_0_drops: 0
rx_queue_0_csum_err: 835
rx_queue_0_alloc_failed: 0
rx_queue_1_packets: 489210382
rx_queue_1_bytes: 429176166020
rx_queue_1_drops: 0
rx_queue_1_csum_err: 782
rx_queue_1_alloc_failed: 0
rx_queue_2_packets: 525554274
rx_queue_2_bytes: 416704083839
rx_queue_2_drops: 0
rx_queue_2_csum_err: 1347
rx_queue_2_alloc_failed: 0
rx_queue_3_packets: 477789928
rx_queue_3_bytes: 449291458631
rx_queue_3_drops: 0
rx_queue_3_csum_err: 1047
rx_queue_3_alloc_failed: 0
rx_queue_4_packets: 464353091
rx_queue_4_bytes: 423139957349
rx_queue_4_drops: 0
rx_queue_4_csum_err: 1394
rx_queue_4_alloc_failed: 0
rx_queue_5_packets: 455349424
rx_queue_5_bytes: 390616991572
rx_queue_5_drops: 0
rx_queue_5_csum_err: 1065
rx_queue_5_alloc_failed: 0
rx_queue_6_packets: 428981359
rx_queue_6_bytes: 384441256460
rx_queue_6_drops: 0
rx_queue_6_csum_err: 493
rx_queue_6_alloc_failed: 0
rx_queue_7_packets: 449411832
rx_queue_7_bytes: 407122763165
rx_queue_7_drops: 0
rx_queue_7_csum_err: 1075
rx_queue_7_alloc_failed: 0
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
OK now we are getting somewhere:
0) No carrier transitions, at least on the eth1-08 interface. Looks like bond1 consists of eth1-05 and eth1-06, while bond2 is eth1-07 and eth1-08. RX-DRP percentage is far less than the target of 0.1% though.
1) Looks like you only provided the ethtool stats for eth1-08, but is showing 99% misses/drops. There were 83 overruns probably caused by the ring buffer being full thus causing backpressure into the NIC buffer which was then overrun a few times. I would expect the other interfaces are similar, see #3 below...
2) Inbound RX balancing of the bonds looks good, but TX numbers are far enough apart that you probably should set L3/L4 hash balancing if you haven't already, although you don't seem to be having any problems on the TX side.
3) Given this is a 13800 with 20 cores, you are almost certainly running with a default split of 2/18 (4/36 if SMT enabled) for CoreXL allocations. So only two physical SND/IRQ cores are emptying the ring buffers of four very busy 1Gbps interfaces and they are not keeping up and causing drops/misses; if a large percentage of traffic is accelerated (use fwaccel stats -s to check) those 2 cores will be getting absolutely killed and seriously crimp the throughput of the box.
Would strongly recommend adjusting CoreXL split via cpconfig. If "Accelerated pkts/Total pkts" is >50% as reported by fwaccel stats -s reduce number of kernel instances from 18 to 14 to allocate 6 SND/IRQ cores; you may also want to disable SMT/Hyperthreading in this instance.
If "Accelerated pkts/Total pkts" is <50% as reported by fwaccel stats -s reduce number of kernel instances from 18/36 to 16/32 to allocate more SND/IRQ cores and leave SMT/Hyperthreading on.
--
Second Edition of my "Max Power" Firewall Book
Now Available at http://www.maxpowerfirewalls.com
CET (Europe) Timezone Course Scheduled for July 1-2
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks again. We are actually way below 50% acceleration because they have a ton of blades turned on, and we already have Hyperthreading enabled with 8 SNDs and 32 workers like you suggested. Multiqueue is enabled on these interfaces (igb) and we added all 8 SNDs to multiqueue. Our next plan is to change to Keep all connections on the policy install. So, I don't know how to set L3/L4 hash balancing offhand but am looking into how to do that.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Right since you have plenty of SND/IRQ instances, automatic interface affinity (assuming SecureXL is enabled) will eventually assign each firewall interface its own dedicated SND/IRQ core to empty its ring buffer. If you are still experiencing RX-DRPs in that situation, that's the time to enable Multi-Queue on your busiest interfaces which you have already done.
You may want to run sar -n EDEV to see exactly when you are piling up those RX-DRPs. If they are only happening around the time of policy installation that is expected to some degree, and setting "Keep all connections" under Connection Persistence will help. If you are slowly accumulating them over time though, I'd argue that is not an actual problem since your RX-DRP rate is well below the generally recommended 0.1% (yours is actually 0.002%). This is a rather deep topic; please see the "RX-DRP Analysis & Discussion" section of my book for a discussion about why we don't necessarily expect RX-DRP to always be zero, even on a well-tuned firewall. RX-ERR and RX-OVR on the other hand are another matter entirely and should be zero or very close to it.
--
Second Edition of my "Max Power" Firewall Book
Now Available at http://www.maxpowerfirewalls.com
CET (Europe) Timezone Course Scheduled for July 1-2
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you. We are sure the drops in question are happening on policy push so I think you're right that Keep all connections is a good next step. 23900s are on the way as well.
