Kaspars Zibarts

Weird 10Gb interface hangup

Discussion created by Kaspars Zibarts on Jun 28, 2018
Latest reply on Jun 28, 2018 by Aleksei Shelepov

This more of a "friday" post for fun. Although problem was real - in one of our 5900 clusters running R80.10 the standby member out of blue produced some obscure errors on one of the 10Gb bond trunks (eth1-04)

 

Jun 27 19:29:19 2018 fwf2 kernel: ixgbe 0000:06:00.1: eth1-04: Detected Tx Unit Hang
Jun 27 19:29:19 2018 fwf2 kernel: Tx Queue <0>
Jun 27 19:29:19 2018 fwf2 kernel: TDH, TDT <37a>, <124>
Jun 27 19:29:19 2018 fwf2 kernel: next_to_use <124>
Jun 27 19:29:19 2018 fwf2 kernel: next_to_clean <37a>
Jun 27 19:29:19 2018 fwf2 kernel: ixgbe 0000:06:00.1: eth1-04: tx_buffer_info[next_to_clean]
Jun 27 19:29:19 2018 fwf2 kernel: time_stamp <2a1b97a3e>
Jun 27 19:29:19 2018 fwf2 kernel: jiffies <2a1b98956>
Jun 27 19:29:19 2018 fwf2 kernel: ixgbe 0000:06:00.1: eth1-04: tx hang 1 detected on queue 0, resetting adapter
Jun 27 19:29:19 2018 fwf2 kernel: ixgbe 0000:06:00.1: eth1-04: Reset adapter
Jun 27 19:29:19 2018 fwf2 kernel: ixgbe 0000:06:00.1: eth1-04: RXDCTL.ENABLE on Rx queue 0 not cleared within the polling period
Jun 27 19:29:19 2018 fwf2 kernel: bonding: bond0: link status down for idle interface eth1-04, disabling it in 200 ms.
Jun 27 19:29:19 2018 fwf2 kernel: ixgbe: eth1-04: ixgbe_setup_mrqc: configure Symmetric RSS
Jun 27 19:29:19 2018 fwf2 kernel: ixgbe: eth1-04: ixgbe_up_complete: Double vlan mode is not set
Jun 27 19:29:19 2018 fwf2 kernel: ixgbe 0000:06:00.1: eth1-04: detected SFP+: 6

 

And then few seconds later all interfaces in the expansion slot started reporting continuously

Jun 27 19:30:55 2018 fwf2 kernel: ixgbe 0000:05:00.0: eth1-01: -1 Spoofed packets detected
Jun 27 19:30:55 2018 fwf2 kernel: ixgbe 0000:06:00.1: eth1-04: -1 Spoofed packets detected
Jun 27 19:30:55 2018 fwf2 kernel: ixgbe 0000:06:00.0: eth1-03: -1 Spoofed packets detected
Jun 27 19:30:55 2018 fwf2 kernel: ixgbe 0000:05:00.1: eth1-02: -1 Spoofed packets detected

 

It was resolved by node reboot. The only relevant SK I found was this Intermittent outages of TCP traffic on 10GbE interfaces in IP Appliances running Gaia OS but it's not applicable to R80.10 nor 5900 and offload is definitely disabled on interfaces.

 

Here's the best part - the display on the appliance at the time showed this

 

 

Does this mean firewall needs to go to toilet?? P-p-p-peee.... 

Outcomes