Solved: Re: Weird 10Gb interface hangup

Kaspars_Zibarts · ‎2018-06-28

This more of a "friday" post for fun. Although problem was real - in one of our 5900 clusters running R80.10 the standby member out of blue produced some obscure errors on one of the 10Gb bond trunks (eth1-04)

Jun 27 19:29:19 2018 fwf2 kernel: ixgbe 0000:06:00.1: eth1-04: Detected Tx Unit Hang
Jun 27 19:29:19 2018 fwf2 kernel: Tx Queue <0>
Jun 27 19:29:19 2018 fwf2 kernel: TDH, TDT <37a>, <124>
Jun 27 19:29:19 2018 fwf2 kernel: next_to_use <124>
Jun 27 19:29:19 2018 fwf2 kernel: next_to_clean <37a>
Jun 27 19:29:19 2018 fwf2 kernel: ixgbe 0000:06:00.1: eth1-04: tx_buffer_info[next_to_clean]
Jun 27 19:29:19 2018 fwf2 kernel: time_stamp <2a1b97a3e>
Jun 27 19:29:19 2018 fwf2 kernel: jiffies <2a1b98956>
Jun 27 19:29:19 2018 fwf2 kernel: ixgbe 0000:06:00.1: eth1-04: tx hang 1 detected on queue 0, resetting adapter
Jun 27 19:29:19 2018 fwf2 kernel: ixgbe 0000:06:00.1: eth1-04: Reset adapter
Jun 27 19:29:19 2018 fwf2 kernel: ixgbe 0000:06:00.1: eth1-04: RXDCTL.ENABLE on Rx queue 0 not cleared within the polling period
Jun 27 19:29:19 2018 fwf2 kernel: bonding: bond0: link status down for idle interface eth1-04, disabling it in 200 ms.
Jun 27 19:29:19 2018 fwf2 kernel: ixgbe: eth1-04: ixgbe_setup_mrqc: configure Symmetric RSS
Jun 27 19:29:19 2018 fwf2 kernel: ixgbe: eth1-04: ixgbe_up_complete: Double vlan mode is not set
Jun 27 19:29:19 2018 fwf2 kernel: ixgbe 0000:06:00.1: eth1-04: detected SFP+: 6‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

And then few seconds later all interfaces in the expansion slot started reporting continuously

Jun 27 19:30:55 2018 fwf2 kernel: ixgbe 0000:05:00.0: eth1-01: -1 Spoofed packets detected
Jun 27 19:30:55 2018 fwf2 kernel: ixgbe 0000:06:00.1: eth1-04: -1 Spoofed packets detected
Jun 27 19:30:55 2018 fwf2 kernel: ixgbe 0000:06:00.0: eth1-03: -1 Spoofed packets detected
Jun 27 19:30:55 2018 fwf2 kernel: ixgbe 0000:05:00.1: eth1-02: -1 Spoofed packets detected‍‍‍‍‍‍‍‍

It was resolved by node reboot. The only relevant SK I found was this Intermittent outages of TCP traffic on 10GbE interfaces in IP Appliances running Gaia OS but it's not applicable to R80.10 nor 5900 and offload is definitely disabled on interfaces.

Here's the best part - the display on the appliance at the time showed this

Does this mean firewall needs to go to toilet?? P-p-p-peee....

AlekseiShelepov · ‎2018-06-28

I guess some things never change...

I think I used a wrong ISO file that time.

View solution in original post

Timothy_Hall · ‎2018-06-28

Sounds like the NIC hardware is what went into the toilet, the display was telling you that the "stream" of outbound packets was no longer getting handled by the NIC, and that the firewall's bladder was too full which can certainly be uncomfortable to say the least. 🙂

--
Second Edition of my "Max Power" Firewall Book
Now Available at http://www.maxpowerfirewalls.com

New Book: "Max Power 2026" Coming Soon
Check Point Firewall Performance Optimization

Kaspars_Zibarts · ‎2018-06-29

AlekseiShelepov · ‎2018-06-28

I guess some things never change...

I think I used a wrong ISO file that time.

Mike_Jones · ‎2019-04-08

Sorry to bring an old post back to live, but I'm having a similar error on a 10Gb interface on a 5900. A reboot initially fixed the issue, but it came back, and again required a reboot (not to mention a disk check on each reboot).

Did you have any more problems with your 5900 after your reboot?

Timothy_Hall · ‎2019-04-08

What code version is the firewall using? Make sure you have the latest GA Jumbo HFA applied as updated NIC driver versions are sometimes bundled in Jumbo HFAs.

New Book: "Max Power 2026" Coming Soon
Check Point Firewall Performance Optimization

Mike_Jones · ‎2019-04-08

Product version Check Point Gaia R80.10
OS build 479
OS kernel version 2.6.18-92cpx86_64
OS edition 64-bit

Using GA Jumbo HFA Take 169

Maarten_Sjouw · ‎2020-03-26

Well bringing back this old one again.
Last weekend we had a 5900 with a 10GB bond (2 interfaces), part of a VSX cluster, with exactly the same problem, a messages files completely filled with anti-spoofing messages. Older messages file was no longer available.
Code running R80.20 with jumbo 118

Regards, Maarten

Mike_Jones · ‎2020-03-27

Sorry I should have posted back with details on this. This was a confirmed by CP to be a nic manufacturer hardware issue. The 5900 was affected, and I think maybe a couple of other models? However, it isn't always an issue on these models. There are some checks you can do to see if you have the issue, but unfortunately, I moved out of the firewall world, and don't have access to check the details. In short, contact CP support.

Found another detail from the past - affected 4 port cards, but not 2 port cards. This went to R&D for investigation and they confirmed is was not software/driver related,but rather HW design.

Maarten_Sjouw · ‎2020-03-27

Thanks Mike, we will check with TAC.

Regards, Maarten

Timothy_Hall · ‎2020-03-27

Thanks for the followup, trying to distinguish NIC hardware problems from NIC driver problems can be pretty tough.

New Book: "Max Power 2026" Coming Soon
Check Point Firewall Performance Optimization

bad_joojoo · ‎2021-03-24

Hi, it would be interesting to understand if you have checked DMESG after the issue occurs and can confirm if your seeing a VETO bit message just after the ixgbe interfaces being taken offline. I had an opportunity to look at something similar and was fortunate enough to also capture an "error level 5" message from the PCIE drivers also being captured (effectively stating they we're going to sleep). Subsequently, I found that either a reboot or reloading the ixgbe driver (this reloads all ixgbe interfaces so take care) brings it back into service.

Did you ever find a resolution?

Kind Regards

Ju

Are you a member of CheckMates?

Weird 10Gb interface hangup