Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
knassif
Participant

RX frame errors

hello,

I am having a frame issue on some interfaces connected to gigamon we see frame rx errors increase on those interfaces, has anyone experienced that and found a solution? we have replaced the NIC cards, cables, SFP's issue remain, the gateways are 6400 model, we dont see that issue on the 16000 turbo hardware.

 

thanks

0 Kudos
18 Replies
PhoneBoy
Admin
Admin

Have you done a performance assessment on the 6400 to ensure it is not overloaded?
You can start with this: https://community.checkpoint.com/t5/Scripts/S7PAC-Super-Seven-Performance-Assessment-Commands/m-p/40... 

0 Kudos
knassif
Participant

no we havent done a performance assessment, however that firewall has no load at the moment, it is not taking live traffic yet. we are trying to find and fix the issue before it gets the load.

0 Kudos
the_rock
Legend
Legend

Can you please run cpview and check below? (just go all the way to the bottom, where it shows drops)

Andy

Screenshot_1.png

0 Kudos
knassif
Participant

 

cpview is not showing errors however ifconfig shows frame errors on the interfaces forming the bond

 

 

0 Kudos
the_rock
Legend
Legend

And you said they constantly keep increasing? If the answer to that question is yes, when did this start happening?

Andy

0 Kudos
knassif
Participant

yes constantly keeps increasing, we dont see this behavior on the 16000 which are also connected to the same gigamon device, the only difference is the 16000 have different drivers for the interfaces

below is from 6400 showing rx errors

[Expert@idboinfw007:0]# ethtool -i eth1-02
driver: i40e
version: 2.10.19.82
firmware-version: 6.80 0x8000a368 0.0.0
expansion-rom-version:
bus-info: 0000:01:00.1
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes
[Expert@idboinfw007:0]# ethtool -s eth1-02
[Expert@idboinfw007:0]# ethtool -S eth1-02
NIC statistics:
rx_packets: 166537830
tx_packets: 20506
rx_bytes: 12314087925
tx_bytes: 2542744
rx_errors: 0
tx_errors: 0
rx_dropped: 0
tx_dropped: 0
collisions: 0
rx_length_errors: 10991
rx_crc_errors: 0
rx_unicast: 0
tx_unicast: 0
rx_multicast: 20502
tx_multicast: 20506
rx_broadcast: 166517328
tx_broadcast: 0
rx_unknown_protocol: 0
tx_linearize: 0
tx_force_wb: 0
tx_busy: 0
rx_alloc_fail: 0
rx_pg_alloc_fail: 0
tx-0.packets: 20500
tx-0.bytes: 2542000
rx-0.packets: 166537426
rx-0.bytes: 12314035213
tx-1.packets: 1
tx-1.bytes: 124
rx-1.packets: 86
rx-1.bytes: 11712
tx-2.packets: 2
tx-2.bytes: 248
rx-2.packets: 62
rx-2.bytes: 7954
tx-3.packets: 3
tx-3.bytes: 372
rx-3.packets: 256
rx-3.bytes: 33046
veb.rx_bytes: 0
veb.tx_bytes: 0
veb.rx_unicast: 0
veb.tx_unicast: 0
veb.rx_multicast: 0
veb.tx_multicast: 0
veb.rx_broadcast: 0
veb.tx_broadcast: 0
veb.rx_discards: 0
veb.tx_discards: 0
veb.tx_errors: 0
veb.rx_unknown_protocol: 0
veb.tc_0_tx_packets: 0
veb.tc_0_tx_bytes: 0
veb.tc_0_rx_packets: 0
veb.tc_0_rx_bytes: 0
veb.tc_1_tx_packets: 0
veb.tc_1_tx_bytes: 0
veb.tc_1_rx_packets: 0
veb.tc_1_rx_bytes: 0
veb.tc_2_tx_packets: 0
veb.tc_2_tx_bytes: 0
veb.tc_2_rx_packets: 0
veb.tc_2_rx_bytes: 0
veb.tc_3_tx_packets: 0
veb.tc_3_tx_bytes: 0
veb.tc_3_rx_packets: 0
veb.tc_3_rx_bytes: 0
veb.tc_4_tx_packets: 0
veb.tc_4_tx_bytes: 0
veb.tc_4_rx_packets: 0
veb.tc_4_rx_bytes: 0
veb.tc_5_tx_packets: 0
veb.tc_5_tx_bytes: 0
veb.tc_5_rx_packets: 0
veb.tc_5_rx_bytes: 0
veb.tc_6_tx_packets: 0
veb.tc_6_tx_bytes: 0
veb.tc_6_rx_packets: 0
veb.tc_6_rx_bytes: 0
veb.tc_7_tx_packets: 0
veb.tc_7_tx_bytes: 0
veb.tc_7_rx_packets: 0
veb.tc_7_rx_bytes: 0
port.rx_bytes: 23697907860
port.tx_bytes: 2624768
port.rx_unicast: 986953
port.tx_unicast: 0
port.rx_multicast: 91526543
port.tx_multicast: 20506
port.rx_broadcast: 166517328
port.tx_broadcast: 0
port.tx_errors: 0
port.rx_dropped: 0
port.tx_dropped_link_down: 0
port.rx_crc_errors: 0
port.illegal_bytes: 0
port.mac_local_faults: 0
port.mac_remote_faults: 0
port.tx_timeout: 0
port.rx_csum_bad: 0
port.rx_length_errors: 10991
port.link_xon_rx: 0
port.link_xoff_rx: 0
port.link_xon_tx: 0
port.link_xoff_tx: 0
port.rx_size_64: 6437141
port.rx_size_127: 222539300
port.rx_size_255: 26399150
port.rx_size_511: 1818432
port.rx_size_1023: 67740
port.rx_size_1522: 1769061
port.rx_size_big: 0
port.tx_size_64: 0
port.tx_size_127: 0
port.tx_size_255: 20506
port.tx_size_511: 0
port.tx_size_1023: 0
port.tx_size_1522: 0
port.tx_size_big: 0
port.rx_undersize: 0
port.rx_fragments: 0
port.rx_oversize: 0
port.rx_jabber: 0
port.VF_admin_queue_requests: 0
port.arq_overflows: 0
port.tx_hwtstamp_timeouts: 0
port.rx_hwtstamp_cleared: 0
port.tx_hwtstamp_skipped: 0
port.fdir_flush_cnt: 1
port.fdir_atr_match: 0
port.fdir_atr_tunnel_match: 0
port.fdir_atr_status: 0
port.fdir_sb_match: 0
port.fdir_sb_status: 1

0 Kudos
the_rock
Legend
Legend

Might be worth TAC case.

0 Kudos
Timothy_Hall
Legend Legend
Legend

Is 10991 roughly how many framing errors are being reported by ifconfig and netstat -ni?  If not please run these commands within a few seconds of each other:

netstat -ni | grep eth1-02

ethtool -S eth1-02

ifconfig eth1-02

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
0 Kudos
Timothy_Hall
Legend Legend
Legend

That is quite strange that both physical interfaces of the bond are reporting the exact same number of framing errors; assuming they are actively incrementing this would suggest some kind of regular emanation from the switch that the NIC thinks is not a properly formed Ethernet frame (perhaps a bridging/STP advertisement or some other kind of proprietary media test?)  On the firewall run sar -n EDEV, is it reporting a consistent number of rxfram/s errors in each 10 minute sample period all day long?  Could also be some kind of invalid frame getting sent to the broadcast and being flooded by the switch, but I was under the impression that a switch will not forward an invalid frame so it is likely something the switch itself is creating.

Unfortunately there is no easy way to see what these supposedly invalid frames actually are with a packet capture on the firewall, as the bad frames will be simply discarded by the NIC hardware.  

Please provide output from the following commands from expert mode on the firewall, there may be some other side effects being caused by this condition that will help point to the issue:

ethtool -i  eth1-02

ethtool -S eth1-04

ethtool -S eth1-02

Make sure all elements of the bond configuration are EXACTLY the same on both the firewall and switch side. 

I assume the network counters on the switchport side are error-free? 

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
0 Kudos
knassif
Participant

yes we have deleted the bond recreated it, replaced all cables NIC's and SFP's same issue, take note that the 6400 is connected to gigamon, if we remove gigamon and connect directly to switch, we dont see those frame errors, however on the 16000 turbo, it is connected to the same gigamon device and no frame errors, my thoughts are it is specific to that model and how it's interface firmware and drivers handles the traffic from gigamon since those drivers dont get updated too often by the manufacturer whereas the higher end 16000 interfaces get more frequent interface driver and firmware updates that might have a fix in it or ignore those frame errors somehow. we did have a case with TAC we replaced the cards but they couldnt find root cause. output of sar -n EDEV is also provided in 10 min intervals. yes switch and gigamon are showing no errors. I will attach the output requested
 
this is the driver version on the 16000 firewalls:(these are 40Gig interfaces)
 
ethtool -i eth1-01
driver: mlx5_core
version: 5.5-1.0.3 (13 May 22)
firmware-version: 12.26.6402 (CP_0010110027)
expansion-rom-version: 
bus-info: 0000:86:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes
 
show interface eth1-01
state on
mac-addr 10:70:fd:2b:d7:9c
type ethernet
link-state link up
instance 0
mtu 1500
auto-negotiation on
speed 40G
 
 
 
this is the driver version on the 6400:
 
ethtool -i eth1-01
driver: i40e
version: 2.10.19.82
firmware-version: 6.80 0x8000a368 0.0.0
expansion-rom-version: 
bus-info: 0000:01:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes
 
show interface eth1-02
state on
mac-addr 00:1c:7f:a4:f9:2f
type ethernet
link-state link up
instance 0
mtu 1500
auto-negotiation off
speed 10G
 
************
 
show bonding group 1
Bond Configuration
    xmit-hash-policy layer2
    down-delay 200
    primary Not configured
    lacp-rate slow
    mode 8023AD
    up-delay 200
    mii-interval 100
    min-links 0
    Bond Interfaces
        eth1-02
        eth1-04
 
 
bond1       Link encap:Ethernet  HWaddr 00:1C:7F:A4:F9:2F  
            UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
            RX packets:370414572 errors:0 dropped:0 overruns:0 frame:24426
            TX packets:45631 errors:0 dropped:0 overruns:0 carrier:0
            collisions:0 txqueuelen:1000 
            RX bytes:27292087999 (25.4 GiB)  TX bytes:5658244 (5.3 MiB)
 
eth1-02     Link encap:Ethernet  HWaddr 00:1C:7F:A4:F9:2F  
            UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
            RX packets:185198049 errors:0 dropped:0 overruns:0 frame:12213
            TX packets:22809 errors:0 dropped:0 overruns:0 carrier:0
            collisions:0 txqueuelen:1000 
            RX bytes:13645337567 (12.7 GiB)  TX bytes:2828316 (2.6 MiB)
 
 
 
 
eth1-04     Link encap:Ethernet  HWaddr 00:1C:7F:A4:F9:2F  
            UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
            RX packets:185198921 errors:0 dropped:0 overruns:0 frame:12213
            TX packets:22805 errors:0 dropped:0 overruns:0 carrier:0
            collisions:0 txqueuelen:1000 
            RX bytes:13645401092 (12.7 GiB)  TX bytes:2827820 (2.6 MiB)
 
 
0 Kudos
knassif
Participant

attached netstat and ethtool and sar dev output

0 Kudos
knassif
Participant

ehtool - i and -S output attached

0 Kudos
knassif
Participant

the only difference on the interface firmware is these settings , dont know if it could be related or not


supports-eeprom-access: yes
supports-register-dump: yes

0 Kudos
Timothy_Hall
Legend Legend
Legend

OK that helped  lot.  From what I can tell there is a constant, regular stream of frames that are too small (at least according to i40e) coming from the gigamon; in the distant past these were called "runts" while too-long frames were called "jabbers".  I highly doubt these framing errors are actually legitimate frames getting corrupted so this would appear to just be cosmetic and not impact real traffic.  This assertion is confirmed by the netstat output showing that these framing errors are not even incrementing RX-ERR.  

For this stream of framing errors to be so consistent it must be some kind of regular emanation from the gigmon itself, probably:

  • STP/Bridge announcements (disabling Spanning Tree is NOT an option, but you could try portfast and see if that helps)
  • LLDP (try disabling this on the gigamon ports if enabled)
  • CDP (Cisco Discovery Protocol - try disabling this on the gigamon ports if enabled)
  • Gigamon Discovery (appears to be some kind of proprietary Gigamon discovery protocol - try disabling it)
  • If there are any other discovery/probing/health check type of protocols enabled on the gigamon try disabling them on the relevant switchports, including possibly some proprietary VLAN trunking 802.1q discovery/healthcheck/probing if the ports are trunked
Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
0 Kudos
knassif
Participant

understood, checking back with my colleagues, they are doubting it may be related to gigamon discovery however we cant disable that sicne this is how the device gigamon works, is it possible though to have a driver/firmware update on these interfaces somehow to get those frame drops/errors go away? it seems the 40Gig interface on the 16000 firewall has a way of handling it and not show on its stats.

0 Kudos
Timothy_Hall
Legend Legend
Legend

Yes it could be a driver update issue; the current Gaia i40e 2.10.19.82 driver is from early 2020.  You can see the changelog for the i40e driver at the URL below, and while it doesn't seem to have any fixes directly relevant to this issue, Check Point TAC may have a newer driver available. 

Also one more question: what code version and Jumbo HFA level are you using on your gateway?  The i40e driver version is 2.10.19.82 for R81.10 and later, in R80.40 and earlier it was 2.7.12.

https://github.com/dmarion/deb-i40e/blob/master/debian/changelog

 

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
0 Kudos
knassif
Participant

the 6400 and the 16000 are both on R81.20 T10, yea I think it is the latest firmware since 2020 they dont update those drivers often for some reason but I will see with TAC if they can check for newer driver/firmware versions. thanks

0 Kudos
knassif
Participant

it is a VSX also

0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events