- Products
- Learn
- Local User Groups
- Partners
- More
Quantum Spark Management Unleashed!
Introducing Check Point Quantum Spark 2500:
Smarter Security, Faster Connectivity, and Simpler MSP Management!
Check Point Named Leader
2025 Gartner® Magic Quadrant™ for Hybrid Mesh Firewall
HTTPS Inspection
Help us to understand your needs better
CheckMates Go:
SharePoint CVEs and More!
Our company purchased a pair of the new 6800 gateways. We initially applied Take 203 , before it became GA last week, as we weren't comfortable deploying new hardware without any notifies , supposedly the new 6800s can only run R80.10 and R80.20 ISOs and general release Takes and hot fixes can't be installed.
A week after putting them live into production the backup member of 6800 HA Pair NIC hung twice within a period of 24 hours. the submitted cpinfo indicated issues with the NIC registers and there was a recommendation to RMA the Quad port NIC card to "fix" the issue.
the following sk described what we were seeing in our logs
Not more than 96 hours later the same behavior occurred with the primary member of the 6800 cluster resulting in the failover to backup. again we RMAed the NIC card . on the recommendation of support , we changed the timeout for logging from 10 seconds to 2 seconds to better capture in the logs the behavior of the card before it failed. the card failed a 2nd time on what was the primary member , this time the unit flapped over 30 times over a period of 45 minutes before it ended up failing. this time the issue was more severe as the Cisco switch actually shut down the attached ports in a self protection mode due to continued prolonged flapping.
we are still waiting a solution from CheckPoint Support and R&D on this issue.
we initially thought the problem was isolated to our environment, but later heard the issue occurred in Check Point's lab environment in Canada on a pair of 4800s after they applied take 203. they initially thought the issue was flow control but the issues persisted after they turned flow control off.
It was later conveyed to me that an other customer that purchased 6800s was having the same issues but was on R80.20 and Take 74. This customer had replaced the NIC and was also looking to RMA the entire chassis .
@Mark_Thomasson1 , please clarify if the instability manifested in R80.20 only after JHFA 74 was installed.
I'll be deploying a pair of the 6500s this weekend and would like to get some feedback from the community on their experience with 6000 series.
Thank you,
Vladimir
Have you tried R80.20 Ongoing Take 80?
As the 6000 series have special ISO that don't allow installation of standard takes only those intended for that platform
R80.10 - Take 203 which is GA
R80.20 -Take 74 which is not GA
if you don't have confidence in either of those Takes you are left with the "unpatched" version of R80.10 or R80.20 as your only option
Is this only happening to NIC ports that are located on a slot expansion card, or is it happening on the built-in NIC ports as well?
The fake TX hang would seem to indicate a driver issue, whereas a "real" hang would normally indicate a hardware issue. Can you please provide the output of the ethtool -i (interfacename) and ethtool -S (interfacename) and ethtool -k (interfacename) for an interface that has hung? (Hopefully just after the hang and prior to a reboot/reset) In the case of the last command all NIC card offloads should be *off* as this has caused some strange issues in the past, and it is certainly possible they are using some kind of new NIC hardware in these 6000 series boxes that may have some type of new offload enabled.
> Not troubleshooting by posting the output of commands here. this issue has already been escalated with R&D
Understood, please post the solution here once it is found by R&D for the benefit of all. Thanks!
I don't know how Alex can state the issue is NOT the platform as the customer facing the issue and no root cause has been established .
since the 6000 series have special builds of R80.10 and R80.20 that MAY or MAY NOT be. a contributing factor , I don't know AGAIN how anyone can definitely rule that out.
we have taken the 6800 out of production as a result
we did have another crash/hang yesterday while sitting on etc network and not passing any network load
We were able to upload a cpinfo shortly after the change to our SR in hopes to further address the issue.
Hi @Timothy_Hall ,
I was assured by Checkpoint that the cards in my 6800 are not affected by this. However, I seem to be having the problem on R80.20 Take 87. I'll be opening a TAC case shortly. However, here is the output of the commands you asked for before I reboot the machine.
[Expert@<removed>:0]# ethtool -i eth2-01
driver: ixgbe
version: 3.9.15-NAPI
firmware-version: 0x800000cb
bus-info: 0000:06:00.0
[Expert@<removed>:0]# ethtool -S eth2-01
NIC statistics:
rx_packets: 317574224
tx_packets: 608453986
rx_bytes: 70420161246
tx_bytes: 702313270365
rx_errors: 5249807244155220
tx_errors: 0
rx_dropped: 0
tx_dropped: 0
multicast: 2624903623517988
collisions: 0
rx_over_errors: 0
rx_crc_errors: 2624903622077610
rx_frame_errors: 0
rx_fifo_errors: 0
rx_missed_errors: 20999228976620880
tx_aborted_errors: 0
tx_carrier_errors: 0
tx_fifo_errors: 0
tx_heartbeat_errors: 0
rx_pkts_nic: 321846298
tx_pkts_nic: 609048206
rx_bytes_nic: 2624976580835660
tx_bytes_nic: 784557258971
lsc_int: 3
tx_busy: 0
non_eop_descs: 0
broadcast: 2624903622077622
rx_no_buffer_count: 0
tx_timeout_count: 0
tx_restart_queue: 0
rx_long_length_errors: 2624903622077610
rx_short_length_errors: 2624903622077610
tx_flow_control_xon: 2624903622077610
rx_flow_control_xon: 2624903622077610
tx_flow_control_xoff: 2624903622077610
rx_flow_control_xoff: 2624903622077610
rx_csum_offload_errors: 2
alloc_rx_page_failed: 0
alloc_rx_buff_failed: 0
rx_no_dma_resources: 41997972621937425
hw_rsc_aggregated: 0
hw_rsc_flushed: 0
fdir_match: 2624903622077610
fdir_miss: 2624903622077610
fdir_overflow: 0
os2bmc_rx_by_bmc: 0
os2bmc_tx_by_bmc: 0
os2bmc_tx_by_host: 0
os2bmc_rx_by_host: 0
tx_queue_0_packets: 608453986
tx_queue_0_bytes: 702313270365
rx_queue_0_packets: 317574224
rx_queue_0_bytes: 70420161246
[Expert@<removed>:0]# ethtool -k eth2-01
Offload parameters for eth2-01:
Cannot get device udp large send offload settings: Operation not supported
Cannot get device GRO settings: Operation not supported
rx-checksumming: on
tx-checksumming: off
scatter-gather: off
tcp segmentation offload: off
udp fragmentation offload: off
generic segmentation offload: off
generic-receive-offload: off
Most of those ethtool counters look completely invalid. Either something is seriously wrong in the ixgbe NIC driver, or there is some kind of corruption occurring. You could try to remove and reload the ixgbe driver from the kernel with the following commands, but keep in mind it will cause an outage on all network interfaces that utilize the ixgbe driver (and a failover if you are using a cluster), not just the eth2-01 interface:
modprobe -r ixgbe; modprobe ixgbe
Hi,
A quick update from RnD : this issue is definitely not appliance related (6k series, or any other...).
We are still studying the issue, most likely it will be found in ixgbe driver code, in a rare type of traffic bursts.
We will update with more details as soon as we make sure.
We would like to reach out to customers who experience this problem and provide them with a driver which we believe should resolve fake TX HANG.
Please, feel free to contact Keren Nitzan (kerenni@checkpoint.com) and myself (alexkim@checkpoint.com) if you have a customer with this issue.
Thank you for this update! Do you know if this issue manifests itself in the 6800 appliance R80.20 GA code (Check_Point_R80.20_T101_R80.10_Dual_6000_T18.iso), or only after an Ongoing Jumbo hotfix accumulator is applied? (eg. Post Take 74, etc)
From what we know, it's not a matter of specific JHF. We are gathering more information about such cases, but it seems like it can happen on R80.20 GA too. It's more related to the characteristics of the traffic flowing through the NIC
turned out to be a hardware problem with the 4-port cards, affecting small subset of customers, and a full Root Cause Analysis is not yet available describing the nature of the problem
As a permanent fix, we will be getting a different quad-port NIC which is not susceptible to the same problem (design improvement) once those are cards available.
Interesting, thanks for the follow-up.
Leaderboard
Epsum factorial non deposit quid pro quo hic escorol.
User | Count |
---|---|
11 | |
7 | |
7 | |
6 | |
5 | |
5 | |
5 | |
5 | |
5 | |
4 |
Wed 10 Sep 2025 @ 11:00 AM (CEST)
Effortless Web Application & API Security with AI-Powered WAF, an intro to CloudGuard WAFWed 10 Sep 2025 @ 11:00 AM (EDT)
Quantum Spark Management Unleashed: Hands-On TechTalk for MSPs Managing SMB NetworksFri 12 Sep 2025 @ 10:00 AM (CEST)
CheckMates Live Netherlands - Sessie 38: Harmony Email & CollaborationWed 10 Sep 2025 @ 11:00 AM (EDT)
Quantum Spark Management Unleashed: Hands-On TechTalk for MSPs Managing SMB NetworksFri 12 Sep 2025 @ 10:00 AM (CEST)
CheckMates Live Netherlands - Sessie 38: Harmony Email & CollaborationAbout CheckMates
Learn Check Point
Advanced Learning
YOU DESERVE THE BEST SECURITY