- Products
- Learn
- Local User Groups
- Partners
- More
Firewall Uptime, Reimagined
How AIOps Simplifies Operations and Prevents Outages
Introduction to Lakera:
Securing the AI Frontier!
Check Point Named Leader
2025 Gartner® Magic Quadrant™ for Hybrid Mesh Firewall
HTTPS Inspection
Help us to understand your needs better
CheckMates Go:
SharePoint CVEs and More!
Hello mates,
I am fighting whit very strange issue - Bond interfaces going down after few hours after reconfiguring interfaces on virtual machines.
There is a cluster of two 19200 (R81.20 JHF92) hosts in VSLS with bond interfaces to Cisco switches with LACP and VPC.
After configuring two VSs - configured interfaces, vlans, routes, blank policy with any-any-allow, everything is fine.
The only thing is that no vlan's are configured on the switches, becaus these VSs are prepared to replace existing plain devices that have same IPs. So to make sure everything is ok till the date of migration, there is no trafic on interfaces of VSs.
So after 4-6 hours, most of the bonds became down, Cisco switches are saying that ports are disabled and there is no way to bring them back up.
On CP side, bonds are with different Aggregator IDs and interface are "churned" and the only way to bring them up is to reboot appliances.
This happens 3 times till now, every time several hours after reconfiguring interfaces of VSs.
Opened a ticket after first time, but nothing usefull came out - only sk115516, but this not helping to prevent from happeing again.
There is nothing usefull in /var/log/messages
Does any one have simillar problems?
Any idea which log files to check or what debugs could be run? I am pretty sure this can reproduced.
Thanks,
Dilian
If it happens again I'd suggest disabling UPPAK from cpconfig to see if it affects the issue. UPPAK has its tendrils sunk pretty deeply into the network drivers via DPDK, and it being the cause of your bond issue is not outside the realm of possibility.
Intersting, strange behaviour
@churned: https://support.checkpoint.com/results/sk/sk169760
One of the peer's LACP (etherchannel) interfaces is suspended or is otherwise no longer active as an LACP interface.
@Cisco side bond:
The bond ID is the same on the newly generated LACP and the existig one? Is there anything common on the existing switch config and the new one?
Akos
Not sure how to respond on this 😞 but after restarting appliances everything works fine.
Tomorrow will try to edit VS config to see if the issue will happen again.
If it happens again I'd suggest disabling UPPAK from cpconfig to see if it affects the issue. UPPAK has its tendrils sunk pretty deeply into the network drivers via DPDK, and it being the cause of your bond issue is not outside the realm of possibility.
Hi Timothy,
After disabling the UPPAK the issue is not happing again.
We have an open ticket with TAC and R&D involved to figured out what was the root cause for the problem.
Thanks for the help!
Interesting that UPPAK was the cause, thanks for the follow-up.
Check the (very long) output of cat /proc/net/bonding/<bond name> before and after the event occurs, in the section "details partner lacp pdu", in both sections for each interface, to see if the remote side changes its LACP information.
You mentioned the interface being "churned", so you likely already saw this, tho.
On the Cisco side, if this is Nexus VPC, then check the status of the etherchannel to see if it has suspended the port-channel member interface. You can run a "debug port-channel error" or "debug port-channel trace" to hopefully catch any switch-side errors.
On IOS-XE, it's "debug etherchannel ..." for similar.
This issue sounds somewhat similar to a supposedly-fixed limitation of Lightspeed cards:
Bond may become unstable because of LACP packet losses (on the network or in the interface).
Workaround - Configure the LACP "slow
" rate for this Bond on each side
Because you are on an Quantum Force appliance it will utilize UPPAK by default just like a Lightspeed appliance, so the above may apply to you. If both sides set to slow rate doesn't help, the last thing to try would be to disable UPPAK via cpconfig to go back to KPPAK and see if that impacts the problem.
It is a new bond implementation, but it is configured almost 4 months ago.
It's stable, except these 3 moments when VSs interfaces changes was made.
Here is ethtool info, it is identical on all involved interfaces (10Gb SFP+)
[Expert@fw2:0]# ethtool -i eth1-04
driver: net_ice
version: DPDK 20.11.7.4.0 (29 Mar 24)
firmware-version: 4.20 0x800178e2 1.3346.0
expansion-rom-version:
bus-info: 0000:17:00.7
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes
From both sides lacp rate is slow/normal
fw2:0> show bonding group 4
Bond Configuration
xmit-hash-policy layer2
down-delay 200
primary Not configured
lacp-rate slow
mode 8023AD
up-delay 200
mii-interval 100
min-links 0
Bond Interfaces
eth1-04
eth3-04
#### edit
There is something that just remember - bond in CP device is created with one port from Line card 1 model: CPAC-8-1/10F-D and second port from: Line card 3 model: CPAC-4-10/25F-D
There is difference in firmware, but driver is the same:
[Expert@fw2:0]# ethtool -i eth1-03
driver: net_ice
version: DPDK 20.11.7.4.0 (29 Mar 24)
firmware-version: 4.20 0x800178e2 1.3346.0
expansion-rom-version:
bus-info: 0000:17:00.5
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes
[Expert@fw2:0]# ethtool -i eth3-04
driver: net_ice
version: DPDK 20.11.7.4.0 (29 Mar 24)
firmware-version: 4.30 0x8001b94f 1.3415.0
expansion-rom-version:
bus-info: 0000:b1:00.2
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes
Leaderboard
Epsum factorial non deposit quid pro quo hic escorol.
User | Count |
---|---|
18 | |
12 | |
6 | |
6 | |
6 | |
5 | |
4 | |
4 | |
3 | |
3 |
Tue 07 Oct 2025 @ 10:00 AM (CEST)
Cloud Architect Series: AI-Powered API Security with CloudGuard WAFThu 09 Oct 2025 @ 10:00 AM (CEST)
CheckMates Live BeLux: Discover How to Stop Data Leaks in GenAI Tools: Live Demo You Can’t Miss!Thu 09 Oct 2025 @ 10:00 AM (CEST)
CheckMates Live BeLux: Discover How to Stop Data Leaks in GenAI Tools: Live Demo You Can’t Miss!Wed 22 Oct 2025 @ 11:00 AM (EDT)
Firewall Uptime, Reimagined: How AIOps Simplifies Operations and Prevents OutagesAbout CheckMates
Learn Check Point
Advanced Learning
YOU DESERVE THE BEST SECURITY