Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Dilian_Chernev
Collaborator

VSX and Bond interfaces going down after few hours

Hello mates,

I am fighting whit very strange issue - Bond interfaces going down after few hours after reconfiguring interfaces on virtual machines.

There is a cluster of two 19200 (R81.20 JHF92) hosts in VSLS with bond interfaces to Cisco switches with LACP and VPC.
After configuring two VSs - configured interfaces, vlans, routes, blank policy with any-any-allow, everything is fine.
The only thing is that no vlan's are configured on the switches, becaus these VSs are prepared to replace existing plain devices that have same IPs. So to make sure everything is ok till the date of migration, there is no trafic on interfaces of VSs.

So after 4-6 hours, most of the bonds became down, Cisco switches are saying that ports are disabled and there is no way to bring them back up. 
On CP side, bonds are with different Aggregator IDs and interface are "churned" and the only way to bring them up is to reboot appliances.

This happens 3 times till now, every time several hours after reconfiguring interfaces of VSs.

Opened a ticket after first time, but nothing usefull came out - only sk115516, but this not helping to prevent from happeing again.
There is nothing usefull in /var/log/messages

Does any one have simillar problems?
Any idea which log files to check or what debugs could be run? I am pretty sure this can reproduced.

Thanks,

Dilian

0 Kudos
4 Replies
AkosBakos
Leader Leader
Leader

Hi @Dilian_Chernev 

Intersting, strange behaviour

@churned: https://support.checkpoint.com/results/sk/sk169760

One of the peer's LACP (etherchannel) interfaces is suspended or is otherwise no longer active as an LACP interface.

@Cisco side bond:

The bond ID is the same on the newly generated LACP and the existig one? Is there anything common on the existing switch config and the  new one?

Akos

----------------
\m/_(>_<)_\m/
0 Kudos
Dilian_Chernev
Collaborator

Not sure how to respond on this 😞 but after restarting appliances everything works fine.
Tomorrow will try to edit VS config to see if the issue will happen again.

0 Kudos
Timothy_Hall
Legend Legend
Legend

  • Is this a new bond implementation on your 19200? 
  • Were the bonds ever stable? 
  • What is the interface speed and driver type of the physical interfaces (ethtool -i ethXX). 

This issue sounds somewhat similar to a supposedly-fixed limitation of Lightspeed cards:

Bond may become unstable because of LACP packet losses (on the network or in the interface).

Workaround - Configure the LACP "slow" rate for this Bond on each side

Because you are on an Quantum Force appliance it will utilize UPPAK by default just like a Lightspeed appliance, so the above may apply to you.  If both sides set to slow rate doesn't help, the last thing to try would be to disable UPPAK via cpconfig to go back to KPPAK and see if that impacts the problem.

Attend my 60-minute "Be your Own TAC: Part Deux" Presentation
Exclusively at CPX 2025 Las Vegas Tuesday Feb 25th @ 1:00pm
0 Kudos
Dilian_Chernev
Collaborator

It is a new bond implementation, but it is configured almost 4 months ago.
It's stable, except these 3 moments when VSs interfaces changes was made.

Here is ethtool info, it is identical on all involved interfaces (10Gb SFP+)

[Expert@fw2:0]# ethtool -i eth1-04
driver: net_ice
version: DPDK 20.11.7.4.0 (29 Mar 24)
firmware-version: 4.20 0x800178e2 1.3346.0
expansion-rom-version:
bus-info: 0000:17:00.7
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes

From both sides lacp rate is slow/normal

 

fw2:0> show bonding group 4
Bond Configuration
xmit-hash-policy layer2
down-delay 200
primary Not configured
lacp-rate slow
mode 8023AD
up-delay 200
mii-interval 100
min-links 0
Bond Interfaces
eth1-04
eth3-04

 

0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events