Solved: No LACP on 2x10G bond with Cisco 3850

Vladimir · ‎2018-01-21

Having issue with R80.10 GW (ClusterXL members), on 13500 units (non-VSX):

LACP issue with Cisco 3850. Bond consisting of two 10G interfaces (same card on 13500s), show Load Sharing being "Down" (bond1):

[Expert@CICNYCP1:0]# cphaconf show_bond -a

[Expert@CICNYCP1:0]# cat /proc/net/bonding/bond1
Ethernet Channel Bonding Driver: v3.2.4 (January 28, 2008)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2 (0)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 200
Down Delay (ms): 200

802.3ad info
LACP rate: slow
Active Aggregator Info:
Aggregator ID: 2
Number of ports: 1
Actor Key: 33
Partner Key: 1
Partner Mac Address: 00:00:00:00:00:00

Slave Interface: eth2-02
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:1c:7f:62:4c:8f
Aggregator ID: 1

Slave Interface: eth2-01
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:1c:7f:62:4c:8e
Aggregator ID: 2
-----------------------------------------

While Cisco claims it is up:

# show etherchannel 1 summary

Number of channel-groups in use: 1
Number of aggregators: 1

...

Group Port-channel Protocol Ports
------------+-----------------+-----------+-----------------------------------------------
2 Po2(SU) LACP Te1/0/1(P) Te2/0/1(P)

The bond2, same configuration on different card's 2x10G, connected to Nexus switch, actually works fine.

FYI: the data shown above is a doctored sample, I do not presently have access to the systems, but am fairly sure I am reproducing it accurately. Will update this question with live data next week.

If any of you have bumped into this one before, please share the solution or troubleshooting methods.

Mikhail_Razumov · ‎2023-02-13

Finally we've found that the issue is related to sk120684.

By an unknown reason it was necessary to explicitly define native vlan 1 on Cisco 3560 trunk port in order to resolve the issue. Though it must be 1 by default.

View solution in original post

Mikhail_Razumov · ‎2023-01-31

Hello Vladimir,

I'm having similar issue on 6200 and 1G bond. (3 bonds work fine, 1 down).

Have you succeeded in finding solution?

Regards,

Mikhail

Mikhail_Razumov · ‎2023-02-01

Hi Vladimir,

I met the same issue with 6200 and 1G bond.

3 bonds are UP, 1 - DOWN.

Have you succeeded in resolving the issue?

Regards,

Mikhail

Chris_Atkinson · ‎2023-02-01

What is the version/JHF and switch side configuration used?

CCSM R77/R80/ELITE

Mikhail_Razumov · ‎2023-02-01

Hello Chris,

R81.20 no JHF.

Switch-side configuration is similar for all 4 bonds, but only 1 bond is down.

interface Port-channel3
description chkp01
switchport trunk encapsulation dot1q

interface GigabitEthernet0/5
description chkp01
switchport trunk encapsulation dot1q
channel-protocol lacp
channel-group 3 mode active

interface GigabitEthernet0/6
description chkp01
switchport trunk encapsulation dot1q
channel-protocol lacp
channel-group 3 mode active

Chris_Atkinson · ‎2023-02-01

All ports on the same NIC on the CP side?

What does the etherchannel output on the Cisco side show and anything in the logs there?

CCSM R77/R80/ELITE

Mikhail_Razumov · ‎2023-02-01

Same NIC, yes. It's 8-port built-in NIC. ports 5 and 6.

Cisco is not in my management. I have this output from it (look for Po3):

Group Port-channel Protocol Ports
------+-------------+-----------+-----------------------------------------------
1 Po1(SU) LACP Gi0/1(P) Gi0/2(P)
2 Po2(SU) LACP Gi0/3(P) Gi0/4(P)
3 Po3(SU) LACP Gi0/5(P) Gi0/6(P)
4 Po4(SU) LACP Gi0/7(P) Gi0/8(P)
14 Po14(SU) LACP Gi0/23(P) Gi0/24(P)

C3560CHKP#show interfaces status

Port Name Status Vlan Duplex Speed Type
Gi0/1 chkp01_L2VS connected 1 a-full a-1000 10/100/1000BaseTX
Gi0/2 chkp01_L2VS connected 1 a-full a-1000 10/100/1000BaseTX
Gi0/3 chkp02_L3VS connected 1 a-full a-1000 10/100/1000BaseTX
Gi0/4 chkp02_L3VS connected 1 a-full a-1000 10/100/1000BaseTX
Gi0/5 chkp01_L3VS connected 1 a-full a-1000 10/100/1000BaseTX
Gi0/6 chkp01_L3VS connected 1 a-full a-1000 10/100/1000BaseTX
Gi0/7 chkp02_L2VS connected 1 a-full a-1000 10/100/1000BaseTX
Gi0/8 chkp02_L2VS connected 1 a-full a-1000 10/100/1000BaseTX

Chris_Atkinson · ‎2023-02-01

Is it impacting one machine or both members of a cluster?

Depending my suggestions might be as follows:

1. Reboot the affected machine

2. Remove and re-add the bond slave interfaces on the CP side.

3. If the issue persists raise it for investigation with TAC.

CCSM R77/R80/ELITE

Mikhail_Razumov · ‎2023-02-01

Both cluster members.

Rebooted many times.

Re-added bond interfaces.

Maybe I'l have to open ticket. My hope is that Vladimir (the topicstarter) has found solution (my symptoms are similar).

Chris_Atkinson · ‎2023-02-01

Perhaps though the hardware / software combination is widely different.

Assuming cabling has all been retraced / VLANs confirmed etc?

CCSM R77/R80/ELITE

Mikhail_Razumov · ‎2023-02-01

Cabling retraced.

Note, that both subordinate interfaces in bond are UP. If it were about cables, would they be UP?

VLANs are checked as well. Though I doubt they can influence on LACP. Can they?

Mikhail_Razumov · ‎2023-02-13

Finally we've found that the issue is related to sk120684.

By an unknown reason it was necessary to explicitly define native vlan 1 on Cisco 3560 trunk port in order to resolve the issue. Though it must be 1 by default.

Are you a member of CheckMates?

No LACP on 2x10G bond with Cisco 3850