Is the Link State Monitoring dead?

Vladimir · ‎2018-06-22

This subject is occasionally being brought-up and Tim Hall has written about CCP behaviour on interfaces connected to VLANs with no pingable hosts.

I do recall, from IPSO days, that there was a feature called Link State Monitoring and have decided to give it a shot on a pair of 15400s running R80.10 and destined to be connected to a L2 only switch.

Even with "Link State Monitoring" enabled on both members (sk31336), when HA cluster members are connected to a Layer 2 switch, the only interfaces of the cluster that remain "UP" during reboot of the second HA member, are those that have other pingable hosts on same VLAN:

[Expert@CXLM01:0]# cat $FWDIR/boot/modules/fwkern.conf
fwha_forw_packet_to_not_active=1
fwha_monitor_if_link_state=1
[Expert@CXLM01:0]#

[Expert@CXLM01:0]# cphaprob stat

Cluster Mode: High Availability (Active Up) with IGMP Membership

Number Unique Address Assigned Load State

1 (local) 192.168.8.11 100% Active Attention
2 192.168.8.12 0% ClusterXL Inactive or Machine is Down

Local member is in current state since Thu Jun 21 08:14:53 2018

[Expert@CXLM01:0]#
[Expert@CXLM01:0]# cphaprob -a if

Required interfaces: 7
Required secured interfaces: 1

eth2-01 Inbound: DOWN (48.8 secs) Outbound: DOWN (49 secs) non sync(non secured), broadcast
eth2-03 UP non sync(non secured), broadcast (External interface connected to the VLAN with the router)
eth2-04 Disconnected non sync(non secured), broadcast
eth2-05 Inbound: DOWN (48.6 secs) Outbound: DOWN (49 secs) non sync(non secured), broadcast
Mgmt UP non sync(non secured), broadcast (Management interface connected to the VLAN with SMS)
Sync Inbound: DOWN (48.8 secs) Outbound: DOWN (48.9 secs) sync(secured), broadcast
eth3-01 Inbound: DOWN (48.6 secs) Outbound: DOWN (48.9 secs) non sync(non secured), broadcast (eth3-01.332)
eth3-01 Inbound: DOWN (48.6 secs) Outbound: DOWN (48.9 secs) non sync(non secured), broadcast (eth3-01.24)

Virtual cluster interfaces: 10

eth2-01 10.255.100.10 VMAC address: 00:1C:7F:00:00:01
eth2-03 192.168.8.9 VMAC address: 00:1C:7F:00:00:01
eth2-05 10.255.101.10 VMAC address: 00:1C:7F:00:00:01
Mgmt 192.168.20.60 VMAC address: 00:1C:7F:00:00:01
eth3-01.48 192.168.48.10 VMAC address: 00:1C:7F:00:00:01
eth3-01.332 172.16.32.10 VMAC address: 00:1C:7F:00:00:01
eth3-01.40 192.168.40.10 VMAC address: 00:1C:7F:00:00:01
eth3-01.24 192.168.24.10 VMAC address: 00:1C:7F:00:00:01
eth3-01.32 192.168.32.10 VMAC address: 00:1C:7F:00:00:01
eth3-01.324 172.16.24.10 VMAC address: 00:1C:7F:00:00:01

[Expert@CXLM01:0]#

When testing a cluster failover using "clusterXL_admin down/up", you may be lulled in a false sense of security, since failover will be flowless: "Standby" will become "Active" and all the interfaces of both member will remain "Up".

Member 1:

[Expert@CXLM01:0]# cphaprob stat

Cluster Mode: High Availability (Active Up) with IGMP Membership

Number Unique Address Assigned Load State

1 (local) 192.168.254.1 100% Active
2 192.168.254.2 0% Standby

Local member is in current state since Thu Jun 21 08:14:53 2018

[Expert@CXLM01:0]# clusterXL_admin down
Setting member to administratively down state ...
Member current state is Down
[Expert@CXLM01:0]# cphaprob stat

Cluster Mode: High Availability (Active Up) with IGMP Membership

Number Unique Address Assigned Load State

1 (local) 192.168.254.1 0% Down
2 192.168.254.2 100% Active

Local member is in current state since Thu Jun 21 08:32:58 2018

[Expert@CXLM01:0]# cphaprob -a if

Required interfaces: 7
Required secured interfaces: 1

eth2-01 UP non sync(non secured), broadcast
eth2-03 UP non sync(non secured), broadcast
eth2-04 Disconnected non sync(non secured), broadcast
eth2-05 UP non sync(non secured), broadcast
Mgmt UP non sync(non secured), broadcast
Sync UP sync(secured), broadcast
eth3-01 UP non sync(non secured), broadcast (eth3-01.332)
eth3-01 UP non sync(non secured), broadcast (eth3-01.24)

Virtual cluster interfaces: 10

eth2-01 10.255.100.10 VMAC address: 00:1C:7F:00:00:01
eth2-03 192.168.8.9 VMAC address: 00:1C:7F:00:00:01
eth2-05 10.255.101.10 VMAC address: 00:1C:7F:00:00:01
Mgmt 192.168.20.60 VMAC address: 00:1C:7F:00:00:01
eth3-01.48 192.168.48.10 VMAC address: 00:1C:7F:00:00:01
eth3-01.332 172.16.32.10 VMAC address: 00:1C:7F:00:00:01
eth3-01.40 192.168.40.10 VMAC address: 00:1C:7F:00:00:01
eth3-01.24 192.168.24.10 VMAC address: 00:1C:7F:00:00:01
eth3-01.32 192.168.32.10 VMAC address: 00:1C:7F:00:00:01
eth3-01.324 172.16.24.10 VMAC address: 00:1C:7F:00:00:01

[Expert@CXLM01:0]#

Member 2:

[Expert@CXLM02:0]# cphaprob stat

Cluster Mode: High Availability (Active Up) with IGMP Membership

Number Unique Address Assigned Load State

1 192.168.254.1 100% Active
2 (local) 192.168.254.2 0% Standby

Local member is in current state since Thu Jun 21 08:18:04 2018

[Expert@CXLM02:0]# cphaprob stat

Cluster Mode: High Availability (Active Up) with IGMP Membership

Number Unique Address Assigned Load State

1 192.168.254.1 0% Down
2 (local) 192.168.254.2 100% Active

Local member is in current state since Thu Jun 21 08:32:15 2018

[Expert@CXLM02:0]# cphaprob -a if

Required interfaces: 7
Required secured interfaces: 1

eth2-01 UP non sync(non secured), broadcast
eth2-03 UP non sync(non secured), broadcast
eth2-04 Disconnected non sync(non secured), broadcast
eth2-05 UP non sync(non secured), broadcast
Mgmt UP non sync(non secured), broadcast
Sync UP sync(secured), broadcast
eth3-01 UP non sync(non secured), broadcast (eth3-01.332)
eth3-01 UP non sync(non secured), broadcast (eth3-01.24)

Virtual cluster interfaces: 10

eth2-01 10.255.100.10 VMAC address: 00:1C:7F:00:00:01
eth2-03 192.168.8.9 VMAC address: 00:1C:7F:00:00:01
eth2-05 10.255.101.10 VMAC address: 00:1C:7F:00:00:01
Mgmt 192.168.20.60 VMAC address: 00:1C:7F:00:00:01
eth3-01.48 192.168.48.10 VMAC address: 00:1C:7F:00:00:01
eth3-01.332 172.16.32.10 VMAC address: 00:1C:7F:00:00:01
eth3-01.40 192.168.40.10 VMAC address: 00:1C:7F:00:00:01
eth3-01.24 192.168.24.10 VMAC address: 00:1C:7F:00:00:01
eth3-01.32 192.168.32.10 VMAC address: 00:1C:7F:00:00:01
eth3-01.324 172.16.24.10 VMAC address: 00:1C:7F:00:00:01

[Expert@CXLM02:0]#

This is a bit annoying, especially from the point of view of monitoring interface states on remote equipment and possible automation actions associated with state changes.

Anyone cares to chime-in?

PhoneBoy · ‎2018-06-28

CCP monitors the entire path (not just the link).

As such, what you're observing is expected behavior.

In R80.20, we'll have a way to monitor just the link (without the path).

Rolf_Sommerhald · ‎2018-10-05

> In R80.20, we'll have a way to monitor just the link (without the path).

That's exactly what we need, but could not find yet how to do it in ClusterXL R80.20 Administration Guide, besides the new Automatic and Unicast Modes for CCP.

Where can we find more about checking Link State only (on Gaia), and how to disable end-to-end path checking by CCP on selected interfaces? Thanks.

PhoneBoy · ‎2018-10-06

It's possible this requires the new kernel...will check.

Are you a member of CheckMates?

Is the Link State Monitoring dead?