AnsweredAssumed Answered

Is the Link State Monitoring dead?

Question asked by Vladimir Yakovlev Champion on Jun 22, 2018
Latest reply on Oct 6, 2018 by Dameon Welch-Abernathy

This subject is occasionally being brought-up and Timothy Hall has written about CCP behaviour on interfaces connected to VLANs with no pingable hosts.

 

I do recall, from IPSO days, that there was a feature called Link State Monitoring and have decided to give it a shot on a pair of 15400s running R80.10 and destined to be connected to a L2 only switch.

 

 

Even with "Link State Monitoring" enabled on both members (sk31336), when HA cluster members are connected to a Layer 2 switch, the only interfaces of the cluster that remain "UP" during reboot of the second HA member, are those that have other pingable hosts on same VLAN:


[Expert@CXLM01:0]# cat $FWDIR/boot/modules/fwkern.conf
fwha_forw_packet_to_not_active=1
fwha_monitor_if_link_state=1
[Expert@CXLM01:0]#

 

[Expert@CXLM01:0]# cphaprob stat

Cluster Mode: High Availability (Active Up) with IGMP Membership

Number Unique Address Assigned Load State

1 (local) 192.168.8.11 100% Active Attention
2 192.168.8.12 0% ClusterXL Inactive or Machine is Down

Local member is in current state since Thu Jun 21 08:14:53 2018

[Expert@CXLM01:0]#
[Expert@CXLM01:0]# cphaprob -a if

Required interfaces: 7
Required secured interfaces: 1

eth2-01 Inbound: DOWN (48.8 secs) Outbound: DOWN (49 secs) non sync(non secured), broadcast
eth2-03 UP non sync(non secured), broadcast (External interface connected to the VLAN with the router)
eth2-04 Disconnected non sync(non secured), broadcast
eth2-05 Inbound: DOWN (48.6 secs) Outbound: DOWN (49 secs) non sync(non secured), broadcast
Mgmt UP non sync(non secured), broadcast (Management interface connected to the VLAN with SMS)
Sync Inbound: DOWN (48.8 secs) Outbound: DOWN (48.9 secs) sync(secured), broadcast
eth3-01 Inbound: DOWN (48.6 secs) Outbound: DOWN (48.9 secs) non sync(non secured), broadcast (eth3-01.332)
eth3-01 Inbound: DOWN (48.6 secs) Outbound: DOWN (48.9 secs) non sync(non secured), broadcast (eth3-01.24)

Virtual cluster interfaces: 10

eth2-01 10.255.100.10 VMAC address: 00:1C:7F:00:00:01
eth2-03 192.168.8.9 VMAC address: 00:1C:7F:00:00:01
eth2-05 10.255.101.10 VMAC address: 00:1C:7F:00:00:01
Mgmt 192.168.20.60 VMAC address: 00:1C:7F:00:00:01
eth3-01.48 192.168.48.10 VMAC address: 00:1C:7F:00:00:01
eth3-01.332 172.16.32.10 VMAC address: 00:1C:7F:00:00:01
eth3-01.40 192.168.40.10 VMAC address: 00:1C:7F:00:00:01
eth3-01.24 192.168.24.10 VMAC address: 00:1C:7F:00:00:01
eth3-01.32 192.168.32.10 VMAC address: 00:1C:7F:00:00:01
eth3-01.324 172.16.24.10 VMAC address: 00:1C:7F:00:00:01

[Expert@CXLM01:0]#

 

When testing a cluster failover using "clusterXL_admin down/up", you may be lulled in a false sense of security, since failover will be flowless: "Standby" will become "Active" and all the interfaces of both member will remain "Up".

 

Member 1:

[Expert@CXLM01:0]# cphaprob stat

Cluster Mode: High Availability (Active Up) with IGMP Membership

Number Unique Address Assigned Load State

1 (local) 192.168.254.1 100% Active
2 192.168.254.2 0% Standby

Local member is in current state since Thu Jun 21 08:14:53 2018

[Expert@CXLM01:0]# clusterXL_admin down
Setting member to administratively down state ...
Member current state is Down
[Expert@CXLM01:0]# cphaprob stat

Cluster Mode: High Availability (Active Up) with IGMP Membership

Number Unique Address Assigned Load State

1 (local) 192.168.254.1 0% Down
2 192.168.254.2 100% Active

Local member is in current state since Thu Jun 21 08:32:58 2018

[Expert@CXLM01:0]# cphaprob -a if

Required interfaces: 7
Required secured interfaces: 1

eth2-01 UP non sync(non secured), broadcast
eth2-03 UP non sync(non secured), broadcast
eth2-04 Disconnected non sync(non secured), broadcast
eth2-05 UP non sync(non secured), broadcast
Mgmt UP non sync(non secured), broadcast
Sync UP sync(secured), broadcast
eth3-01 UP non sync(non secured), broadcast (eth3-01.332)
eth3-01 UP non sync(non secured), broadcast (eth3-01.24)

Virtual cluster interfaces: 10

eth2-01 10.255.100.10 VMAC address: 00:1C:7F:00:00:01
eth2-03 192.168.8.9 VMAC address: 00:1C:7F:00:00:01
eth2-05 10.255.101.10 VMAC address: 00:1C:7F:00:00:01
Mgmt 192.168.20.60 VMAC address: 00:1C:7F:00:00:01
eth3-01.48 192.168.48.10 VMAC address: 00:1C:7F:00:00:01
eth3-01.332 172.16.32.10 VMAC address: 00:1C:7F:00:00:01
eth3-01.40 192.168.40.10 VMAC address: 00:1C:7F:00:00:01
eth3-01.24 192.168.24.10 VMAC address: 00:1C:7F:00:00:01
eth3-01.32 192.168.32.10 VMAC address: 00:1C:7F:00:00:01
eth3-01.324 172.16.24.10 VMAC address: 00:1C:7F:00:00:01

[Expert@CXLM01:0]#

 

Member 2:

 

[Expert@CXLM02:0]# cphaprob stat

Cluster Mode: High Availability (Active Up) with IGMP Membership

Number Unique Address Assigned Load State

1 192.168.254.1 100% Active
2 (local) 192.168.254.2 0% Standby

Local member is in current state since Thu Jun 21 08:18:04 2018

[Expert@CXLM02:0]# cphaprob stat

Cluster Mode: High Availability (Active Up) with IGMP Membership

Number Unique Address Assigned Load State

1 192.168.254.1 0% Down
2 (local) 192.168.254.2 100% Active

Local member is in current state since Thu Jun 21 08:32:15 2018

[Expert@CXLM02:0]# cphaprob -a if

Required interfaces: 7
Required secured interfaces: 1

eth2-01 UP non sync(non secured), broadcast
eth2-03 UP non sync(non secured), broadcast
eth2-04 Disconnected non sync(non secured), broadcast
eth2-05 UP non sync(non secured), broadcast
Mgmt UP non sync(non secured), broadcast
Sync UP sync(secured), broadcast
eth3-01 UP non sync(non secured), broadcast (eth3-01.332)
eth3-01 UP non sync(non secured), broadcast (eth3-01.24)

Virtual cluster interfaces: 10

eth2-01 10.255.100.10 VMAC address: 00:1C:7F:00:00:01
eth2-03 192.168.8.9 VMAC address: 00:1C:7F:00:00:01
eth2-05 10.255.101.10 VMAC address: 00:1C:7F:00:00:01
Mgmt 192.168.20.60 VMAC address: 00:1C:7F:00:00:01
eth3-01.48 192.168.48.10 VMAC address: 00:1C:7F:00:00:01
eth3-01.332 172.16.32.10 VMAC address: 00:1C:7F:00:00:01
eth3-01.40 192.168.40.10 VMAC address: 00:1C:7F:00:00:01
eth3-01.24 192.168.24.10 VMAC address: 00:1C:7F:00:00:01
eth3-01.32 192.168.32.10 VMAC address: 00:1C:7F:00:00:01
eth3-01.324 172.16.24.10 VMAC address: 00:1C:7F:00:00:01

[Expert@CXLM02:0]#

 

This is a bit annoying, especially from the point of view of monitoring interface states on remote equipment and possible automation actions associated with state changes.

 

Anyone cares to chime-in?

Outcomes