Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
jimm
Participant

ClusterXL not failing over when interface down

I have a pair of 3600 appliances running R80.40 T197. For the most part HA is working fine. If one member goes down, the other takes over. However, if i take down the external facing (and SMS management) interface, the active member becomes active-attention and the standby doesnt take over. I have a separate sync-dedicated interface on each member, so its not like they lose sight of each other. The SMS will lose sight since its the external/mgmt int that goes down.

Any ideas what i should check?

0 Kudos
6 Replies
Chris_Atkinson
Employee Employee
Employee

What pnotes do you see in cphaprob outputs?

cphaprob stat

cphaprob -a if

cphaprob -ia list

CCSM R77/R80/ELITE
0 Kudos
the_rock
Legend
Legend

Please send commands @Chris_Atkinson asked, plus output of cphaprob syncstat as well.

Cheers,

Andy

0 Kudos
jimm
Participant

Re the cphaprob commands, below is what i was seeing on the active member when its external interface (eth1) is downed. eth5 was dedicated for sync. eth1 was secondary sink as well as external interface. I have since resolved the issue by removing the sync role from eth1 and only having it on eth5 (dedicated sync).

Cluster Mode: High Availability (Active Up) with IGMP Membership

ID Unique Address Assigned Load State Name

1 169.254.1.1 0% STANDBY CPFW01
2 (local) 169.254.1.2 100% ACTIVE(!) CPFW02


Active PNOTEs: IAC

Last member state change event:
Event Code: CLUS-110205
State change: ACTIVE -> ACTIVE(!)
Reason for state change: Interface eth5 is down (disconnected / link down)
Event time: Wed Jul 19 09:00:14 2023

CCP mode: Manual (Unicast)
Required interfaces: 4
Required secured interfaces: 2


Interface Name: Status:

eth5 (S) UP
Mgmt Non-Monitored
eth1 (S) Inbound: DOWN (8.6 secs)
Outbound: DOWN (8.8 secs)
eth2.105 UP
eth2.979 UP

eth1 10.3.2.13
eth2.970 192.168.238.65
eth2.975 192.168.238.129
eth2.105 192.168.238.209
eth2.973 192.168.238.97
eth2.979 10.138.254.1
eth2.971 192.168.238.1
eth2.974 192.168.238.113


Built-in Devices:

Device Name: Interface Active Check
Current state: problem (non-blocking)


Delta Sync Statistics

Sync status: OK

Drops:
Lost updates................................. 0
Lost bulk update events...................... 0
Oversized updates not sent................... 0

Sync at risk:
Sent reject notifications.................... 0
Received reject notifications................ 0

Sent messages:
Total generated sync messages................ 4719247
Sent retransmission requests................. 42
Sent retransmission updates.................. 31
Peak fragments per update.................... 1

Received messages:
Total received updates....................... 306886
Received retransmission requests............. 20

Sync Interface:
Name......................................... eth1
Link speed................................... 2▒|▒
Rate......................................... 351020[Bps]
Peak rate.................................... 449670[Bps]
Link usage................................... 0%
Total........................................ 220568[MB]

Queue sizes (num of updates):
Sending queue size........................... 512
Receiving queue size......................... 256
Fragments queue size......................... 50

Timers:
Delta Sync interval (ms)..................... 100

0 Kudos
emmap
Employee
Employee

For future reference, as of R80 we only support a single sync interface, and recommend it to be dedicated to the purpose (so not also a cluster interface with a VIP that passes traffic).

If you require redundant sync paths, you can use a bond. Details on options are in the ClusterXL Admin Guide for your version.

the_rock
Legend
Legend

I recall single sync interface support even before R55...maybe recommended (not 100% positive), but redundant sync interface setup always had issues.

Andy

0 Kudos
_Val_
Admin
Admin

It sound to me, your SMS is only connected to one of the cluster members. Is this right? Can you please share a diagram with your setup?

0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events