Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Luciano_Cirino
Contributor

Failure in the implementation of ClusterXL

Good afternoon everyone,

I'm facing a problem with the ClusterXL implementation. When completing the implementation, one of the cluster members, in this case, FW_01, is in DOWN status. I'm not sure if this was before or after the access rules were created.

Has anyone experienced this before?

Attached is a screenshot of the command output for each of the members.

0 Kudos
4 Replies
the_rock
Legend
Legend

Please send below from both members.

Andy

cphaprob roles

cphaprob state

cphaprob -a if

cphaprob list

cphaprob syncstat

Cheers,

Andy

0 Kudos
Luciano_Cirino
Contributor

Member 1 - FW_01

FW_01> cphaprob roles

ID Role

1 (local) Non-Master
2 Master

FW_01> cphaprob stat

Cluster Mode: High Availability (Active Up) with IGMP Membership

ID Unique Address Assigned Load State Name

1 (local) 172.16.0.1 0% DOWN FW_01
2 172.16.0.2 100% ACTIVE(!) FW_02


Active PNOTEs: LPRB, IAC

Last member state change event:
Event Code: CLUS-112000
State change: INIT -> DOWN
Reason for state change: USER DEFINED PNOTE
Event time: Mon Jun 19 15:00:24 2023

Last cluster failover event:
Transition to new ACTIVE: Member 1 -> Member 2
Reason: FULLSYNC PNOTE - cpstop
Event time: Mon Jun 19 15:00:06 2023

Cluster failover count:
Failover counter: 1
Time of counter reset: Wed Jun 14 13:08:01 2023 (reboot)


FW_01> cphaprob -a if

CCP mode: Manual (Unicast)
Required interfaces: 2
Required secured interfaces: 1


Interface Name: Status:

eth1 UP
Sync (S) UP
Mgmt (P) DOWN (940275 secs)

S - sync, HA/LS - bond type, LM - link monitor, P - probing

Virtual cluster interfaces: 2

eth1 10.24.3.254
Mgmt 192.168.1.3

FW_01> cphaprob list

Built-in Devices:

Device Name: Interface Active Check
Current state: problem

Registered Devices:

Device Name: Local Probing
Registration number: 8
Timeout: none
Current state: problem
Time since last report: 940291 sec

FW_01> cphaprob syncstat

Delta Sync Statistics

Sync status: OK

Drops:
Lost updates................................. 0
Lost bulk update events...................... 0
Oversized updates not sent................... 0

Sync at risk:
Sent reject notifications.................... 0
Received reject notifications................ 0

Sent messages:
Total generated sync messages................ 1142352
Sent retransmission requests................. 2
Sent retransmission updates.................. 0
Peak fragments per update.................... 1

Received messages:
Total received updates....................... 13704124
Received retransmission requests............. 0

Sync Interface:
Name......................................... Sync
Link speed................................... 1000Mb/s
Rate......................................... 78480 [Bps]
Peak rate.................................... 78480 [Bps]
Link usage................................... 0%
Total........................................ 65614 [MB]

Queue sizes (num of updates):
Sending queue size........................... 512
Receiving queue size......................... 256
Fragments queue size......................... 50

Timers:
Delta Sync interval (ms)..................... 100

Reset on Mon Jun 19 15:00:30 2023 (triggered by fullsync).

============================================================================================================

 

Member 2 - FW_02

 

FW_02> cphaprob roles

ID Role

1 Non-Master
2 (local) Master

FW_02> cphaprob stat

Cluster Mode: High Availability (Active Up) with IGMP Membership

ID Unique Address Assigned Load State Name

1 172.16.0.1 0% DOWN FW_01
2 (local) 172.16.0.2 100% ACTIVE(!) FW_02


Active PNOTEs: LPRB, IAC

Last member state change event:
Event Code: CLUS-116505
State change: DOWN -> ACTIVE(!)
Reason for state change: All other machines are dead (timeout), Interface Sync is down (disconnected / link down)
Event time: Mon Jun 19 15:00:06 2023

Last cluster failover event:
Transition to new ACTIVE: Member 1 -> Member 2
Reason: Available on member 1
Event time: Mon Jun 19 15:00:06 2023

Cluster failover count:
Failover counter: 1
Time of counter reset: Wed Jun 14 13:08:01 2023 (reboot)


FW_02> cphaprob -a if

CCP mode: Manual (Unicast)
Required interfaces: 2
Required secured interfaces: 1


Interface Name: Status:

eth1 UP
Sync (S) UP
Mgmt (P) DOWN (940895 secs)

S - sync, HA/LS - bond type, LM - link monitor, P - probing

Virtual cluster interfaces: 2

eth1 10.24.3.254
Mgmt 192.168.1.3

FW_02> cphaprob list

Built-in Devices:

Device Name: Interface Active Check
Current state: problem (non-blocking)

Registered Devices:

Device Name: Local Probing
Registration number: 8
Timeout: none
Current state: problem
Time since last report: 940907 sec

FW_02> cphaprob syncstat

Delta Sync Statistics

Sync status: OK

Drops:
Lost updates................................. 0
Lost bulk update events...................... 0
Oversized updates not sent................... 0

Sync at risk:
Sent reject notifications.................... 0
Received reject notifications................ 0

Sent messages:
Total generated sync messages................ 13940522
Sent retransmission requests................. 1
Sent retransmission updates.................. 35
Peak fragments per update.................... 1

Received messages:
Total received updates....................... 972144
Received retransmission requests............. 1

Sync Interface:
Name......................................... Sync
Link speed................................... 1000Mb/s
Rate......................................... 78470 [Bps]
Peak rate.................................... 78470 [Bps]
Link usage................................... 0%
Total........................................ 65644 [MB]

Queue sizes (num of updates):
Sending queue size........................... 512
Receiving queue size......................... 256
Fragments queue size......................... 50

Timers:
Delta Sync interval (ms)..................... 100

Reset on Mon Jun 19 14:54:13 2023 (triggered by fullsync).

 

A detail,
member 1 and 2, talk to each other and to the management, but do not go out to the internet. Manages, talks to everyone and goes out to the internet.
0 Kudos
the_rock
Legend
Legend

Appears there is sync issue. If I were you, I would issue cphastop; cphastart on both members and test again.

Andy

0 Kudos
PhoneBoy
Admin
Admin

Edited the original post for clarity and moved it to the correct space

0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events