Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Zoxir86
Contributor

Cluster inconsistency after policy installation

Jump to solution

Hello guys

 

We have noticed that after upgrading from R80.30 to R81.10 two of our clusters behave erratically after policy installation. The primary will remain as active(F) which believing the secondary is in standby while the secondary shows itself as active and the primary as Lost. This will go on for 4-5 minutes until both members converge in the correct state. cphaprob says that there are no ccp sent in the sync interface however tcpdump says otherwise

 

I noticed that the output of the following parameters is different on both members

 

[Expert@GW-01:0]# fw ctl get int fwha_mac_magic

fwha_mac_magic = 254

[Expert@GW-01:0]# fw ctl get int fwha_mac_forward_magic

fwha_mac_forward_magic = 253

 

[Expert@GW-02:0]# fw ctl get int fwha_mac_magic

fwha_mac_magic = 1

[Expert@GW-02:0]# fw ctl get int fwha_mac_forward_magic

fwha_mac_forward_magic = 254

 

Are this still relevant in R81.10, I have a case open with tac however it has been lagging so any opinion would be helpful

0 Kudos
1 Solution

Accepted Solutions
Zoxir86
Contributor

Rebooting the standby member has solved the issue for now, I will monitor this for a moment.

View solution in original post

32 Replies

Your commands are outdated! Better look into the current CP_R81.10_ClusterXL_AdminGuide and here: sk42096: Cluster member is stuck in 'Ready' state

CCSE CCTE CCSM SMB Specialist
0 Kudos
Zoxir86
Contributor

I did not see any mention of cluster ID or mac magic in the new admin guide. I am afraid that this is something the gateways have inherited from previous versions since they have been upgraded in place. On other R81.10 cluster I have done a clean install the values are the same on both members.

0 Kudos

What CCP mode does each member believe it is operating in?

Note the following was introduced starting from R80.40

* Support for Cluster Control Protocol (CCP) in Unicast mode for any number of cluster members eliminating the need for CCP Broadcast, Multicast or Automatic modes.

* Eliminated the need for MAC Magic configuration when several clusters are connected to the same subnet.

* Cluster Control Protocol encryption is now enabled by default.

 

0 Kudos
Zoxir86
Contributor

You're onto something while the active member has ccp correctly set as unicast on the standby it is like this

 

CCP mode: Manual (Multicast)

0 Kudos

They should ideally both be the same i.e. both auto or both unicast.

0 Kudos
Zoxir86
Contributor

Any idea as to how I can change it?

0 Kudos

To set the CCP mode:

  • In Gaia Clish, run:

    set cluster ccp {auto | unicast | multicast | broadcast}

  • In Expert mode, run:

    cphaconf set_ccp {auto | unicast | multicast | broadcast}

This configuration applies immediately and survives reboot.

0 Kudos
the_rock
Champion
Champion

I checked in client's cluster and both members have exact same values, mac_magic as 254 and mac_forward_magic as 253.

Can you paste output of cphaprob state and cphaprob -a if?

0 Kudos
Zoxir86
Contributor

gw2

cphaprob -a if

CCP mode: Manual (Multicast)
Required interfaces: 5
Required secured interfaces: 1


Interface Name: Status:

eth5 UP
eth8 UP
Sync (S) UP
bond0.11 (LS) UP
bond0.48 (LS) UP

S - sync, LM - link monitor, HA/LS - bond type

Virtual cluster interfaces: 16

 

gw1

 

cphaprob -a if

CCP mode: Manual (Unicast)
Required interfaces: 5
Required secured interfaces: 1


Interface Name: Status:

eth5 UP
eth8 UP
Sync (S) UP
bond0.11 (LS) UP
bond0.48 (LS) UP

S - sync, LM - link monitor, HA/LS - bond type

Virtual cluster interfaces: 16

0 Kudos
the_rock
Champion
Champion

[Expert@HostName]# cphaconf set_ccp unicast

I dont believe multicast is supported in R81.10 from what I remember last time customer and I tried changing it, but give it a go. Its been few months, so its possible it was another protocol.

0 Kudos
Zoxir86
Contributor

I change it however the issue still happens this is what cphaprob state on gw2 looks after policy installation. There is no unique address for gw-1 either

 

ID Unique Address Assigned Load State Name

1 none 0% GW-01
2 (local) a.b.c.d 100% GW-02


Active PNOTEs: LPRB, IAC

Last member state change event:
Event Code: CLUS-116505
State change: DOWN -> ACTIVE(!)
Reason for state change: All other machines are dead (timeout), Interface Sync is down (Cluster Control Protocol packets are not received)
Event time: Thu Jul 14 16:59:52 2022

Last cluster failover event:
Transition to new ACTIVE: Member 1 -> Member 2
Reason: Available on member 1
Event time: Thu Jul 14 16:59:52 2022

Cluster failover count:
Failover counter: 6
Time of counter reset: Fri Jun 24 08:32:48 2022 (reboot)

0 Kudos
the_rock
Champion
Champion

What mode did you change it to? 

0 Kudos
Zoxir86
Contributor

I changed both members to unicast

0 Kudos
the_rock
Champion
Champion

So what is currently output of cphaprob state and cphaprob -a if?

0 Kudos
Zoxir86
Contributor

CCP mode: Manual (Unicast)
Required interfaces: 5
Required secured interfaces: 1

same for both

0 Kudos
the_rock
Champion
Champion

Okay and what about magic_mac settings? Are they the same? If so, then its possible to make it work, you may need to do a quick failover by running clusterXL_admin down on master or cphastop and cphastart on BOTH.

Andy

0 Kudos
charlokt
Explorer

Same thing happened to me. I upgraded from R80.30 to R81.10 and the cluster was not established as it kept poking both members. Would it be necessary to configure the CCP mode so that the cluster remains Active > Passive? and keep it in the state it comes from R80.30?print.jpg

0 Kudos
charlokt
Explorer

Same thing happened to me. I upgraded from R80.30 to R81.10 and the cluster was not established as it kept poking both members. Would it be necessary to configure the CCP mode so that the cluster remains Active > Passive? and keep it in the state it comes from R80.30?

 

print.jpg

0 Kudos
the_rock
Champion
Champion

I upgraded one customer's cluster from R80.40 to R81.10 and never had this problem. The mode has always been unicast.

0 Kudos

It's also worth checking CCP encryption per sk169777.

0 Kudos
Zoxir86
Contributor

This might be worth a try, I will probably check this on Monday and come back  here with the results 

0 Kudos
Zoxir86
Contributor

This unfortunately did not work

0 Kudos
Zoxir86
Contributor

This is exactly the behavior I am seeing too still no meaningful reply from TAC

0 Kudos
Zoxir86
Contributor

Rebooting the standby member has solved the issue for now, I will monitor this for a moment.

Has it remained stable since?

(Note sk174510 was recently published for an MVC upgrade issue)

0 Kudos
Zoxir86
Contributor

It has, I have marked the post as the solution, thank you for you help

0 Kudos
Oliver_Fink
Advisor

So, if kernel parameters are different – what is the content of $FWDIR/boot/modules/fwkern.conf of both nodes? Any kernel parameters set there differently?

Zoxir86
Contributor

The parameters mentioned above are not in fwkern.conf

0 Kudos
the_rock
Champion
Champion

I believe it would if it was set manually

$FWDIR/boot/modules/fwkern.conf

0 Kudos