cancel
Showing results for 
Search instead for 
Did you mean: 
Post a Question

ClusterXL

Hello All, Need a help with the ClusterXL issue. We've two 23800's configured in Cluster running on R80.10 Take-42 since December-2017. Recently we discovered, In the active firewall the output of 'cphaprob stat' shows Active/Down, however in the standby firewall shows Active/Standby. We tried cpstop & cptstart and rebooted problematic one, but that doesn't resolved our issue.

On the Active Firewall:- 

cphaprob stat

Cluster Mode: High Availability (Active Up) with IGMP Membership

Number Unique Address Assigned Load State

1 (local) 10.254.254.2 100% Active
2 10.254.254.1 0% Down

 On the Standby Firewall:-

cphaprob stat

Cluster Mode: High Availability (Active Up) with IGMP Membership

Number Unique Address Assigned Load State

1 10.254.254.2 100% Active
2 (local) 10.254.254.1 0% Standby

Couldn't find anything in pnotes. Everything looks okay.

FW1:- Standby

cphaprob list

There are no pnotes in problem state

* Use -l option to show full list of pnotes

FW2:- Active

cphaprob list

There are no pnotes in problem state

* Use -l option to show full list of pnotes

Does anyone came across the similar issue??? Can someone please give us their valuable suggestions on this.

10 Replies

Re: ClusterXL

My suggestion is to find where the problem is.

cpharpob stat

cphaprob -l list

cphaprob -a if

cphaconf show_bond

cphaconf cluster_id get

fw ctl pstat

fw tab -t connections -s

fw ctl affinity -l -a

fw ctl multik stat

cat /var/log/messages

How to troubleshoot failovers in ClusterXL 

How to troubleshoot failovers in ClusterXL - Advanced Guide 

0 Kudos
Vladimir
Pearl

Re: ClusterXL

May be related to the improper handling of multicast by the switch.

If you can, try changing the CCP mode to broadcast and see if this'll resolve your issues: How to set ClusterXL Control Protocol (CCP) in Broadcast / Multicast mode in ClusterXL 

0 Kudos

Re: ClusterXL

See SK:

sk33221

sk31934

Or changing the CCP mode to broadcast:

# cphaconf set_ccp broadcast

I don't understand why multicast mode is used by default anyway. Broadcast is always the best way.

Vladimir
Pearl

Re: ClusterXL

Because CCP in broadcast mode is blasted out of all interfaces of the cluster to every connected network, whereas properly configured multicast allows cluster members to subscribe to it reducing unnecessary traffic in the broadcast domain. 

Re: ClusterXL

Yes you're right, but with 2*10 CCP packets per interface, that's no problem from my point of view. But this has been a phillosofic discussion for 20 years (broadcast or multicast). If the switches are configured correctly "multicast" is better from my point of view. CCP Broadcast is uncomplicated.

Re: ClusterXL

Sumanth Parvathaneni

Run this command:

#tcpdump -eni Sync port 8116

This worked for me.

- Jeremy

Re: ClusterXL

sk20576 - as mentioned above, changing CCP mode from Multicast to Broadcast resolved this issue for me.  

Employee
Employee

Re: ClusterXL

In my last case it was the difference in affinity settings (fw ctl affinity -l -r -v) and multi-queue configuration on both members... 

Also, check hotfix level for both members... cpinfo -y all

Or maybe different time configured on both members... use NTP if possible

Change CCP to broadcast, specially if there is a Cisco Nexus Switch in the middle, it's happened to me several times...

Re: ClusterXL

Check if the switch, the firewall is connected to, has a feature enabled called EEE (Energy Efficient Ethernet).  If it is enabled, disable it and check the cluster state again.

0 Kudos

Re: ClusterXL

What does the long error message on the "Down" cluster gateway show?
# cpstat ha -f all

If there are interface errors, what does the following command show?
# cphaprob -a if

Do you see sync errors or cluster change with the following command?
# clish -c "show routed cluster-state detailed"

Any sync errors:
# fw ctl pstat