Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Dan_Currens
Participant

Mismatch in the number of CoreXL FW instances on new 3800 cluster, but no mismatch can be found.

Jump to solution

Two 3800 gateways running in a cluster on R80.40 Jumbo 125.

Node "A" in the cluster /var/log/messages full of:

fwk: CLUS-114802-2: State change: DOWN -> STANDBY | Reason: There is already an ACTIVE member in the cluster (member 1)
fwk: CLUS-113900-2: State change: STANDBY -> DOWN | Reason: Mismatch in the number of CoreXL FW instances has been detected

The cluster will not failover to this node.

Tried forcing the number of CoreXL instances to 6 in CPCONFIG and rebooting.

Tried removing nodes from cluster and re-adding.

Any suggestions?

 

Cluster does show as a healthy cluster in cphaprob stat:

Cluster Mode: High Availability (Active Up) with IGMP Membership

ID Unique Address Assigned Load State Name

1 192.168.142.6 100% ACTIVE windICSfw1B
2 (local) 192.168.142.5 0% STANDBY windICSfw1A

Active PNOTEs: None

Last member state change event:
Event Code: CLUS-114802
State change: DOWN -> STANDBY
Reason for state change: There is already an ACTIVE member in the cluster (member 1)
Event time: Thu Jan 13 09:36:39 2022

Last cluster failover event:
Transition to new ACTIVE: Member 2 -> Member 1
Reason: Mismatch in the number of CoreXL FW instances has been detected
Event time: Tue Jan 11 14:19:18 2022

Cluster failover count:
Failover counter: 3007

[Expert@windICSfw1A:0]# grep -c ^processor /proc/cpuinfo
8

[Expert@windICSfw1B:0]# grep -c ^processor /proc/cpuinfo
8

[Expert@windICSfw1A:0]# fw ctl affinity -l -r -a
CPU 0: Mgmt
CPU 1: fw_5
mpdaemon fwd cprid lpd pepd rad rtmd wsdnsd in.asessiond vpnd usrchkd in.acapd cprid cpd
CPU 2: fw_3
mpdaemon fwd cprid lpd pepd rad rtmd wsdnsd in.asessiond vpnd usrchkd in.acapd cprid cpd
CPU 3: fw_1
mpdaemon fwd cprid lpd pepd rad rtmd wsdnsd in.asessiond vpnd usrchkd in.acapd cprid cpd
CPU 4:
CPU 5: fw_4
mpdaemon fwd cprid lpd pepd rad rtmd wsdnsd in.asessiond vpnd usrchkd in.acapd cprid cpd
CPU 6: fw_2
mpdaemon fwd cprid lpd pepd rad rtmd wsdnsd in.asessiond vpnd usrchkd in.acapd cprid cpd
CPU 7: fw_0
mpdaemon fwd cprid lpd pepd rad rtmd wsdnsd in.asessiond vpnd usrchkd in.acapd cprid cpd
All:
Interface eth5: has multi queue enabled
Interface eth1: has multi queue enabled
Interface eth2: has multi queue enabled

[Expert@windICSfw1B:0]# fw ctl affinity -l -r -a
CPU 0: Mgmt
CPU 1: fw_5
mpdaemon fwd cprid lpd vpnd rad rtmd wsdnsd pepd in.asessiond usrchkd in.acapd cprid cpd
CPU 2: fw_3
mpdaemon fwd cprid lpd vpnd rad rtmd wsdnsd pepd in.asessiond usrchkd in.acapd cprid cpd
CPU 3: fw_1
mpdaemon fwd cprid lpd vpnd rad rtmd wsdnsd pepd in.asessiond usrchkd in.acapd cprid cpd
CPU 4:
CPU 5: fw_4
mpdaemon fwd cprid lpd vpnd rad rtmd wsdnsd pepd in.asessiond usrchkd in.acapd cprid cpd
CPU 6: fw_2
mpdaemon fwd cprid lpd vpnd rad rtmd wsdnsd pepd in.asessiond usrchkd in.acapd cprid cpd
CPU 7: fw_0
mpdaemon fwd cprid lpd vpnd rad rtmd wsdnsd pepd in.asessiond usrchkd in.acapd cprid cpd
All:
Interface eth5: has multi queue enabled
Interface eth1: has multi queue enabled
Interface eth2: has multi queue enabled

License looks correct CPSG-C-8-U

0 Kudos
2 Solutions

Accepted Solutions
HeikoAnkenbrand
Champion
Champion

Hi @Dan_Currens 

You also see the following message "CLUS-114802 There is already an ACTIVE member in the cluster (member X)". But that should be ok.

I would do the following:

1) Reboot both gateways

2) Check if dynamic split enabled
     # dynamic_split -p
    In this case, I have also seen various errors. If necessary, I would disable dynamic split.
    
3) Install the latest R80.40 JHF 139

View solution in original post

(1)
Timothy_Hall
Champion
Champion

Would appear to be some kind of bug since your CoreXL split is identical on both members.  However do you have another separate ClusterXL cluster running R80.10 or earlier on the same VLAN(s) as the problematic cluster?  Wondering if it could be a Magic MAC issue with an older cluster present, as Automatic MAC Magic was not introduced until R80.20.  Make sure that Automatic MAC Magic is enabled on all your clusters that support it with this clish command: show cluster mmagic

Also:

Cluster failover count:
Failover counter: 3007

Your cluster is flapping like crazy, TAC should probably be engaged here to see what is happening.

"Max Capture: Know Your Packets" Self-Guided Video Series
available at http://www.maxpowerfirewalls.com

View solution in original post

(1)
6 Replies
rickardsv
Participant

Hi, i had an similar issue but that was by using wrong license..

However, i tried as you to change in cpnconfig/corexl but it did not help and i ended up disabling corexl, reboot and then enable again and after that it worked.

0 Kudos
HeikoAnkenbrand
Champion
Champion

Hi @Dan_Currens 

You also see the following message "CLUS-114802 There is already an ACTIVE member in the cluster (member X)". But that should be ok.

I would do the following:

1) Reboot both gateways

2) Check if dynamic split enabled
     # dynamic_split -p
    In this case, I have also seen various errors. If necessary, I would disable dynamic split.
    
3) Install the latest R80.40 JHF 139

(1)
AmitShmuel
Employee
Employee

Dynamic Balancing does not change the number of loaded FW instances, only activates/inactivates if needed, which does not create a mismatch.

It does require the maximum number of FW instances to be loaded, but this should be configured on all clusters members. 

0 Kudos
_Val_
Admin
Admin

Please open a TAC case for this.

0 Kudos
Timothy_Hall
Champion
Champion

Would appear to be some kind of bug since your CoreXL split is identical on both members.  However do you have another separate ClusterXL cluster running R80.10 or earlier on the same VLAN(s) as the problematic cluster?  Wondering if it could be a Magic MAC issue with an older cluster present, as Automatic MAC Magic was not introduced until R80.20.  Make sure that Automatic MAC Magic is enabled on all your clusters that support it with this clish command: show cluster mmagic

Also:

Cluster failover count:
Failover counter: 3007

Your cluster is flapping like crazy, TAC should probably be engaged here to see what is happening.

"Max Capture: Know Your Packets" Self-Guided Video Series
available at http://www.maxpowerfirewalls.com
(1)
Dan_Currens
Participant

That was it, there was another old cluster in the same VLAN .  Thank you for your reply.

0 Kudos