CoreXL doesnt see one core

pepso100 · ‎2022-11-24

Hi guys,

I am just converted StandAlone FW to cluster, but I can't sync cluster due to error "Mismatch in the number of CoreXL FW instances has been detected"

[Expert@gw_b:0]# cphaprob state

Cluster Mode: High Availability (Active Up) with IGMP Membership

ID Unique Address Assigned Load State Name

1 169.254.0.2 100% ACTIVE(!) gwA
2 (local) 169.254.0.1 0% DOWN gwB

Active PNOTEs: COREXL

Last member state change event:
Event Code: CLUS-113905
State change: ACTIVE(!) -> DOWN
Reason for state change: Mismatch in the number of CoreXL FW instances has been detected
Event time: Thu Nov 24 14:55:43 2022

Last cluster failover event:
Transition to new ACTIVE: Member 2 -> Member 1
Reason: Mismatch in the number of CoreXL FW instances has been detected
Event time: Thu Nov 24 14:55:43 2022

Cluster failover count:
Failover counter: 1
Time of counter reset: Thu Nov 24 14:55:07 2022 (reboot)

1. licenses should be fine on both members of clsuter

[Expert@gw_a:0]# cplic print
Host Expiration Features
192.168.1.100 16Dec2022 CPSG-C-8-U CPSB-FW CPSB-VPN CPSB-IPSA CPSB-DL

[Expert@gw_b:0]# cplic print
Host Expiration Features
192.168.1.100 24Dec2022 CPSG-C-8-U CPSB-FW CPSB-VPN CPSB-IPSA CPSB-DLP

2. Number of cores is same on both members of cluster

[Expert@gw_a:0]# cpstat -f cpu os | grep -i CPUs
CPUs Number: 6

[Expert@gw_b:0]# cpstat -f cpu os | grep -i CPUs
CPUs Number: 6

3. here is difference.

on GW_A is (only)

fw_0: CPU 5
fw_1: CPU 2

and on GW_B is
fw_0: CPU 5
fw_1: CPU 2
fw_2: CPU 4

[Expert@gw_a:0]# fw ctl affinity -l
Interface eth0: CPU 0
Interface eth1: CPU 0
Interface eth2: CPU 0
Interface eth3: CPU 0
fw_0: CPU 5
fw_1: CPU 2
Daemon mpdaemon: CPU 2 5
Daemon fwd: CPU 2 5
Daemon scrubd: CPU 2 5
Daemon core_uploader: CPU 2 5
Daemon rtmd: CPU 2 5
Daemon rad: CPU 2 5
Daemon in.msd: CPU 2 5
Daemon usrchkd: CPU 2 5
Daemon pdpd: CPU 2 5
Daemon cp_file_convertd: CPU 2 5
Daemon pepd: CPU 2 5
Daemon lpd: CPU 2 5
Daemon in.asessiond: CPU 2 5
Daemon scanengine_b: CPU 2 5
Daemon watermark_cp_file_convertd: CPU 2 5
Daemon vpnd: CPU 2 5
Daemon in.acapd: CPU 2 5
Daemon scrub_cp_file_convertd: CPU 2 5
Daemon cprid: CPU 2 5
Daemon cpd: CPU 2 5

[Expert@gw_b:0]# fw ctl affinity -l
Interface eth0: CPU 0
Interface eth1: CPU 0
Interface eth2: CPU 0
Interface eth3: CPU 0
fw_0: CPU 5
fw_1: CPU 2
fw_2: CPU 4
Daemon mpdaemon: CPU 2 4 5
Daemon fwd: CPU 2 4 5
Daemon pdpd: CPU 2 4 5
Daemon lpd: CPU 2 4 5
Daemon cp_file_convertd: CPU 2 4 5
Daemon vpnd: CPU 2 4 5
Daemon scrub_cp_file_convertd: CPU 2 4 5
Daemon pepd: CPU 2 4 5
Daemon rad: CPU 2 4 5
Daemon rtmd: CPU 2 4 5
Daemon in.asessiond: CPU 2 4 5
Daemon usrchkd: CPU 2 4 5
Daemon in.acapd: CPU 2 4 5
Daemon watermark_cp_file_convertd: CPU 2 4 5
Daemon in.msd: CPU 2 4 5
Daemon scrubd: CPU 2 4 5
Daemon cprid: CPU 2 4 5
Daemon cpd: CPU 2 4 5

Any idea someone what can be an issue?

Thank you very much for any advice...

Chris_Atkinson · ‎2022-11-24

Have you checked cpconfig > corexl?

Could you please share the version / jumbo take information for the cluster?

CCSM R77/R80/ELITE

pepso100 · ‎2022-11-24

GW_B

[Expert@gw_b:0]# cpinfo -y all

This is Check Point CPinfo Build 914000214 for GAIA
[IDA]
No hotfixes..

[MGMT]
No hotfixes..

[CPFC]
HOTFIX_TEX_ENGINE_R81_AUTOUPDATE

[FW1]
HOTFIX_GOT_TPCONF_AUTOUPDATE
HOTFIX_TEX_ENGINE_R81_AUTOUPDATE

FW1 build number:
This is Check Point's software version R81 - Build 959
kernel: R81 - Build 813

[SecurePlatform]
No hotfixes..

[PPACK]
No hotfixes..

[CPinfo]
No hotfixes..

[AutoUpdater]
No hotfixes..

[DIAG]
No hotfixes..

[CVPN]
No hotfixes..

[CPUpdates]
BUNDLE_GOT_TPCONF_AUTOUPDATE Take: 63
BUNDLE_TEX_ENGINE_R81_AUTOUPDATE Take: 10

GW_A

gw_a> cpinfo -y all

This is Check Point CPinfo Build 914000231 for GAIA
[IDA]
No hotfixes..
[MGMT]
No hotfixes..
[CPFC]
HOTFIX_TEX_ENGINE_R81_AUTOUPDATE
[FW1]
HOTFIX_TEX_ENGINE_R81_AUTOUPDATE
HOTFIX_R80_40_MAAS_TUNNEL_AUTOUPDATE
HOTFIX_GOT_TPCONF_AUTOUPDATE

FW1 build number:
This is Check Point's software version R81 - Build 959
kernel: R81 - Build 813
[SecurePlatform]
No hotfixes..
[PPACK]
No hotfixes..
[CPinfo]
No hotfixes..
[AutoUpdater]
No hotfixes..
[DIAG]
No hotfixes..
[CVPN]
No hotfixes..
[CPUpdates]
BUNDLE_CPSDC_AUTOUPDATE Take: 21
BUNDLE_HCP_AUTOUPDATE Take: 58
BUNDLE_CORE_FILE_UPLOADER_AUTOUPDATE Take: 17
BUNDLE_R80_40_MAAS_TUNNEL_AUTOUPDATE Take: 49
BUNDLE_INFRA_AUTOUPDATE Take: 55
BUNDLE_DEP_INSTALLER_AUTOUPDATE Take: 25
BUNDLE_GENERAL_AUTOUPDATE Take: 13
BUNDLE_GOT_TPCONF_AUTOUPDATE Take: 111
BUNDLE_TEX_ENGINE_R81_AUTOUPDATE Take: 12
[CPDepInst]
No hotfixes..
[core_uploader]
HOTFIX_CHARON_HF
[hcp_wrapper]
HOTFIX_HCP_AUTOUPDATE
[cpsdc_wrapper]
HOTFIX_CPSDC_AUTOUPDATE

Lesley · ‎2022-11-24

1 gateway has 2 FW workers and the other one has 3. I would pick a number and use it on both members 🙂

After changes I think you need to reboot to make it effective.

-------
If you like this post please give a thumbs up(kudo)! 🙂

pepso100 · ‎2022-11-24

how can I do that?

number of workers should be always number of cores -1 (SND), no?

_Val_ · ‎2022-11-24

By reading the comments here, it seems your boxes have different amount of CoreXL instances configured. run cpconfig and check there is any manual modifications were done. Set up the same amount of instances and rebote the members. This should clean up the issue.

pepso100 · ‎2022-11-24

Hi Val,

thx for advice.

I did exactly what you said ( I setup 6 instances on both nodes via cpconfig and reboot them), and now it looks that number of CoreXL instantces are same but output is slightly different on GW_A and GW_B

[Expert@gw_a:0]# fw ctl affinity -l
Interface eth0: CPU 0
Interface eth1: CPU 0
Interface eth2: CPU 0
Interface eth3: CPU 0
fw_0: CPU 5
fw_1: CPU 2
fw_2: CPU 2
fw_3: CPU 2
fw_4: CPU 2
fw_5: CPU 2
Daemon mpdaemon: CPU 2 5
Daemon fwd: CPU 2 5
Daemon scrub_cp_file_convertd: CPU 2 5
Daemon core_uploader: CPU 2 5
Daemon rtmd: CPU 2 5
Daemon rad: CPU 2 5
Daemon in.msd: CPU 2 5
Daemon pdpd: CPU 2 5
Daemon watermark_cp_file_convertd: CPU 2 5
Daemon lpd: CPU 2 5
Daemon scrubd: CPU 2 5
Daemon in.asessiond: CPU 2 5
Daemon scanengine_b: CPU 2 5
Daemon vpnd: CPU 2 5
Daemon cp_file_convertd: CPU 2 5
Daemon usrchkd: CPU 2 5
Daemon in.acapd: CPU 2 5
Daemon pepd: CPU 2 5
Daemon cprid: CPU 2 5
Daemon cpd: CPU 2 5

[Expert@gw_b:0]# fw ctl affinity -l
Interface eth0: CPU 0
Interface eth1: CPU 0
Interface eth2: CPU 0
Interface eth3: CPU 0
fw_0: CPU 5
fw_1: CPU 2
fw_2: CPU 4
fw_3: CPU 1
fw_4: CPU 3
fw_5: CPU 0

But I have still same issue.

[Expert@gw_a:0]# cphaprob state

Cluster Mode: High Availability (Active Up) with IGMP Membership

ID Unique Address Assigned Load State Name

1 (local) 169.254.0.2 100% ACTIVE(!) gwA
2 169.254.0.1 0% DOWN gwB

Active PNOTEs: COREXL

Last member state change event:
Event Code: CLUS-113905
State change: STANDBY -> ACTIVE(!)
Reason for state change: Mismatch in the number of CoreXL FW instances has been detected
Event time: Thu Nov 24 15:06:52 2022

Last cluster failover event:
Transition to new ACTIVE: Member 2 -> Member 1
Reason: Mismatch in the number of CoreXL FW instances has been detected
Event time: Thu Nov 24 15:06:52 2022

Cluster failover count:
Failover counter: 1
Time of counter reset: Thu Nov 24 15:06:17 2022 (reboot)

Ilya_Yusupov · ‎2022-11-24

Hi @pepso100 ,

Based on your last output i can only a guess that one member running in user space mode while the second one i kernel mode.

You can validate through cpconfig, under corexl section.

pepso100 · ‎2022-11-25

Hi Ilya,

Where in cpconfig I can see if I am in kernel mode or user space mode?

Thank you.

gw_b> cpconfig
This program will let you re-configure
your Check Point products configuration.

Configuration Options:
----------------------
(1) Licenses and contracts
(2) SNMP Extension
(3) PKCS#11 Token
(4) Random Pool
(5) Secure Internal Communication
(6) Disable cluster membership for this gateway
(7) Enable Check Point Per Virtual System State
(8) Enable Check Point ClusterXL for Bridge Active/Standby
(9) Check Point CoreXL
(10) Automatic start of Check Point Products

(11) Exit

Enter your choice (1-11) :9

Configuring Check Point CoreXL...
=================================

CoreXL is currently enabled with 6 IPv4 firewall instances.

(1) Change the number of firewall instances
(2) Disable Check Point CoreXL

Ilya_Yusupov · ‎2022-11-25

Sorry i forgot to ask first which version you are running as the option is there in R81.10.

pepso100 · ‎2022-11-25

yes u right, I am running on R81.10

_Val_ · ‎2022-11-25

this is not what I asked you to do. run cpconfig and show us what you have in corexl on both members.

pepso100 · ‎2022-11-25

Hi Valeri,

1.

in your original post you wrote: "Set up the same amount of instances and rebote the members" .. .so I did it.

So now I am little bit confused by your reply "this is not what I asked you to do".

2. Today morning when I boot up both FWs, status of cluster was ok

[Expert@gw_b:0]# cphaprob state

Cluster Mode: High Availability (Active Up) with IGMP Membership

ID Unique Address Assigned Load State Name

1 169.254.0.2 0% STANDBY gwA
2 (local) 169.254.0.1 100% ACTIVE gwB

Active PNOTEs: None

Last member state change event:
Event Code: CLUS-114904
State change: ACTIVE(!) -> ACTIVE
Reason for state change: Reason for ACTIVE! alert has been resolved
Event time: Fri Nov 25 10:26:29 2022

Cluster failover count:
Failover counter: 0
Time of counter reset: Fri Nov 25 10:25:17 2022 (reboot)

[Expert@gw_a:0]# cphaprob state

Cluster Mode: High Availability (Active Up) with IGMP Membership

ID Unique Address Assigned Load State Name

1 (local) 169.254.0.2 0% STANDBY gwA
2 169.254.0.1 100% ACTIVE gwB

Active PNOTEs: None

Last member state change event:
Event Code: CLUS-114802
State change: INIT -> STANDBY
Reason for state change: There is already an ACTIVE member in the cluster (member 2)
Event time: Fri Nov 25 10:26:30 2022

Cluster failover count:
Failover counter: 0
Time of counter reset: Fri Nov 25 10:25:17 2022 (reboot)

3.

[Expert@gw_a:0]# fw ctl affinity -l
Interface eth0: CPU 0
Interface eth1: CPU 0
Interface eth2: CPU 0
Interface eth3: CPU 0
fw_0: CPU 5
fw_1: CPU 2
fw_2: CPU 4
fw_3: CPU 1
fw_4: CPU 3
fw_5: CPU 0
[Expert@gw_a:0]# cpconfig
This program will let you re-configure
your Check Point products configuration

[Expert@gw_b:0]# fw ctl affinity -l
Interface eth0: CPU 0
Interface eth1: CPU 0
Interface eth2: CPU 0
Interface eth3: CPU 0
fw_0: CPU 5
fw_1: CPU 2
fw_2: CPU 4
fw_3: CPU 1
fw_4: CPU 3
fw_5: CPU 0
[Expert@gw_b:0]# cpconfig
This program will let you re-configure
your Check Point products configuration.

3. here is output of coreXL

[Expert@gw_a:0]# cpconfig
This program will let you re-configure
your Check Point products configuration.

Configuration Options:
----------------------
(1) Licenses and contracts
(2) SNMP Extension
(3) PKCS#11 Token
(4) Random Pool
(5) Secure Internal Communication
(6) Disable cluster membership for this gateway
(7) Enable Check Point Per Virtual System State
(8) Enable Check Point ClusterXL for Bridge Active/Standby
(9) Check Point CoreXL
(10) Automatic start of Check Point Products

(11) Exit

Enter your choice (1-11) :9

Configuring Check Point CoreXL...
=================================

CoreXL is currently enabled with 6 IPv4 firewall instances.

(1) Change the number of firewall instances
(2) Disable Check Point CoreXL

(3) Exit
Enter your choice (1-3) : 1

This machine has 6 CPUs.

Note: All cluster members must have the same number of firewall instances
enabled.

How many IPv4 firewall instances would you like to enable (2 to 6) [4] ? ^C

[Expert@gw_b:0]# cpconfig
This program will let you re-configure
your Check Point products configuration.

Configuration Options:
----------------------
(1) Licenses and contracts
(2) SNMP Extension
(3) PKCS#11 Token
(4) Random Pool
(5) Secure Internal Communication
(6) Disable cluster membership for this gateway
(7) Enable Check Point Per Virtual System State
(8) Enable Check Point ClusterXL for Bridge Active/Standby
(9) Check Point CoreXL
(10) Automatic start of Check Point Products

(11) Exit

Enter your choice (1-11) :9

Configuring Check Point CoreXL...
=================================

CoreXL is currently enabled with 6 IPv4 firewall instances.

(1) Change the number of firewall instances
(2) Disable Check Point CoreXL

(3) Exit
Enter your choice (1-3) : 1

This machine has 6 CPUs.

Note: All cluster members must have the same number of firewall instances
enabled.

How many IPv4 firewall instances would you like to enable (2 to 6) [4] ? ^C

I have no idea what fixed the issue...

I would like to really understand to the background what cause it and what fix it.

Anyway thank you for all your help!

_Val_ · ‎2022-11-25

On one of the members you had 3, not 4 CoreXl instances available. Now you have 4 cores, which is default for 6 cores in total, on both cluster members. This is what fixed your issue.

pepso100 · ‎2022-11-25

I knew that I have different number of CoreXL instances.

The thing is I setup same amount of CoreXL instances (via cpconfig to 6) and despite that "fw ctl affinity -l" showed me different number of CoreXL instances pre cluster member.

As you can see from output above I have 6 coreXL instances "CoreXL is currently enabled with 6 IPv4 firewall instances."

So after my change cpconfig, nothing change (yesterday).

Today If I boot up both FWs, number of CoreXL instances are same

_Val_ · ‎2022-11-25

Reboot is required after changing amount of CoreXL instances. Once you rebooted, it is all green.

pepso100 · ‎2022-11-25

1. yesterday I change via cpconfig number of CoreXL instances to 6.

2. then I rebooted both FWs

3. after loading (booting them up) I finally have outcome where I had same amount of CoreXL instances but

3-A) outcome of fw ctl affinity -l was different as you can see from output below

[Expert@gw_a:0]# fw ctl affinity -l
Interface eth0: CPU 0
Interface eth1: CPU 0
Interface eth2: CPU 0
Interface eth3: CPU 0
fw_0: CPU 5
fw_1: CPU 2
fw_2: CPU 2
fw_3: CPU 2
fw_4: CPU 2
fw_5: CPU 2
Daemon mpdaemon: CPU 2 5
Daemon fwd: CPU 2 5
Daemon scrub_cp_file_convertd: CPU 2 5
Daemon core_uploader: CPU 2 5
Daemon rtmd: CPU 2 5
Daemon rad: CPU 2 5
Daemon in.msd: CPU 2 5
Daemon pdpd: CPU 2 5
Daemon watermark_cp_file_convertd: CPU 2 5
Daemon lpd: CPU 2 5
Daemon scrubd: CPU 2 5
Daemon in.asessiond: CPU 2 5
Daemon scanengine_b: CPU 2 5
Daemon vpnd: CPU 2 5
Daemon cp_file_convertd: CPU 2 5
Daemon usrchkd: CPU 2 5
Daemon in.acapd: CPU 2 5
Daemon pepd: CPU 2 5
Daemon cprid: CPU 2 5
Daemon cpd: CPU 2 5

[Expert@gw_b:0]# fw ctl affinity -l
Interface eth0: CPU 0
Interface eth1: CPU 0
Interface eth2: CPU 0
Interface eth3: CPU 0
fw_0: CPU 5
fw_1: CPU 2
fw_2: CPU 4
fw_3: CPU 1
fw_4: CPU 3
fw_5: CPU 0

3-B) output of cphaprob state showed me that cluster is not in-sync

[Expert@gw_a:0]# cphaprob state

Cluster Mode: High Availability (Active Up) with IGMP Membership

ID Unique Address Assigned Load State Name

1 (local) 169.254.0.2 100% ACTIVE(!) gwA
2 169.254.0.1 0% DOWN gwB

Active PNOTEs: COREXL

Last member state change event:
Event Code: CLUS-113905
State change: STANDBY -> ACTIVE(!)
Reason for state change: Mismatch in the number of CoreXL FW instances has been detected
Event time: Thu Nov 24 15:06:52 2022

Last cluster failover event:
Transition to new ACTIVE: Member 2 -> Member 1
Reason: Mismatch in the number of CoreXL FW instances has been detected
Event time: Thu Nov 24 15:06:52 2022

Cluster failover count:
Failover counter: 1
Time of counter reset: Thu Nov 24 15:06:17 2022 (reboot)

Summary:

Yesterday: after changing number of CoreXL instances and reboots, change only nubmer of CoreXL instances (were same on both nodes) but cluster was still not in-sync

Today: when I booted up both FWs everything is fine. ( Also number of CoreXL instances and Cluster is in-sync)

Chris_Atkinson · ‎2022-11-27

If the problem persists please capture a CPinfo & hwdiag outputs and contact support to investigate further.

CCSM R77/R80/ELITE

maad-pul · ‎2024-06-24

How should this be done in HA-cluster? I have a HA-cluster with 2CPU, if I add 2CPU to the STANDBY-member and via cpconfig change the CoreXL instances to 3. I will still get information about "Mismatch in the number och CoreXL FW instances has been detected" and the cluster is Active(!)/Down? Can I do a cpstop on the ACTIVE(!) member and the one with "DOWN" will take over?

_Val_ · ‎2022-11-27

Just to make sure, setting to 6 cores in the system where only 6 cores present is wrong.

Are you a member of CheckMates?

CoreXL doesnt see one core