Solved: Re: After a reboot firewall will not rejoin cluste...

P_Williams

Hi,

I rebooted a firewall in a HA cluster and it now wont rejoin the cluster. This is a new environment that has been working fine, the config was saved before the reboot

-FW-01:0]# cphaprob stat

Cluster Mode: High Availability (Active Up) with IGMP Membership

ID Unique Address Assigned Load State Name

1 (local) 169.254.1.1 0% DOWN xxxx1
2 169.254.1.2 100% ACTIVE xxxx2

Active PNOTEs: FSYNC, DSD

Last member state change event:
Event Code: CLUS-112000
State change: INIT -> DOWN
Reason for state change: USER DEFINED PNOTE
Event time: Tue Mar 25 14:47:09 2025

Cluster failover count:
Failover counter: 0
Time of counter reset: Tue Mar 25 14:45:17 2025 (reboot)

And when I do the below tail command it is reporting an issue with eth5:1

FW-01:0]# tail $FWDIR/log/dsd.elg
[dsd 24308 4119955328]@QH-1MER-FW-01[26 Mar 8:40:55] ds_verify_state_do: verify state
[dsd 24308 4119955328]@QH-1MER-FW-01[26 Mar 8:40:55] mq_mng_to_buf: Reading mq_mng state
get_irqs_from_mq_mng_buf: interface eth5:1 wasn't found in 'mq_mng -ov' output.

init_interface_structure: Failed to get irqs for interface eth5:1

[dsd 24308 4119955328]@QH-1MER-FW-01[26 Mar 8:41:01] recover_record_mq_mng: Recording mq_mng state
[dsd 24308]@QH-1MER-FW-01[26 Mar 8:41:01] Warning:cp_timed_blocker_handler: A handler [0x805ec20] blocked for 6 seconds.
[dsd 24308]@QH-1MER-FW-01[26 Mar 8:41:01] Warning:cp_timed_blocker_handler: Handler info: Library [dsd], Function offset [0x16c20].
[dsd 24308]@QH-1MER-FW-01[26 Mar 8:41:01] Warning:cp_timed_blocker_handler: Handler info: Nearest symbol name [ds_single_cycle], offset [0x16c20].

I have googled those error messages without much success.

FW01 eth5 config

set interface eth5 comments "xxxxxxx"
set interface eth5 state on
set interface eth5 auto-negotiation on
set interface eth5 mtu 1500
add interface eth5 alias x.x.x.133/28
set interface eth5 ipv4-address x.x.x.139 mask-length 28

FW02 eth5 config

set interface eth5 comments "xxxxxxx"
set interface eth5 state on
set interface eth5 auto-negotiation on
set interface eth5 mtu 1500
add interface eth5 alias x.x.x.133/28
set interface eth5 ipv4-address x.x.x.140 mask-length 28

emmap

Alias IPs are not supported in clusters.

https://support.checkpoint.com/results/sk/sk89980

View solution in original post

PhoneBoy

Until the "user defined pnote" is cleared, your cluster member will not join.
It looks like this is done with cphaconf set_pnote -a unregister
See: https://sc1.checkpoint.com/documents/R81.20/WebAdminGuides/EN/CP_R81.20_CLI_ReferenceGuide/Content/T...

P_Williams

Thanks PhoneBoy for the response.

It looks like your command would unregister all devices which I probably dont want to do, I could unregister the devices that arent working

cphaconf set_pnote -d <Name of Critical Device> [-p] [-g] unregister

The two devices with the problem are Fullsync and DSD

FW-01> show cluster members pnotes problem

Registered Devices:

Device Name: Fullsync
Registration number: 0
Timeout: none
Current state: problem
Time since last report: 5.6 sec

Device Name: DSD
Registration number: 9
Timeout: none
Current state: problem
Time since last report: 11273.5 sec

So the commands would look like this I believe, and have to be put on both firewalls

cphaconf set_pnote -d Fullsync -p unregister

cphaconf set_pnote -d DSD -p unregister

My concern would be that this looks like I am telling the firewall to not look at that these devices and therefore it is not resolving the issue, its telling it to ignore the issue? Is that right? I can see that DSD is not included as a critical device on some of our other firewall clusters so could maybe unregister that, but Fullsync sounds quite important. Or is Fullsync state set to problem because of the DSD state and therefore removing DSD will resolve the Fullsync state?

AkosBakos

Which version? R81.20 take ?

----------------
\m/_(>_<)_\m/

P_Williams

It is r81.20 take 92

AkosBakos

HI @P_Williams

Dynamic balancig is enabled on both members?

Akos

----------------
\m/_(>_<)_\m/

P_Williams

i believe it is, I am sure I check this yesterday. What is the command to see if its enabled? I cant find the document I took it from.

AkosBakos

you can check it in #cpview - > Sysinfo

----------------
\m/_(>_<)_\m/

P_Williams

Hi Akos,

Yes it is enabled on both gateways

AkosBakos

Maybe TAC should involve here

----------------
\m/_(>_<)_\m/

emmap

Alias IPs are not supported in clusters.

https://support.checkpoint.com/results/sk/sk89980

P_Williams

This was, we believe, the reason. They are new r81.20 firewalls that we had built from old r80.40 firewalls. We had made the decision to copy the config wholly and then remove anything unwanted afterwards. Unfortunately it seems that the alias issue beat us to the moment we could delete them.

We raised it with TAC and they jumped on a call and we;

We removed the alias from FW01 (the down firewall) via the GUI, then rebooted that firewall. After the reboot it still wouldnt join the cluster but that was expected.
We then did a cpstop on the active firewall FW02. Although slightly stressful to watch it happen (these are production firewalls for a large organization) the down firewall did become the Active firewall and there was no downtime.
We then removed the alias config from the FW02 and rebooted. This time the FW02 joined the cluster as the standby.
We then pushed a policy to the cluster to confirm all was good.

The only slight issue we had was with the IPSec VPNs. I think this was because forcing the failover via cpstop means that some elements have to restart (the tech words are not coming to me now :-)). So we had to manually reset the VPNs.

Are you a member of CheckMates?

After a reboot firewall will not rejoin cluster reporting DSD problem