- CheckMates
- :
- Products
- :
- Quantum
- :
- Security Gateways
- :
- Re: After a reboot firewall will not rejoin cluste...
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Are you a member of CheckMates?
×- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
After a reboot firewall will not rejoin cluster reporting DSD problem
Hi,
I rebooted a firewall in a HA cluster and it now wont rejoin the cluster. This is a new environment that has been working fine, the config was saved before the reboot
-FW-01:0]# cphaprob stat
Cluster Mode: High Availability (Active Up) with IGMP Membership
ID Unique Address Assigned Load State Name
1 (local) 169.254.1.1 0% DOWN xxxx1
2 169.254.1.2 100% ACTIVE xxxx2
Active PNOTEs: FSYNC, DSD
Last member state change event:
Event Code: CLUS-112000
State change: INIT -> DOWN
Reason for state change: USER DEFINED PNOTE
Event time: Tue Mar 25 14:47:09 2025
Cluster failover count:
Failover counter: 0
Time of counter reset: Tue Mar 25 14:45:17 2025 (reboot)
And when I do the below tail command it is reporting an issue with eth5:1
FW-01:0]# tail $FWDIR/log/dsd.elg
[dsd 24308 4119955328]@QH-1MER-FW-01[26 Mar 8:40:55] ds_verify_state_do: verify state
[dsd 24308 4119955328]@QH-1MER-FW-01[26 Mar 8:40:55] mq_mng_to_buf: Reading mq_mng state
get_irqs_from_mq_mng_buf: interface eth5:1 wasn't found in 'mq_mng -ov' output.
init_interface_structure: Failed to get irqs for interface eth5:1
[dsd 24308 4119955328]@QH-1MER-FW-01[26 Mar 8:41:01] recover_record_mq_mng: Recording mq_mng state
[dsd 24308]@QH-1MER-FW-01[26 Mar 8:41:01] Warning:cp_timed_blocker_handler: A handler [0x805ec20] blocked for 6 seconds.
[dsd 24308]@QH-1MER-FW-01[26 Mar 8:41:01] Warning:cp_timed_blocker_handler: Handler info: Library [dsd], Function offset [0x16c20].
[dsd 24308]@QH-1MER-FW-01[26 Mar 8:41:01] Warning:cp_timed_blocker_handler: Handler info: Nearest symbol name [ds_single_cycle], offset [0x16c20].
I have googled those error messages without much success.
FW01 eth5 config
set interface eth5 comments "xxxxxxx"
set interface eth5 state on
set interface eth5 auto-negotiation on
set interface eth5 mtu 1500
add interface eth5 alias x.x.x.133/28
set interface eth5 ipv4-address x.x.x.139 mask-length 28
FW02 eth5 config
set interface eth5 comments "xxxxxxx"
set interface eth5 state on
set interface eth5 auto-negotiation on
set interface eth5 mtu 1500
add interface eth5 alias x.x.x.133/28
set interface eth5 ipv4-address x.x.x.140 mask-length 28
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Alias IPs are not supported in clusters.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Until the "user defined pnote" is cleared, your cluster member will not join.
It looks like this is done with cphaconf set_pnote -a unregister
See: https://sc1.checkpoint.com/documents/R81.20/WebAdminGuides/EN/CP_R81.20_CLI_ReferenceGuide/Content/T...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks PhoneBoy for the response.
It looks like your command would unregister all devices which I probably dont want to do, I could unregister the devices that arent working
cphaconf set_pnote -d <Name of Critical Device> [-p] [-g] unregister
The two devices with the problem are Fullsync and DSD
FW-01> show cluster members pnotes problem
Registered Devices:
Device Name: Fullsync
Registration number: 0
Timeout: none
Current state: problem
Time since last report: 5.6 sec
Device Name: DSD
Registration number: 9
Timeout: none
Current state: problem
Time since last report: 11273.5 sec
So the commands would look like this I believe, and have to be put on both firewalls
cphaconf set_pnote -d Fullsync -p unregister
cphaconf set_pnote -d DSD -p unregister
My concern would be that this looks like I am telling the firewall to not look at that these devices and therefore it is not resolving the issue, its telling it to ignore the issue? Is that right? I can see that DSD is not included as a critical device on some of our other firewall clusters so could maybe unregister that, but Fullsync sounds quite important. Or is Fullsync state set to problem because of the DSD state and therefore removing DSD will resolve the Fullsync state?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Which version? R81.20 take ?
\m/_(>_<)_\m/
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It is r81.20 take 92
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
i believe it is, I am sure I check this yesterday. What is the command to see if its enabled? I cant find the document I took it from.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
you can check it in #cpview - > Sysinfo
\m/_(>_<)_\m/
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Akos,
Yes it is enabled on both gateways
| Platform Gaia 64Bit |
| Configuration Check Point Security Gateway |
| CoreXL Status On |
| CoreXL instances 18 |
| Dynamic Balancing Status On |
| SecureXL Status On |
| USFW Status On |
| UPPAK Status Off
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Maybe TAC should involve here
\m/_(>_<)_\m/
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Alias IPs are not supported in clusters.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This was, we believe, the reason. They are new r81.20 firewalls that we had built from old r80.40 firewalls. We had made the decision to copy the config wholly and then remove anything unwanted afterwards. Unfortunately it seems that the alias issue beat us to the moment we could delete them.
We raised it with TAC and they jumped on a call and we;
- We removed the alias from FW01 (the down firewall) via the GUI, then rebooted that firewall. After the reboot it still wouldnt join the cluster but that was expected.
- We then did a cpstop on the active firewall FW02. Although slightly stressful to watch it happen (these are production firewalls for a large organization) the down firewall did become the Active firewall and there was no downtime.
- We then removed the alias config from the FW02 and rebooted. This time the FW02 joined the cluster as the standby.
- We then pushed a policy to the cluster to confirm all was good.
The only slight issue we had was with the IPSec VPNs. I think this was because forcing the failover via cpstop means that some elements have to restart (the tech words are not coming to me now :-)). So we had to manually reset the VPNs.
