- Products
- Learn
- Local User Groups
- Partners
- More
Firewall Uptime, Reimagined
How AIOps Simplifies Operations and Prevents Outages
Introduction to Lakera:
Securing the AI Frontier!
Check Point Named Leader
2025 Gartner® Magic Quadrant™ for Hybrid Mesh Firewall
HTTPS Inspection
Help us to understand your needs better
CheckMates Go:
SharePoint CVEs and More!
Hello,
I have two 5600 Appliances runninh in HA on R80.30 , but bevavoiur was already on 80.10.
Everytime I do a change in PBR Settings (adding Table etc), add an interface, or add an ospf route distribution, the node on which I work gets degraded to down. If I do this on the primary node, a failover occurs.
May 7 10:08:22 2020 DETKDUSIPS09 kernel: [SIM4];sim_restore_ip_options: failed to properly restore IP options
May 7 10:08:22 2020 DETKDUSIPS09 kernel: [fw4_1];[xxxxxxxxx:46770 -> xxxxxxxxxxxx] [ERROR]: cmik_loader_fw_context_match_cb: match_cb for CMI APP 3 failed on context 56, executing context 366 and adding the app to apps in exception
May 7 10:08:23 2020 DETKDUSIPS09 routed[27001]: [routed] NOTICE: task_cmd_init(143): command subsystem initialized.
May 7 10:08:23 2020 DETKDUSIPS09 routed[27001]: [routed] NOTICE: Start routed[27001] version routed-12.30.2019-11:21:08 instance 0
May 7 10:08:23 2020 DETKDUSIPS09 routed[27001]: routed_syslog_on: tracing to "/var/log/routed_messages" started
May 7 10:08:23 2020 DETKDUSIPS09 kernel: Passive ARP hook already uninstalled!
May 7 10:08:23 2020 DETKDUSIPS09 kernel: [fw4_1];Global param: set int fwha_cbs_which_member_is_running_gated to '0'
May 7 10:08:23 2020 DETKDUSIPS09 kernel: [fw4_1];CLUS-120105-1: routed PNOTE ON
May 7 10:08:23 2020 DETKDUSIPS09 kernel: [fw4_1];CLUS-111700-1: State change: ACTIVE -> DOWN | Reason: ROUTED PNOTE
May 7 10:08:23 2020 DETKDUSIPS09 kernel: [fw4_1];CLUS-214704-1: Remote member 2 (state STANDBY -> ACTIVE) | Reason: No other ACTIVE members have been found in the cluster
May 7 10:08:23 2020 DETKDUSIPS09 routed[12380]: [routed] ERROR: recv(header) returns 0
May 7 10:08:24 2020 DETKDUSIPS09 kernel: [fw4_1];CLUS-100102-1: Failover member 1 -> member 2 | Reason: ROUTED PNOTE
May 7 10:08:25 2020 DETKDUSIPS09 xpand[10387]: admin localhost t -volatile:configurationChange
May 7 10:08:25 2020 DETKDUSIPS09 xpand[10387]: admin localhost t -volatile:configurationSave
May 7 10:08:29 2020 DETKDUSIPS09 kernel: [SIM4];sim_restore_ip_options: failed to properly restore IP options
May 7 10:08:31 2020 DETKDUSIPS09 kernel: [fw4_1];CLUS-120105-1: routed PNOTE OFF
May 7 10:08:31 2020 DETKDUSIPS09 kernel: [fw4_1];CLUS-114802-1: State change: DOWN -> STANDBY | Reason: There is already an ACTIVE member in the cluster (member 2)
May 7 10:08:38 2020 DETKDUSIPS09 kernel: [SIM4];sim_restore_ip_options: failed to properly restore IP options
Does anybody have an idea what is causing this?
Thanks
Frank
Hi @Gro_Tea
Maybe this sk will help you:
sk131352: Cluster member is down and routed pnote is in a problem state
or
sk62570: How to troubleshoot failovers in ClusterXL - Advanced Guide
Hi @Gro_Tea
Maybe this sk will help you:
sk131352: Cluster member is down and routed pnote is in a problem state
or
sk62570: How to troubleshoot failovers in ClusterXL - Advanced Guide
Hello Heiko,
thanks for reply, I checked the links...
The Problem is, that the Pnote is only a few seconds.
Pnote ON at 10:08:23 degrades the Member from Active to down. Then without any interaction the status resumes from down to standby (because the other node is now active) only 8 seconds later.
This happens while doing changes in routing contexts (create Interface, modify/add PBR,...). Very annoying when you need to do changes and almost every action causes a failover.
Regrads,
Frank
Have you seen this SK:
sk109051: Troubleshooting Dynamic Routing - Cluster XL - PNOTE issues
If the routed pnote has a Timeout of "None" in the output of cphaprob -l list, even the slightest blip in that process will cause a failure of that pnote and an instant failover. Perhaps you have a very large routed configuration and when it is changed the daemon goes "out to lunch" parsing the config for just long enough to trip the pnote? The 5600 is not the speediest box in the world and that may be part of the issue.
It might be interesting to try increasing the timeout for the routed pnote to give it a little more leeway. Would not recommend going beyond 2 or 3 seconds though.
Hi Tim,
thanks, that sounds interesting and I want to try it. For increasing the timeout do I have to unregister the device and register it with new timeout?
cphaprob -d routed [-p] unregister
cphaprob -d routed -t <timeout in sec> -s ok [-p] register
If I do a "cphaprob -d routed unregister" it answers me with the list of usage...
How do I increase the timeout?
Thanks
Frank
Actually check this SK first, as it is not immediately obvious how to modify the timeout for the routed pnote:
sk108069: "PNOTE Reporting" setting in Gaia OS causes frequent ClusterXL failovers
Hi,
the mentioned checkbox for PNOTE Reporting has gone with 77.30 and is disabled by default.
Seems not easy setting timeout for device routed...
Thanks
Frank
Hi, Frank!
Did you found any solution for this trouble with failover after creating interface, adding route, and so on?
Now i have same trouble, but can't found how to fix this.
Are you applying your changes first on the standby or active?
Hello! On Active first. Is this not right?
Please see sk57100 for an example process
https://support.checkpoint.com/results/sk/sk57100
Hello! Thank you for answer. I understand about adding / deleting interfaces, but we have failover when add / edit ip routes. If we add static route or redistribute static route to ospf on active node, after few second we get failover. You can see messages on active node (i added part of messages file).
For context are you seeing impact from the failover or you just observe that it happens...
Are you using graceful restart?
Are the ospf router-id aligned for both members?
Yes, all 4 nodes have same router-id in OSPF process.
About impact from failover - i can't say exactly, but anyway we need to understand, why we get failover after add/edit routes and how to fix this one.
I checked on my lab with virtual CP , and when i redistribute static route or direct connected interface to ospf , i don't have this messages and don't have failover (but in real CP cluster have this messages! And after this messages we get failover):
May 15 09:06:25 2025 cp-int-1 routed[31560]: [routed] NOTICE: task_cmd_init(145): command subsystem initialized.
May 15 09:06:25 2025 cp-int-1 routed[31560]: [routed] NOTICE: Start routed[31560] version routed-11.06.2024-17:55:19 instance 0
May 15 09:06:25 2025 cp-int-1 routed[31560]: [routed] NOTICE: mc_enabling_check_startup(131): Starting up with multicast routing enabled (see routed_messages for subsequent messages)
May 15 09:06:25 2025 cp-int-1 routed[31560]: routed_syslog_on: tracing to "/var/log/routed_messages" started
Somebody know, what mean this messages?
Hello!
I found solution for trouble with failover after every change of routing.
It turned out that the problem was that we had a static route to a subnet for NAT registered on the gateways and the cluster address of one of the interfaces was specified as the next-hop address. Then this static route was redistributed to the OSPF for neighboring routers. And the gateways in routed_messages complained that the next-hop address for these static routes belonged to the local interface, which was kind of wrong. As soon as I deleted these static routes, the failover error disappeared and now after changing the routing, the node activity does not change.
I have this error on stand, where i simulate same problem (you can see in the attachment)
In the logs you see that routed gets started. I guess that is because it was stopped before – maybe to reload configuration. But routed is a registered device to the cluster.
# cphaprob -l li
[…]
Registered Devices:
[…]
Device Name: routed
Registration number: 2
Timeout: none
Current state: OK
Time since last report: XXXX sec
If a registered device is not availabe the cluster fails over.
Thank you for answer!
I checked , and all members of cluster have registered devices: routed , and status OK.
Yes, the process routed gets started – as I wrote. Before that the process routed seems to be stopped. That is why the cluster fails over. After the (re)start of the routed everything is fine again – except that the cluster runs on the other node. The cluster fails over in the time between stop and start of the routed process, I guess.
Frank had the issue 5 years ago, so your question is kind of... I would suggest to create a post yourself !
Leaderboard
Epsum factorial non deposit quid pro quo hic escorol.
User | Count |
---|---|
12 | |
12 | |
10 | |
7 | |
7 | |
6 | |
5 | |
5 | |
5 | |
5 |
Tue 07 Oct 2025 @ 10:00 AM (CEST)
Cloud Architect Series: AI-Powered API Security with CloudGuard WAFThu 09 Oct 2025 @ 10:00 AM (CEST)
CheckMates Live BeLux: Discover How to Stop Data Leaks in GenAI Tools: Live Demo You Can’t Miss!Thu 09 Oct 2025 @ 10:00 AM (CEST)
CheckMates Live BeLux: Discover How to Stop Data Leaks in GenAI Tools: Live Demo You Can’t Miss!Wed 22 Oct 2025 @ 11:00 AM (EDT)
Firewall Uptime, Reimagined: How AIOps Simplifies Operations and Prevents OutagesAbout CheckMates
Learn Check Point
Advanced Learning
YOU DESERVE THE BEST SECURITY