- Products
- Learn
- Local User Groups
- Partners
- More
MVP 2026: Submissions
Are Now Open!
What's New in R82.10?
Watch NowOverlap in Security Validation
Help us to understand your needs better
CheckMates Go:
Maestro Madness
Hi all,
This has now happened a few times in last 6 months. The Standby firewall doesnot receive the CCP packets and marks Sync interface as down. The cluster goes into a split brain scenario.
It resolves itself in less than a min. All BGP peers are re-established. Any idea why is this happening ?
Note: Sys_admin installed Threat Prevention policy right after this. There were spike detective alerts for temain right before this happened ( can be totally unrealted ).
Active firewall
Dec 11 01:53:31 2025 F1-2 spike_detective: spike info: type: cpu, cpu core: 42, top consumer: fwk0_dev_57, start time: 11/12/25 01:53:18, spike duration (sec): 12, initial cpu usage: 91, average cpu usage: 74, perf taken: 0
Dec 11 01:54:37 2025 F1-2 spike_detective: spike info: type: thread, thread id: 115061, thread name: temain, start time: 11/12/25 01:54:30, spike duration (sec): 6, initial cpu usage: 100, average cpu usage: 100, perf taken: 1
Dec 11 01:55:27 2025 F1-2 fwk: CLUS-210300-2: Remote member 1 (state STANDBY -> DOWN) | Reason: Interface is down (Cluster Control Protocol packets are not received)
Dec 11 01:55:27 2025 F1-2 fwk: CLUS-114402-2: State change: ACTIVE -> STANDBY | Reason: Member state has been changed after returning from ACTIVE/ACTIVE scenario (remote cluster member 1 has higher priority)
Dec 11 01:55:27 2025 F1-2 fwk: CLUS-210305-2: Remote member 1 (state DOWN -> ACTIVE(!)) | Reason: Interface is down (Cluster Control Protocol packets are not received)
Dec 11 01:55:27 2025 F1-2 fwk: CLUS-210300-2: Remote member 1 (state ACTIVE(!) -> DOWN) | Reason: Interface is down (Cluster Control Protocol packets are not received)
Dec 11 01:55:27 2025 F1-2 fwk: CLUS-114704-2: State change: STANDBY -> ACTIVE | Reason: No other ACTIVE members have been found in the cluster
Dec 11 01:55:27 2025 F1-2 fwk: CLUS-100102-2: Failover member 1 -> member 2 | Reason: Available on member 1
Dec 11 01:55:27 2025 F1-2 fwk: CLUS-214802-2: Remote member 1 (state DOWN -> STANDBY) | Reason: There is already an ACTIVE member in the cluster
Dec 11 01:55:27 2025 F1-2 fwk: CLUS-211700-2: Remote member 1 (state STANDBY -> DOWN) | Reason: ROUTED PNOTE
Dec 11 01:55:27 2025 F1-2 fwk: CLUS-100201-2: Failover member 2 -> member 1 | Reason: Member state has been changed after returning from ACTIVE/ACTIVE scenario (remote cluster member 1 has higher priority)
Dec 11 01:55:27 2025 F1-2 fwk: CLUS-120105-2: routed PNOTE ON
Dec 11 01:55:27 2025 F1-2 fwk: CLUS-111705-2: State change: ACTIVE -> ACTIVE(!) | Reason: ROUTED PNOTE
Dec 11 01:55:28 2025 F1-2 fwk: CLUS-120105-2: routed PNOTE OFF
Dec 11 01:55:28 2025 F1-2 fwk: CLUS-114904-2: State change: ACTIVE(!) -> ACTIVE | Reason: Reason for ACTIVE! alert has been resolved
Standby Firewall
Dec 11 01:55:24 2025 F1-1 fwk: CLUS-110300-1: State change: STANDBY -> DOWN | Reason: Interface Sync is down (Cluster Control Protocol packets are not received)
Dec 11 01:55:25 2025 F1-1 fwk: CLUS-216400-1: Remote member 2 (state ACTIVE -> LOST) | Reason: Timeout Control Protocol packet expired member declared as DEAD
Dec 11 01:55:25 2025 F1-1 fwk: CLUS-116505-1: State change: DOWN -> ACTIVE(!) | Reason: All other machines are dead (timeout), Interface Sync is down (Cluster Control Protocol packets are not received)
Dec 11 01:55:25 2025 F1-1 fwk: CLUS-100201-1: Failover member 2 -> member 1 | Reason: Available on member 2
Dec 11 01:55:27 2025 F1-1 fwk: CLUS-214802-1: Remote member 2 (state LOST -> STANDBY) | Reason: There is already an ACTIVE member in the cluster
Dec 11 01:55:27 2025 F1-1 fwk: CLUS-110305-1: State change: ACTIVE! -> DOWN | Reason: Interface Sync is down (Cluster Control Protocol packets are not received)
Dec 11 01:55:27 2025 F1-1 fwk: CLUS-214904-1: Remote member 2 (state STANDBY -> ACTIVE) | Reason: Reason for ACTIVE! alert has been resolved
Dec 11 01:55:27 2025 F1-1 fwk: CLUS-114802-1: State change: DOWN -> STANDBY | Reason: There is already an ACTIVE member in the cluster (member 2)
Dec 11 01:55:27 2025 F1-1 fwk: CLUS-120105-1: routed PNOTE ON
Dec 11 01:55:27 2025 F1-1 fwk: CLUS-111700-1: State change: STANDBY -> DOWN | Reason: ROUTED PNOTE
Dec 11 01:55:27 2025 F1-1 fwk: CLUS-100102-1: Failover member 1 -> member 2 | Reason: Interface Sync is down (Cluster Control Protocol packets are not received)
Dec 11 01:55:27 2025 F1-1 routed[168442]: [routed] ERROR: cpcl_recv: Failed to receive cluster message header, connection will need to be reestablished. errno = 104 (Connection reset by peer)
Dec 11 01:55:27 2025 F1-1 routed[168442]: [routed] ERROR: cpcl_recv: deleting peer task 0x8f1aee4 due to failure to read from the socket
Dec 11 01:56:02 2025 F1-1 fwk: CLUS-120105-1: routed PNOTE OFF
Dec 11 01:56:02 2025 F1-1 fwk: CLUS-114802-1: State change: DOWN -> STANDBY | Reason: There is already an ACTIVE member in the cluster (member 2)
I see this suspicious messages:
Dec 11 01:53:31 2025 F1-2 spike_detective: spike info: type: cpu, cpu core: 42, top consumer: fwk0_dev_57, start time: 11/12/25 01:53:18, spike duration (sec): 12, initial cpu usage: 91, average cpu usage: 74, perf taken: 0
Dec 11 01:54:37 2025 F1-2 spike_detective: spike info: type: thread, thread id: 115061, thread name: temain, start time: 11/12/25 01:54:30, spike duration (sec): 6, initial cpu usage: 100, average cpu usage: 100, perf taken: 1
question is why this device consumes so much cpu. I guess it's VSX and maybe to be analysed what exactly caused the spike and to consider an adjustment of the VS core assignment.
If I were you, would install recommended one, take 119, but either way, what @Vincent_Bacher said, makes total sense, at least to me.
right ! thanks
PRJ-62301, PMTR-115027 ClusterXL In ClusterXL High Availability (HA), in some scenarios, the Active cluster member stops sending Cluster Control Protocol (CCP) heartbeats, and the Standby member may misinterpret this as an Interface Active Check (IAC) failure.
I see this suspicious messages:
Dec 11 01:53:31 2025 F1-2 spike_detective: spike info: type: cpu, cpu core: 42, top consumer: fwk0_dev_57, start time: 11/12/25 01:53:18, spike duration (sec): 12, initial cpu usage: 91, average cpu usage: 74, perf taken: 0
Dec 11 01:54:37 2025 F1-2 spike_detective: spike info: type: thread, thread id: 115061, thread name: temain, start time: 11/12/25 01:54:30, spike duration (sec): 6, initial cpu usage: 100, average cpu usage: 100, perf taken: 1
question is why this device consumes so much cpu. I guess it's VSX and maybe to be analysed what exactly caused the spike and to consider an adjustment of the VS core assignment.
It is not a VSX. The CPU spikes are short lived.. mostly for TEMAIN threads. Its a 28600 box and not being over utilized. Will investigate the cpu issue anyways. thanks
ClusterXL typically has split brain prevention mechanisms so either is overwhelmed or their is some Layer-2 issue.
What is the topology of the sync interface? Is it a bond, are there intermediate switches etc.
not a bond interface.. and no switch in between. they are directly connected. The cable was replaced after we saw this issue earlier.
Which version / JHF are we working with?
its 81.20 take 113
If I were you, would install recommended one, take 119, but either way, what @Vincent_Bacher said, makes total sense, at least to me.
yup, thank you. Will check on the CPU usage and plan to install t119.
Good idea since T119 has some CXL fixes
right ! thanks
PRJ-62301, PMTR-115027 ClusterXL In ClusterXL High Availability (HA), in some scenarios, the Active cluster member stops sending Cluster Control Protocol (CCP) heartbeats, and the Standby member may misinterpret this as an Interface Active Check (IAC) failure.
You can also follow below for historical data:
or
cpview -t and then press t again
I feel it will improve the situation, for sure.
I see the point Vince is making. That could absolutely happen due to CPU spike.
Perhaps I didn't express myself clearly enough as a non-native English speaker, but thank you for the flowers.
You absolutely did, I got all you had to say. Dont worry, English is not my first language either lol
Leaderboard
Epsum factorial non deposit quid pro quo hic escorol.
| User | Count |
|---|---|
| 20 | |
| 19 | |
| 19 | |
| 8 | |
| 7 | |
| 3 | |
| 3 | |
| 3 | |
| 3 | |
| 3 |
Tue 16 Dec 2025 @ 05:00 PM (CET)
Under the Hood: CloudGuard Network Security for Oracle Cloud - Config and Autoscaling!Thu 18 Dec 2025 @ 10:00 AM (CET)
Cloud Architect Series - Building a Hybrid Mesh Security Strategy across cloudsTue 16 Dec 2025 @ 05:00 PM (CET)
Under the Hood: CloudGuard Network Security for Oracle Cloud - Config and Autoscaling!Thu 18 Dec 2025 @ 10:00 AM (CET)
Cloud Architect Series - Building a Hybrid Mesh Security Strategy across cloudsAbout CheckMates
Learn Check Point
Advanced Learning
YOU DESERVE THE BEST SECURITY