Hi all,
This has now happened a few times in last 6 months. The Standby firewall doesnot receive the CCP packets and marks Sync interface as down. The cluster goes into a split brain scenario.
It resolves itself in less than a min. All BGP peers are re-established. Any idea why is this happening ?
Note: Sys_admin installed Threat Prevention policy right after this. There were spike detective alerts for temain right before this happened ( can be totally unrealted ).
Active firewall
Dec 11 01:53:31 2025 F1-2 spike_detective: spike info: type: cpu, cpu core: 42, top consumer: fwk0_dev_57, start time: 11/12/25 01:53:18, spike duration (sec): 12, initial cpu usage: 91, average cpu usage: 74, perf taken: 0
Dec 11 01:54:37 2025 F1-2 spike_detective: spike info: type: thread, thread id: 115061, thread name: temain, start time: 11/12/25 01:54:30, spike duration (sec): 6, initial cpu usage: 100, average cpu usage: 100, perf taken: 1
Dec 11 01:55:27 2025 F1-2 fwk: CLUS-210300-2: Remote member 1 (state STANDBY -> DOWN) | Reason: Interface is down (Cluster Control Protocol packets are not received)
Dec 11 01:55:27 2025 F1-2 fwk: CLUS-114402-2: State change: ACTIVE -> STANDBY | Reason: Member state has been changed after returning from ACTIVE/ACTIVE scenario (remote cluster member 1 has higher priority)
Dec 11 01:55:27 2025 F1-2 fwk: CLUS-210305-2: Remote member 1 (state DOWN -> ACTIVE(!)) | Reason: Interface is down (Cluster Control Protocol packets are not received)
Dec 11 01:55:27 2025 F1-2 fwk: CLUS-210300-2: Remote member 1 (state ACTIVE(!) -> DOWN) | Reason: Interface is down (Cluster Control Protocol packets are not received)
Dec 11 01:55:27 2025 F1-2 fwk: CLUS-114704-2: State change: STANDBY -> ACTIVE | Reason: No other ACTIVE members have been found in the cluster
Dec 11 01:55:27 2025 F1-2 fwk: CLUS-100102-2: Failover member 1 -> member 2 | Reason: Available on member 1
Dec 11 01:55:27 2025 F1-2 fwk: CLUS-214802-2: Remote member 1 (state DOWN -> STANDBY) | Reason: There is already an ACTIVE member in the cluster
Dec 11 01:55:27 2025 F1-2 fwk: CLUS-211700-2: Remote member 1 (state STANDBY -> DOWN) | Reason: ROUTED PNOTE
Dec 11 01:55:27 2025 F1-2 fwk: CLUS-100201-2: Failover member 2 -> member 1 | Reason: Member state has been changed after returning from ACTIVE/ACTIVE scenario (remote cluster member 1 has higher priority)
Dec 11 01:55:27 2025 F1-2 fwk: CLUS-120105-2: routed PNOTE ON
Dec 11 01:55:27 2025 F1-2 fwk: CLUS-111705-2: State change: ACTIVE -> ACTIVE(!) | Reason: ROUTED PNOTE
Dec 11 01:55:28 2025 F1-2 fwk: CLUS-120105-2: routed PNOTE OFF
Dec 11 01:55:28 2025 F1-2 fwk: CLUS-114904-2: State change: ACTIVE(!) -> ACTIVE | Reason: Reason for ACTIVE! alert has been resolved
Standby Firewall
Dec 11 01:55:24 2025 F1-1 fwk: CLUS-110300-1: State change: STANDBY -> DOWN | Reason: Interface Sync is down (Cluster Control Protocol packets are not received)
Dec 11 01:55:25 2025 F1-1 fwk: CLUS-216400-1: Remote member 2 (state ACTIVE -> LOST) | Reason: Timeout Control Protocol packet expired member declared as DEAD
Dec 11 01:55:25 2025 F1-1 fwk: CLUS-116505-1: State change: DOWN -> ACTIVE(!) | Reason: All other machines are dead (timeout), Interface Sync is down (Cluster Control Protocol packets are not received)
Dec 11 01:55:25 2025 F1-1 fwk: CLUS-100201-1: Failover member 2 -> member 1 | Reason: Available on member 2
Dec 11 01:55:27 2025 F1-1 fwk: CLUS-214802-1: Remote member 2 (state LOST -> STANDBY) | Reason: There is already an ACTIVE member in the cluster
Dec 11 01:55:27 2025 F1-1 fwk: CLUS-110305-1: State change: ACTIVE! -> DOWN | Reason: Interface Sync is down (Cluster Control Protocol packets are not received)
Dec 11 01:55:27 2025 F1-1 fwk: CLUS-214904-1: Remote member 2 (state STANDBY -> ACTIVE) | Reason: Reason for ACTIVE! alert has been resolved
Dec 11 01:55:27 2025 F1-1 fwk: CLUS-114802-1: State change: DOWN -> STANDBY | Reason: There is already an ACTIVE member in the cluster (member 2)
Dec 11 01:55:27 2025 F1-1 fwk: CLUS-120105-1: routed PNOTE ON
Dec 11 01:55:27 2025 F1-1 fwk: CLUS-111700-1: State change: STANDBY -> DOWN | Reason: ROUTED PNOTE
Dec 11 01:55:27 2025 F1-1 fwk: CLUS-100102-1: Failover member 1 -> member 2 | Reason: Interface Sync is down (Cluster Control Protocol packets are not received)
Dec 11 01:55:27 2025 F1-1 routed[168442]: [routed] ERROR: cpcl_recv: Failed to receive cluster message header, connection will need to be reestablished. errno = 104 (Connection reset by peer)
Dec 11 01:55:27 2025 F1-1 routed[168442]: [routed] ERROR: cpcl_recv: deleting peer task 0x8f1aee4 due to failure to read from the socket
Dec 11 01:56:02 2025 F1-1 fwk: CLUS-120105-1: routed PNOTE OFF
Dec 11 01:56:02 2025 F1-1 fwk: CLUS-114802-1: State change: DOWN -> STANDBY | Reason: There is already an ACTIVE member in the cluster (member 2)