We have similar issue but we are on VSX R80.20 T-141. problem is it occurred ever night same time and after 1 hour 10 minute it fail over back to 1st member. So it happened twice at night. Any idea or solution.
----1st cluster member----
[Expert@fwg-01:0]# cphaprob stat
Cluster Mode: VSX High Availability (Active Up) with IGMP Membership
ID Unique Address Assigned Load State Name
1 (local) 192.168.168.129 100% ACTIVE fwg-a
2 192.168.168.130 0% STANDBY fwg-b
Active PNOTEs: None
Last member state change event:
Event Code: CLUS-114704
State change: STANDBY -> ACTIVE
Reason for state change: No other ACTIVE members have been found in the cluster
Event time: Wed May 6 02:20:59 2020
Last cluster failover event:
Transition to new ACTIVE: Member 2 -> Member 1
Reason: VSX PNOTE
Event time: Wed May 6 02:20:59 2020
Cluster failover count:
Failover counter: 125
Time of counter reset: Thu Aug 29 16:31:55 2019 (reboot)
---2nd cluster member---
fwg-03:0> cphaprob stat
Cluster Mode: VSX High Availability (Active Up) with IGMP Membership
ID Unique Address Assigned Load State Name
1 192.168.168.129 100% ACTIVE fwg-a
2 (local) 192.168.168.130 0% STANDBY fwg-b
Active PNOTEs: None
Last member state change event:
Event Code: CLUS-114802
State change: DOWN -> STANDBY
Reason for state change: There is already an ACTIVE member in the cluster (member 1)
Event time: Wed May 6 02:21:15 2020Last cluster failover event:
Transition to new ACTIVE: Member 2 -> Member 1
Reason: VSX PNOTE
Event time: Wed May 6 02:20:59 2020
Cluster failover count:
Failover counter: 125
---Messages log from 1st unit---
May 10 01:11:21 2020 fwg-01 kernel: routed[29708]: segfault at 0000000000000000 rip 0000000008378d5b rsp 00000000ffeac0a0 error 4
May 11 01:11:23 2020 fwg-01 kernel: routed[17390]: segfault at 0000000000000000 rip 0000000008378d5b rsp 00000000ff8f4760 error 4
---Messages log from 2nd unit---
May 10 02:21:07 2020 fwg-03 kernel: routed[26954]: segfault at 0000000000000000 rip 0000000008378d5b rsp 00000000ffc16990 error 4
May 11 02:21:10 2020 fwg-sca-03 kernel: routed[3829]: segfault at 0000000000000000 rip 0000000008378d5b rsp 00000000ffbb7ff0 error 4
---log from fwk.elg---date is different from above logs but similar log every time---
[2 May 2:08:11][fw4_0];[vs_0];CLUS-120103-1: VSX PNOTE ON
[2 May 2:08:11][fw4_0];[vs_0];CLUS-111500-1: State change: ACTIVE -> DOWN | Reason: VSX PNOTE due to problem in Virtual System 2
[2 May 2:08:11][fw4_0];[vs_0];CLUS-214704-1: Remote member 2 (state STANDBY -> ACTIVE) | Reason: No other ACTIVE members have been found in the cluster
[2 May 2:08:11][fw4_0];[vs_0];CLUS-100102-1: Failover member 1 -> member 2 | Reason: VSX PNOTE
[2 May 2:08:12][fw4_0];[vs_0];CLUS-111500-1: State remains: DOWN | Reason: Previous problem resolved, VSX PNOTE due to problem in Virtual System 1
[2 May 2:08:26][fw4_0];[vs_0];CLUS-211505-1: Remote member 2 (state ACTIVE -> ACTIVE(!)) | Reason: VSX PNOTE
[2 May 2:08:29][fw4_0];[vs_0];CLUS-214904-1: Remote member 2 (state ACTIVE(!) -> ACTIVE) | Reason: Reason for ACTIVE! alert has been resolved
[2 May 2:08:35][fw4_0];[vs_0];CLUS-120103-1: VSX PNOTE OFF
[2 May 2:08:35][fw4_0];[vs_0];CLUS-114802-1: State change: DOWN -> STANDBY | Reason: There is already an ACTIVE member in the cluster (member 2)
[2 May 2:15:01][fw4_0];[vs_0];CLUS-211500-1: Remote member 2 (state ACTIVE -> DOWN) | Reason: VSX PNOTE
[2 May 2:15:01][fw4_0];[vs_0];CLUS-114704-1: State change: STANDBY -> ACTIVE | Reason: No other ACTIVE members have been found in the cluster
[2 May 2:15:01][fw4_0];[vs_0];CLUS-100201-1: Failover member 2 -> member 1 | Reason: Available on member 2
[2 May 2:15:16][fw4_0];[vs_0];CLUS-120103-1: VSX PNOTE ON
[2 May 2:15:16][fw4_0];[vs_0];CLUS-111505-1: State change: ACTIVE -> ACTIVE(!) | Reason: VSX PNOTE due to problem in Virtual System 1
[2 May 2:15:19][fw4_0];[vs_0];CLUS-120103-1: VSX PNOTE OFF
[2 May 2:15:19][fw4_0];[vs_0];CLUS-114904-1: State change: ACTIVE(!) -> ACTIVE | Reason: Reason for ACTIVE! alert has been resolved
[2 May 2:15:25][fw4_0];[vs_0];CLUS-214802-1: Remote member 2 (state DOWN -> STANDBY) | Reason: There is already an ACTIVE member in the cl
[6 May 1:11:09][fw4_0];[vs_0];CLUS-120103-1: VSX PNOTE ON
[6 May 1:11:09][fw4_0];[vs_0];CLUS-111500-1: State change: ACTIVE -> DOWN | Reason: VSX PNOTE due to problem in Virtual System 2
[6 May 1:11:09][fw4_0];[vs_0];CLUS-214704-1: Remote member 2 (state STANDBY -> ACTIVE) | Reason: No other ACTIVE members have been found i n the cluster
[6 May 1:11:10][fw4_0];[vs_0];CLUS-100102-1: Failover member 1 -> member 2 | Reason: VSX PNOTE
[6 May 1:11:10][fw4_0];[vs_0];CLUS-111500-1: State remains: DOWN | Reason: Previous problem resolved, VSX PNOTE due to problem in Virtual System 1
[6 May 1:11:22][fw4_0];[vs_0];CLUS-120103-1: VSX PNOTE OFF
[6 May 1:11:22][fw4_0];[vs_0];CLUS-114802-1: State change: DOWN -> STANDBY | Reason: There is already an ACTIVE member in the cluster (mem ber 2)
[6 May 2:20:59][fw4_0];[vs_0];CLUS-211500-1: Remote member 2 (state ACTIVE -> DOWN) | Reason: VSX PNOTE
[6 May 2:20:59][fw4_0];[vs_0];CLUS-114704-1: State change: STANDBY -> ACTIVE | Reason: No other ACTIVE members have been found in the clus ter
[6 May 2:20:59][fw4_0];[vs_0];CLUS-100201-1: Failover member 2 -> member 1 | Reason: Available on member 2
[6 May 2:21:15][fw4_0];[vs_0];CLUS-214802-1: Remote member 2 (state DOWN -> STANDBY) | Reason: There is already an ACTIVE member in the cl uster