Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Collaborator

regarding firewall failover instances

So we just had a failover, i do believe that normally for failovers in checkpoint it isnt much of an issue, but i just want to clarify here, do i need to do further investigation or do not need to worry at all, below is the output of chphaprob stat-

 

[Expert@VWC-FW08:0]# cphaprob stat

 

Cluster Mode:   High Availability (Active Up) with IGMP Membership

 

ID         Unique Address  Assigned Load   State          Name                                                                                                                                

 

1 (local)  172.16.1.1      0%              STANDBY        xxx-FW0x

2          172.16.1.2      100%            ACTIVE         xxx-FW0x

 

Active PNOTEs: None

 

Last member state change event:

   Event Code:                 CLUS-114802

   State change:               DOWN -> STANDBY

   Reason for state change:    There is already an ACTIVE member in the cluster                                                                                                               (member 2)

   Event time:                 Wed Apr 29 01:52:30 2020

 

Last cluster failover event:

   Transition to new ACTIVE:   Member 1 -> Member 2

   Reason:                     Interface eth6.906 is down (Cluster Control Proto                                                                                                              col packets are not received)

   Event time:                 Wed Apr 29 02:01:21 2020

 

Cluster failover count:

   Failover counter:           2

   Time of counter reset:      Wed Jan 22 22:13:46 2020 (reboot)

So this "Cluster Control protocol packets are not received" issue, is this just something insignificant that i should not worry about and something that can happen sometimes? also ive already made that firewall active and the other one as standby and dont see any issues wit both firewalls, no issue with pnotes either.

Thanks and Regards.

0 Kudos
Reply
5 Replies
Admin
Admin

CCP is probing all monitored interfaces 3 times every second. Losing connectivity on one of the networks causes a failover, as in you case. It seems that your old "Active" member lost a VLAN for some time while Standby did not. As a single event, it might not be significant, but it worth investigating why this happened anyway. 

Most probably, someone was working with the network equipment, pulled a cable, changed a switch config, etc. 

0 Kudos
Reply
Collaborator

Thank you
0 Kudos
Reply
Contributor

We have similar issue but we are on VSX R80.20 T-141. problem is it occurred ever night same time and after 1 hour 10 minute it fail over back to 1st member. So it happened twice at night. Any idea or solution.

----1st cluster member----
[Expert@fwg-01:0]# cphaprob stat

Cluster Mode: VSX High Availability (Active Up) with IGMP Membership

ID Unique Address Assigned Load State Name

1 (local) 192.168.168.129 100% ACTIVE fwg-a
2 192.168.168.130 0% STANDBY fwg-b


Active PNOTEs: None

Last member state change event:
Event Code: CLUS-114704
State change: STANDBY -> ACTIVE
Reason for state change: No other ACTIVE members have been found in the cluster
Event time: Wed May 6 02:20:59 2020

Last cluster failover event:
Transition to new ACTIVE: Member 2 -> Member 1
Reason: VSX PNOTE
Event time: Wed May 6 02:20:59 2020

Cluster failover count:
Failover counter: 125
Time of counter reset: Thu Aug 29 16:31:55 2019 (reboot)

 

---2nd cluster member---
fwg-03:0> cphaprob stat

Cluster Mode: VSX High Availability (Active Up) with IGMP Membership

ID Unique Address Assigned Load State Name

1 192.168.168.129 100% ACTIVE fwg-a
2 (local) 192.168.168.130 0% STANDBY fwg-b


Active PNOTEs: None

Last member state change event:
Event Code: CLUS-114802
State change: DOWN -> STANDBY
Reason for state change: There is already an ACTIVE member in the cluster (member 1)
Event time: Wed May 6 02:21:15 2020Last cluster failover event:
Transition to new ACTIVE: Member 2 -> Member 1
Reason: VSX PNOTE
Event time: Wed May 6 02:20:59 2020

Cluster failover count:
Failover counter: 125

 

---Messages log from 1st unit---

May 10 01:11:21 2020 fwg-01 kernel: routed[29708]: segfault at 0000000000000000 rip 0000000008378d5b rsp 00000000ffeac0a0 error 4

May 11 01:11:23 2020 fwg-01 kernel: routed[17390]: segfault at 0000000000000000 rip 0000000008378d5b rsp 00000000ff8f4760 error 4

---Messages log from 2nd unit---

May 10 02:21:07 2020 fwg-03 kernel: routed[26954]: segfault at 0000000000000000 rip 0000000008378d5b rsp 00000000ffc16990 error 4

May 11 02:21:10 2020 fwg-sca-03 kernel: routed[3829]: segfault at 0000000000000000 rip 0000000008378d5b rsp 00000000ffbb7ff0 error 4

 

---log from fwk.elg---date is different from above logs but similar log every time---

[2 May 2:08:11][fw4_0];[vs_0];CLUS-120103-1: VSX PNOTE ON
[2 May 2:08:11][fw4_0];[vs_0];CLUS-111500-1: State change: ACTIVE -> DOWN | Reason: VSX PNOTE due to problem in Virtual System 2
[2 May 2:08:11][fw4_0];[vs_0];CLUS-214704-1: Remote member 2 (state STANDBY -> ACTIVE) | Reason: No other ACTIVE members have been found in the cluster
[2 May 2:08:11][fw4_0];[vs_0];CLUS-100102-1: Failover member 1 -> member 2 | Reason: VSX PNOTE
[2 May 2:08:12][fw4_0];[vs_0];CLUS-111500-1: State remains: DOWN | Reason: Previous problem resolved, VSX PNOTE due to problem in Virtual System 1
[2 May 2:08:26][fw4_0];[vs_0];CLUS-211505-1: Remote member 2 (state ACTIVE -> ACTIVE(!)) | Reason: VSX PNOTE
[2 May 2:08:29][fw4_0];[vs_0];CLUS-214904-1: Remote member 2 (state ACTIVE(!) -> ACTIVE) | Reason: Reason for ACTIVE! alert has been resolved
[2 May 2:08:35][fw4_0];[vs_0];CLUS-120103-1: VSX PNOTE OFF
[2 May 2:08:35][fw4_0];[vs_0];CLUS-114802-1: State change: DOWN -> STANDBY | Reason: There is already an ACTIVE member in the cluster (member 2)
[2 May 2:15:01][fw4_0];[vs_0];CLUS-211500-1: Remote member 2 (state ACTIVE -> DOWN) | Reason: VSX PNOTE
[2 May 2:15:01][fw4_0];[vs_0];CLUS-114704-1: State change: STANDBY -> ACTIVE | Reason: No other ACTIVE members have been found in the cluster
[2 May 2:15:01][fw4_0];[vs_0];CLUS-100201-1: Failover member 2 -> member 1 | Reason: Available on member 2
[2 May 2:15:16][fw4_0];[vs_0];CLUS-120103-1: VSX PNOTE ON
[2 May 2:15:16][fw4_0];[vs_0];CLUS-111505-1: State change: ACTIVE -> ACTIVE(!) | Reason: VSX PNOTE due to problem in Virtual System 1
[2 May 2:15:19][fw4_0];[vs_0];CLUS-120103-1: VSX PNOTE OFF
[2 May 2:15:19][fw4_0];[vs_0];CLUS-114904-1: State change: ACTIVE(!) -> ACTIVE | Reason: Reason for ACTIVE! alert has been resolved
[2 May 2:15:25][fw4_0];[vs_0];CLUS-214802-1: Remote member 2 (state DOWN -> STANDBY) | Reason: There is already an ACTIVE member in the cl

[6 May 1:11:09][fw4_0];[vs_0];CLUS-120103-1: VSX PNOTE ON
[6 May 1:11:09][fw4_0];[vs_0];CLUS-111500-1: State change: ACTIVE -> DOWN | Reason: VSX PNOTE due to problem in Virtual System 2
[6 May 1:11:09][fw4_0];[vs_0];CLUS-214704-1: Remote member 2 (state STANDBY -> ACTIVE) | Reason: No other ACTIVE members have been found i n the cluster
[6 May 1:11:10][fw4_0];[vs_0];CLUS-100102-1: Failover member 1 -> member 2 | Reason: VSX PNOTE
[6 May 1:11:10][fw4_0];[vs_0];CLUS-111500-1: State remains: DOWN | Reason: Previous problem resolved, VSX PNOTE due to problem in Virtual System 1
[6 May 1:11:22][fw4_0];[vs_0];CLUS-120103-1: VSX PNOTE OFF
[6 May 1:11:22][fw4_0];[vs_0];CLUS-114802-1: State change: DOWN -> STANDBY | Reason: There is already an ACTIVE member in the cluster (mem ber 2)
[6 May 2:20:59][fw4_0];[vs_0];CLUS-211500-1: Remote member 2 (state ACTIVE -> DOWN) | Reason: VSX PNOTE
[6 May 2:20:59][fw4_0];[vs_0];CLUS-114704-1: State change: STANDBY -> ACTIVE | Reason: No other ACTIVE members have been found in the clus ter
[6 May 2:20:59][fw4_0];[vs_0];CLUS-100201-1: Failover member 2 -> member 1 | Reason: Available on member 2
[6 May 2:21:15][fw4_0];[vs_0];CLUS-214802-1: Remote member 2 (state DOWN -> STANDBY) | Reason: There is already an ACTIVE member in the cl uster

 

0 Kudos
Reply
Admin
Admin

Looks like routed is crashing, which might cause a failover.
Recommend a TAC case for further Troubleshooting.
0 Kudos
Reply
Contributor

Yes, routed is crashing. We have opened case with CP and provided all the logs files, cpinfo, coredumps etc and waiting for their response. Thanks for the quick response.

0 Kudos
Reply