Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Geomix7
Collaborator

Cluster XL - Failover due to Fullsync PNOTE ON

Hello All,

We faced an unexpected failover due to Fullsync PNOTE ON  error CLUS-120108.

According to SK125152 

CLUS- 120108

Fullsync PNOTE <ON | OFF>

ON - problem

 

After the failover, I had verified that sync communication is ok and this member is in standby mode in the cluster.

In addition see below syncstat statistics.

syncstatsyncstat

Does anyone face the same issue? Do you know what trigger this behavior?

Thanks

0 Kudos
8 Replies
_Val_
Admin
Admin

Full sync only happens when the second member is coming from boot/initialization, and before it becomes the fully operational cluster member (usually in standby mode). Check the uptime, it seems to me one of your boxes rebooted itself.

Also, fullsync PNOTE should not cause a failover. Please post the logs you got and full message

0 Kudos
Geomix7
Collaborator

Hello Val ,

 

Uptime is 160 days for all members of the cluster. The failover occurs on Sun Mar 14 18:44:26 2021.

 

Please find attached cpwd_admin list & messages.

Thanks

 

0 Kudos
_Val_
Admin
Admin

You did something with SNMP settings, which then called for cpstop/cpstart, which in it turn, caused Active member to go down.

It says it right there in your messages:

Mar 14 18:44:24 2021 HQ pm[16877]: Disabled snmpd
Mar 14 18:44:24 2021 HQ xpand[16895]: Configuration changed from localhost by user admin by the service /usr/sbin/snmpd
Mar 14 18:44:24 2021 HQ snmpd: Destroying the lists of sensors
Mar 14 18:44:24 2021 HQ pm[16877]: Reaped:  snmpd[8268]
Mar 14 18:44:26 2021 HQ kernel: [fw4_1];CLUS-120108-2: Fullsync PNOTE ON
Mar 14 18:44:26 2021 HQ kernel: [fw4_1];CLUS-120130-2: cpstop
Mar 14 18:44:26 2021 HQ kernel: [fw4_1];CLUS-113500-2: State change: ACTIVE -> DOWN | Reason: FULLSYNC PNOTE - cpstop
0 Kudos
Geomix7
Collaborator

We did not change something manually on the SNMP configuration. I already had open a case with support and I will update the post accordingly.

Thanks

0 Kudos
_Val_
Admin
Admin

Sure, please keep us posted.

This line, however suggests something has been done: 

Mar 14 18:44:24 2021 HQ xpand[16895]: Configuration changed from localhost by user admin
0 Kudos
Geomix7
Collaborator

Hello all ,

The support cannot find the root cause of the issue.

The only suggestion that provided since the issue occurs once is to update to the latest jumbo take (we are on 78) because resolves many performance and stability issues.

 

Thanks

 

 

0 Kudos
the_rock
Authority
Authority

Val is definitely correct...based on that message in the logs you sent, seems that someone manually changed something in the config. Maybe try below command...cd /var/log and then run grep -i PNOTE messages.*

 

Andy

0 Kudos
Geomix7
Collaborator

Mar 14 18:44:26 2021 HQ kernel: [fw4_1];CLUS-120108-2: Fullsync PNOTE ON
Mar 14 18:44:26 2021 HQ kernel: [fw4_1];CLUS-113500-2: State change: ACTIVE -> DOWN | Reason: FULLSYNC PNOTE - cpstop
Mar 14 18:44:28 2021 HQ kernel: [fw4_0];fwhak_drv_report_process_state: no running process, reporting pnote fwd
Mar 14 18:44:30 2021 HQ kernel: [fw4_1];CLUS-120105-2: routed PNOTE ON
Mar 14 18:44:31 2021 HQ kernel: [fw4_0];fwhak_drv_report_process_state: no running process, reporting pnote cphad
Mar 14 18:45:34 2021 HQ kernel: [fw4_1];CLUS-113601-2: State remains: INIT | Reason: FULLSYNC PNOTE - cpstart
Mar 14 18:45:36 2021 HQ kernel: [fw4_1];CLUS-100201-2: Failover member 2 -> member 1 | Reason: FULLSYNC PNOTE - cpstop
Mar 14 18:46:22 2021 HQ kernel: [fw4_1];CLUS-120207-2: LPRB PNOTE : local probing has started on interface bond1.399
Mar 14 18:46:51 2021 HQ kernel: [fw4_1];CLUS-120207-2: LPRB PNOTE : local probing has started on interface bond2
Mar 14 18:46:52 2021 HQ kernel: [fw4_1];CLUS-120207-2: LPRB PNOTE : local probing had stopped on interface bond2

0 Kudos