cancel
Showing results for 
Search instead for 
Did you mean: 
Create a Post

standby cluster member fails randomly

Hello CheckMates,

Here is the issue, I have faced several times with the issue that standby member has stopped to answer icmp, http, https and ssh requests. Only reboot of a member helps.

In var/log/messages there are only 2 lines wich correlates with the time of that failover

Jun 10 17:10:46 2019 cpfw-msk-2 kernel: [fw4_1];CLUS-220201-2: Starting CUL mode because CPU usage (81%) on the remote member 1 increased above the configured threshold (80%).
Jun 10 17:10:56 2019 cpfw-msk-2 kernel: [fw4_1];CLUS-120202-2: Stopping CUL mode after 10 sec (short CUL timeout), because no member reported CPU usage above the configured threshold (80%) during the last 10 sec.

 

I have read some articles related to that messages on the CheckMates, however, I wonder do these messages mean a failover? And what is the possible cause?

Meantime, on the both cluster members by means of monitoring blade I do not see any high peaks - the 1st screenshot is the active member, the second is the standby.

 

0 Kudos
6 Replies

Re: standby cluster member fails randomly

The message does not constitute failover. In fact, the opposite. CUL feature freezes CLX status in case of high CPU utilisation, to avoid a failover.

 

There is something going on with CPU, other than that, you need to look further.

0 Kudos

Re: standby cluster member fails randomly

Hi,

Are you getting these messages when you install policy?

Please filter type as control in smart log and check description if you are getting any hint during that time.

Capture.PNG

0 Kudos
Admin
Admin

Re: standby cluster member fails randomly

CUL == Cluster Under Load
Looking at cpview history around the error message times might give some insights.
0 Kudos

Re: standby cluster member fails randomly

Agree with the others, you need to identify why CPU load is so high on the standby; the CUL is just a symptom of your problem and not the cause.  cpview in history mode (-t) and the sar command can be helpful.  If you can identify in which "space" the excessive CPU is being consumed (us/sy/ni/si/hi) that will help guide where to look next.

Any dynamic routing being used on this gateway cluster?  There are a few known causes of high CPU on the standby when that feature is in use, see sk95966 and sk105863 for more details.

 

"IPS Immersion Training" Self-paced Video Class
Now Available at http://www.maxpowerfirewalls.com
0 Kudos

Re: standby cluster member fails randomly

Thank you colleagues!

Timothy I guess that limitations are not acceptable to ma case, because the version is R80.20.

Nevertheless, the weird thing is monitoring blade shows me no peaks at those moments.

Can the cpview history give me more information and how deep can I drill down in this history (a day, a week or more)?

 

 

Thanks in advance.

0 Kudos
Highlighted

Re: standby cluster member fails randomly

0 Kudos