Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Hongyu_Chen
Participant

Standby management server disconnected alert.

We have two management servers, mgmt-1 and mgmt-2, both running on Gaia R81.10. mgmt-1 is active and mgmt-2 is standby. mgmt-1 has been configured to send out email alert when it think mgmt-2 is disconnected.

Recently we are receiving much more mgmt-2 disconnected alerts than before, but when we login to mgmt-2 it seems to be ok.

I don’t know how mgmt-1 detect the status of mgmt-2, maybe it is using ping or some other methods. We are investigating the root cause of the alerts.

My question is, while we are investigating,  as a workaround is there a way to increase the probe interval from mgmt-1 to mgmt-2 to reduce the number of alerts as this is very annoying? But I don't want to just simply disable the function because mgmt-1 is also managing the security gateways. I just want mgmt-1 to send fewer mgmt-2 disconnected alerts.

Thank you!

0 Kudos
9 Replies
Chris_Atkinson
Employee Employee
Employee

Could you please share the JHF (jumbo) level of the machines, is it recent?

How's the latency between the machines and is there any packet loss?

What is the spec/load of the respective  management machines?

CCSM R77/R80/ELITE
0 Kudos
Hongyu_Chen
Participant

Hi Chris,

JHF is R810.10 Take 66.

We did see sometimes cpu is high.  please see below screenshot

mgmt-2.JPG

 

But no packet loss. 

mgmt-2-traffic.JPG

 

Our priority is to adjust the probing interval if that is adjustable, because the alert email is pouring into the mail box. 

 

0 Kudos
Chris_Atkinson
Employee Employee
Employee

The probing interval you'd need to check with TAC as I'm unsure. You can deffinately set the sync timeout otherwise per sk176165.

CCSM R77/R80/ELITE
0 Kudos
Hongyu_Chen
Participant

Thanks Chris. sk176165 seems apply to MDS while ours is not. 

0 Kudos
_Val_
Admin
Admin

@Hongyu_Chen It does not matter, it is also applicable to SMS

the_rock
Legend
Legend

Just as a test, though I dont have mgmt HA, I ran it on single SMS and worked fine, no issues. I was able to also push policy afterwards, so I would say its most likely safe to do in your case as well. I sadly dont have access to my R81.10 lab mgmt, as its being used for something else atm, but I did it on R81.20, though Im positive it makes no difference.

0 Kudos
the_rock
Legend
Legend

All valid points by Chris. Also, any relevant logs you can provide about this?

0 Kudos
Hongyu_Chen
Participant

What logs you wanted to see?

0 Kudos
the_rock
Legend
Legend

Never mind, my bad, I used the wrong term. You said there are some alerts, but I assumed there might be some logs related to the issue.

0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events