- CheckMates
- :
- Products
- :
- General Topics
- :
- Both cluster nodes stuck in INIT mode for 1 hour
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Are you a member of CheckMates?
×- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Both cluster nodes stuck in INIT mode for 1 hour
Hi
Last night I added proper licenses to a relatively freshly installed CheckPoint R81.10 (Take 110) cluster (running on VmWare)
I did the reboot to get them aligned in terms of licensed cores.
After the reboot, both nodes came up as INIT.
I understand that they are both waiting for each other to determine who should be ACTIVE and STANDBY.
All interfaces were reachable from each other.
Clock on WmWare host were correct.
Security settings on VmWare interfaces/VLAN were set correctly
I did several reboots of them
Tried pushing policy. - etc etc
More or less exactly 1 hour I left the nodes alone they came back, and formed the cluster successfully.
Is there a logical explanation, and how I can troubleshoot this?
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi
cphaprob stat showed both notes to be in INIT state.
This was shown on both nodes, and the other node was never shown (in the output of cphaprob stat) when I was checking.
As mentioned I did cpstop;cpstart numerous times.
When I did clusterXL_admin down/up it did throw an error message saying that it could not bringe cluster XL down (or up).
cphaprob -a if showed all interfaces to be there.
It was "green" on the SYNC interface + highest and lowest VLAN interface.
The cphaprob list command showed:
Device Name: Interface Active Check Current state: problem (non-blocking)
And I understand that this is the normal behaviour when the nodes are not sure about the state of the other (who should be ACTIVE and who should be STANDBY)
I suspected it to be "something" on VmWare so we moved both nodes to the same host, but to no avail.
We actually tried to move them to three different hosts.
Currently the cluster is functioning, and I don't have the liberty to experiment with it right now.
I will for sure run of your commands if I run into this problem again (I'm pretty sure I will)
Much appreciated.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sounds like one of the clustering processes has/had an issue. Is the problem still there? If I had that scenario, I would first try cphastop; cphastart to see if it helps.
Otherwise, I gather info from below commands to make sure it all matches
cphaprob roles
cphaprob state
cphaprob mvc
cphaprob syncstat
cphaprob -a if
cphaprob -i list
cphaprob -l list
K, my bad, obviously, some of them wont match, because one would be master, one backup, but at least ones for interfaces have to match, for sure.
Best,
Andy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi
cphaprob stat showed both notes to be in INIT state.
This was shown on both nodes, and the other node was never shown (in the output of cphaprob stat) when I was checking.
As mentioned I did cpstop;cpstart numerous times.
When I did clusterXL_admin down/up it did throw an error message saying that it could not bringe cluster XL down (or up).
cphaprob -a if showed all interfaces to be there.
It was "green" on the SYNC interface + highest and lowest VLAN interface.
The cphaprob list command showed:
Device Name: Interface Active Check Current state: problem (non-blocking)
And I understand that this is the normal behaviour when the nodes are not sure about the state of the other (who should be ACTIVE and who should be STANDBY)
I suspected it to be "something" on VmWare so we moved both nodes to the same host, but to no avail.
We actually tried to move them to three different hosts.
Currently the cluster is functioning, and I don't have the liberty to experiment with it right now.
I will for sure run of your commands if I run into this problem again (I'm pretty sure I will)
Much appreciated.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hey Kenny,
Yea, that also makes total sense to me. I had that issue in the lab once ages ago, probably before R77, and it turned out to be the problem on vmware, cant recall exactly what now.
Glad its sorted out mate.
Best,
Andy
