Hi all!
We had a full crash on both VSX gateways of a 2 node VSX cluster.
Versions (SMS R81.20 / VSX gateways R80.40)
We managed to restore the first node using vsx_util reconfigure getting a working cluster with a single working node.
Tried to restore the second one using the same method but just after the command vsx_util reconfigure command finished (so the gateway is set into VSX mode and received via push the configurations and virtual systems), many communications started to fail.
Checking the first node status with "cphaprob state" showed that 2 out of 4 virtual devices were in standby mode. So supposedly the 2nd node that was still in process of being restored (there were tasks still to be done: reboot, configure the license, configure local.arp, enable dynamic objects, install policies...) had 2 virtual devices in Active state.
Tried to "cphaprob state" and "clusterxl_admin down" to force failover but these commands did not show any output and nothing changed in the status of the virtual devices on the 1st node. Disconnecting interfaces on 2nd node didnt change anything either.
Shutting down this 2nd node made the first node be the active one for all virtual systems.
- why did the node become active for the virtual devices while still not fully restored?
- is there any way to avoid this behaviour?
- what would be the correct procedure?
Thanks all!