Hey Jon,
Just wondering whether this issue ever reoccurred for you and if so whether you managed to resolve it?
I'm facing a very similar issue in that whenever I reboot a cluster member that member then comes up in INIT state and can stay like that for over an hour before it goes active:
Cluster Mode: High Availability (Primary Up) with IGMP Membership
ID Unique Address Assigned Load State Name
1 (local) 169.254.99.1 0% INIT fw-01
2 169.254.99.2 100% ACTIVE fw-02
The other member shows that connectivity is lost during this time:
Cluster Mode: High Availability (Primary Up) with IGMP Membership
ID Unique Address Assigned Load State Name
1 169.254.99.1 0% LOST fw-01
2 (local) 169.254.99.2 100% ACTIVE fw-02
And if I reboot them both, they'll just sit there for ages unable to see each other:
Cluster Mode: High Availability (Primary Up) with IGMP Membership
ID Unique Address Assigned Load State Name
1 (local) 169.254.99.1 0% INIT fw-01
Cluster Mode: High Availability (Primary Up) with IGMP Membership
ID Unique Address Assigned Load State Name
2 (local) 169.254.99.2 0% INIT fw-02
I've checked connectivity on all interfaces between the members and all seems fine.
The strangest thing is that once the cluster has formed, if I gracefully take the members offline "clusterXL_admin down/up" everything works absolutely perfectly. I've failed across like this many times with no issue, but if a host is rebooted or suffers power failure, we are back to the original issue.
We are also running on ESX (with Promicuous mode, MAC Address changes and Forged Transmits enabled on each connected port group)
Any advice you or anybody else could share would be greatly appreciated.