Who rated this post

cancel
Showing results for 
Search instead for 
Did you mean: 
Reevsie147
Contributor

Just in case anyone else happens to experience the issue I was having. I think I've finally worked it out so dropping this here in case it saves anyone else a big headache!

Symptoms were:

  • After a cluster member reboot, the cluster wouldn't converge for approximately an hour with the rebooted member sat in INIT state
  • Once the cluster finally did converge, operation was flawless and we could perform manual failovers with clusterXL_admin up/down without any issues.
  • If a member was ever rebooted (or suffered a power issues), we would be back to waiting for an hour until the cluster came back online.

I treble checked ESX portgroup settings and enabled mac-learning on DVS switches as well as verifying NTP was working correctly but still no luck, it would still take approximately an hour for the Cluster to come up.

Then I randomly noticed when looking at the monitoring for the cluster on the Gateways and Cluster tab that when a member had just been booted it stated "uptime" as a minus figure (around -3600 seconds) so I rechecked the date/time and timezones on the cluster members and they were all correct.

I then noticed that the time on my ESX host was a hour out....fixed it and rebooted the 2 Guest cluster members and everything began working perfectly. I'm still a bit stumped as to why this should make any difference as GAiA was reporting the time etc correctly and I thought the guest was abstracted from the host hypervisor, but it works!

TLDR: Save yourself a headache and make sure the time on your ESX hosts is correct if attempting to use ClusterXL on CloudGuard IaaS

(1)
Who rated this post