Hello,
I'm playing with ElasticXL in R82 which is great but it looks like that I encountered a little "bug" or better to say "specific situation that probably was not taken into consideration by R&D".
I have 2 appliances and I want to check couple of scenarios for them regarding ElasticXL:
1) both in one Site (LoadBalancing) = works flawlessly
2) 1st in Site1, 2nd in Site2 (HighAvailability) = works flawlessly
And these two scenarios are quite obvious ... because of that there should be no issue ... and there is not, great !
However ... I wouldn't be me if I did not check 3rd scenario...
3) 1st in Site1, 2nd in Site2 .... but not using Sync interface (cooper) but using different interface (fibre) for sync.
Why such scenario ?
Let's consider that we have 2 DCs separated geographically.
In DC#1 I want to have (up to 3) appliances that will act as LB, and in DC#2 I want to have (up to 3) appliances that will act also as LB ... but will be part of the same cluster.
In this scenario DC#1 (Site 1) will be active, and DC#2 (Site 2) will be standby.
This is in my opinion really revolutionary change that we have thanks to ElasticXL - one cluster that is LB and HA at the same time !
This 3rd scenario will not work with regular Sync interface (cooper) (cooper = 100m) and because of that here is what I did:
a) ElasticXL cluster was created (1st appliance)
b) I added eth1-01 (fiber) into Sync bonding group, and changed configuration of this group so that this eth1-01 should be primary
c) I powered up appliance2 (factory configuration)
d) because of factory configuration there is no bonding group, etc ... so I created bonding group with eth1-01 and Sync interfaces
e) I addressed this "bond1" interface as 192.0.2.254 ... so that it could be discovered by exl_detectiond
f) after that I connected these two appliances via eth1-01 using fibre cable
g) appliance1 discovered appliance2 and marked it as available member with state "request_to_join"
So far ... so good ... but ...
h) after I executed in gclish "add cluster member ...... site 2" bad things started to happen
i) state changed to "joining_cluster" and that's it ... nothing more ... it stucked
After couple of minutes of waiting I begun troubleshooting and discovered this on appliance2:
You can see above what's the problem ...
Sync interface (cooper) was addressed as 192.0.2.15 ... not this bond1.
Because of that process can not finish because this cooper has no link.
Question:
Is it possible to use this 3rd scenario in case NOT using cooper as Sync ?
For sure it will work in case I will have all of the appliances in the same DC, connect them via Sync (cooper), create a cluster (both Sites), then add fiber interfaces to sync bonding group ... and only after that I can move those appliances that should be in Site 2 (Standby) to DC#2.
I think you see what's the problem here 🙂
Just wondering if it is possible to do it in case appliances already are separated ?
Best
m.