Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
marcyn
Collaborator
Collaborator

R82 member unable to join cluster

Hello,

I'm playing with ElasticXL in R82 which is great but it looks like that I encountered a little "bug" or better to say "specific situation that probably was not taken into consideration by R&D".

I have 2 appliances and I want to check couple of scenarios for them regarding ElasticXL:
1) both in one Site (LoadBalancing) = works flawlessly
2) 1st in Site1, 2nd in Site2 (HighAvailability) = works flawlessly
And these two scenarios are quite obvious ... because of that there should be no issue ... and there is not, great !

 

However ... I wouldn't be me if I did not check 3rd scenario...
3) 1st in Site1, 2nd in Site2 .... but not using Sync interface (cooper) but using different interface (fibre) for sync.

Why such scenario ?
Let's consider that we have 2 DCs separated geographically.
In DC#1 I want to have (up to 3) appliances that will act as LB, and in DC#2 I want to have (up to 3) appliances that will act also as LB ... but will be part of the same cluster.
In this scenario DC#1 (Site 1) will be active, and DC#2 (Site 2) will be standby.
This is in my opinion really revolutionary change that we have thanks to ElasticXL - one cluster that is LB and HA at the same time !

This 3rd scenario will not work with regular Sync interface (cooper) (cooper = 100m) and because of that here is what I did:
a) ElasticXL cluster was created (1st appliance)
b) I added eth1-01 (fiber) into Sync bonding group, and changed configuration of this group so that this eth1-01 should be primary
c) I powered up appliance2 (factory configuration)
d) because of factory configuration there is no bonding group, etc ... so I created bonding group with eth1-01 and Sync interfaces
e) I addressed this "bond1" interface as 192.0.2.254 ... so that it could be discovered by exl_detectiond
f) after that I connected these two appliances via eth1-01 using fibre cable
g) appliance1 discovered appliance2 and marked it as available member with state "request_to_join"

So far ... so good ... but ...

h) after I executed in gclish "add cluster member ...... site 2" bad things started to happen
i) state changed to "joining_cluster" and that's it ... nothing more ... it stucked

After couple of minutes of waiting I begun troubleshooting and discovered this on appliance2:

img1.png

 You can see above what's the problem ...
Sync interface (cooper) was addressed as 192.0.2.15 ... not this bond1.
Because of that process can not finish because this cooper has no link.

 

Question:
Is it possible to use this 3rd scenario in case NOT using cooper as Sync ?

For sure it will work in case I will have all of the appliances in the same DC, connect them via Sync (cooper), create a cluster (both Sites), then add fiber interfaces to sync bonding group ... and only after that I can move those appliances that should be in Site 2 (Standby) to DC#2.

I think you see what's the problem here 🙂

Just wondering if it is possible to do it in case appliances already are separated ?

Best
m.

0 Kudos
6 Replies
emmap
Employee
Employee

Typically we would expect the Sync interfaces to run through a switching layer, as you can't directly connect more than 2 cluster members together and EXL is designed for more than 2 cluster members. Hence the available onboard sync interfaces are fine as the inter-DC links are not directly connected to the appliances. 

marcyn
Collaborator
Collaborator

Yes, I agree that typically it will work as you mentioned (couple of appliances and switch - sync via cooper port).

In my 3rd scenario on-board Sync interface(s) will not be fine in all cases (lower models have only cooper sync interface, only higher have fiber).
In case appliances will be geographically separated (for example couple or even more kilometers) cooper will be not enough... and here fiber connection will be neccessary.
Hence in case models with cooper we need to add fibre interfaces to sync bonding group and use them as a sync between appliances.

I'm thinking about such a scenario:
DC#1 = 1-3 appliance (active)
DC#2 = 1-3 appliance (standby)
Here I can use direct fiber connection.... or use fiber switch as well.

I'm just thinking about using ElasticXL as LB+HA at the same time...
LB in each Site, HA between two Sites (two DCs).


BTW
In my lab I have 23500 model which has Sync as cooper interface.
I'm trying to "move it" to fiber.
To my surprise after I added fiber interface to sync bonding group and connected fiber cable, then disconnected Sync (cooper) ... it doesn't work ... I will play with this a little bit more.

0 Kudos
ShaiF
Employee
Employee

Hi Marcyn,

Have you fetched topology and install policy after adding the slave before disconnecting eth1-Sync?

Regards,

Shai.

0 Kudos
marcyn
Collaborator
Collaborator

Ah yes ... I could forget about that ... I made so many changes that it is possible (and probably this is the case).
I will try that and see if it helps - it should 🙂

But still my question remains unanswered - is this 3rd scenario possible without using on-board Sync interface (in my case cooper) ? Or on-board Sync interface is "neccessary" during "first-sync" ?
Of course in case higher/newer models with Sync interface as fiber ... there will be no issue with it.

0 Kudos
ShaiF
Employee
Employee

It will be possible once we will fix the flow. you will create the bond put the 192.0.2.254 on it and it will join without the need that Sync physical interface will be connected

 

0 Kudos
ShaiF
Employee
Employee

Hi Marcyn,
Indeed such scenario was not tested. We will push to fix it in GA.
Regards,

Shai.

0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events