Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Maarten_Sjouw
Champion
Champion

ClusterXL and new interface

Last week we had some problems with a customer setup where we first ran into an issue someone did a get interface with topology and was able to ok the topology without a Sync interface set. 

The new interface was still down, as the switch ports were not enabled yet, although the new interface was completely setup correctly, the config change was not pushed to the gateway until the interfaces were enabled (it was a Bond with 2 members). When checking with "cphaprob -a if" the new interface just did not get added. Finally when the interfaces were enabled and another policy push (during this time we also reset the sync interface in the topology) finally the interfaces showed up.

So far I can understand that the sync problem could have caused the other issue.

However, next step was, due to time constraints the customer decided to do a rollback. Now the guy who controlled the switch ports just shut them down without telling us. Ok now we see the standby member change state to down. Ok then we remove the VIP from the bond interface and push policy, you would guess it will be removed from the interface list in "cphaprob -a if", but nope, it just remained there and it did not remove the VIP...

Only removing the bond interface could bring the state back to active/standby. We tried removing the member interfaces first, which did not change anything and tried to turn the state off on the bond, but you cannot, you can only delete it.

Regards, Maarten
11 Replies
Kaspars_Zibarts
Employee Employee
Employee

Very strange. I had some fun stuff with bonds today, although VSX, R80.10. Added two new members to existing bond and all seemed ok:

show bonding group showed all

cat /proc/net/bonding/bondx showed all 4 up

ifconfig showed counters increasing on all 4

Except

cphaconf show_bonding bondx

Still showed only 2 original members.

Box restart fixed it. And problem appeared the same way on both cluster gateways

Not too sure if it's related, but bond and cpha seems the common theme Smiley Happy

0 Kudos
Daniel_Taney
Advisor

Did you try cphastop / cphastart after changing the bonding group topology? I seem to remember having to do this to get ClusterXL to "acknowledge" that the number of members had changed. Although... I do suppose a full reboot would accomplish the same thing Smiley Happy 

R80 CCSA / CCSE
0 Kudos
Kaspars_Zibarts
Employee Employee
Employee

Nah, since it was 4am I wanted to save time Smiley Happy but youryo probably right. Clustering process restart would have done the trick. I just seemed to remember that similar excercise on other VSX cluster with the same HW worked just fine. Weird.

Daniel_Taney
Advisor

oh... well at 4am when in doubt just reboot it! Smiley Happy 

R80 CCSA / CCSE
0 Kudos
Kaspars_Zibarts
Employee Employee
Employee

Interestingly enough i run a test this morning - the only way to solve it was to reboot! cphastop/start nor cpstop/start didn't help. SR raised

0 Kudos
Ilya_Yusupov
Employee
Employee

Which version are you talking about?

when you are saying remove the VIP i'm guessing you mean moving the interface to Non-Monitor interface, right?

0 Kudos
Maarten_Sjouw
Champion
Champion

This is on a R80.10 build 423 (no JHF) gateway cluster and R80.10 JHF 103 Management setup. 

Indeed removing the VIP and changing the setting to private network in SmartConsole. 

Regards, Maarten
0 Kudos
Ilya_Yusupov
Employee
Employee

what do you mean removing VIP and changing to private? isn't changing it to private remove the VIP?

0 Kudos
Maarten_Sjouw
Champion
Champion

Is that part really relevant? I wanted to make sure the IP was removed from the interface and then changed the setting to Private.

In the process, after I found it was not removed from the "cphaprob -a if" output, I tried by removing the members from the bond interface, but still the interface remained in the cphaprob list.

Regards, Maarten
0 Kudos
Timothy_Hall
Legend Legend
Legend

Yep seen this "stuck interface" behavior before with ClusterXL, usually involving a failure to recognize the Sync interface after a change even though it has been properly configured on the SmartConsole cluster object and the Gaia OS.  See this old but relevant SK for the increasingly brutal steps to shake it loose, I usually end up going to at least Step 4:

sk39047: Output of 'cphaprob -a if' command shows 'Sync will not function since there aren't any syn...

These newer SKs describe similar "stuck" behavior with the Sync interface:

sk114113: Changing ClusterXL Sync Interface to an existing non-monitored interface fails

sk114212: Synchronization in cluster is broken after moving the "1st Sync" Network Objective to an i...

--

CheckMates Break Out Sessions Speaker

CPX 2019 Las Vegas & Vienna - Tuesday@13:30

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
0 Kudos
Maarten_Sjouw
Champion
Champion

Yeah Tim, while we were in the first stage of this change, where the sync was removed from the topology due to the get interfaces with topology, I found these SK's as well. The part I described here was after everything worked fine, sync was ok and the state was active/standby and all interfaces that should be there were there.

After that they decided to roll back and shut down the new interface, this is where the cluster went into active-attention/down state. and that is the starting point of this adventure. Only removing the Bond interfaces resolves the issue.

Regards, Maarten
0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events