Hi,
I have couple of questions about VSNext.
We recently tried to deploy an ElasticXL Cluster with 2 19100 GW and using VSNext to replace 2 HA ClusterXL. 2 VS were created.
VGW External
VGW Internal
It is a simple architecture:
Internet < Bond10 > VS_EXT (19100 and MGMT's Default GW) < Bond11 > Core < Bond12 > VS_INT < Bond13 > Internal Networks
The activity and window was as many of you have exposed here in checkmates:
1. We configured everything on Member 1, interfaces, bonds, routes, dns, ntp, etc. on each VS as required (VS0, Only 1 VSW and it was the 500 for mgmt, VS EXT and VS INT). Pretty simple as VSNext promised. We had 2 firewalls in little time compared with traditional VSX.
2. We leave member 2 just clean installed.
Then we...
3. Cloned actual policy sets.
4. Connected M1 mgmt port.
5. Created the 3 objects (SMO/VS0, VS_EXT, VS_INT)
6. Replaced whatever we had to with new objects.
7. Installed policy, everything ok.
8. Connected only member 1 to the network like this
Internet < Bond10 > VS_EXT < Bond11 > Core > Normal ClusterXL
We wanted to test everything only for VS_EXT and only with SMO Member before clonning the 2nd one. Basically we disconnected every fiber from older External ClusterXL members and connected it to Bond10 and 11 on ElasticXL Member 1. AFAIK, traffic only needed to cross bond10, then fw, then bond 11. The other VS was irrelevant in this window.
Everything went from bad to worse. Traffic passed the FW but latency was huge, lot of intermittence, some traffic working, some traffic dont. Big differenced compared with normal ClusterXL HA. Resources were OK. 10% maximum CPU Usage and 8GB RAM out of 64GB used, 95% traffic being accelerated, we had to rollback.
I have some questions about this:
1. Having the other interfaces disconnected could drive into this problem? I dont think so but who knows.
2. Not having the second member joined and cloned could cause traffic issues? I've been thinking about this one a lot. Every admin guide, video and forum says, install policy, clone, test. Can you confirm if this could lead to this kind of problem?
3. Admin guide said we need 4 interfaces, MGMT (Connected), Sync (Disconnected), External (Connected, topology external), Internal (Connected, topology based on routes). That Not connected Sync can cause something like this?
4. Do you think this step by step its ok?
For us, the activity almost should have simply been, disconnect older platform, connect ElasticXL and thats it. Now I think we are missing something. I'll appreciate every comment. Thanks in advanced.