Moving VS on VSX with 3 members don't works so goo...

Marco32 · ‎2024-03-29

Hi there,

I'm testing on my lab a VSX cluster with 2 members that works good. I added a 3rd members an before any vsls movement every seems good

This is the same situation on all the memeber:

After te moving of VS, I tried with different option:

2. Distribute all Virtual Systems so that each cluster member is equally loaded
3. Set all VSes active on one member
4. Manually set priority and weight

The situation is not so good. Some member is in lost, some other in Standby or Down. Now I have:

Member1:

Member2:

1 | 10 | ACTIVE | STANDBY | DOWN

Member3:

Traffic on sync interface is present to/from the 3 nodes.

The only solution to fix it is restart of vs on the member where the status is not the one I expect.

Some time install policy on vs change the state in a good way.

What to check? Why these behavior?

R81.10 JHF130 on vmWare Environment

Thanks

M.

Duane_Toler · ‎2024-04-01

I'd suggest running "cphaprob -a if" on each VSX gateway and verify you have the same number of interfaces on each host. For any VLANs, make sure you have the same VLANs up and reachable on all hosts/interfaces as you'd expect. Looks like CCP is not being passed on host 2 for VS ID 2.

Check spanning tree and allowed VLAN list on VLAN trunk ports. For any port-channels, make sure all interfaces of the port channels are up, too.

--
Ansible for Check Point APIs series: https://www.youtube.com/@EdgeCaseScenario and Substack

emmap · ‎2024-04-01

It is normal behaviour for a VS moving to a 'backup' state to become 'lost' for a short time as it restarts itself as part of that process. This is also why you don't even see it on member 2, as it's not running. If it's not coming back by itself, that restart process needs troubleshooting. Check things like resource availability on the VM (does it have enough CPU/RAM/disk?), crash dumps, fwk.elg log file for that VS, things like that.

Marco32 · ‎2024-04-09

Hi emmap,

I increased ram an cpu and it seems to be better on my lab. The active VS change quickly but standby and backup need some minutes to be stable.

Moving it on production in a vlan stratched environment, I see that Sync network need max 100ms latency and no more of 5% of packet lose.

In this case the latency is intented as end-to-end (one-way delay) from source to destination or round-trip time from source to dastination and from destination to source? This delay impact only to sync connection (status table etc.) or to CCP packet to check interface reachability?

emmap · ‎2024-04-10

Sync and CCP are all on UDP 8116 so I think that would mean that it's one-way latency but I can't 100% confirm that. If you need a concrete answer it might be best to ask via TAC.

Are you a member of CheckMates?

Moving VS on VSX with 3 members don't works so good