- Products
- Learn
- Local User Groups
- Partners
- More
Policy Insights and Policy Auditor in Action
19 November @ 5pm CET / 11am ET
Access Control and Threat Prevention Best Practices
Watch HereOverlap in Security Validation
Help us to understand your needs better
CheckMates Go:
Maestro Madness
Hi all!
We had a full crash on both VSX gateways of a 2 node VSX cluster.
Versions (SMS R81.20 / VSX gateways R80.40)
We managed to restore the first node using vsx_util reconfigure getting a working cluster with a single working node.
Tried to restore the second one using the same method but just after the command vsx_util reconfigure command finished (so the gateway is set into VSX mode and received via push the configurations and virtual systems), many communications started to fail.
Checking the first node status with "cphaprob state" showed that 2 out of 4 virtual devices were in standby mode. So supposedly the 2nd node that was still in process of being restored (there were tasks still to be done: reboot, configure the license, configure local.arp, enable dynamic objects, install policies...) had 2 virtual devices in Active state.
Tried to "cphaprob state" and "clusterxl_admin down" to force failover but these commands did not show any output and nothing changed in the status of the virtual devices on the 1st node. Disconnecting interfaces on 2nd node didnt change anything either.
Shutting down this 2nd node made the first node be the active one for all virtual systems.
- why did the node become active for the virtual devices while still not fully restored?
- is there any way to avoid this behaviour?
- what would be the correct procedure?
Thanks all!
Is the reason for the initial crash understood and resolved?
Which JHF is the cluster using? (R80.40 is EOL)
The VSX recovery procedure is outlined in sk101515.
Sounds like you did everything right. Other than checking the logs to see if there is anything obvious, I would definitely open the TAC case to see if they can provide a reason.
Andy
Is the reason for the initial crash understood and resolved?
Which JHF is the cluster using? (R80.40 is EOL)
The VSX recovery procedure is outlined in sk101515.
Is the reason for the initial crash understood and resolved? Which JHF is the cluster using?
It was when trying to recover a deleted virtual system. So supposedly it will not crash unless doing the same actions.
The VSX recovery procedure is outlined in sk101515.
Thanks, that was just was i needed.
Just a doubt, step 10 shows the way to prevent the cluster member from becoming active before the reconfig ends by using cphastop,cphaconf... After rebooting it requires any command to make it become active?
Thanks!
Thanks Chris! Sorry for the late response.
The sk101515 went flawless.
I would only add another step. After the vsx_util reconfigure and before the reboot, i would set virtual devices down (with persistence flag) using clusterXL_admin.
Then reboot, add the cables, perform the pushes, install the required policies and perform configurations (f.e. for certain configurations, internet access is a prerequisite).
After that the different Virtual Devices can be checked one by one via clusterXL_admin
Thanks again!
Do you have VSLS in use? I can imagine that as soon as second member's VS is standby, VSLS will fire and make sure the load is 1:1 between nodes.
Did you check output of "cphaprob -a if" ? Maybe one member had different number of required interfaces which in many cases trigger unplanned failover.
My cluster is active-standby.Not VSLS.
With the sk101515 Chris_Atkinson commented, everything went ok
I would suggest to open a SR# with TAC to get the issue resolved ! You are aware of the fact that version R80.40 is out of support since April 2024 ?
Sorry for the late response.
Yes, i was aware. But first of all i wanted to restore the cluster with adding extra factors.
Thanks
Always configure Active UP before operation like this
You cluster config was probably Primary UP
Finally the sk101515 gives steps to avoid the problem of the second node becoming active unexpectedly while installing
Thanks
Leaderboard
Epsum factorial non deposit quid pro quo hic escorol.
| User | Count |
|---|---|
| 42 | |
| 19 | |
| 13 | |
| 11 | |
| 9 | |
| 8 | |
| 6 | |
| 5 | |
| 5 | |
| 5 |
Wed 19 Nov 2025 @ 11:00 AM (EST)
TechTalk: Improve Your Security Posture with Threat Prevention and Policy InsightsThu 20 Nov 2025 @ 05:00 PM (CET)
Hacking LLM Applications: latest research and insights from our LLM pen testing projects - AMERThu 20 Nov 2025 @ 10:00 AM (CST)
Hacking LLM Applications: latest research and insights from our LLM pen testing projects - EMEAWed 26 Nov 2025 @ 12:00 PM (COT)
Panama City: Risk Management a la Parrilla: ERM, TEM & Meat LunchWed 19 Nov 2025 @ 11:00 AM (EST)
TechTalk: Improve Your Security Posture with Threat Prevention and Policy InsightsThu 20 Nov 2025 @ 05:00 PM (CET)
Hacking LLM Applications: latest research and insights from our LLM pen testing projects - AMERThu 20 Nov 2025 @ 10:00 AM (CST)
Hacking LLM Applications: latest research and insights from our LLM pen testing projects - EMEAThu 04 Dec 2025 @ 12:30 PM (SGT)
End-of-Year Event: Securing AI Transformation in a Hyperconnected World - APACThu 04 Dec 2025 @ 03:00 PM (CET)
End-of-Year Event: Securing AI Transformation in a Hyperconnected World - EMEAWed 26 Nov 2025 @ 12:00 PM (COT)
Panama City: Risk Management a la Parrilla: ERM, TEM & Meat LunchAbout CheckMates
Learn Check Point
Advanced Learning
YOU DESERVE THE BEST SECURITY