Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Matlu
Advisor

My context HA environment broke in VSX

Hello, everyone.

I don't have much experience in VSX and MDS environments, hopefully you can clarify the doubt.

I currently have a problem in one of my contexts, and the HA of the Cluster has been lost.

CD1.png

If the “Standby” member of the context is lost, what can be the “most practical way” to recover its operation?

Should I still be able to access that member that appears as “Lost” by the CLI?

Can I check the root-cause of why the Cluster of my context was “broken”?

Thanks for your comments.

0 Kudos
8 Replies
Duane_Toler
Advisor

First step is to login to that second node, then run "cphaprob stat" for VS0.  Then go to that one VS (vsenv 3) and run "cphaprob stat" again per-VS.  Start there and it should give you some hints.

--
Ansible for Check Point APIs series: https://www.youtube.com/@EdgeCaseScenario and Substack
the_rock
Legend
Legend

I dont know if all of below work on VSX, but worth comparing.

Andy

cphaprob roles

cphaprob state

cphaprob -i list

cphaprob -l list

cphaprob syncstat

0 Kudos
Lesley
Mentor Mentor
Mentor

Able to push policy to this problem vs, and to vs0?

Issue still there if you restart the VS itself?

  1. Connect to the command line on the VSX Gateway.
  2. Go to the context of the Virtual System:
    • In Gaia Clish, run:
      set virtual-system <VSID>
    • In the Expert mode, run:
      vsenv <VSID>
  3. Stop the Virtual System:
    cpstop
  4. Start the Virtual System:
    cpstart
-------
If you like this post please give a thumbs up(kudo)! 🙂
the_rock
Legend
Legend

That sounds very logical to me.

Andy

0 Kudos
Matlu
Advisor

Hello,

Initially the problem was with only one particular vsenv, in this case, ID 3, but over the course of the hours, from one moment to another unexpectedly the whole box (VS0) has rebooted for no reason.

The device is up again, but “vsenv 3” is still not available for the cluster.

Are there any files that indicate a possible root-cause of “why” an instance as such “crashes”?

Regards.

0 Kudos
genisis__
Mentor Mentor
Mentor

What version are you running and what Jumbo is installed?
What files are in /var/log/crash and /var/log/dump?
If you see files there that match when the node rebooted, pull these off and get a cpinfo run soon as possible.  

Get a TAC case raise to investigate these file (if there are any).

How long has the VS been stable?  If its been good, what changed in the environment? 
As the other have said start with cphaprob commands to determine status (suspect this will give you a clue).

 

0 Kudos
Matlu
Advisor

Hello.
I have a version R81.20
JHF Take 82

[Expert@FWCP-AC:3]# cphaprob state
[Expert@FWCP-AC:3]# cphaprob state

HA module not started.

Expert@FWCP-AC:3]# [Expert@FWCP-AC:3]# [Expert@FWCP-AC:3
[Expert@FWCP-AC:3]# [Expert@FWCP-AC:3]# [Expert@FWCP-AC:3]#

Expert@FWCP-AC:3]# [Expert@FWCP-AC:3]# [Expert@FWCP-AC:3]#
Expert@FWCP-AC:3]# [Expert@FWCP-AC:3]# [Expert@FWCP-AC:3]# last reboot
reboot system boot 3.10.0-1160.15.2 Tue Mar 18 15:30 - 17:52 (02:22)
reboot system boot 3.10.0-1160.15.2 Tue Mar 18 15:22 - 17:52 (02:30)
reboot system boot 3.10.0-1160.15.2 Tue Mar 18 15:15 - 17:52 (02:36)
reboot system boot 3.10.0-1160.15.2 Tue Mar 18 15:06 - 17:52 (02:45)

wtmp begins Tue Mar 18 12:32:46 2025
[Expert@FWCP-AC:3]# [Expert@FWCP-AC:3]#
Expert@FWCP-AC:3]# [Expert@FWCP-AC:3]#

Greetings

0 Kudos
the_rock
Legend
Legend

Can you make sure clustering is enabled via cpconfig? If it is, maybe try cphastop; cphastart

Andy

0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events