- CheckMates
- :
- Products
- :
- Quantum
- :
- Security Gateways
- :
- Re: My context HA environment broke in VSX
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Are you a member of CheckMates?
×- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
My context HA environment broke in VSX
Hello, everyone.
I don't have much experience in VSX and MDS environments, hopefully you can clarify the doubt.
I currently have a problem in one of my contexts, and the HA of the Cluster has been lost.
If the “Standby” member of the context is lost, what can be the “most practical way” to recover its operation?
Should I still be able to access that member that appears as “Lost” by the CLI?
Can I check the root-cause of why the Cluster of my context was “broken”?
Thanks for your comments.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
First step is to login to that second node, then run "cphaprob stat" for VS0. Then go to that one VS (vsenv 3) and run "cphaprob stat" again per-VS. Start there and it should give you some hints.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I dont know if all of below work on VSX, but worth comparing.
Andy
cphaprob roles
cphaprob state
cphaprob -i list
cphaprob -l list
cphaprob syncstat
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Able to push policy to this problem vs, and to vs0?
Issue still there if you restart the VS itself?
- Connect to the command line on the VSX Gateway.
- Go to the context of the Virtual System:
- In Gaia Clish, run:
set virtual-system <VSID>
- In the Expert mode, run:
vsenv <VSID>
- In Gaia Clish, run:
- Stop the Virtual System:
cpstop
- Start the Virtual System:
cpstart
If you like this post please give a thumbs up(kudo)! 🙂
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
That sounds very logical to me.
Andy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
Initially the problem was with only one particular vsenv, in this case, ID 3, but over the course of the hours, from one moment to another unexpectedly the whole box (VS0) has rebooted for no reason.
The device is up again, but “vsenv 3” is still not available for the cluster.
Are there any files that indicate a possible root-cause of “why” an instance as such “crashes”?
Regards.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What version are you running and what Jumbo is installed?
What files are in /var/log/crash and /var/log/dump?
If you see files there that match when the node rebooted, pull these off and get a cpinfo run soon as possible.
Get a TAC case raise to investigate these file (if there are any).
How long has the VS been stable? If its been good, what changed in the environment?
As the other have said start with cphaprob commands to determine status (suspect this will give you a clue).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello.
I have a version R81.20
JHF Take 82
[Expert@FWCP-AC:3]# cphaprob state
[Expert@FWCP-AC:3]# cphaprob state
HA module not started.
Expert@FWCP-AC:3]# [Expert@FWCP-AC:3]# [Expert@FWCP-AC:3
[Expert@FWCP-AC:3]# [Expert@FWCP-AC:3]# [Expert@FWCP-AC:3]#
Expert@FWCP-AC:3]# [Expert@FWCP-AC:3]# [Expert@FWCP-AC:3]#
Expert@FWCP-AC:3]# [Expert@FWCP-AC:3]# [Expert@FWCP-AC:3]# last reboot
reboot system boot 3.10.0-1160.15.2 Tue Mar 18 15:30 - 17:52 (02:22)
reboot system boot 3.10.0-1160.15.2 Tue Mar 18 15:22 - 17:52 (02:30)
reboot system boot 3.10.0-1160.15.2 Tue Mar 18 15:15 - 17:52 (02:36)
reboot system boot 3.10.0-1160.15.2 Tue Mar 18 15:06 - 17:52 (02:45)
wtmp begins Tue Mar 18 12:32:46 2025
[Expert@FWCP-AC:3]# [Expert@FWCP-AC:3]#
Expert@FWCP-AC:3]# [Expert@FWCP-AC:3]#
Greetings
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can you make sure clustering is enabled via cpconfig? If it is, maybe try cphastop; cphastart
Andy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Do you have a SR# with CP TAC open already ? I fear getting help is not so easy...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
The logs that are related to what happens with the Clusters, for example, when the cluster breaks, or one of the members freezes, or there is an unexpected switch, where can these logs be reviewed?
Is there a way to review the events related to this, from the last 24 hours?
Could you share me the syntax please?
Thanks for the help.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hey bro, it should be all in /var/log/messages
example:
grep -i DOWN /var/log/messages*
you can replace word down with anything else you wish to search, ie clusterXL, freeze, etc
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Buddy,
The command you just passed me works very well, but is there a way to print only the “last 100 lines”?
Because the command prints the whole thing.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
grep -i DOWN /var/log/messages* > /var/log/clusterissue.txt
cd /var/log
tail -50 clusterissue.txt
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
example in my lab.
Andy
***********************
[Expert@CP-FW-01:0]# grep -i DOWN /var/log/messages* > /var/log/clusterissue.txt; cd /var/log; tail -50 clusterissue.txt
/var/log/messages.10:Mar 18 14:27:50 2025 CP-FW-01 xpand[6967]: admin localhost t +installer:update_status_message Contacting the Download Center
/var/log/messages.10:Mar 18 14:27:51 2025 CP-FW-01 xpand[6967]: admin localhost t +installer:update_status_message Received 148 results from the Download Center
/var/log/messages.3:Mar 19 02:30:10 2025 CP-FW-01 xpand[6967]: admin localhost t +installer:update_status_message Contacting the Download Center
/var/log/messages.3:Mar 19 02:30:12 2025 CP-FW-01 xpand[6967]: admin localhost t +installer:update_status_message Received 148 results from the Download Center
/var/log/messages.6:Mar 18 20:29:01 2025 CP-FW-01 xpand[6967]: admin localhost t +installer:update_status_message Contacting the Download Center
/var/log/messages.6:Mar 18 20:29:01 2025 CP-FW-01 xpand[6967]: admin localhost t +installer:update_status_message Received 148 results from the Download Center
[Expert@CP-FW-01:0]#
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Consider running an HCP check on your system as well.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Before you restart your VS arrange a maintenance window for the outage!
\m/_(>_<)_\m/
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi.
I rebooted my standby box of my vsx cluster, but when I pick up, the VS ID 3 cluster, it was still broken.
I enter the VS ID 3 of the standby member, and I gave the command ‘cphastart’, immediately, the member already appeared inside the cluster but it appears as DOWN.
Immediately I gave the command ‘clusterXL_admin up’, but the member does not change status and continues as DOWN.
It is very strange.
Is there any other way to recover this member, so that it forms correctly the Cluster of my VS 3?
Greetings.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hey bro,
I know this may sound weird, but few times when I had this problem with customers, we had to reboot BOTH boxes to get it working.
Andy
