- CheckMates
- :
- Products
- :
- General Topics
- :
- Re: Commmands not executing in Management Server R...
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Are you a member of CheckMates?
×- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Commmands not executing in Management Server R80.10
Hello All,
Since last 2 days every morning we are facing very strange issue. Commands are not getting executed on management server. CPU & memory utilization is also normal.
After rebooting of management server the issue gets fixed but again next morning the issue arises.
We have collected few of the outputs during the issue as per the TAC suggestion. Attaching the same herewith.
We have logged a ticket with checkpoint TAC but they are also not able to fix this issue.
Kindly help if any troubleshooting we can perform to fix this issue
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Which commands do not get executed ? What is shown in logs from the time of the issue ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
cpview, cpstat , cpinfo, reboot etc commands are not getting executed.
[Expert@DSPMGMT:0]# tail -f /var/log/messages
Jan 24 08:58:29 2020 DSPMGMT PAM-tacplus[1819]: auth failed: 2
Jan 24 09:21:24 2020 DSPMGMT snmpd: Error: Timeout waiting for response from database server.
Jan 24 09:22:04 2020 DSPMGMT monitord[3873]: Error: Timeout waiting for response from database server.
Jan 24 09:22:24 2020 DSPMGMT snmpd: Error: Timeout waiting for response from database server.
Jan 24 09:38:01 2020 DSPMGMT PAM-tacplus[4844]: auth failed: 2
Jan 24 09:58:43 2020 DSPMGMT PAM-tacplus[6059]: auth failed: 2
Jan 24 10:49:33 2020 DSPMGMT PAM-tacplus[8861]: auth failed: 2
Jan 24 10:54:43 2020 DSPMGMT PAM-tacplus[9166]: auth failed: 2
Jan 24 10:54:49 2020 DSPMGMT PAM-tacplus[9166]: auth failed: 2
Jan 24 10:56:38 2020 DSPMGMT PAM-tacplus[9325]: auth failed: 2
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you've opened a TAC case and provided the necessary details, it will make its way to them.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Good afternoon. Was there a resolution to this? We are having identical problems with a Smart-1 5050, R80.30. The only difference is the power cords must be reseated. A warm reboot or shutdown -r does not help. Thank you for any info you can provide. I do have a case open with TAC.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Cant say I had ever seen that before...what did TAC say?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
TAC is still working on it. Trying to duplicate the problem with our configuration.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Just curious, as I like to approach every problem logically. So, when you say this happened 2 days ago, anything you can think of that may had changed on mgmt server 2 or 3 days ago at all? Can you maybe check any audit logs to see if there is anything of interest when this issue occurred? One thing that comes to my mind is guidbedit, but unless someone inadvertently made changes there, I guess might not be relevant. Just to be on safe side, I would try do "install database" on the server itself.
TAC has valid idea...if they can import your config in their lab and try fix it, they can provide the solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for your interest. I don't recall saying it happened two days ago though - it started about 12 days ago and is very intermittent. We're about 14 hours total into troubleshooting, reinstalling from R80.30 ISO (twice). Patch to latest hotfix, migrate export/import, etc, push policy, all is good. Wait x amount of minutes/hours/days, then same problem.
My gut says it's hardware sensor related - or maybe ILMI related because only reseating the power cables will bring it back to the point where the GAIA portal and the dashboard are useable again. But that's just my opinion. As soon as that database timeout message appears in /var/log/messages, that's it for the portal and dashboard.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry, my apologies, I read original post and said "since last 2 days"...thats what I wanted to respond to, but replied to you, sorry about that. Though now that you said all that, I would agree 100% with your assessment...did you asked TAC for rma? I cant see what else they can ask you to do, except send a replacement.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I forgot to add I've had practically zero problems like this. For roughly 14 months it's been rock solid with regular operational rule changes, IPS, other blade updates, VPN stuff, regular hotfix updates, etc. No real negative work stopping events like this for a long time.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Well, for such expensive machine like Smart-1 5050, better work way longer than 14 months 🙂
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
No worries. Agreed. Decision on RMA late tomorrow.
