Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
entsupport
Explorer

Commmands not executing in Management Server R80.10

Hello All,

Since last 2 days every morning we are facing very strange issue. Commands are not getting executed on management server. CPU & memory utilization is also normal. 

After rebooting of management server the issue gets fixed but again next morning the issue arises.

We have collected few of the outputs during the issue as per the TAC suggestion. Attaching the same herewith.

We have logged a ticket with checkpoint TAC but they are also not able to fix this issue.

Kindly help if any troubleshooting we can perform to fix this issue

0 Kudos
12 Replies
G_W_Albrecht
Legend Legend
Legend

Which commands do not get executed ? What is shown in logs from the time of the issue ?

CCSP - CCSE / CCTE / CTPS / CCME / CCSM Elite / SMB Specialist
0 Kudos
entsupport
Explorer

cpview, cpstat , cpinfo, reboot etc commands are not getting executed. 

[Expert@DSPMGMT:0]# tail -f /var/log/messages
Jan 24 08:58:29 2020 DSPMGMT PAM-tacplus[1819]: auth failed: 2
Jan 24 09:21:24 2020 DSPMGMT snmpd: Error: Timeout waiting for response from database server.
Jan 24 09:22:04 2020 DSPMGMT monitord[3873]: Error: Timeout waiting for response from database server.
Jan 24 09:22:24 2020 DSPMGMT snmpd: Error: Timeout waiting for response from database server.
Jan 24 09:38:01 2020 DSPMGMT PAM-tacplus[4844]: auth failed: 2
Jan 24 09:58:43 2020 DSPMGMT PAM-tacplus[6059]: auth failed: 2
Jan 24 10:49:33 2020 DSPMGMT PAM-tacplus[8861]: auth failed: 2
Jan 24 10:54:43 2020 DSPMGMT PAM-tacplus[9166]: auth failed: 2
Jan 24 10:54:49 2020 DSPMGMT PAM-tacplus[9166]: auth failed: 2
Jan 24 10:56:38 2020 DSPMGMT PAM-tacplus[9325]: auth failed: 2

0 Kudos
PhoneBoy
Admin
Admin

Someone from R&D will probably have to have a look at this.
If you've opened a TAC case and provided the necessary details, it will make its way to them.
0 Kudos
Blake_Fithen
Participant

Good afternoon.  Was there a resolution to this?  We are having identical problems with a Smart-1 5050, R80.30. The only difference is the power cords must be reseated.  A warm reboot or shutdown -r does not help.   Thank you for any info you can provide.  I do have a case open with TAC.

0 Kudos
the_rock
Legend
Legend

Cant say I had ever seen that before...what did TAC say?

0 Kudos
Blake_Fithen
Participant

TAC is still working on it.  Trying to duplicate the problem with our configuration. 

0 Kudos
the_rock
Legend
Legend

Just curious, as I like to approach every problem logically. So, when you say this happened 2 days ago, anything you can think of that may had changed on mgmt server 2 or 3 days ago at all? Can you maybe check any audit logs to see if there is anything of interest when this issue occurred? One thing that comes to my mind is guidbedit, but unless someone inadvertently made changes there, I guess might not be relevant. Just to be on safe side, I would try do "install database" on the server itself.

TAC has valid idea...if they can import your config in their lab and try fix it, they can provide the solution. 

0 Kudos
Blake_Fithen
Participant

Thanks for your interest. I don't recall saying it happened two days ago though - it started about 12 days ago and is very intermittent.  We're about 14 hours total into troubleshooting, reinstalling from R80.30 ISO (twice). Patch to latest hotfix, migrate export/import, etc, push policy, all is good.  Wait x amount of minutes/hours/days, then same problem.

My gut says it's hardware sensor related - or maybe ILMI related because only reseating the power cables will bring it back to the point where the GAIA portal and the dashboard are useable again.  But that's just my opinion.   As soon as that database timeout message appears in /var/log/messages, that's it for the portal and dashboard.  

0 Kudos
the_rock
Legend
Legend

Sorry, my apologies, I read original post and said "since last 2 days"...thats what I wanted to respond to, but replied to you, sorry about that. Though now that you said all that, I would agree 100% with your assessment...did you asked TAC for rma? I cant see what else they can ask you to do, except send a replacement.

0 Kudos
Blake_Fithen
Participant

I forgot to add I've had practically zero problems like this.  For roughly 14 months it's been rock solid with regular operational rule changes, IPS, other blade updates, VPN stuff, regular hotfix updates, etc.   No real negative work stopping events like this for a long time. 

0 Kudos
the_rock
Legend
Legend

Well, for such expensive machine like Smart-1 5050, better work way longer than 14 months 🙂

0 Kudos
Blake_Fithen
Participant

No worries.  Agreed.   Decision on RMA late tomorrow.

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events