Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Ricardo_Gros
Collaborator

Reboot no explanation

Hi, 

 

I recently have the issue that a customer from us has an appliance that reboots without explanation. 

 

We have noticed following behaviour:

 

[Expert@clusterFW2:0]# last -x |head |tac
reboot   system boot  2.6.18-92cpx86_6 Fri Apr 12 02:44          (00:03)
runlevel (to lvl 3)   2.6.18-92cpx86_6 Fri Apr 12 02:44 - 02:48  (00:03)
runlevel (to lvl 6)   2.6.18-92cpx86_6 Fri Apr 12 02:48 - 02:48  (00:00)
shutdown system down  2.6.18-92cpx86_6 Fri Apr 12 02:48 - 15:59  (13:10)
reboot   system boot  2.6.18-92cpx86_6 Fri Apr 12 02:51          (13:07)
runlevel (to lvl 3)   2.6.18-92cpx86_6 Fri Apr 12 02:51 - 15:59  (13:07)
sseidewi pts/2        dez7acomdv010.in Fri Apr 12 06:05 - 06:25  (00:20)
admin    pts/2        dez7acomdv002.in Fri Apr 12 09:29 - 09:41  (00:12)
admin    pts/2        dez7acomdv001.in Fri Apr 12 14:23 - 14:57  (00:33)
admin    pts/2        dez7acomdv001.in Fri Apr 12 15:46   still logged in

 

This looks like a normal reboot, however runlevel 6 is making me wonder, a normal reboot should not show runlevel6, 

On messages file I can see the message Restart, but no errors previous to this, system reboots normally. 

There are no crash dumps available or errors. 

Can I somehow confirm that the system was not rebooted by simply pressing the power or imputing a command?

0 Kudos
13 Replies
Nick_Doropoulos
Advisor

Try the dmesg log file:

less /var/log/dmesg

0 Kudos
Ricardo_Gros
Collaborator

Hi DSMEG shows nothing relevant. Any other ideas? 

 

I also found 

Apr 12 02:43:06 2019 clusterFW2 kernel:
Apr 12 02:43:06 2019 clusterFW2 kernel: FW-1: stopping debug messages for the next 54 seconds
Apr 12 02:57:39 2019 clusterFW2 syslogd 1.4.1: restart.
Apr 12 02:57:39 2019 clusterFW2 syslogd: local sendto: Network is unreachable

 

0 Kudos
Nick_Doropoulos
Advisor

Another log I would check is the /var/log/messages file. Could you provide the entries whose time stamp corresponds with the last time that the reboot took place?

0 Kudos
Ricardo_Gros
Collaborator

Apr 12 02:43:06 2019 clusterFW2 kernel: [fw4_5];FW-1 - cifs_process_read_andx: /var/log/jail/sys/class/misc/mcelog
Apr 12 02:43:06 2019 clusterFW2 last message repeated 98 times
Apr 12 02:43:06 2019 clusterFW2 kernel:
Apr 12 02:43:06 2019 clusterFW2 kernel: FW-1: stopping debug messages for the next 54 seconds
Apr 12 02:57:39 2019 clusterFW2 syslogd 1.4.1: restart.
Apr 12 02:57:39 2019 clusterFW2 syslogd: local sendto: Network is unreachable
Apr 12 02:57:39 2019 clusterFW2 kernel: klogd 1.4.1, log source = /proc/kmsg started.
Apr 12 02:57:39 2019 clusterFW2 kernel: Linux version 2.6.18-92cpx86_64 (builder@Lnx50BccCmp9.checkpoint.com) (gcc version 4.1.1 20061011 (Red Hat 4.1.1-30)) #1 SMP Mon Oct 8 10:34:42 IDT 2018

 

Reboot happend on this time frame. 

0 Kudos
G_W_Albrecht
Legend Legend
Legend

I would rather involve TAC here - and be sure to save all log files before they are overwritten !

CCSP - CCSE / CCTE / CTPS / CCME / CCSM Elite / SMB Specialist
0 Kudos
Ricardo_Gros
Collaborator

Hi Gunter, 

 

Tac is involved, we already had 1 RMA, now the new maschine has the exact same problem, this lead me to reasearch because this is very strange... 

 

I'm searching for mcelog  or something similar but could not find it.. 

0 Kudos
Vladimir
Champion
Champion

@Ricardo_Gros , can you let us know if the problem was identified and resolved?

If so, what was causing the reboots and how was it remedied.

Thank you,

 

Vladimir

0 Kudos
Ricardo_Gros
Collaborator

Hi,

 

Sorry for the late reply, this issue was a Kernel Bug of some sort and was solved by installing at the time the newest Jumbo Take.

 

 

0 Kudos
Nick_Doropoulos
Advisor

Could you provide the following info as well please:

fw ver
enabled_blades

0 Kudos
JozkoMrkvicka
Authority
Authority

Are syslog messages forwarded to some syslog receiver ? From there you should see what happened exactly.

Do you have LOM ? There you can find some interesting logs.

What is status of PSU ?

Wasnt there any on-site activity for this node? Manual on-site reboot, for example.

Kind regards,
Jozko Mrkvicka
0 Kudos
Alissa20
Explorer

Agreed.....involving TAC will make things much more easier...Also ensure that u dont overwrite log files...So yeah,involve TAC files and it make things much more easier.

 
 
0 Kudos
Manujeet
Explorer

can anyone tell me about runlevel to 3 or 6 

0 Kudos
_Val_
Admin
Admin

Those are part of early boot operations. Why?

0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events