Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted

Reboot no explanation

Hi, 

 

I recently have the issue that a customer from us has an appliance that reboots without explanation. 

 

We have noticed following behaviour:

 

[Expert@clusterFW2:0]# last -x |head |tac
reboot   system boot  2.6.18-92cpx86_6 Fri Apr 12 02:44          (00:03)
runlevel (to lvl 3)   2.6.18-92cpx86_6 Fri Apr 12 02:44 - 02:48  (00:03)
runlevel (to lvl 6)   2.6.18-92cpx86_6 Fri Apr 12 02:48 - 02:48  (00:00)
shutdown system down  2.6.18-92cpx86_6 Fri Apr 12 02:48 - 15:59  (13:10)
reboot   system boot  2.6.18-92cpx86_6 Fri Apr 12 02:51          (13:07)
runlevel (to lvl 3)   2.6.18-92cpx86_6 Fri Apr 12 02:51 - 15:59  (13:07)
sseidewi pts/2        dez7acomdv010.in Fri Apr 12 06:05 - 06:25  (00:20)
admin    pts/2        dez7acomdv002.in Fri Apr 12 09:29 - 09:41  (00:12)
admin    pts/2        dez7acomdv001.in Fri Apr 12 14:23 - 14:57  (00:33)
admin    pts/2        dez7acomdv001.in Fri Apr 12 15:46   still logged in

 

This looks like a normal reboot, however runlevel 6 is making me wonder, a normal reboot should not show runlevel6, 

On messages file I can see the message Restart, but no errors previous to this, system reboots normally. 

There are no crash dumps available or errors. 

Can I somehow confirm that the system was not rebooted by simply pressing the power or imputing a command?

0 Kudos
11 Replies
Highlighted

Try the dmesg log file:

less /var/log/dmesg

0 Kudos
Highlighted

Hi DSMEG shows nothing relevant. Any other ideas? 

 

I also found 

Apr 12 02:43:06 2019 clusterFW2 kernel:
Apr 12 02:43:06 2019 clusterFW2 kernel: FW-1: stopping debug messages for the next 54 seconds
Apr 12 02:57:39 2019 clusterFW2 syslogd 1.4.1: restart.
Apr 12 02:57:39 2019 clusterFW2 syslogd: local sendto: Network is unreachable

 

0 Kudos
Highlighted

Another log I would check is the /var/log/messages file. Could you provide the entries whose time stamp corresponds with the last time that the reboot took place?

0 Kudos
Highlighted

Apr 12 02:43:06 2019 clusterFW2 kernel: [fw4_5];FW-1 - cifs_process_read_andx: /var/log/jail/sys/class/misc/mcelog
Apr 12 02:43:06 2019 clusterFW2 last message repeated 98 times
Apr 12 02:43:06 2019 clusterFW2 kernel:
Apr 12 02:43:06 2019 clusterFW2 kernel: FW-1: stopping debug messages for the next 54 seconds
Apr 12 02:57:39 2019 clusterFW2 syslogd 1.4.1: restart.
Apr 12 02:57:39 2019 clusterFW2 syslogd: local sendto: Network is unreachable
Apr 12 02:57:39 2019 clusterFW2 kernel: klogd 1.4.1, log source = /proc/kmsg started.
Apr 12 02:57:39 2019 clusterFW2 kernel: Linux version 2.6.18-92cpx86_64 (builder@Lnx50BccCmp9.checkpoint.com) (gcc version 4.1.1 20061011 (Red Hat 4.1.1-30)) #1 SMP Mon Oct 8 10:34:42 IDT 2018

 

Reboot happend on this time frame. 

0 Kudos
Highlighted
Sapphire

I would rather involve TAC here - and be sure to save all log files before they are overwritten !

0 Kudos
Highlighted

Hi Gunter, 

 

Tac is involved, we already had 1 RMA, now the new maschine has the exact same problem, this lead me to reasearch because this is very strange... 

 

I'm searching for mcelog  or something similar but could not find it.. 

0 Kudos
Highlighted
Pearl

@Ricardo_Gros , can you let us know if the problem was identified and resolved?

If so, what was causing the reboots and how was it remedied.

Thank you,

 

Vladimir

0 Kudos

Hi,

 

Sorry for the late reply, this issue was a Kernel Bug of some sort and was solved by installing at the time the newest Jumbo Take.

 

 

0 Kudos
Highlighted

Could you provide the following info as well please:

fw ver
enabled_blades

0 Kudos
Highlighted
Platinum

Are syslog messages forwarded to some syslog receiver ? From there you should see what happened exactly.

Do you have LOM ? There you can find some interesting logs.

What is status of PSU ?

Wasnt there any on-site activity for this node? Manual on-site reboot, for example.

Kind regards,
Jozko Mrkvicka
0 Kudos
Highlighted
Ivory

Agreed.....involving TAC will make things much more easier...Also ensure that u dont overwrite log files...So yeah,involve TAC files and it make things much more easier.

 
 
0 Kudos