Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Matlu
Advisor

Unexpected crash of ClusterXL active member

Hello, Folks.

Today our client's Main Cluster had a problem.

At about 10:00am (GMT -5), the client was without services in general (Internet, Publishing, Communication between internal segments).
Practically, the active member of the Cluster "DIED", without any cause.

The client tried to switch the ClusterXL order, with the command "clusterXL_admin down", but the command did not work, and had to restart the computer.

They have already had similar bad experiences with this Cluster, and it seems to be a problem of the hardware that was sold (Appliance 6000).

Is it possible to "detect" what was the root-cause, by which simply, the equipment, stopped working, and caused this disaster for the customer?

Best regards.

Check Point: R81.10 with JHF Take 95

0 Kudos
4 Replies
PhoneBoy
Admin
Admin

If the system crashed, there is going to be a vmcore somewhere.
I would highly recommend engaging with the TAC on this as, if it's a hardware failure, an RMA may be required.

0 Kudos
the_rock
Legend
Legend

TAC may  suggest to upgrade to R81.20, but hard to say if that would solve anything. As @PhoneBoy said, sounds like vmcore was generated, so that would need to be investigated further.

Andy

0 Kudos
JozkoMrkvicka
Mentor
Mentor

as soon as possible, once device is "alive", collect all logs present on a device ( dmesg, var/log/messages) and cpinfo. TAC may be able to spot what went wrong before all logs are overwritten with newer ones.

Kind regards,
Jozko Mrkvicka
the_rock
Legend
Legend

Very good point @JozkoMrkvicka 

0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events