Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Maller
Contributor

cluster member down after upgrade from R80.40 to R81.20 via CPUSE

Hello 

 

Today we've tried to upgrade a 5600 cluster  of two members from R80.40 to R81.20 running OSPF . It finished as a total disaster ,

Standby  member was dead after reboot  ,  with these messages

 

Nov 11 16:58:24 2023 ctsmdpc01fw routed[27361]: [routed] ERROR: cpcl_cxl_runtime_status(1216): HA mode not started
Nov 11 16:58:25 2023 ctsmdpc01fw routed[27361]: [routed] ERROR: cpcl_cxl_runtime_status(1216): HA mode not started
Nov 11 16:58:25 2023 ctsmdpc01fw routed[27361]: [routed] ERROR: cpcl_cxl_runtime_status(1216): HA mode not started

It seems like cluster membership was deleted  , Standalone ....lost sync IP , etc...

gateway01fw> show routed cluster-state

Cluster: Standalone
Master/Slave: Master
Sync IP: N/A    
Cluster Sync: N/A

We are going to open a SR to Checkpoint but  I would like to know if someone has found a similar problem..

 

thanks

0 Kudos
18 Replies
Ruan_Kotze
Advisor

Can you check if ClusterXL is enabled in cpconfig?

0 Kudos
Maller
Contributor

Yes, it was enabled. We tried to disable/ reboot and enable again but the result was the same.

finally we did a revert snapshot

 

0 Kudos
_Val_
Admin
Admin

Just in case, did you change the cluster object version and compiled/installed a new policy? was it installed successfully? R80.40 policy version will not work on R81.20. It seems that it loads the default policy, where clustering is not present, hence the HA error

Please post the output from "fw stat"

0 Kudos
Maller
Contributor

After upgrade ,  gateway lost connection with management  we had no option to do nothing. We did a revert snapshot 

But your observation is absolutely right about modifying  object version in management.  I think that   the object version was not modified initially . 

We'll try again in a few days.

 

thanks

 

0 Kudos
the_rock
Legend
Legend

That definitely could have been part of the problem @Maller 

Andy

0 Kudos
_Val_
Admin
Admin

That should not happen. SIC did not work at all? 

0 Kudos
the_rock
Legend
Legend

@_Val_ makes a very good point actually...did you change cluster object to R81.20 in general properties tab?

Best regards,

Andy

0 Kudos
Chris_Atkinson
Employee Employee
Employee

Which method / process did you use for the upgrade e.g. MVC and was a policy install performed successfully after?

CCSM R77/R80/ELITE
0 Kudos
Maller
Contributor

Hello

No, I didn't have the oportunity to enable mvc , install ...nothing.

steps followed

1- Verify the applicable CPUSE Software Packages
2- Download the applicable CPUSESoftware Packages.

3-Install the applicable CPUSE Software Packages.

after step 3 gateway reboot and crashed

 

0 Kudos
the_rock
Legend
Legend

Thats very unfortunate. I always follow zero downtime upgrade method and never had an issue. Hope TAC can check this further for you.

Andy

0 Kudos
Matlu
Advisor

Hello,

I went through a similar event.

Did you have any CT scan results after the event happened to you?

What was the root cause of the problem?

Can you update this post with your comments, please.

0 Kudos
Maller
Contributor

Hello Matlu

Yes,  as Val indicated  it seems to be related to version object change in mgmt server.  Our team followed an old procedure used in R80.X upgrades where the standby node was upgraded previously to modify  cluster object version in mgmt server.  To upgrade to R81.X first of all , object  must be upgraded int mgmt .  Mistakes when nobody reads the  upgrade guide 😞 

thanks

 

Matlu
Advisor

Hello,

Did you use the CPUSE package or the Blink Image package?

When you downloaded the package (either CPUSE or Blink Image), before "Installing" it, you must change the Cluster object version, from the SmartConsole?

This is a previous step before sending to install the package in the passive member?

I have 1 doubt, if you change the version of the Cluster object, before installing the package in the passive member, you have to install "policies"? Or is it just change?

Wouldn't this give more errors?

0 Kudos
Maller
Contributor

Hi Matlu

Answering your qüestions

 

Did you use the CPUSE package or the Blink Image package?  CPUSE package

** ************************************************************************* **
** Majors **
** ************************************************************************* **
Display name Status
R81.20 Gaia Fresh Install and upgrade Downloaded    <--

 

When you downloaded the package (either CPUSE or Blink Image), before "Installing" it, you must change the Cluster object version, from the SmartConsole? YES

This is a previous step before sending to install the package in the passive member? YES

I have 1 doubt, if you change the version of the Cluster object, before installing the package in the passive member, you have to install "policies"? Or is it just change? Just change it .  After standby node is upgraded  then you have to install policy

Wouldn't this give more errors? Yes , install policy will finish ok in R81.20 node and failed in not upgraded node.  But you have to deselect  option "For gateways clusers, if installation on cluster member fails,do not install on that cluster"

When all members in cluster are upgraded , select this option again

 

the_rock
Legend
Legend

This is part of the reason why I never do or recommend this method. I know it probably goes without saying that changing cluster version has to be done when upgrading, but I find doing zero downtime upgrade seems more "natural" to me, if you will.

I had done it that way for years and never had an issue and besides, literally every customer I ever done this for, they dont care if they lose handful of pings or connecton is down for a minute, hence why this is all done after hours anyway.

Just my 2 cents...

Andy

0 Kudos
Naama_Specktor
Employee
Employee

Hello,

My Name us Naama Specktor and I am checkpoint employee,

I will appreciate it if you will share SR #, here on in PM.

thanks in advanced,

Naama 

0 Kudos
TheJP
Explorer

Just to confirm, I also get this on standby cluster members when installing hotfix's.

I've gone through an R80.40 ClusterXL gateway upgrade tonight. /var/log/messages gets spammed with "[routed] ERROR: cpcl_cxl_runtime_status(1216): HA mode not started" messages every second. I've went from base to T41 and T53 just to check. It does it regardless of the version.

0 Kudos
the_rock
Legend
Legend

And if you try cphastop; cphastart ... any change? Reboot?

Andy

0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events