Solved: Re: Spark 1530 Cluster upgrade ( R81.10.00 - R81....

K_R_V · ‎2023-08-25

We upgraded 2 Spark 1530 clusters, and experienced the same behavior on both clusters.

What did we do :

We upgraded the Standby firewall FW2 from R81.10.00 to R81.10.07. After rebooting, FW2 had the outgoing policy and after a policy push, the correct policy was installed and firewall was in STANDBY state.
So far so good.

cluster 1 :

We upgraded FW1, firewall was constantly rebooting, responded to ping and SSH for 10 seconds before rebooting, console output :

INIT: version 2.88 booting
INIT: Entering runlevel: 3
Booting CRASH kernel...

after factory reset and upgrade, cluster was back online.

cluster 2 :
We initiated a failover to FW2 with the cluster down script, and FW2 became the active firewall. At this moment, FW1 crashed and rebooted, over and over again. In the 10 second window it was reachable, I managed to log in and do a "fw unloadlocal".
This prevented the crashing and I was able to upgrade the device. After the upgrade and policy install, cluster was OK again.

Do we do something wrong here, did anybody experienced the same behavior, known issue ?

We have 30 more cluster to upgrade !

K_R_V · ‎2024-01-02

Upgrades with version R81.10.08 - Build 616 do not have this issue, problem seems to be resolved in this version !

View solution in original post

G_W_Albrecht · ‎2023-08-25

Contact TAC to get this resolved !

CCSP - CCSE / CCTE / CTPS / CCME / CCSM Elite / SMB Specialist

K_R_V · ‎2023-08-25

TAC case has been created, but with my experience with TAC, this will take some time and frustration ( other discussion ! 😁)

We'll see where it goes, but I'm just wondering if somebody else had the same issue ..

the_rock · ‎2023-08-25

You are 100% right, but personally, if I were you, I would see if issue can be escalated. At this point, no customer would feel comfortable continuing the process, specially considering the seriousness of the issue you experienced with first cluster.

Just my opinion.

Andy

K_R_V · ‎2023-09-01

feedback from TAC and cased closed 🙄

The current occurrence is quite rare and happening on Centrally Managed SMB appliances
due to a policy issue where the policy is not installed on the gateway/cluster after the upgrade.

We do suggest performing the upgrade and once the upgrade performed, installing the policy
in order to avoid a roll-back to previous versions.

Is there anything else we can assist you with?

the_rock · ‎2023-09-01

If I were you, I would escalate that issue through your Sales person, thats totally unacceptable. That answer logically does not make much sense...lets think about it for a second. So there are probably 1000s of customers who do upgrades on regular Gaia devices and if same issue occured, are they going to blame it on the fact that right policy was not installed??!!

Andy

G_W_Albrecht · ‎2023-09-01

According to your description the answer is unacceptable: In cluster 2 the issue has happened after failover and before the upgrade.

CCSP - CCSE / CCTE / CTPS / CCME / CCSM Elite / SMB Specialist

the_rock · ‎2023-09-01

Personally, answer @K_R_V was given by TAC makes no sense. I cant logically see how any of that (whether it be failover or crash itself) can be caused by policy.

Andy

PhoneBoy · ‎2023-09-01

Note that it’s required to perform a policy install after doing a major upgrade (eg from R80.20.xx to R81.10.xx).
Never seen it being necessary when upgrading within the same major release.

Jones · ‎2023-11-16

I'm experiencing a similar issue when upgrading a 1600 centrally managed cluster from R80.20.50 to R81.10.08 or from R81.10.05 to R81.10.08. The first unit went fine, but the second unit keeps crashing the kernel. Ending up reverting back to R80.20.50.

Did you find any solution for this issue for your other cluster upgrades?

K_R_V · ‎2023-11-17

The other clusters have not yet been upgraded due to some issues with vmac in R81.10.05, 7 and 8. This should be fixed in R81.10.10 so I'm waiting for the GA version.

The upgrade issue is not in R81.10.00, so you can maybe upgrade to this version.

Is your cluster also using VMAC ?

Jones · ‎2023-11-17

I upgraded to R81.10.05 that went fine. But upgrading to R81.10.08 reverted me back to R80.20.50 so I stayed there. I'm not using VMAC so this must be something else.

K_R_V · ‎2024-01-02

Upgrades with version R81.10.08 - Build 616 do not have this issue, problem seems to be resolved in this version !

Jones · ‎2024-01-08

According to sk181079, Build 683 is the latest and replaced 608.

Date	Description
30 Nov. 2023	Release of Build 996001683, replacing Build 996001608.
07 Sep. 2023	Release of Build 996001608, replacing Build 996001601.
31 Aug. 2023	First release of this document (Build 996001601).

K_R_V · ‎2024-01-08

Fixed in , R81.10.08 (996001690), but was proved by TAC.

Are you a member of CheckMates?

Spark 1530 Cluster upgrade ( R81.10.00 - R81.10.07 )-> kernel crash