Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
K_R_V
Collaborator
Jump to solution

Spark 1530 Cluster upgrade ( R81.10.00 - R81.10.07 )-> kernel crash

We  upgraded 2 Spark 1530 clusters, and experienced the same behavior on both clusters.

What did we do :

We upgraded the Standby firewall FW2 from R81.10.00 to R81.10.07. After rebooting, FW2 had the outgoing policy and after a policy push, the correct policy was installed and firewall was in STANDBY state.
So far so good.

cluster 1 :

We upgraded FW1,  firewall was constantly rebooting, responded to ping and SSH for 10 seconds before rebooting, console output :

INIT: version 2.88 booting
INIT: Entering runlevel: 3
Booting CRASH kernel...

after factory reset and upgrade, cluster was back online.

cluster 2 :
We initiated a failover to FW2 with the cluster down script, and FW2 became the active firewall. At this moment, FW1 crashed and rebooted, over and over again. In the 10 second window it was reachable, I managed to log in and do a "fw unloadlocal".
This prevented the crashing and I was able to upgrade the device. After the upgrade and policy install, cluster was OK again.

Do we do something wrong here, did anybody experienced the same behavior, known issue ? 

We have 30 more cluster to upgrade !

0 Kudos
1 Solution

Accepted Solutions
K_R_V
Collaborator

Upgrades with version R81.10.08 - Build 616 do not have this issue, problem seems to be resolved in this version !

View solution in original post

14 Replies
G_W_Albrecht
Legend
Legend

Contact TAC to get this resolved !

CCSE CCTE CCSM SMB Specialist
K_R_V
Collaborator

TAC case has been created, but with my experience with TAC, this will take some time and frustration ( other discussion !  😁)

We'll see where it goes, but I'm just wondering if somebody else had the same issue .. 

(1)
the_rock
Legend
Legend

You are 100% right, but personally, if I were you, I would see if issue can be escalated. At this point, no customer would feel comfortable continuing the process, specially considering the seriousness of the issue you experienced with first cluster.

Just my opinion.

Andy

0 Kudos
K_R_V
Collaborator

feedback from TAC and cased closed 🙄

The current occurrence is quite rare and happening on Centrally Managed SMB appliances
due to a policy issue where the policy is not installed on the gateway/cluster after the upgrade.

We do suggest performing the upgrade and once the upgrade performed, installing the policy
in order to avoid a roll-back to previous versions.

Is there anything else we can assist you with?

0 Kudos
the_rock
Legend
Legend

If I were you, I would escalate that issue through your Sales person, thats totally unacceptable. That answer logically does not make much sense...lets think about it for a second. So there are probably 1000s of customers who do upgrades on regular Gaia devices and if same issue occured, are they going to blame it on the fact that right policy was not installed??!!

Andy

G_W_Albrecht
Legend
Legend

According to your description the answer is unacceptable: In cluster 2 the issue has happened after failover and before the upgrade.

CCSE CCTE CCSM SMB Specialist
the_rock
Legend
Legend

Personally, answer @K_R_V was given by TAC makes no sense. I cant logically see how any of that (whether it be failover or crash itself) can be caused by policy. 

Andy

PhoneBoy
Admin
Admin

Note that it’s required to perform a policy install after doing a major upgrade (eg from R80.20.xx to R81.10.xx).
Never seen it being necessary when upgrading within the same major release.

0 Kudos
Jones
Collaborator
Collaborator

I'm experiencing a similar issue when upgrading a 1600 centrally managed cluster from R80.20.50 to R81.10.08 or from R81.10.05 to R81.10.08. The first unit went fine, but the second unit keeps crashing the kernel. Ending up reverting back to R80.20.50.

Did you find any solution for this issue for your other cluster upgrades?

0 Kudos
K_R_V
Collaborator

The other clusters have not yet been upgraded due to some issues with vmac in R81.10.05, 7 and 8. This should be fixed in R81.10.10 so I'm waiting for the GA version.

The upgrade issue is not in R81.10.00, so you can maybe upgrade to this version.

Is your cluster also using VMAC ?

0 Kudos
Jones
Collaborator
Collaborator

I upgraded to R81.10.05 that went fine. But upgrading to R81.10.08 reverted me back to R80.20.50 so I stayed there. I'm not using VMAC so this must be something else. 

0 Kudos
K_R_V
Collaborator

Upgrades with version R81.10.08 - Build 616 do not have this issue, problem seems to be resolved in this version !

Jones
Collaborator
Collaborator

According to sk181079, Build 683 is the latest and replaced 608.

DateDescription
30 Nov. 2023Release of Build 996001683, replacing Build 996001608.
07 Sep. 2023Release of Build 996001608, replacing Build 996001601. 
31 Aug. 2023First release of this document (Build 996001601).
0 Kudos
K_R_V
Collaborator

Fixed in , R81.10.08 (996001690), but was proved by TAC.

0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events