To summarize our attempt:
2x MHO-140 Orchestrators, running R80.20SP Take 295
4x CPAP-SG6500 appliances in Security Group 1, running R80.20SP Take 295 + Memleak Portfix.
Started with upgrading MHO-140 to R80.20SP Take 317. Then upgraded them to R81.10. Went flawlessly, no issue whatosever.
Then we moved to the gateways. We decided to start with half of them to begin with. They are four in total. 1_01, 1_02, 1_03 and 1_04. We did run clusterXL down on 1_01 and 1_02. Removed the memleak portfix, reboot, installed take 315, reboot, installed the R80.20SP Upgrade hotfix, reboot. Upgraded the deployment agent and imported the R81.10SP upgrade package.
All is fine thus far. The upgrade itself went well. But upon boot after the upgrade 1_01, 1_02 tossed 1_03 and 1_04 into ready state. All traffic was lost. It fixed itself after a few minutes. 1_01 and 1_02 was in down state as expected.
We change the object in Smart Console from R80.20SP to R81.10. Did the mgmt_cli command. Ran the sp_upgrade script and this is when we started having issues. The script was unable to fetch policy from the management. This command provides no information on what it's doing, where and why it might be failing. This obviously makes it very frustrating and difficult to troubleshoot.
After several attempts at getting the policy installed on the upgraded gateways we had to roll-back using snapshots.
Afterwards we noticed the existence of sk174844. It almost seems like this SK is mandatory? It claims to be relevant for R80.20SP, R80.30SP and R81SP. In other words, it's mandatory regardless of the enviroment? We find it strange that there are no references to this SK in the R81.10 Maestro admin guide or the original SK regarding upgrades (sk173363). The timestamp for sk174844 is 2021-08-03 meaning it was created weeks before the hotfix for R80.20SP, R80.30SP and R81SP was created that was required for doing a upgrade in the first place.
This makes it quite strange to not have any references to sk174844 in the admin guide or in sk173363. Seems like our upgrade was doomed to fail for the get-go as we had no inforrmation of sk174844 so the fetch was never going to work?
Certifications: CCSA, CCSE, CCSM, CCSM ELITE, CCTA, CCTE, CCVS, CCME