Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted

CPUSE will fail to install new Jumbo on restored gateway

Just run into a interesting scenario with CPUSE failing to install take 203 on very last gateway (nearly 40 updated without any issues). Won't be creating TAC case out of pure laziness and too much to do as is

DA agent version is 1677, so all good there and gateway had take 154 installed before attempt to upgrade to 203. 

What turned out was that this particular box was recently fully re-built from factory image due to SSD failure (second SSD dying on 5900 appliances! not good trend there). So we went R77.30 > R80.10 > take 154 > backup restore. All went great and box was running like a charm.

But now when I attempted to install take 203 it failed at very early stage with following error:

cpuse error.jpg

Digging into more detailed logs I found that CPUSE was looking for an older file that was not there (/opt/CPInstLog/install_cpfc_wrapper_HOTFIX_R80_10_JUMBO_HF.log)

detailed error.jpg

So I compared the deployment agent backup directory contents on both cluster members. /opt/CPda/backup/

This was restored node

cpda fw1.jpg

and this was the secondary that was in it's "original" state

cpda fw2.jpg

Ok - bunch of archives missing..

Then it clicked - when we restored the box from backup, we did not install all jumbo HFs that were installed over time originally but went straight to the latest take 154 that was running on the node when backup was taken.

So quick action was simply to copy all missing archives from "original" node /opt/CPda/backup to restored one and then take 203 installation succeeded.

It might be a known issue, but there's a definitely room for improvement for CPUSE in case you use backup for restore instead of snapshot

14 Replies
Highlighted
Admin
Admin

Just discussed this during User Group meeting in Tallinn at the beginning of this week, and before that, a couple of month ago, in Zurich.

CPUSE repository is in /var/log partition. This part of your filesystem is not backed up neither by snapshot nor backup tool. When restoring from one of those, it is likely you lose at least some of downloaded packages. 

It looks to me that in your case CPUSE fails after backup restored cause some of those packages are missing. 

Highlighted

Valeri - backup restore succeeds but the next time we try to install JHF on such box, it will fail. You must have misunderstood the problem
0 Kudos
Highlighted
Admin
Admin

No, I did not, but I fixed my previous comment to be more correct. It will fail because of some missing package info. Explained why

0 Kudos
Highlighted
Admin
Admin

LOL, @Kaspars_Zibarts, you basically explained it yourself, I should read till the end before answering.

Anyhow, this is not a bug, this is expected behaviour, but not many people know that. 

0 Kudos
Highlighted

I would disagree on that - so you are saying that if you do disaster recovery using OS re-install and then backup restore, you won't be able to install new JHFs?
0 Kudos
Highlighted
Sapphire

This is the CPUSE mechanism that is depending on backups in a directory/partition that is not included in snapshots and backup. The new HF over HF method may help a lot here as uninstall is not needed anymore for higher version installs.

0 Kudos
Highlighted
Sapphire

You are forced to do a OS reinstall, HF / JT reinstall and restore...

0 Kudos
Highlighted

Yeah, but you don't want to re-install every single JHF that was existing on the gateway before it crashed as it can be rather many over 1-2 years. So you want to install the latest that was there when backup was taken. And then CPUSE should take care of any "missing" old JFH info that's irrelevant anyways. I understand that it can be tricky with custom HF but not generic Jumbos - they should be handled by CPUSE without jumping through loops and hoops.
I checked sk91400 and sk108902 but cannot find any reference that says that when you start restore from fresh OS install, you must re-install all old JHFs in exact order as they were done prior on that GW
Highlighted
Admin
Admin

@Kaspars_Zibarts I hate to say that, but I think you are wrong here.

You have mentionedhttps://supportcenter.checkpoint.com/supportcenter/portal?eventSubmit_doGoviewsolutiondetails=&solut...

It says in the limitation section: "Restore is only allowed using the same Gaia version on the source and target computers"

That means, before pulling backup, you need to restore the exact combination of binaries, e.i vanilla plus HFAs, before pulling backups. If you do not do that, unexpected results are bound to happen.

Another case, https://supportcenter.checkpoint.com/supportcenter/portal?eventSubmit_doGoviewsolutiondetails=&solut... describes the optimal combination of snapshots and backups, that would allow you to avoid most of the hustle when restoring GWs. 

 

 

 

0 Kudos
Highlighted
Admin
Admin

I think there's a difference here between expected behavior (which you document) and optimal behavior 🙂
0 Kudos
Highlighted

As said earlier, I just think there's room for improvement. 🙂 In our case we had only local snapshots that died when SSD failed so the only option was fresh install along with JHF reinstallation and backup restore. You always want a system that's as simple as possible to restore in critical conditions. But we can park the case now.
0 Kudos
Highlighted
Admin
Admin

@Kaspars_Zibarts, Agree with you, there is a room to improve built-in disaster recovery tools. 

On the other hand 🙂 Just yesterday during CheckMates Live event in Athens, I was drilling the guys that any critical backup should be taken out, snapshots included. Specifically to cover cases when SSD/HDD fails. It is not straight forward in case of snapshots, I know. 

0 Kudos
Highlighted
Admin
Admin

Seems like something we can improve upon. @Tsahi_Etziony 

0 Kudos
Highlighted

We had the same issue on MDM 80.20 restored from backup , TAC had do modify cpregistry (after a mont) to make us able to install the latest JHF , guess indeed there are room for improvement here

0 Kudos