Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Marcel_Wildenbe
Contributor

upgrade to R80.20 failed

Hi CheckMates,

Last night, I have tried to upgrade our MDS from R80.10 to R80.20.

I have ran into a few issues, but the most aggravating was when the installer got stuck and I had to reboot in order to get any further, the snapshot that was made by the installer was not removed and a new attempt is telling me there is no free space enough.

CP support tells me to run MDS export, do a fresh install en import, but I would like to avoid the hassle and just remove the LV.

Can I remove this Logical Volume and if so, how do I do that?

It is GAIA running on VMware 5.5. So it is using LVM for Snapshots. "show snapshots" is showing no snapshots, but lvm_manager shows me lv_fcd_new of 300 GB, non configurable, containing: Factory defaults volume, which was not present prior to the upgrade.

31 Replies
JozkoMrkvicka
Authority
Authority

Try to check Snapshot Management from WebUI.

Kind regards,
Jozko Mrkvicka
0 Kudos
Marcel_Wildenbe
Contributor

I have did that and it is the equivalent of "show snapshots" in CLISH.

No snapshots available and no free space available

0 Kudos
JozkoMrkvicka
Authority
Authority

What about "delete snapshot <TAB>" from clish ? Nothing ?

Do you have some options in case "set snapshot <TAB>" ?

Some articles regarding Snapshot management:

Snapshot location on Gaia and SecurePlatform 

Managing partition sizes via LVM manager on Gaia OS 

How to add hardware resources, such as log storage, to a VMware Virtual Machine running Gaia OS 

https://community.checkpoint.com/thread/6448-lvmmanager-successor-on-r8010 

Kind regards,
Jozko Mrkvicka
0 Kudos
Marcel_Wildenbe
Contributor

Hi Jozko,

No options available with "delete snapshot <TAB>" 

Options "set snapshot <TAB>" import, export, revert. No options after export or revert, so no valid snapshot present.

In expert:

# lvs --all
  LV         VG       Attr   LSize   Origin Snap%  Move Log Copy%
  lv_current vg_splat -wi-ao 300.00G
  lv_fcd_new vg_splat -wi-a- 300.00G
  lv_log     vg_splat -wi-ao   3.44T

it is the second LV, attributed "-wi-a-", so without the "o", which stands for "Open"

0 Kudos
PhoneBoy
Admin
Admin

While I've heard of customers doing an in-place CPUSE upgrade on their MDS, there can be issues in doing so, particularly if the MDSes also contain a fair amount of logs relative to the available free disk space.

I'm guessing that's what happened here. 

I would expect that TAC would be able to help you manually remove the logical volume that got created here.

Note: one reason to do a migrate export/import in this case: by doing so, you can do a fresh install of R80.20 and leverage the new xfs filesystem, which offers better performance.

Kaspars_Zibarts
Employee Employee
Employee

That was very important point Dameon! Having xfs by doing export import! I did lab testing using CPUSE and realized only afterwards that I'm missing out on the new file system!

0 Kudos
Marcel_Wildenbe
Contributor

Thanks Dameon, for your reply. Valid point you mention, regarding the xfs filesystem.

The available disk space needed for a snapshot is not related to the amount of disk space occupied by the logs, imho. The logging is written to the lv_log Logical Volume, where the snapshot is creating its own Logical Volume. They live side-by-side on the physical disk, regardless of their content. Please correct me if I am wrong.

Anyway, I will get in touch with Boaz. Hopefully he can help me out.

Fun fact: I did run a "dry run" of the upgrade on a restored snapshot of the original MDS on a different VM and it worked like a charm... That made me confident to do the actual upgrade in production.

0 Kudos
Boaz_Orshav
Employee
Employee

Hi Marcel

  Sorry to hear you had problem during installation.

  We have worked hard on reducing upgrade failures and I can say that on latest versions (80.10+80.20) we have over 99% success rate.

  Notice that rebooting the machine while installer is "stuck" is very problematic, since you might get in a situation that the new partition was created but not defined as snapshot yet.

  I guess this is what happened on your machine and that's why you could not run snapshot management commands to remove it.

  In order to analyze and make sure it won't happen to others (and for sure not happen in next version) I would need some more information so I will contact you offline to further investigate.

 The target is 100% success so if you encounter any problem please contact and we'll solve the issue.

0 Kudos
Marcel_Wildenbe
Contributor

Hi Boaz,

Thanks for reaching out. I will reply to your mail a.s.a.p. I am aware of the fact that the reboot probably got me into this predicament, hopefully we can work this out.

0 Kudos
JozkoMrkvicka
Authority
Authority

After how many minutes (hours) you did reboot ? What was the overal progress while it stucked ?

Kind regards,
Jozko Mrkvicka
0 Kudos
Kaspars_Zibarts
Employee Employee
Employee

Mine didn't get stuck but it took good 4hrs in the lab (that's rather underpowered, 8 cores 24GB ram and RAID 5..)

0 Kudos
Marcel_Wildenbe
Contributor

I did a dry run of the upgrade on a snapshot of the production database and was much quicker than the real thing.

That is why I have stopped it (after 36 minutes).

0 Kudos
Marcel_Wildenbe
Contributor

I would like to go back to the initial question: "Can I remove this Logical Volume and if so, how do I do that?"

0 Kudos
Marcel_Wildenbe
Contributor

Boaz (from CheckPoint) has successfully supported me on removing the Logical Volume.

I am going to re-schedule a service window to upgrade.

Marcel_Wildenbe
Contributor

  Check Point User Center - Customer Portal for Licensing, Support and Account Management    

Last night, we have performed a new attempt to upgrade the MDS, under supervision of Boaz Orshav (R^D) and colleagues.
The upgrade failed: after the upgrade was completed, one of the five Management Domains did not come up. MDSSTAT was showing all the Management Servers as up.
But the not in the SmartConsole (MDS) and the Management Server did not receive any logging.
So, we have reverted to the AutoSnapshot, created during the start of the procedure.
0 Kudos
Kaspars_Zibarts
Employee Employee
Employee

what did you end up doing? CPUSE or export and import on fresh filesystem?

0 Kudos
Marcel_Wildenbe
Contributor

CPUSE

Amir_Rehman
Contributor

My CPUSE upgrade fails.

I am attempting to upgrade SM R80.10 to R80.20. However I get an error during DB import process.

DB exports runs successfully 100% and the sever gets rebooted and then it starts import process. 

It reverts back to R80.10 automatically when DB import fails at 50%.

Any idea please ?

Import_process.PNG

 

automatic_revert.PNG

0 Kudos
Maik
Advisor

@Amir_Rehman, did you run the pre upgrade verifier before you started the actual upgrade? Especially errors during the database import could be related to issues that should get detected by the verifier on the first hand. 

0 Kudos
Amir_Rehman
Contributor

No I did not ,

I think it is only required when you upgrade from R77.30 (or lower) version to R80.20 as per CP documentation.

0 Kudos
Tal_Paz-Fridman
Employee
Employee

Hi Amir,

 

In order to investigate the issue it would be helpful if you could zip and send me all the files in /opt/CPInstLog/ 

 

Thanks

Tal

tfridman@checkpoint.com

 

 

0 Kudos
Amir_Rehman
Contributor

 

 

Folder is more than 2GB . Is there any specific file I can share with you ?Log_file_1.PNG

0 Kudos
lucafabbri365
Collaborator

Hello @Amir_Rehman,
I have a similar issue: the import database process stuck at 48%. It's there for more that 45 minutes; no reverting back operation started yet. Did you manage to solve the issue ?

Thank you,
Luca

0 Kudos
Tal_Paz-Fridman
Employee
Employee

Hi Luca,

 

Our investigation has reveled that in Amir's case it is related to a problem in the Endpoint Security Management Server.

Did you have it enabled before the upgrade?

If not please send me the contents your /opt/CPInstLog/ folder. 

Thanks

Tal

tfridman@checkpoint.com

0 Kudos
Amir_Rehman
Contributor

Hi,

Sorry I did not update the forum earlier.

My  problem was Endpoint Policy Manager. I disabled EPM Blade and it worked fine.

 

 

0 Kudos
lucafabbri365
Collaborator

Hello,

I don't have Endpoint Policy Manager; however after sometime the Import Database process completed successfully: I had to be patient.

Thank you,
Luca

0 Kudos
venkata_marutur
Contributor

Hello Luca

Just curious, how long did it take in your case "stuck at 48% while importing". I am in the same boat at the moment.
Thanks.
0 Kudos
lucafabbri365
Collaborator

Hello Nickel,

it's was about 10 minutes....

 

Bye,
Luca

0 Kudos
Jeissonsr
Explorer

HI, 

I have the same issue, how did you solve it?

Thanks

0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events