Solved: Re: Clearing disk space on R80.40 Management Serve...

johnnyringo · ‎2022-12-26

So over holiday break I'm tasked with prepping the upgrade of our Management Server from R80.40 to R81.10. This is a cloud-based deployment in GCP which does not support upgrade in place. I have to create a new R81.10 VM, migrate the database over, reset SIC on all gateways, and transfer the license key. Doesn't sound too bad.

The problem is a snapshot on the old server requires about 18 GB of disk space free, and we currently only have 6. I've been looking for some type of knowledge base article on this but surprisingly have no found anything. I did find some files that were safe to delete:

rm -f /opt/CPInstLog/DeploymentAgent.log.*

This cleared about 2 GB but I still got a ways to go. Probably going to have to open a support case but are there any quick fixes I may have missed?

PS - In GCP, the deployment template uses a default size of 100 GB for management server. I'd really recommend making this at least 120 GB since our usage is typical of a medium size deployment. Going forward, we'll be deploying Management server via terraform and specifying a 200 GB disk to ensure we don't have this problem in the future.

RamGuy239 · ‎2023-01-02

My experience using $MDS_FWDIR/scripts/migrate_server export with -l or -x is very poor. I have had some scenarios where customers want to export with logs, which has never worked for me. Even when reducing their log size to the past five days, the amount of storage is too vast for the script to handle. The script has a built-in timeout of two hours (I think it used to be shorter?), and if you have any half-decent amount of logs, you end up at the timeout before it's able to complete. I posted about how the gzip process is single-threaded and will knock its head into a brick wall when trying to compress if you are lucky enough to get to the stage where it tries to compress all the logs.

https://community.checkpoint.com/t5/Management/MDS-FWDIR-scripts-migrate-server-export-with-logs-and...

I'm unsure if the script has been updated since I tried this, but I doubt it has. You also have to remember that the logging format changed from R80.xx to R81.xx. Unless something changed recently, you could not apply -x when doing export on R80.xx management when your output is R81.xx.

The easiest way to move logs, in my experience, is to create your new management on a new host and transfer your Gaia configuration and the database ($MDS_FWDIR/scripts/migrate_server export -v R81.xx). Replicate your Gaia configuration on the new host and import the database. And when your new host is successfully running, change the IP on the old management host and transfer the logs you want to move over directly using SCP. You can dump *.log files from $FWDIR/log on the old management over to $FWDIR/log on the new management, do a reboot or a cpstop && cpstart after the transfer is complete, and the log indexing should start indexing based on the logs files you transferred. And for older logs that are not getting indexed you can open manually in Smart Console by pointing to the specific *.log file manually.

Certifications: CCSA, CCSE, CCSM, CCSM ELITE, CCTA, CCTE, CCVS, CCME

View solution in original post

the_rock · ‎2022-12-26

I personally would do something like this from expert mode -> find /var/log -size +300000000c (this will search for files larger than 300 MBs in /var/log, but can be applies for files any size and any dir).

By the way, this is what TAC sent us when we had concerns from customer about scalling up the resources on Azure. I assume the same would apply to other cloud platforms, but not 100% positive.

>>In order to increase all the parameters- disk space, memory and CPU, I would recommend the procedure described in the following sk:-
https://supportcenter.checkpoint.com/supportcenter/portal?eventSubmit_doGoviewsolutiondetails=&solut...

>>The above Sk shows, you can pick the "size" you want and preset the CPU,HDD and memory according to the image selected.
To see what sizes are supported, you may have a look at the following SK:-
https://supportcenter.checkpoint.com/supportcenter/portal?eventSubmit_doGoviewsolutiondetails=&solut...

>>In order to just increase the disk space, you may follow the sk linked below,

https://supportcenter.checkpoint.com/supportcenter/portal?eventSubmit_doGoviewsolutiondetails=&solut...

Note:- We support in-place upgrades for azure gateways now, but not for mgmt server. So, as an alternative to the above procedure for the mgmt server specifically, you may consider, re-deploying a bigger R81.10 VM with required HDD,CPU and memory.

johnnyringo · ‎2022-12-27

Ok, the steps for the third link show why my options are limited. The root partition is only 21.5 GB but I need 18GB free for the snapshot, so increasing disk is the only way to do this.

# parted -l

Model: Google PersistentDisk (scsi)
Disk /dev/sda: 107GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags: 

Number  Start   End     Size    File system     Name  Flags
 1      17.4kB  315MB   315MB   ext3                  boot
 2      315MB   8902MB  8587MB  linux-swap(v1)
 3      8902MB  107GB   98.5GB                        lvm


Model: Linux device-mapper (linear) (dm)
Disk /dev/mapper/vg_splat-lv_log: 46.2GB
Sector size (logical/physical): 512B/4096B
Partition Table: loop
Disk Flags: 

Number  Start  End     Size    File system  Flags
 1      0.00B  46.2GB  46.2GB  xfs


Model: Linux device-mapper (linear) (dm)
Disk /dev/mapper/vg_splat-lv_current: 21.5GB
Sector size (logical/physical): 512B/4096B
Partition Table: loop
Disk Flags: 

Number  Start  End     Size    File system  Flags
 1      0.00B  21.5GB  21.5GB  xfs

johnnyringo · ‎2022-12-28

Started testing the disk size increase today in lab and created a blog post to walk through the steps:

https://layer77.net/2022/12/28/re-sizing-the-disk-of-a-checkpoint-r80-40-management-server-in-gcp/

It's worth noting that for Snapshots, GAIA actually uses unmounted free disk space. So simply deleting files from / or /var/log is pointless.

the_rock · ‎2022-12-28

All good points @johnnyringo

_Val_ · ‎2022-12-29

It is not "unmounted, it is a hidden partition, which yes, is used for snapshots.

If you are interested in creating a local snapshot, you are right, cleaning /var/log/ will not help. However, if you want to export your snapshot at some point and save it to an external server, then yes, you need enough space in /var/log/ to get it out.

johnnyringo · ‎2022-12-29

Hmmm...yeah, we're still way down the rabbit hole on this one. I should have known to expect nothing less from Checkpoint.

Anyway, would it be correct to state the upgrade w/ VM migration requirements as the following?

The upgrade process requires creating an exported R81.10 snapshot on the old R80.40 server
This image must be created using $FWDIR/scripts/migrate_server export -v R81.10
The image must be created and stored temporarily on the old server on a mounted volume. This doesn't necessarily have to be /var/log, but people are using /var/log because it's an already mounted partition that has the best chance of having enough free space.
If it's not possible to clear space on /var/log for the snapshot, a new partition could be created and mount in /snapshots or whatever (I can't find specific instructions on this, but would assume it's possible)
On the new R81.10 server, create /var/log/mdss.json with the old hostname and new IP address of the mgmt server
After creating the R81.10 snapshot, copy it to the new R81.10 server via SCP or FTP. Then import it using $FWDIR/scripts/migrate_server import
Reset SIC on the gateways to sync them up with the new R81.10 management server
If using BYOL, modify the existing license to use the new IP address and apply.

PhoneBoy · ‎2022-12-29

Just to be clear, a snapshot (which is a copy of the root filesystem) and migrate_server export output are very different things.
However, the basic procedure you outline above is more or less correct.
You can mount a separate filesystem to put the migrate_server export output onto if /var/log doesn't have enough space.
This can be done with the mount command in expert mode (similar to a regular Linux server).

Note that you shouldn't have to reset SIC on upgrade, a policy install from the new MDS should be sufficient.
Depending on whether or not you're changing the IP address in the process, you may need to add some manual rules to allow the policy push to succeed from the new MDS.

johnnyringo · ‎2022-12-29

Right, but do you see the problem here? ./migrate_server doesn't provide an estimate on how large the output file will be, so I have no way of knowing how much space I need to clear.

I'd assume the output of ./migrate server export is about the same as a snapshot?

RamGuy239 · ‎2022-12-29

$MDS_FWDIR/scripts/migrate_server export, as long as you don't add -l or -x to have logs or indexing data included, will normally be 250Mb - 1,5Gb in size from my experience. I've done a ton of advanced upgrades, and I have yet to see a compressed export of the database larger than 8Gb in size. And that 8Gb export was using the old $FWDIR/bin/upgrade_tools/migrate export, not $MDS_FWDIR/scripts/migrate_server export, which tends to be more efficient.

It all depends on your database, of course. The number of policy packages, IPS profiles, network objects etc, will affect the size, but you will most likely be somewhere between 250Mb to 2Gb. But during the export, it has to save all the files temporarily before compressing, so you need to have more than 2Gb free for the export to be successful.

Shouldn't be any need to involve lv_current/root. You can specify the location, and the recommendation is to use a path on lv_log as lv_log tends to have more space compared to lv_current, especially on management installations:

$MDS_FWDIR/scripts/migrate_server export -v R81.10 /var/log/cpmgmt-r8110-database.tgz.

A simple "df -hT" will show you how much space you have left on the partitions. If you don't have enough space on lv_log aka /var/log you can most likely remove some log files from $FWDIR/log/ (/var/log/CPsuite-R80.40/fw1/log). There should be tons of *.log, *.logaccount_ptr and *.loginitial_ptr you can safely remove to free up space.

Certifications: CCSA, CCSE, CCSM, CCSM ELITE, CCTA, CCTE, CCVS, CCME

the_rock · ‎2022-12-29

Good point, but generally, migrate_server should not be anywhere close in size to the snapshot.

PhoneBoy · ‎2022-12-30

A volume snapshot is the size of the root partition (actual space used).
migrate_server output is significantly smaller.
However, because of how the file is assembled, additional space will be required above and beyond the final compressed export file.
Assuming you're not including logs as part of the export, a very conservative estimate for space required is...half the size of your root filesystem.

johnnyringo · ‎2023-01-02

Assuming you're not including logs as part of the export, a very conservative estimate for space required is...

We do need to migrate the logs for compliance reasons. This is where I'm just trying to get a ballpark number. I don't see any option to do a filter (i.e. only include logs from last 60 days)

PhoneBoy · ‎2023-01-02

migrate_server can be used to migrate logs.
However, they can (and, in the vast majority of cases, should) be migrated separately.
You can simply copy the files from $FWDIR/log over to the target system with sftp/scp or similar.
Once you’ve done so, you will need to reindex the logs using: https://supportcenter.checkpoint.com/supportcenter/portal?eventSubmit_doGoviewsolutiondetails=&solut...

RamGuy239 · ‎2022-12-29

I haven't done any management upgrades running on public cloud, but I can't image the procedure being much different compared to appliances, open servers and private cloud. Why do you need to reset SIC?

When doing a $MDS_FWDIR/scripts/migrate_server export, you also export licenses and certificates. Normally there is no need to reset SIC after doing an "advanced upgrade", aka moving a management server from one installation to another on a higher version.

Certifications: CCSA, CCSE, CCSM, CCSM ELITE, CCTA, CCTE, CCVS, CCME

johnnyringo · ‎2023-01-02

I haven't done any management upgrades running on public cloud

Lol, that's too bad. It's lots of fun.

_Val_ · ‎2022-12-30

Several issues with the process above:

1. Why do you say you must do a snapshot of R80.40 server? If it is a management server, migrate export is more than enough to back it up. It is much more economical than creating a snapshot.

2. If you really HAVE to do a snapshot, why not rely on your virtualization platform snapshot for that?

3. On the new server, you can just set up the same IP and hostname, not sure why you need mdss.json

4. you do not use a snapshot to migrate server, it is done with your mgmt export file, see p.1

5. No need to reset sic, even if you change the management server IP. CA and SIC will be intact after migration

johnnyringo · ‎2023-01-02

1. Why do you say you must do a snapshot of R80.40 server? If it is a management server, migrate export is more than enough to back it up. It is much more economical than creating a snapshot.

Again, the fundamental problem is migrate_server export doesn't provide any estimate on what the output file size will be. Note that I do have to migrate logs for compliance reasons. I am not sure if I'm looking at 2 GB or 20 GB.

Sure, I can just wing it and hope for the best. But as I think anyone who's worked in IT can tell you, when servers run out of disk space, it can result in unpredictable behavior.

2. If you really HAVE to do a snapshot, why not rely on your virtualization platform snapshot for that?

Already am. It's our standard policy to do that for any 3rd party VM in case the vendor's backup/restore tool doesn't work as expected.

3. On the new server, you can just set up the same IP and hostname, not sure why you need mdss.json

And how, pray tell, are two servers going to exist at the same time with the same IP address?

5. No need to reset sic, even if you change the management server IP. CA and SIC will be intact after migration

I definitely remember having to reset SIC when migrating from R80.30 to .40. Perhaps the process has changed. I don't remember mdss.json being mentioned and would suspect it's intended specifically to eliminated that requirement.

RamGuy239 · ‎2023-01-02

My experience using $MDS_FWDIR/scripts/migrate_server export with -l or -x is very poor. I have had some scenarios where customers want to export with logs, which has never worked for me. Even when reducing their log size to the past five days, the amount of storage is too vast for the script to handle. The script has a built-in timeout of two hours (I think it used to be shorter?), and if you have any half-decent amount of logs, you end up at the timeout before it's able to complete. I posted about how the gzip process is single-threaded and will knock its head into a brick wall when trying to compress if you are lucky enough to get to the stage where it tries to compress all the logs.

https://community.checkpoint.com/t5/Management/MDS-FWDIR-scripts-migrate-server-export-with-logs-and...

I'm unsure if the script has been updated since I tried this, but I doubt it has. You also have to remember that the logging format changed from R80.xx to R81.xx. Unless something changed recently, you could not apply -x when doing export on R80.xx management when your output is R81.xx.

The easiest way to move logs, in my experience, is to create your new management on a new host and transfer your Gaia configuration and the database ($MDS_FWDIR/scripts/migrate_server export -v R81.xx). Replicate your Gaia configuration on the new host and import the database. And when your new host is successfully running, change the IP on the old management host and transfer the logs you want to move over directly using SCP. You can dump *.log files from $FWDIR/log on the old management over to $FWDIR/log on the new management, do a reboot or a cpstop && cpstart after the transfer is complete, and the log indexing should start indexing based on the logs files you transferred. And for older logs that are not getting indexed you can open manually in Smart Console by pointing to the specific *.log file manually.

Certifications: CCSA, CCSE, CCSM, CCSM ELITE, CCTA, CCTE, CCVS, CCME

johnnyringo · ‎2023-01-02

You can dump *.log files from $FWDIR/log on the old management over to $FWDIR/log on the new management, do a reboot or a cpstop && cpstart after the transfer is complete, and the log indexing should start indexing based on the logs files you transferred.

Ahh thank you. This is exactly where I was going to go with the next question.

In prep for the upgrade a few weeks ago, I did adjust logging on heavily-hit rules so we're only logging abnormal traffic, or traffic that is required for troubleshooting. This has resulting in each daily log being in the 50-100 MB range, so it's no problem to copy these manually outside of the normal migration process.

G_W_Albrecht · ‎2022-12-31

Very clear, especally as a snapshot is a binary image of the entire root (lv_current) disk partition. See sk108902: Best Practices - Backup on Gaia OS

CCSP - CCSE / CCTE / CTPS / CCME / CCSM Elite / SMB Specialist

johnnyringo · ‎2023-01-02

Very clear

What is?

Blason_R · ‎2022-12-27

I guess below will definitely help you.

# #cd /var/log

# du -sh * | grep -E 'M|G|T'

If you could give us the output of above command it will help us to identify which files probably can be deleted.

Thanks and Regards,
Blason R
CCSA,CCSE,CCCS

johnnyringo · ‎2022-12-27

Sorry, should have mentioned /var/log is in a separate partition

Filesystem                      1K-blocks     Used Available Use% Mounted on
/dev/mapper/vg_splat-lv_current  20961280 16505692   4455588  79% /
/dev/sda1                          297485    27216    254909  10% /boot
tmpfs                             7572656     3856   7568800   1% /dev/shm
/dev/mapper/vg_splat-lv_log      45066752 27559764  17506988  62% /var/log

G_W_Albrecht · ‎2022-12-31

Find more hints and commands here:

https://community.checkpoint.com/t5/Management/There-is-insufficient-disk-space-to-complete-the-oper...

CCSP - CCSE / CCTE / CTPS / CCME / CCSM Elite / SMB Specialist

Are you a member of CheckMates?

Clearing disk space on R80.40 Management Server