Solved: Re: Gaia partition misalignment - Page 2

nmelay · ‎2022-10-28

During Gaia installation (appliance/open server), partitions are NOT aligned on a 1MB boundary, but are instead cylinder-aligned, in a MS-DOS compatible way.

This alignment turns to a real performance problem with today's RAID, SSDs and AF HDDs.
Filesystem blocks being misaligned with storage blocks leads to read-before-write operations, which can incur a severe performance hit.
My own measurements showed storage performance being more than halved on some specific workload.
(The worst case scenario probably is heavy SmartEvent activity.)

This issue was fixed in WS 2008 and RHEL 6, when the performance hit first became obvious.
Gaia should have inherited the fix from RHEL, but this did not happen due to the use of a custom installer.
The packaged fdisk utility was fixed, the installer was not.

Fixing an installed Check Point system is almost impossible and requires LVM wizardry.

Please fix the installer and make sure partitions are 1MB-aligned at installation time.

PhoneBoy · ‎2023-03-23

When a platform is installed from ISO, the hard drive is reformatted and repartitioned.
When a platform is "fresh installed" from CPUSE, the existing hard drive partitioning is reused.
Which means the partitioning you had from R81.20 is preserved when you "fresh installed" R81.10 on top of it.

From a "support" perspective, this should not create any issues.
If anything, it might improve performance on R81.10 as well (especially in virtualized environments).

Bob_Zimmerman · ‎2023-03-24

Explain what about the above? It looks normal to me.

R81.20 uses GNU GRUB 2.02 beta2 (check via 'grub2-install --version'). On GPT disks GRUB2 puts its second stage in a special partition with the GPT GUID "21686148-6449-6E6F-744E-656564454649" (which is actually a bit of an easter egg, as explained on the Wikipedia page).

R81.10 uses GNU GRUB 0.97. It doesn't use the separate BIOS boot partition. The downgrade deletes it. Upgrading to R81.20 again creates a new one from the end of the 300 MB "Microsoft basic" partition. It's a little disappointing that the upgrade doesn't put it in the same spot as a clean installation, but it doesn't ultimately matter.

genisis__ · ‎2023-03-25

Great thanks.

flachance · ‎2023-04-13

When you tried with R80.40 did you have to do anything special? I built a VM with the R81.20 ISO. I updated CPUSE, imported the R80.40 CPUSE package, and ran 'installer clean-install <R80.40 package>'. It's been running for days (stuck at 83%). Is there a way to kill that install?

Bob_Zimmerman · ‎2023-04-13

What VM platform are you using?

What boot ROM does the VM have? BIOS or UEFI? For example, Hyper-V generation 2 machines use UEFI.

I tried it in VirtualBox and later in Hyper-V on a generation 1 (BIOS) machine. A VM with UEFI boot ROM definitely wouldn't be able to boot after the downgrade. Maybe it could also fail to downgrade GRUB or something? I'm not sure.

flachance · ‎2023-04-14

Trying on a Proliant DL380 Gen10. UEFI

flachance · ‎2023-04-14

Changing to Legacy BIOS mode and retesting

RamGuy239 · ‎2023-04-14

I tested this in LAB on VMware ESXi. There was no problem downgrading from R81.20 to R81.10, R81 or R80.40 using CPUSE or Blink. This won't reformat, so the alignment should stay the same.

But it's only R81.20 that supports VMware Paravirtual Controller (PVSCSI) and (U)EFI boot. When trying to downgrade an R81.20 host running on UEFI or BIOS with PVSCSI, the downgrade would stop at around 80-90%, and it wouldn't move for days.

In other words, you can use R81.20 to get the 4k alignment on older versions by installing R81.20 and downgrade using CPUSE or Blink to an older version. But you can't use R81.20 to get UEFI boot support on older versions. This also means you can't do Hyper-V Generation 2 unless you stay on R81.20, as Generation 2 requires UEFI.

Certifications: CCSA, CCSE, CCSM, CCSM ELITE, CCTA, CCTE, CCVS, CCME

nmelay · ‎2023-04-14

That's a good point, if you plan to install R81.20 then downgrade to to R80.x or R81.10 on a VMware VM, you should make sure you selected the LSI Logic SAS storage controller at VM creation time, just to be on the safe side.
R81.10 does include a PVSCSI driver, so it *should* still work. Not sure about R80.40.

RamGuy239 · ‎2022-12-07

As someone who is dealing with some rather extensive and complex Check Point management installations with various degrees of performance issues. Do anyone have some layman terms for how big of an improvement this is going to bring? For management installations most of them are running on VMware ESXi and I would normally do an advanced upgrade moving to a new host whenever we move between versions. This would result in this alignment fixing itself automatically. But as this is seemingly affect all installations types, even appliances it would be great to have some kind of idea how much this might impact things.

I suppose it's most relevant for management installations as those are way more disk intensive compared to gateway installations. But gateways are also affected so they get tossed into the mix as well for both open servers, virtual installations and appliances.

As there is no way to get the disk reformatted without having to opt for a proper clean-install from USB/ISO this makes things somewhat messy. Especially for customers running on open server or appliance without LOM. We have already tried to push for clean-installations recently because it was the only way to get the new XFS file system applied. Having to push for another round of clean-installations is not ideal but if the benefits are of a great enough value it has to be considered.

How does this affect the images available within sk158292? I suppose those images are based off of fresh installations, making the R81.20 images having the alignment corrected?

Certifications: CCSA, CCSE, CCSM, CCSM ELITE, CCTA, CCTE, CCVS, CCME

PhoneBoy · ‎2022-12-07

For XFS, we have this work that @Kaspars_Zibarts did: https://community.checkpoint.com/t5/Security-Gateways/Real-life-comparison-of-XFS-and-EXT3-file-syst...

I imagine it helps with disk writes of larger files, as shown by @nmelay‘s comment above (roughly 50% improvement in install time).

RamGuy239 · ‎2022-12-07

Thanks for the reply. From my understanding, XFS comes with the added benefit of being more resilient to file system corruption as well. There is lots of talk about big writes, the question is more related to using the Postgress database. With larger Check Point managements controlling 20+ gateway objects and having 10+ admins with write access hammering the management every day, there are a lot of slowdowns when opening and closing objects, searching and especially when publishing changes.

Tossing additional CPU cores and RAM onto the host has been the "easy solution" thus far, but you reach a point where you are just adding icing on top of the cake with barely any added value. R81.10 improved the performance somewhat, but there are still complaints. In one of the more demanding environments I'm working on, we have done pretty much anything we can do at this point, like making sure the disk is using thick provisioning instead of thin provisioning, making sure all the CPU cores are linked to a specific socket and are not using virtual cores / SMT cores, tried to tweak the CPM heap size manually. Currently, we are sitting at 12-cores (Intel(R) Xeon(R) Gold 5218R CPU @ 2.10GHz), 128GB RAM on its running local SSD storage with VMFS6 file system on the VMware ESXi server.

Very tempted to try moving this to R81.20 using the VMware Paravirtual controller, as the PVSCSI is a part of the slightly bumped kernel on R81.20. But I'm not that into VMware, disk alignment and disk utilisation to grasp what to expect as performance gains by doing so. Having JHF install faster is not something anyone cares about unless it's taking an obscene amount of time. Sames goes for snapshots etc. It's the friction caused by Smart Console slow-downs daily that is causing frustration.

We split the management into three different installations with the move to R81.10. They are now running a dedicated management installation, a dedicated log server and a dedicated smart event server. All three are living on the very same VMware ESXi server, though, so they are still competing for the same underlying hardware. But it's a new ESXi server dedicated to running only these three hosts, and the server itself, with Intel(R) Xeon(R) Gold 5218R CPU @ 2.10GHz, 384 GB RAM and 26TB SSD, should be capable of handling this one would think.

Certifications: CCSA, CCSE, CCSM, CCSM ELITE, CCTA, CCTE, CCVS, CCME

Bob_Zimmerman · ‎2022-12-07

Redhat only supports ext3's "ordered" journaling mode, which only journals filesystem metadata. Yes, XFS is more resilient.

The 5218R isn't a great processor for running a management server. It has 20 cores, and only 125W TDP, so each core is slow (~6.25W TDP budget per core). Of that family of Xeons, a 6226R (16c, 150W ~9.38W per core) would provide much better performance for about the same cost. Managements need a lot of single-threaded performance. They don't use more than about four cores. Log servers generally only need 2-3 cores, but high-IOPS disks. SmartEvent correlation units can do better with more cores, but they mostly needs RAM to keep event candidates around.

Running management, log, and SmartEvent all on the same physical host does cause them to compete as if all running on the same OS, yes. Sharing processors via a hypervisor has a performance hit due to the hypervisor running three supervisors instead of one supervisor running directly. Each supervisor's scheduler doesn't know what the other schedulers are trying to do, and the hypervisor's scheduler doesn't know what any of the guest schedulers are trying to do.

Hypervisors are enormously worse at sharing I/O than they are at sharing processors, so your I/O performance under virtualization is generally 10% of the performance you can get running on the hardware directly. Paravirtualization helps some, but this is part of why virtualization-focused IOMMU (VT-d and AMD-Vi) came about. It lets you hand an entire PCIe card (e.g, a SATA controller) to a guest to remove the hypervisor performance hit. I have seen high-quality SSDs perform worse than spinning disks due to the virtualization performance hit to I/O operations.

JHF installation time and snapshot time are good proxies for storage subsystem IOPS performance because they involve a huge number of small file writes.

RamGuy239 · ‎2022-12-07

@Bob_Zimmerman Thank you for the detailed information. I might try to communicate with the customer regarding the storage. I'm unaware if it's NVMe storage or SATA/SCSI-based. I would think it's NVMe considering the rest of the hardware on the server, but you never know. If it's indeed NVMe, it becomes a question of how the storage is being handled on the server, it should be possible to pass through individual NVMe drivers directly to each installation. But then we have a question of redundancy, as I suppose you are removing the VMware software out of the equation, so the drives can't have redundancy on the software level on VMware, making us have to utilise Gaia for creating some kind of software RAID going? Not entirely sure what capabilities we have in Gaia for such things. I'm not even sure how Gaia fares in terms of supporting NVMe storage at all. Do you have to have controllers from the Gaia HCL list on your VMware ESXi server to make it supported and possible to begin with?

Certifications: CCSA, CCSE, CCSM, CCSM ELITE, CCTA, CCTE, CCVS, CCME

Are you a member of CheckMates?

Gaia partition misalignment