R81.10 New Recommended Jumbo - Take #130

eranzo · ‎2023-12-25

Hi All,

R81.10 Jumbo HF Take #130 is now our Recommended Jumbo take and is available for download to all via CPUSE (as recommended) and via Jumbo documentation (R81.10)

A full list of resolved issues can be found in the Jumbo documentation (R81.10)

Note:

Central Deployment allows you to perform a batch deployment of Hotfixes on your Security Gateways and clusters from SmartConsole!! For more information, see sk168597.
With Blink images, you can upgrade your environment to the required Major version including its recommended Jumbo hotfix in one Step, using a single image file.

You can install Blink images using CPUSE – More details can be found in sk120193

Thanks,

Release Operations Group

the_rock · ‎2023-12-25

Fantastic news for Christmas!

Best,

Andy

maxtaan · ‎2023-12-25

Great News!

the_rock · ‎2023-12-27

We are definitely asking all our customers who are on R81.10, to update jumbo take to 130

Andy

Henrik_Noerr1 · ‎2024-01-13

Ok, Sometimes I am sure we are the only customer finding all kinds of issues in the JHF.

I mostly see praises here in Check Mates - and I have no idea why we 3 weeks after GA are the first to report this.

After installing on several VSX clusters we see high load after install. We saw cpcgroup and cxld using some load that were not normal.

It seems a new functionality was added - undocumented, that monitors SND load, this itself caused the heightened load,

I guess customer installs have simply not reached t130 yet.

Attached is screenshot after the JHF upgrade coming from 110 to 130. You easily see when the fix is made.

cpu load.png

Platform Open Server, some 32 cores

VSX - 40VS

fix:

fw ctl set int fwha_cpu_utlization_monitor_enable 0

add to fwkern.conf for persistence.

6-0003825377

/Henrik

the_rock · ‎2024-01-13

@Henrik_Noerr1 Just wondering, doesnt that kernel parameter simply "hide" the issue?

Best,

Andy

Henrik_Noerr1 · ‎2024-01-13

Hey Andy,

what do you mean 'hide' the issue? It solved the high cpu load quite clearly.

/Henrik

the_rock · ‎2024-01-13

Seems to me thats what that kernel value does, but could be mistaken.

Mattias_Jansson · ‎2024-01-14

Interesting! I havent noticed this behaviour before you posted this.

We installed take 130 on december 28.

Thanks for the tip!

Hugo_vd_Kooij · ‎2024-01-15

To me the variable "fwha_cpu_utlization_monitor_enable" reads as "do (not) monitor the CPU load to determine if a cluster member is operating properly". So it should impact the pnote information.

I would expect that the if the monitoring creates more CPU load then the issue would result in cluster flapping. Has anyone seen that happen? Or is the impact significant but just to low to cause a failover?

Hugo_vd_Kooij · ‎2024-01-15

We have a major issue as well with one VSX cluster. But the suggested setting does not have any impact as far as we can tell. R&D is working on it.

Auke__ · ‎2024-01-15

Same problem here on VSX cluster. R&D working on it...

the_rock · ‎2024-01-15

@Hugo_vd_Kooij Thats exactly how I understood that value as well.

Best,

Andy

Wolfgang · ‎2024-01-15

same here, decrease of 20% of CPU utilization after disabling this monitoring at 10:40am

This is Maestro VSX 16600 appliance

Anyone has an answer from TAC what exactly "fwha_cpu_utlization_monitor_enable" monitor ?

MatanYanay · ‎2024-01-16

Hi all

Thanks for raising this issue, we are aware of it and plan to fix it in the upcoming jumbo we aim to release by EOM.

Matan.

Michael_Terlien · ‎2024-01-16

I am planning to upgrade a VSX cluster to the latest version. Is R81.20 unaffected by this bug? Or is it recommended to wait with upgrades until this bug is resolved?

David_C1 · ‎2024-01-16

Is this issue only affecting VSX? We have a few clusters with Check Point appliances, I have not observed an increase in CPU after applying Take 130.

Dave

MatanYanay · ‎2024-01-16

Hi,

The issue arises in VSX as each CXLD writes and reads the same file instead of file per VS, leading to high CPU usage.

Thanks

Hugo_vd_Kooij · ‎2024-01-18

So this issue is also releated to the number of virtual instances. You may see only a slight issue with 2 virtual system but a big issue with 20 virtual systems. As more processes start to fight over access to the same file.

PNH · ‎2024-01-22

My findings after upgrading 2 days ago (I like trying to dissect stuff like this, and this may be inaccurate)

fw ctl set int fwha_cpu_utlization_monitor_enable 0

seems to (probably among other things) to make cxld process stop calling an affinity script 2 times a minute.

This script is told to write to a file /cpus.txt that imedately after I guess it has been read it is deleted (looks like it is in the actual root directory as the process does not seem to be in a jail or something??? isn't /tmp better ?)

(you see these calls logges in cxld.elg )

All the cxld processes I have seen going crazy has started behaving bad at different intervals, they all have the /cpus.txt file open, but lsof marks it as deleted.

The cxld processes that has not gone crazy yet does not have this file open (all the time), but the rest seems to be the same. But since I could not find anything similar to "strace" to see what system calls the process does, I do not know for sure.

Also the broken cxld processes write all its 8 times 21MB .elg files all within same second which possibly causes a little more load on disks too.

This parameter does not exist on jumbo 110 apparently, and I see the call to the affinity script there just on startup. (it too seems to write to /cpus.txt by the way, but if they don't happen in all context simultaneously I guess it is ok..)

You may need to restart to make the already hung cxld processes recover.

Sorry for my ramblings..

Henrik_Noerr1 · ‎2024-01-22

See here 🙂

https://support.checkpoint.com/results/sk/sk181891

Javi_San · ‎2024-01-22

Thanks @Henrik_Noerr1
Solution from sk181891 worked like a charm in our VSX cluster!!!

MatanYanay · ‎2024-01-25

Hi all

please note we just released R81.10 Jumbo HF Take #132, including a fix for the issue that was discussed in this thread

Thanks,

Matan.

the_rock · ‎2024-01-25

Great news @MatanYanay

R81.10 New Recommended Jumbo - Take #130

Are you a member of CheckMates?