Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
tjoll
Contributor
Contributor
Jump to solution

High CPU usage on R82.10

Hi all,

A week ago, I migrated my lab environment (open server/proxmox) from R82 JHF44 to R82.10. For the management server (clean installed), I performed an migrate export/import with the upgrade tools. The gateway was also clean installed and I basically reconfigured the gateway manually, setup a sic connection and installed the policy from my new R82.10 management.

Since running R82.10, I see a high cpu usage in my vm. Most of the time, the high cpu and average load is caused by the process usimx86. According to sk180299 this is caused by UPPAK mode, which is the (new) default in R82.10. Although, the SK is mentioning that it's a cosmetic issue of a few Linux commands, it's visible on the hypervisor as well.

CPU.png

My first thought was that it might be related to certain Linux commands that the hypervisor is interpreting in a wrong way. But I also see a higher fan rpm of my hypervisor, a lot of disk I/O and an increase in power draw. 

Disk.png

Power.png

Is anybody familiar with this behavior in the new version or do I encounter a bug that causing my issues?

Thanks in advance.

Mitchel

0 Kudos
1 Solution

Accepted Solutions
Timothy_Hall
MVP Gold
MVP Gold

Absolutely correct, Bob.  In my research on DPDK, I keep seeing references to changing the poll rate somehow, but I can't figure out how to do it in R82.10.  In lab and training environments, it would be helpful to reduce the poll rate on the SNDs from 100% to, say, 25%. R&D assist here?  @PhoneBoy @_Val_ 

Yes, I understand doing so may cause buffering misses and harm performance.  But if a training center or lab environment sets up two ClusterXL HA gateways with 6 or more total cores each, suddenly four of your CPUs are fully spoken for in ESXi or whatever.  Multiply that by 10 students, and suddenly you have 40 of your cores that are always fully subscribed.  Pretty sure most existing training environments aren't going to be able to handle that very well, let alone in the cloud, where you are getting charged by the cycle.

Gaia 4.18 (R82) Immersion Tips, Tricks, & Best Practices Video Course
Now Available at https://shadowpeak.com/gaia4-18-immersion-course

View solution in original post

18 Replies
the_rock
MVP Platinum
MVP Platinum

Did you try change to kernel mode?

Best,
Andy
0 Kudos
Chris_Atkinson
MVP Platinum CHKP MVP Platinum CHKP
MVP Platinum CHKP

Unfortunately this is not possible as of R82.10 it's UPPAK only to my knowledge.

https://sc1.checkpoint.com/documents/R82.10/WebAdminGuides/EN/CP_R82.10_RN/Content/Topics-RN/Softwar...

CCSM R77/R80/ELITE
the_rock
MVP Platinum
MVP Platinum

Good to know!

Best,
Andy
0 Kudos
tjoll
Contributor
Contributor

Yeah, I tried to do that as well but apparently, the option is gone now.

0 Kudos
the_rock
MVP Platinum
MVP Platinum

Right, thats what Chris mentioned as well. I was not aware, sorry.

Best,
Andy
0 Kudos
Chris_Atkinson
MVP Platinum CHKP MVP Platinum CHKP
MVP Platinum CHKP

Have you reviewed HCP for any anomalies, to confirm the gateway is also a VM?

CCSM R77/R80/ELITE
0 Kudos
tjoll
Contributor
Contributor

Yes, the hcp does not show any interesting:

Failed tests:

Test name Status Runtime (sec)
==========================================================================
Local Address Port Usage..........................[INFO] 0.03417
Status of unsaved changes Gaia Clish..............[WARNING] 0.11794
Connection Distribution...........................[INFO] 0.07834
Cpu spikes........................................[INFO] 0.02417
Traffic distribution..............................[INFO] 5.03691
Template efficiency...............................[INFO] 2.05703
Non-FQDN Objects..................................[WARNING] 0.04003
Domain Objects - DNS Passive Learning.............[WARNING] 0.60138

The gateway is a vm running on a proxmox host. 

0 Kudos
Ilya_Yusupov
Employee
Employee

Hi @tjoll ,

do you see same high CPU usage under cpview view?
asking as usim process is working in poll mode and there is a known issue that top in Linux will shows you always 100% usage on poll mode processes.

0 Kudos
tjoll
Contributor
Contributor

Hi,

No the cpu usage in cpview is very low.

Cpview.png

htop looks like this:

htop.png

Basically everything looks normal as described in sk180299. For me the usim_x86 is more of a cosmetic issue for certain linux commands. But that does not match the things I see on my hypervisor:

- Increased cpu usage; the hypervisor should not be aware of sk180299 so it should not show a high cpu usage. It's increased form 3-4% towards 30% constantly.
- Increased fan rpm; the hypervisor is generating more heat so the fans spinning faster.
- Increased power draw; the hypervisor is now pulling around 60 watts while it was around 30-40 watts.
- Increased disk I/O; the hypervisor was doing around 200-400kb/s on disk i/o. Now on R82.10, it's increasing constantly and currently at 1,2mb/s. iotop does not show that many read/write actions though.

To summarize a bit: I see a different behavior on the hypervisor since the new version without a good explanation. 

Thanks.

0 Kudos
Timothy_Hall
MVP Gold
MVP Gold

This is due to the mandatory use of UPPAK on all gateways in R82.10, not just Quantum Force appliances anymore.  Any Linux-based tools (top, vmstat, sar, etc.) will show that all cores acting as SNDs are always at 100% utilization, regardless of the actual system load.  This is due to the use of "poll mode" in UPPAK, which is part of the Data Plane Development Kit (DPDK).  KPPAK used SoftIRQ interrupts instead, where CPU load generally tracked the overall SND traffic load. 

Any Check Point-based measurement tools (cpview, cpstat, etc.) will show the "true" load on the SND cores independent of the CPU utilization.  So if cpview reports 5% utilization on an SND core, that means 5% of the time there was traffic available to process, even though it is always running at 100% CPU.  You may want to check out my 2025 CPX Presentation, which covered this effect.

It should get really interesting when CloudGuard gateways are upgraded to R82.10, and the customer incurs a huge cloud bill due to excessive CPU usage.

Gaia 4.18 (R82) Immersion Tips, Tricks, & Best Practices Video Course
Now Available at https://shadowpeak.com/gaia4-18-immersion-course
(1)
tjoll
Contributor
Contributor

I understand the higher cpu within the vm. That's also described in the sk mentioned earlier. The usage in cpview should be the "real" cpu usage. But I'm seeing a much higher usage outside the vm, in the hypervisor, which should not be aware of the processes and stuff going on within the vm. So I expect the same values in cpview and in the hypervisor. But apparently, it isn't. Do you know how that's possible? From my point of view there is an increase in idle cpu usage in the new version and I'm not sure if it's "works as designed" or I might hit a bug or something. A higher idle cpu can (sometimes) decrease the overall performance because cpu cycles are used for other stuff instead of traffic processing.

Thanks for all your help. 

 

0 Kudos
Timothy_Hall
MVP Gold
MVP Gold

No, what you will see in the hypervisor for CPU utilization will match what the standard Linux-based tools in Gaia report.  cpview is not a standard Linux tool.

Gaia 4.18 (R82) Immersion Tips, Tricks, & Best Practices Video Course
Now Available at https://shadowpeak.com/gaia4-18-immersion-course
0 Kudos
tjoll
Contributor
Contributor

Okay that's possible regarding the statistics of the vm. I'm seeing the same on a host level:

Screenshot_20260105_230555_Brave.jpg

I'm also seeing an increase in fan rpm of the host and an increase of power draw. Both can be a result of a actual higher cpu. I would not suspect that the fans have an rpm increase for a cpu that is basically idling. I'm also not suspecting two very small fans in a mini pc to draw 30 watts of extra power for the idling cpu.

For some reason, it doesn't add up. Things are still quite unexplainable, in my opinion. 

0 Kudos
Bob_Zimmerman
MVP Gold
MVP Gold

From the perspective of the OS scheduler (and the hypervisor scheduler above it), the processor time is actually being used. This isn't a cosmetic issue, it's real usage. This usage can essentially be preempted when there is actual work to do, but it can't be directly reduced.

0 Kudos
Timothy_Hall
MVP Gold
MVP Gold

Absolutely correct, Bob.  In my research on DPDK, I keep seeing references to changing the poll rate somehow, but I can't figure out how to do it in R82.10.  In lab and training environments, it would be helpful to reduce the poll rate on the SNDs from 100% to, say, 25%. R&D assist here?  @PhoneBoy @_Val_ 

Yes, I understand doing so may cause buffering misses and harm performance.  But if a training center or lab environment sets up two ClusterXL HA gateways with 6 or more total cores each, suddenly four of your CPUs are fully spoken for in ESXi or whatever.  Multiply that by 10 students, and suddenly you have 40 of your cores that are always fully subscribed.  Pretty sure most existing training environments aren't going to be able to handle that very well, let alone in the cloud, where you are getting charged by the cycle.

Gaia 4.18 (R82) Immersion Tips, Tricks, & Best Practices Video Course
Now Available at https://shadowpeak.com/gaia4-18-immersion-course
tjoll
Contributor
Contributor

Thanks Bob and Timothy for the clarification. 

I was put on the wrong track because of this line in the sk:"This means that the User Space packet threads together with the kernel threads can result in an elevated CPU Load Average, but it does not reflect an actual increase in the CPU usage." I did assume that the Linux tools could not show the cpu usage correctly and it was idling. 

In the end, it is real usage which results in the things I saw. Maybe that line needs to be rephrased or put in the release notes that there is a increase of cpu usage. Indeed like Timothy mentioned, if all our engineers spin up a gateway in our lab, one of the hosts can be utilized very quickly. It would be nice to change the polling rate.

Thanks all, for the input and explanation. 

Have a nice day.

Mitchel

0 Kudos
Chris_Atkinson
MVP Platinum CHKP MVP Platinum CHKP
MVP Platinum CHKP

I'll submit some feedback on the SK to the effect that the baseline CPU may appear higher and not track the network traffic profile from a performance perspective as in the past.

CCSM R77/R80/ELITE
Jon_Paine
Employee
Employee

let alone in the cloud, where you are getting charged by the cycle.

The supported instance types are "fixed" meaning you pay for the whole processor, regardless of whether you use it or not. R82.10 should not trigger any increase in charges. (Usual disclaimers apply.)

Some clouds do have "burstable" instance types or families, where you pay more if the CPU spikes above, say 30%. I have not tested (why would I...?), but as far as I know, they are not supported for Gaia. 

0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events