Solved: Re: High CPU usage on R82.10

tjoll · ‎2026-01-04

Hi all,

A week ago, I migrated my lab environment (open server/proxmox) from R82 JHF44 to R82.10. For the management server (clean installed), I performed an migrate export/import with the upgrade tools. The gateway was also clean installed and I basically reconfigured the gateway manually, setup a sic connection and installed the policy from my new R82.10 management.

Since running R82.10, I see a high cpu usage in my vm. Most of the time, the high cpu and average load is caused by the process usimx86. According to sk180299 this is caused by UPPAK mode, which is the (new) default in R82.10. Although, the SK is mentioning that it's a cosmetic issue of a few Linux commands, it's visible on the hypervisor as well.

My first thought was that it might be related to certain Linux commands that the hypervisor is interpreting in a wrong way. But I also see a higher fan rpm of my hypervisor, a lot of disk I/O and an increase in power draw.

Is anybody familiar with this behavior in the new version or do I encounter a bug that causing my issues?

Thanks in advance.

Mitchel

Timothy_Hall · ‎2026-01-05

Absolutely correct, Bob. In my research on DPDK, I keep seeing references to changing the poll rate somehow, but I can't figure out how to do it in R82.10. In lab and training environments, it would be helpful to reduce the poll rate on the SNDs from 100% to, say, 25%. R&D assist here? @PhoneBoy @_Val_

Yes, I understand doing so may cause buffering misses and harm performance. But if a training center or lab environment sets up two ClusterXL HA gateways with 6 or more total cores each, suddenly four of your CPUs are fully spoken for in ESXi or whatever. Multiply that by 10 students, and suddenly you have 40 of your cores that are always fully subscribed. Pretty sure most existing training environments aren't going to be able to handle that very well, let alone in the cloud, where you are getting charged by the cycle.

New Book: "Max Power 2026" Coming Soon
Check Point Firewall Performance Optimization

View solution in original post

the_rock · ‎2026-01-04

Did you try change to kernel mode?

Best,
Andy

Chris_Atkinson · ‎2026-01-04

Unfortunately this is not possible as of R82.10 it's UPPAK only to my knowledge.

https://sc1.checkpoint.com/documents/R82.10/WebAdminGuides/EN/CP_R82.10_RN/Content/Topics-RN/Softwar...

CCSM R77/R80/ELITE

the_rock · ‎2026-01-04

Good to know!

Best,
Andy

tjoll · ‎2026-01-05

Yeah, I tried to do that as well but apparently, the option is gone now.

the_rock · ‎2026-01-05

Right, thats what Chris mentioned as well. I was not aware, sorry.

Best,
Andy

Chris_Atkinson · ‎2026-01-04

Have you reviewed HCP for any anomalies, to confirm the gateway is also a VM?

CCSM R77/R80/ELITE

tjoll · ‎2026-01-05

Yes, the hcp does not show any interesting:

Failed tests:

Test name Status Runtime (sec)
==========================================================================
Local Address Port Usage..........................[INFO] 0.03417
Status of unsaved changes Gaia Clish..............[WARNING] 0.11794
Connection Distribution...........................[INFO] 0.07834
Cpu spikes........................................[INFO] 0.02417
Traffic distribution..............................[INFO] 5.03691
Template efficiency...............................[INFO] 2.05703
Non-FQDN Objects..................................[WARNING] 0.04003
Domain Objects - DNS Passive Learning.............[WARNING] 0.60138

The gateway is a vm running on a proxmox host.

Ilya_Yusupov · ‎2026-01-04

Hi @tjoll ,

do you see same high CPU usage under cpview view?
asking as usim process is working in poll mode and there is a known issue that top in Linux will shows you always 100% usage on poll mode processes.

tjoll · ‎2026-01-05

Hi,

No the cpu usage in cpview is very low.

htop looks like this:

Basically everything looks normal as described in sk180299. For me the usim_x86 is more of a cosmetic issue for certain linux commands. But that does not match the things I see on my hypervisor:

- Increased cpu usage; the hypervisor should not be aware of sk180299 so it should not show a high cpu usage. It's increased form 3-4% towards 30% constantly.
- Increased fan rpm; the hypervisor is generating more heat so the fans spinning faster.
- Increased power draw; the hypervisor is now pulling around 60 watts while it was around 30-40 watts.
- Increased disk I/O; the hypervisor was doing around 200-400kb/s on disk i/o. Now on R82.10, it's increasing constantly and currently at 1,2mb/s. iotop does not show that many read/write actions though.

To summarize a bit: I see a different behavior on the hypervisor since the new version without a good explanation.

Thanks.

Timothy_Hall · ‎2026-01-05

This is due to the mandatory use of UPPAK on all gateways in R82.10, not just Quantum Force appliances anymore. Any Linux-based tools (top, vmstat, sar, etc.) will show that all cores acting as SNDs are always at 100% utilization, regardless of the actual system load. This is due to the use of "poll mode" in UPPAK, which is part of the Data Plane Development Kit (DPDK). KPPAK used SoftIRQ interrupts instead, where CPU load generally tracked the overall SND traffic load.

Any Check Point-based measurement tools (cpview, cpstat, etc.) will show the "true" load on the SND cores independent of the CPU utilization. So if cpview reports 5% utilization on an SND core, that means 5% of the time there was traffic available to process, even though it is always running at 100% CPU. You may want to check out my 2025 CPX Presentation, which covered this effect.

It should get really interesting when CloudGuard gateways are upgraded to R82.10, and the customer incurs a huge cloud bill due to excessive CPU usage.

New Book: "Max Power 2026" Coming Soon
Check Point Firewall Performance Optimization

tjoll · ‎2026-01-05

I understand the higher cpu within the vm. That's also described in the sk mentioned earlier. The usage in cpview should be the "real" cpu usage. But I'm seeing a much higher usage outside the vm, in the hypervisor, which should not be aware of the processes and stuff going on within the vm. So I expect the same values in cpview and in the hypervisor. But apparently, it isn't. Do you know how that's possible? From my point of view there is an increase in idle cpu usage in the new version and I'm not sure if it's "works as designed" or I might hit a bug or something. A higher idle cpu can (sometimes) decrease the overall performance because cpu cycles are used for other stuff instead of traffic processing.

Thanks for all your help.

Timothy_Hall · ‎2026-01-05

No, what you will see in the hypervisor for CPU utilization will match what the standard Linux-based tools in Gaia report. cpview is not a standard Linux tool.

New Book: "Max Power 2026" Coming Soon
Check Point Firewall Performance Optimization

tjoll · ‎2026-01-05

Okay that's possible regarding the statistics of the vm. I'm seeing the same on a host level:

I'm also seeing an increase in fan rpm of the host and an increase of power draw. Both can be a result of a actual higher cpu. I would not suspect that the fans have an rpm increase for a cpu that is basically idling. I'm also not suspecting two very small fans in a mini pc to draw 30 watts of extra power for the idling cpu.

For some reason, it doesn't add up. Things are still quite unexplainable, in my opinion.

Bob_Zimmerman · ‎2026-01-05

From the perspective of the OS scheduler (and the hypervisor scheduler above it), the processor time is actually being used. This isn't a cosmetic issue, it's real usage. This usage can essentially be preempted when there is actual work to do, but it can't be directly reduced.

Timothy_Hall · ‎2026-01-05

Absolutely correct, Bob. In my research on DPDK, I keep seeing references to changing the poll rate somehow, but I can't figure out how to do it in R82.10. In lab and training environments, it would be helpful to reduce the poll rate on the SNDs from 100% to, say, 25%. R&D assist here? @PhoneBoy @_Val_

Yes, I understand doing so may cause buffering misses and harm performance. But if a training center or lab environment sets up two ClusterXL HA gateways with 6 or more total cores each, suddenly four of your CPUs are fully spoken for in ESXi or whatever. Multiply that by 10 students, and suddenly you have 40 of your cores that are always fully subscribed. Pretty sure most existing training environments aren't going to be able to handle that very well, let alone in the cloud, where you are getting charged by the cycle.

New Book: "Max Power 2026" Coming Soon
Check Point Firewall Performance Optimization

tjoll · ‎2026-01-05

Thanks Bob and Timothy for the clarification.

I was put on the wrong track because of this line in the sk:"This means that the User Space packet threads together with the kernel threads can result in an elevated CPU Load Average, but it does not reflect an actual increase in the CPU usage." I did assume that the Linux tools could not show the cpu usage correctly and it was idling.

In the end, it is real usage which results in the things I saw. Maybe that line needs to be rephrased or put in the release notes that there is a increase of cpu usage. Indeed like Timothy mentioned, if all our engineers spin up a gateway in our lab, one of the hosts can be utilized very quickly. It would be nice to change the polling rate.

Thanks all, for the input and explanation.

Have a nice day.

Mitchel

Chris_Atkinson · ‎2026-01-05

I'll submit some feedback on the SK to the effect that the baseline CPU may appear higher and not track the network traffic profile from a performance perspective as in the past.

CCSM R77/R80/ELITE

Jon_Paine · ‎2026-01-09

> let alone in the cloud, where you are getting charged by the cycle.

The supported instance types are "fixed" meaning you pay for the whole processor, regardless of whether you use it or not. R82.10 should not trigger any increase in charges. (Usual disclaimers apply.)

Some clouds do have "burstable" instance types or families, where you pay more if the CPU spikes above, say 30%. I have not tested (why would I...?), but as far as I know, they are not supported for Gaia.

Timothy_Hall · ‎2026-01-09

Got it, thanks for the clarification. However, even if you pay for the entire processor and it runs at 100% utilization all day, every day, will the cloud provider have an issue with that? Would they start denying scheduling (steal) in that case?

New Book: "Max Power 2026" Coming Soon
Check Point Firewall Performance Optimization

Jon_Paine · ‎2026-01-12

For authoritative answers, please check with each of your cloud providers / billing consultancy / enterprise agreement.

Given the size of a host in the cloud, having 1 x R81.20 instance running hotter will not likely make a noticeable difference.

Usual disclaimers apply, my shirt says Check Point, not anything else.

emmap · ‎2026-01-13

Just to note this for conversation - R82.10 is not yet listed as being supported on virtual platforms. This may be worth revisiting when support is added, which will likely come in a JHF take. I have on insight on whether anything will actually change, but it'll be easier to get more information about it.

https://www.checkpoint.com/support-services/hcl/#virtual-machines

emmap · ‎2026-01-19

For clarity, looks like VMs are supported on the HCL now.

tjoll · ‎2026-01-20

I think open server was already supported as I checked the following documentation: https://sc1.checkpoint.com/documents/R82.10/WebAdminGuides/EN/CP_R82.10_RN/Content/Topics-RN/Support...

Although, I don't think that they will change the architecture of the DPDK only for open servers. The issue with a fully utilized CPU in a virtual environment can still cause issues in the future.

emmap · ‎2026-01-20

Virtual Machine support is separated from Open Server support on the HCL. I don't know if there's any roadmap to change anything around DPDK though - probably this will rest on feedback from the field.

Are you a member of CheckMates?

High CPU usage on R82.10