Understanding if high CPU utilization is 'normal' ...

Parabol · ‎2023-03-09

Hi all,

We have automated alerts setup with our SNMP monitoring platform, so that if our checkpoint gateways exceed 80%+ CPU utilization for a duration of 10-minutes or longer, we receive an alert.

The alerts generally trigger quite frequently, although it's very inconsistent. A particular gateway might trigger for multiple days, then nothing. There isn't really any pattern.

I've tried to look into the alerts to understand if this is "normal" operation, or something which needs further investigation.

My method has been to use cpview -t to check the historical util, this shows some information such as CPU type (e.g. CoreXL_FW).

I also check the /var/log/spike_detective logs, but I find the process information doesn't mean much to me, e.g.:

spike info: type: thread, thread id: 86227, thread name: fwk3_3, start time: 21/02/23 05:11:54, spike duration (sec): 29, initial cpu usage: 100, average cpu usage: 100, perf taken: 0

I wondered if others have alternative methods to investigate high CPU utilization to understand the cause? Or is it quite normal to have frequent "spikes" and periods of high CPU in normal operation? If so perhaps we need to tweak our alert threshold.

Thanks in advance, and I appreciate it's quite a variable question depending on factors like throughput, active blades, model etc..

Timothy_Hall · ‎2023-03-09

A policy installation to a gateway will cause a spike which is expected. The next most common thing is the presence of elephant flows, try running fw ctl multik print_heavy_conn to see all elephant flows for the last 24 hours. These will normally coincide with spikes reported by the Spike Detective depending upon just how busy overall your firewall is.

Attend my online "Be your Own TAC: Part Deux" CheckMates event
March 27th with sessions for both the EMEA and Americas time zones

the_rock · ‎2023-03-09

Excellent advice by @Timothy_Hall as always. I would also suggest run below commands:

ps -auxw

free -m

top

cpview

cpwd_admin list

Btw, below is output of the command Tim gave on my fw, but obviously, there are no issues, as its basic lab with just single windows fw behind it (with https inspection on)

[Expert@quantum-firewall:0]# fw ctl multik print_heavy_conn
[fw_1]; Conn: 172.16.10.178:62747 -> 34.104.35.123:80 IPP 6; Instance load: 63%; Connection instance load: 99%; StartTime: 08/03/23 21:56:37; Duration: 104; IdentificationTime: 08/03/23 21:56:44; Service: 6:80; Total Packets: 17866; Total Bytes: 19523333;
[Expert@quantum-firewall:0]# fw ctl multik stat
ID | Active | CPU | Connections | Peak
----------------------------------------------
0 | Yes | 3 | 11 | 65
1 | Yes | 2 | 8 | 73
2 | Yes | 1 | 11 | 79
3 | Yes | 0 | 9 | 67
[Expert@quantum-firewall:0]# free -m
total used free shared buff/cache available
Mem: 21860 5206 11350 8 5303 15644
Swap: 32191 0 32191
[Expert@quantum-firewall:0]#

I also strongly recommend his book. Dont worry, he did not pay me to say this, I just think its EXCELLENT 👍👍

https://www.amazon.ca/Max-Power-Firewall-Performance-Optimization/dp/1981481222/ref=sr_1_1?crid=PRBO...

Are you a member of CheckMates?

Understanding if high CPU utilization is 'normal' or not?