- Products
- Learn
- Local User Groups
- Partners
- More
Access Control and Threat Prevention Best Practices
5 November @ 5pm CET / 11am ET
Ask Check Point Threat Intelligence Anything!
October 28th, 9am ET / 3pm CET
Check Point Named Leader
2025 Gartner® Magic Quadrant™ for Hybrid Mesh Firewall
HTTPS Inspection
Help us to understand your needs better
CheckMates Go:
Spark Management Portal and More!
Hi everyone,
At the moment I have an ongoing issue with a customer. Symptoms are as following:
High CPU load:
top - 12:50:06 up 9:41, 3 users, load average: 24.71, 12.38, 6.78
Tasks: 347 total, 30 running, 317 sleeping, 0 stopped, 0 zombie
%Cpu(s): 2.4 us, 81.9 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.3 hi, 15.4 si, 0.0 st
KiB Mem : 98087944 total, 69572780 free, 15522848 used, 12992316 buff/cache
KiB Swap: 67108860 total, 67108860 free, 0 used. 81122528 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 
20460 admin 20 0 216456 103184 25532 R 64.4 0.1 32:58.43 rad 
11626 admin 20 0 0 0 0 R 55.8 0.0 72:38.48 fw_worker_7 
11628 admin 20 0 0 0 0 R 54.6 0.0 70:53.74 fw_worker_9 
11623 admin 20 0 0 0 0 R 49.5 0.0 74:40.62 fw_worker_4 
11624 admin 20 0 0 0 0 R 44.8 0.0 71:14.27 fw_worker_5 
11627 admin 20 0 0 0 0 R 41.3 0.0 71:24.60 fw_worker_8 
11625 admin 20 0 0 0 0 R 40.7 0.0 70:17.07 fw_worker_6 
11622 admin 20 0 0 0 0 R 38.5 0.0 72:11.25 fw_worker_3 
11619 admin 20 0 0 0 0 R 37.5 0.0 74:28.74 fw_worker_0 
11620 admin 20 0 0 0 0 R 37.2 0.0 73:07.65 fw_worker_1 
19952 admin 20 0 940080 353524 49376 R 31.5 0.4 110:26.24 fw_full
We can see slowly the load increase on the workers and later the RAD daemon.
RAD shows no errors in the rad dir and SmartConsole. CPU spike log is empty.
Only fix now is to failover to the other member and it starts over. At the moment I have to do failover every 10-15 min.
TAC case is going on as we speak. Wanted to reach out to the community to have a second check, maybe share some ideas.
We also upgraded the setup yesterday from take 113 to 115 R81.20 no improvement . Disabled blades: av ips etc same result.
Please provide the output of the following commands, ideally taken while the issue is occurring, prior to failing over:
fwaccel stat
fwaccel stats -s
enabled_blades
netstat -ni
Any chance you are on a Quantum Force 3900/9XXX/19XXX/29XXX or Lightspeed appliance? UPPAK is in play there.
If URLF is enabled, this could be the URL categorization cache thrashing because you have far more than 1,000 surfing users behind the firewall. This cache is not synced between cluster members, so a failover would fix the issue temporarily. It could also be the AV anti-malware cache thrashing, but it sounds like you tried turning off AV, and the issue persisted.
The only other thing a failover would do is dump all connections out of the Medium Path into the fastpath upon failover by default, which would significantly reduce the CPU load on your firewall worker instances temporarily. This effect upon failover was discussed in my CPX Presentation which you may want to review.
Hi,
Thank you for the reply. This gateway runs kernel mode (open server).
AV is indeed off and I suspect URL filtering as you state. Below the info 🙂
fwaccel stat
+---------------------------------------------------------------------------------+
|Id|Name |Status |Interfaces |Features |
+---------------------------------------------------------------------------------+
|0 |KPPAK |enabled |eth,eth,et,eth,eth,|Acceleration,Cryptography |
| | | |eth,eth,eth,eth | |
| | | | |Crypto: Tunnel,UDPEncap,MD5, |
| | | | |SHA1,3DES,DES,AES-128,AES-256,|
| | | | |ESP,LinkSelection,DynamicVPN, |
| | | | |NatTraversal,AES-XCBC,SHA256, |
| | | | |SHA384,SHA512 |
+---------------------------------------------------------------------------------+
Accept Templates : enabled
Drop Templates : enabled
NAT Templates : enabled
LightSpeed Accel : disabled
fwaccel stats -s
Accelerated conns/Total conns : 56744/159331 (35%)
LightSpeed conns/Total conns : 0/159331 (0%)
Accelerated pkts/Total pkts : 7073015862/7843049171 (90%)
LightSpeed pkts/Total pkts : 0/7843049171 (0%)
F2Fed pkts/Total pkts : 770033309/7843049171 (9%)
F2V pkts/Total pkts : 46691949/7843049171 (0%)
CPASXL pkts/Total pkts : 0/7843049171 (0%)
PSLXL pkts/Total pkts : 4516731459/7843049171 (57%)
CPAS pipeline pkts/Total pkts : 0/7843049171 (0%)
PSL pipeline pkts/Total pkts : 0/7843049171 (0%)
QOS inbound pkts/Total pkts : 0/7843049171 (0%)
QOS outbound pkts/Total pkts : 0/7843049171 (0%)
Corrected pkts/Total pkts : 0/7843049171 (0%)
enabled_blades
fw urlf appi SSL_INSPECT anti_bot mon
All those outputs look OK, pretty sure this is a cache thrash issue caused by AB and/or URLF, see the last two paragraphs on the second page, which is quoted from the most recent edition of my Gateway Performance Course:
Apart from what Tim asked for, maybe send us below as well.
Andy
************
fw tab -t connections -s
fw ctl multik print_heavy_conn
Hi,
in URL-filtering-blade-RAD-process-causing-high-CPU-tip we discussed a few RAD issues.
In our case with high CPU+RAD we had to disable the RAD autodebug option with sk182859 
Cheers!
Good call @D_W
Hi,
Thanks for the tip, autodebug is already disabled 🙂
We see loads of the following RAD error.
FlowError=RAD request exceeded maximum handing time
On the other hand, CPU issue is still there and there can be 0 RAD errors at the moment. They have been gone all done and just popped up:
grep "FlowError=" $FWDIR/log/rad_events/Errors/* | grep -oP '(?<=FlowError=).*' | sort | uniq -c | sort -nr
483 RAD request exceeded maximum handing time
15 Failed to fetch Check Point resources. Timeout was reached
14 Failed to fetch Check Point resources. Couldn't resolve host name
1 Failed to fetch Check Point resources. Couldn't connect to server
That error is about connectivity issues. What is using CPu now, RAD or FW workers?
Hi Val,
First we see increased load on the fw_workers, shortly after RAD joins aswell with high load.
RAD errors have been clear most of the day today. We experienced high load without any RAD errors in de relevant folder.
This is how it looks '''mid'' issue. Customer notice issues around load average of 25
top - 17:36:03 up 14:27, 4 users, load average: 10.60, 7.64, 7.45
Tasks: 350 total, 19 running, 331 sleeping, 0 stopped, 0 zombie
%Cpu(s): 4.0 us, 78.1 sy, 0.0 ni, 11.4 id, 0.0 wa, 0.5 hi, 6.0 si, 0.0 st
KiB Mem : 98087944 total, 49163212 free, 15571648 used, 33353084 buff/cache
KiB Swap: 67108860 total, 67108860 free, 0 used. 81177276 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 
19952 admin 20 0 962644 374820 49380 R 86.5 0.4 200:43.26 fw_full 
11622 admin 20 0 0 0 0 R 69.4 0.0 133:56.11 fw_worker_3 
11619 admin 20 0 0 0 0 R 62.3 0.0 137:19.11 fw_worker_0 
11625 admin 20 0 0 0 0 R 61.6 0.0 130:51.28 fw_worker_6 
11624 admin 20 0 0 0 0 R 60.0 0.0 133:16.94 fw_worker_5 
11626 admin 20 0 0 0 0 R 57.4 0.0 134:03.56 fw_worker_7 
11623 admin 20 0 0 0 0 R 57.1 0.0 136:49.37 fw_worker_4 
11621 admin 20 0 0 0 0 R 56.5 0.0 133:35.66 fw_worker_2 
11628 admin 20 0 0 0 0 R 54.2 0.0 131:37.48 fw_worker_9 
11627 admin 20 0 0 0 0 R 53.5 0.0 132:21.71 fw_worker_8 
20460 admin 20 0 220916 111744 28284 R 53.2 0.1 60:11.37 rad 
11620 admin 20 0 0 0 0 R 44.5 0.0 134:50.44 fw_worker_1
Do you use external DNS servers like 9.9.9.9? They will eventually block the requests due to too many requests/minute.
Or maybe you hit a limit at 
$FWDIR/conf/rad_conf.C:
:max_flows (1000)
The CSV file did not displayed max flows today, only this morning. TAC noticed that we reached the cap but was not needed to increase the max flow value. FW's connect towards internal infoblox server. After that I am unaware, could ask if important 🙂
Update: we suspect customer was under attack. I noticed following logs:
SYN Defender: activated <interface>. Number of not established connections is 5017
After 5000 Syn defender kicks in and does the following (copied from SK):
When the Gateway decides that a server is under attack, it switches to SYN Relay Defense. SYN Relay counters the attack by making sure that the three-way handshake is complete before sending a SYN packet to the connection's destination.
Even if the destination server is not listening on that port, the Gateway will respond with a SYN-ACK to make sure that the client completes the three-way handshake with an ACK; it does this to determine the legitimacy of the connection. After the Gateway has determined that the connection is legitimate, it forwards the packet to the firewall layer and eventually to the destination server
--------------
So after i disabled this protection load went down. Customer was still under attack and firewall dropped still traffic. But the above protection is a critical performance one. Load went down and fw went stable after this. We blocked the attack(before the fw) and enabled protection again.
I see loads of host / port scans. If firewall is gonna reply to them due above protection I can imagine it struggles with it.
Enable the SecureXL penalty box feature which will help a lot. It should be enabled by default as far as I am concerned.
 
					
				
				
			
		
Leaderboard
Epsum factorial non deposit quid pro quo hic escorol.
| User | Count | 
|---|---|
| 18 | |
| 17 | |
| 13 | |
| 11 | |
| 11 | |
| 7 | |
| 7 | |
| 6 | |
| 6 | |
| 4 | 
Tue 28 Oct 2025 @ 11:00 AM (EDT)
Under the Hood: CloudGuard Network Security for Google Cloud Network Security Integration - OverviewTue 28 Oct 2025 @ 12:30 PM (EDT)
Check Point & AWS Virtual Immersion Day: Web App ProtectionTue 28 Oct 2025 @ 11:00 AM (EDT)
Under the Hood: CloudGuard Network Security for Google Cloud Network Security Integration - OverviewTue 28 Oct 2025 @ 12:30 PM (EDT)
Check Point & AWS Virtual Immersion Day: Web App ProtectionThu 30 Oct 2025 @ 03:00 PM (CET)
Cloud Security Under Siege: Critical Insights from the 2025 Security Landscape - EMEAThu 30 Oct 2025 @ 11:00 AM (EDT)
Tips and Tricks 2025 #15: Become a Threat Exposure Management Power User!About CheckMates
Learn Check Point
Advanced Learning
YOU DESERVE THE BEST SECURITY