- Products
- Learn
- Local User Groups
- Partners
- More
Policy Insights and Policy Auditor in Action
19 November @ 5pm CET / 11am ET
Access Control and Threat Prevention Best Practices
Watch HereOverlap in Security Validation
Help us to understand your needs better
CheckMates Go:
Maestro Madness
Hi all,
a customer is facing high cpu on VSX, but even with TAC assisting, I am struggling to figure out, how to debug which blade or which part of fw_worker is causing it.
System is R80.20 Take 183. On gateway are running some virtual systems, affected only virtual system 1. There are blades activated, And a week ago, suddenly CPUs started to peak from 40% to 100%. Reboot did not help, traffic switchover also had no effect. Turning off some of "not so important" blades had minimal impact as well.
I am suspecting, that there might be issue with DNS, ActiveDirectory or something seemingly unrelated, but currently I am having no evidence.
Could you please just give me any hint, where to look what might causing fw_worker to use so much CPU time?
Thank you.
Please find output from some commands:
[Expert@gateway:0]# cpinfo -y all
This is Check Point CPinfo Build 914000202 for GAIA
[IDA]
No hotfixes..
[CPFC]
HOTFIX_R80_20_JUMBO_HF_MAIN Take: 183
[MGMT]
HOTFIX_R80_20_JUMBO_HF_MAIN Take: 183
[FW1]
HOTFIX_R80_20_JUMBO_HF_MAIN Take: 183
FW1 build number:
This is Check Point's software version R80.20 - Build 256
kernel: R80.20 - Build 255
[SecurePlatform]
HOTFIX_R80_20_JUMBO_HF_MAIN Take: 183
[CPinfo]
No hotfixes..
Only this particular virtual system affected, over all, not so much traffic there:
[Expert@gateway:0]# vsx stat -l
...
VSID: 1
VRID: 1
Type: Virtual System
Name: virtualsystem1
Security Policy: policy
Installed at: 18Nov2020 8:38:19
SIC Status: Trust
Connections number: 69145
Connections peak: 72530
Connections limit: 99900
We have tried to switch off AntiVirus, AntiBot blades so far, as system is in production:
[Expert@gateway:1]# enabled_blades
fw urlf av appi identityServer anti_bot mon
Affinity looks fine, we have added additional CPU to virtual system, but given in account sudden increase of CPU time with the same amount of traffic, I would prefer more sophisticated solution than only "more power".
[Expert@gateway:0]# fw ctl affinity -l
Mgmt: CPU 0
Sync: CPU 1
eth1-01: CPU 1
eth1-02: CPU 0
eth1-03: CPU 0
eth1-04: CPU 1
eth1-05: CPU 1
eth3-01: CPU 0
eth3-02: CPU 0
eth3-03: CPU 1
eth3-04: CPU 1
VS_0 fwk: CPU 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
VS_1 fwk: CPU 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
VS_2 fwk: CPU 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
VS_3 fwk: CPU 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
VS_4 fwk: CPU 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
VS_5 fwk: CPU 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
VS_6 fwk: CPU 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
VS_7 fwk: CPU 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
[Expert@gateway:0]# top
Tasks: 415 total, 1 running, 414 sleeping, 0 stopped, 0 zombie
Cpu(s): 6.0%us, 1.6%sy, 0.0%ni, 91.2%id, 0.1%wa, 0.1%hi, 1.1%si, 0.0%st
Mem: 32778976k total, 15861368k used, 16917608k free, 389068k buffers
Swap: 18892344k total, 0k used, 18892344k free, 6372636k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
20156 admin 0 -20 3560m 2.8g 300m S 594 8.9 5069:25 fwk1_dev_0
8156 admin 0 -20 772m 167m 57m S 6 0.5 57:17.01 fwk6_dev_0
20605 admin 16 0 299m 74m 39m S 6 0.2 42:36.06 cpd
25484 admin 15 0 751m 221m 40m S 6 0.7 151:10.55 fw_full
3683 admin 15 0 256m 144m 53m S 4 0.5 482:48.48 rad
9099 admin 16 0 296m 67m 39m S 4 0.2 19:33.27 cpd
...
8456 admin 3 -20 3175m 2.5g 236m S 87 7.8 432:49.48 fwk1_3
8457 admin 0 -20 3175m 2.5g 236m R 84 7.8 429:57.89 fwk1_4
8458 admin 0 -20 3175m 2.5g 236m R 80 7.8 422:10.97 fwk1_5
8454 admin 0 -20 3175m 2.5g 236m S 79 7.8 457:55.30 fwk1_1
8455 admin 0 -20 3175m 2.5g 236m R 72 7.8 429:56.26 fwk1_2
8453 admin 0 -20 3175m 2.5g 236m R 72 7.8 434:19.71 fwk1_0
Which blades are enabled? How long are the spikes? Is it an Internet facing VS?
I would start by checking if SXL/CoreXL statistics are bending during the spikes, and rule out the usual suspects: heavy connections & deep policy drops (cleanup rule). Do you have drop optimization enabled?
Since those are FWKs, could be either F2F or PXL. I would look for former at the beginning.
I have seen quite a few cases where internet scans were doing quite an impact on Internet facing VS. Could be consistent with your symptoms:
How many cores do you run on that VS?
Enabled blades:
fw urlf av appi identityServer anti_bot mon
Problem is, that those are not typical spikes, it is continually 80% CPU consumption with 8 cores during office hours, two weeks ago it was around 50% with 5 cores presumably with the same traffic pattern.
Yes, it is external firewall, so it can get traffic directly from the internet.
Reboot did not helped. Switchover to other member had the same result - high CPU.
No known configuration change on firewall prior issue started.
Most probably not AI. Was the situation changing gradually? Can you put a finger on a moment in time it changed?
Any chance to turn off, or simplify policy for the rest of the blades? Also, acceleration and corexl statistics would help, plus fwaccel stat output
Customer is claiming, that issue has started November 18th. With CPviwer I identified November 17th at 11.00 , when I can see on graphs sudden spike in CPU interrupts from 1Million To 1,1 Billion, the same time counter Interface errors started showing error rate 50k, even output from netstat shows 0. Further more Interfaces Throughput dropped just prior events from 715Mbps to 0 for moment. Nothing relevant found from local logs, traffic logs does not show anything unusual, at least not clearly visible.
Output from fwaccel looks as I would expected, majority of traffic goes via PSLXL.
[Expert@gateway:1]# fwaccel stats -s
Accelerated conns/Total conns : 1298/73821 (1%)
Accelerated pkts/Total pkts : 12358235182/12405674200 (99%)
F2Fed pkts/Total pkts : 47439018/12405674200 (0%)
F2V pkts/Total pkts : 47430970/12405674200 (0%)
CPASXL pkts/Total pkts : 0/12405674200 (0%)
PSLXL pkts/Total pkts : 11758122745/12405674200 (94%)
CPAS inline pkts/Total pkts : 0/12405674200 (0%)
PSL inline pkts/Total pkts : 0/12405674200 (0%)
QOS inbound pkts/Total pkts : 0/12405674200 (0%)
QOS outbound pkts/Total pkts : 0/12405674200 (0%)
Corrected pkts/Total pkts : 0/12405674200 (0%)
Drop templating is also activated:
[Expert@fwcluster01:1]# fwaccel stats -d
Reason Value Reason Value
-------------------- --------------- -------------------- ---------------
General 967 CPASXL Decision 0
PSLXL Decision 1285558 Clear Packet on VPN 0
Encryption Failed 0 Drop Template 286770
Decryption Failed 0 Interface Down 0
Cluster Error 0 XMT Error 0
Anti-Spoofing 194825 Local Spoofing 8506
Sanity Error 209 Monitored Spoofed 0
QXL Decision 0 C2S Violation 0
S2C Violation 0 Loop Prevention 0
DOS Fragments 0 DOS IP Options 0
DOS Blacklists 0 DOS Penalty Box 0
DOS Rate Limiting 0 Syn Attack 0
Reorder 1025 Virt Defrag Timeout 7368
Errors on interfaces:
[Expert@gateway:0]# netstat -ni
Kernel Interface table
Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
Mgmt 1500 0 24089392 0 0 0 30078984 0 0 0 BMRU
Sync 1500 0 106256569 0 384 0 155747864 0 0 0 BMRU
bond1 1500 0 2814106924 0 0 0 4647754685 0 0 0 BMmRU
bond2 1500 0 7756011045 0 0 0 5639537713 0 0 0 BMmRU
eth1-01 1500 0 123220626 0 0 0 143293137 0 0 0 BMRU
eth1-02 1500 0 0 0 0 0 0 0 0 0 BMsU
eth1-03 1500 0 5013412227 3 73523 73523 4622912696 0 0 0 BMRU
eth1-04 1500 0 32681606 0 0 0 31840976 0 0 0 BMRU
eth1-05 1500 0 418773297 0 0 0 1003323082 0 0 0 BMRU
eth3-01 1500 0 1797471464 0 0 0 4498085216 0 0 0 BMsRU
eth3-02 1500 0 1016637167 0 0 0 149672804 0 0 0 BMsRU
eth3-03 1500 0 6412175 0 0 0 5634342714 0 0 0 BMsRU
eth3-04 1500 0 7749603797 0 0 0 5198377 0 0 0 BMsRU
lo 16436 0 2284471 0 0 0 2284471 0 0 0 LRU
I see here practically no accelerated connections. Matching the first packet through rulebase might also cause performance degradation, especially with 60-70k concurrent connections.
fwaccel stat should show if templates are disabled. Also, any policy changes around November 17 at 11? Audit logs should show some
That is correct, there are activated blades fw urlf av appi identityServer anti_bot mon, then I would say it is correct, that traffic goes through PSLXL.
Templates are enabled.
I am suspecting, that one of blades might causing high cpu, but could not figure out, how to proof it. I have tried to disable some of them, which I have considered safe to do without maintenance window, but no effect. CPUs are busy with fwk_1_ processes.
Hello Martin, did you find the culprit?
As Val said the CPU jump could be caused by excessive rulebase lookup overhead due to lack of SecureXL Accept Templates. Another situation that can manifest itself as perpetually high CPU is issues with the state sync network in HA, usually driven by a dramatic jump in the new connections/sec rate. Does cpview reveal a jump in this rate around the time this started?
What is the size of your Internet-routable addressing space on this firewall? /24? Larger? Drop Optimization/Templates is not the greatest dealing with spikes in "trash" hack/scanning traffic coming in from the Internet which can suddenly increase without warning, and is exacerbated by a large Internet-routable footprint size. A SecureXL penalty box setup does a much better job.
Leaderboard
Epsum factorial non deposit quid pro quo hic escorol.
| User | Count |
|---|---|
| 42 | |
| 19 | |
| 10 | |
| 9 | |
| 8 | |
| 7 | |
| 5 | |
| 5 | |
| 4 | |
| 4 |
Wed 19 Nov 2025 @ 11:00 AM (EST)
TechTalk: Improve Your Security Posture with Threat Prevention and Policy InsightsThu 20 Nov 2025 @ 05:00 PM (CET)
Hacking LLM Applications: latest research and insights from our LLM pen testing projects - AMERThu 20 Nov 2025 @ 10:00 AM (CST)
Hacking LLM Applications: latest research and insights from our LLM pen testing projects - EMEAWed 26 Nov 2025 @ 12:00 PM (COT)
Panama City: Risk Management a la Parrilla: ERM, TEM & Meat LunchWed 19 Nov 2025 @ 11:00 AM (EST)
TechTalk: Improve Your Security Posture with Threat Prevention and Policy InsightsThu 20 Nov 2025 @ 05:00 PM (CET)
Hacking LLM Applications: latest research and insights from our LLM pen testing projects - AMERThu 20 Nov 2025 @ 10:00 AM (CST)
Hacking LLM Applications: latest research and insights from our LLM pen testing projects - EMEAThu 04 Dec 2025 @ 12:30 PM (SGT)
End-of-Year Event: Securing AI Transformation in a Hyperconnected World - APACThu 04 Dec 2025 @ 03:00 PM (CET)
End-of-Year Event: Securing AI Transformation in a Hyperconnected World - EMEAWed 26 Nov 2025 @ 12:00 PM (COT)
Panama City: Risk Management a la Parrilla: ERM, TEM & Meat LunchAbout CheckMates
Learn Check Point
Advanced Learning
YOU DESERVE THE BEST SECURITY