Hi all,
a customer is facing high cpu on VSX, but even with TAC assisting, I am struggling to figure out, how to debug which blade or which part of fw_worker is causing it.
System is R80.20 Take 183. On gateway are running some virtual systems, affected only virtual system 1. There are blades activated, And a week ago, suddenly CPUs started to peak from 40% to 100%. Reboot did not help, traffic switchover also had no effect. Turning off some of "not so important" blades had minimal impact as well.
I am suspecting, that there might be issue with DNS, ActiveDirectory or something seemingly unrelated, but currently I am having no evidence.
Could you please just give me any hint, where to look what might causing fw_worker to use so much CPU time?
Thank you.
Please find output from some commands:
[Expert@gateway:0]# cpinfo -y all
This is Check Point CPinfo Build 914000202 for GAIA
[IDA]
No hotfixes..
[CPFC]
HOTFIX_R80_20_JUMBO_HF_MAIN Take: 183
[MGMT]
HOTFIX_R80_20_JUMBO_HF_MAIN Take: 183
[FW1]
HOTFIX_R80_20_JUMBO_HF_MAIN Take: 183
FW1 build number:
This is Check Point's software version R80.20 - Build 256
kernel: R80.20 - Build 255
[SecurePlatform]
HOTFIX_R80_20_JUMBO_HF_MAIN Take: 183
[CPinfo]
No hotfixes..
Only this particular virtual system affected, over all, not so much traffic there:
[Expert@gateway:0]# vsx stat -l
...
VSID: 1
VRID: 1
Type: Virtual System
Name: virtualsystem1
Security Policy: policy
Installed at: 18Nov2020 8:38:19
SIC Status: Trust
Connections number: 69145
Connections peak: 72530
Connections limit: 99900
We have tried to switch off AntiVirus, AntiBot blades so far, as system is in production:
[Expert@gateway:1]# enabled_blades
fw urlf av appi identityServer anti_bot mon
Affinity looks fine, we have added additional CPU to virtual system, but given in account sudden increase of CPU time with the same amount of traffic, I would prefer more sophisticated solution than only "more power".
[Expert@gateway:0]# fw ctl affinity -l
Mgmt: CPU 0
Sync: CPU 1
eth1-01: CPU 1
eth1-02: CPU 0
eth1-03: CPU 0
eth1-04: CPU 1
eth1-05: CPU 1
eth3-01: CPU 0
eth3-02: CPU 0
eth3-03: CPU 1
eth3-04: CPU 1
VS_0 fwk: CPU 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
VS_1 fwk: CPU 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
VS_2 fwk: CPU 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
VS_3 fwk: CPU 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
VS_4 fwk: CPU 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
VS_5 fwk: CPU 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
VS_6 fwk: CPU 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
VS_7 fwk: CPU 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
[Expert@gateway:0]# top
Tasks: 415 total, 1 running, 414 sleeping, 0 stopped, 0 zombie
Cpu(s): 6.0%us, 1.6%sy, 0.0%ni, 91.2%id, 0.1%wa, 0.1%hi, 1.1%si, 0.0%st
Mem: 32778976k total, 15861368k used, 16917608k free, 389068k buffers
Swap: 18892344k total, 0k used, 18892344k free, 6372636k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
20156 admin 0 -20 3560m 2.8g 300m S 594 8.9 5069:25 fwk1_dev_0
8156 admin 0 -20 772m 167m 57m S 6 0.5 57:17.01 fwk6_dev_0
20605 admin 16 0 299m 74m 39m S 6 0.2 42:36.06 cpd
25484 admin 15 0 751m 221m 40m S 6 0.7 151:10.55 fw_full
3683 admin 15 0 256m 144m 53m S 4 0.5 482:48.48 rad
9099 admin 16 0 296m 67m 39m S 4 0.2 19:33.27 cpd
...
8456 admin 3 -20 3175m 2.5g 236m S 87 7.8 432:49.48 fwk1_3
8457 admin 0 -20 3175m 2.5g 236m R 84 7.8 429:57.89 fwk1_4
8458 admin 0 -20 3175m 2.5g 236m R 80 7.8 422:10.97 fwk1_5
8454 admin 0 -20 3175m 2.5g 236m S 79 7.8 457:55.30 fwk1_1
8455 admin 0 -20 3175m 2.5g 236m R 72 7.8 429:56.26 fwk1_2
8453 admin 0 -20 3175m 2.5g 236m R 72 7.8 434:19.71 fwk1_0