Hi everyone,
At the moment I have an ongoing issue with a customer. Symptoms are as following:
High CPU load:
top - 12:50:06 up 9:41, 3 users, load average: 24.71, 12.38, 6.78
Tasks: 347 total, 30 running, 317 sleeping, 0 stopped, 0 zombie
%Cpu(s): 2.4 us, 81.9 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.3 hi, 15.4 si, 0.0 st
KiB Mem : 98087944 total, 69572780 free, 15522848 used, 12992316 buff/cache
KiB Swap: 67108860 total, 67108860 free, 0 used. 81122528 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
20460 admin 20 0 216456 103184 25532 R 64.4 0.1 32:58.43 rad
11626 admin 20 0 0 0 0 R 55.8 0.0 72:38.48 fw_worker_7
11628 admin 20 0 0 0 0 R 54.6 0.0 70:53.74 fw_worker_9
11623 admin 20 0 0 0 0 R 49.5 0.0 74:40.62 fw_worker_4
11624 admin 20 0 0 0 0 R 44.8 0.0 71:14.27 fw_worker_5
11627 admin 20 0 0 0 0 R 41.3 0.0 71:24.60 fw_worker_8
11625 admin 20 0 0 0 0 R 40.7 0.0 70:17.07 fw_worker_6
11622 admin 20 0 0 0 0 R 38.5 0.0 72:11.25 fw_worker_3
11619 admin 20 0 0 0 0 R 37.5 0.0 74:28.74 fw_worker_0
11620 admin 20 0 0 0 0 R 37.2 0.0 73:07.65 fw_worker_1
19952 admin 20 0 940080 353524 49376 R 31.5 0.4 110:26.24 fw_full
We can see slowly the load increase on the workers and later the RAD daemon.
RAD shows no errors in the rad dir and SmartConsole. CPU spike log is empty.
Only fix now is to failover to the other member and it starts over. At the moment I have to do failover every 10-15 min.
TAC case is going on as we speak. Wanted to reach out to the community to have a second check, maybe share some ideas.
We also upgraded the setup yesterday from take 113 to 115 R81.20 no improvement . Disabled blades: av ips etc same result.
-------
Please press "Accept as Solution" if my post solved it 🙂