SOLR CPU usage R81.10

Storsveen · ‎2022-02-15

Hi!

I have notised that after upgrading our management server (management + smartEvent/log-server) to R81.10, the cpu usage of our event server is almost constantly at 90-99% cpu utilization. About 7,5/8 cores is used by SOLR.

top - 14:14:19 up 37 min,  1 user,  load average: 12.79, 13.95, 11.50
Tasks: 439 total,   3 running, 436 sleeping,   0 stopped,   0 zombie
%Cpu(s):  1.0 us,  0.7 sy, 93.7 ni,  3.9 id,  0.0 wa,  0.5 hi,  0.2 si,  0.0 st
KiB Mem : 32726544 total,  3412652 free,  6690484 used, 22623408 buff/cache
KiB Swap: 33029636 total, 33029636 free,        0 used. 25099216 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 9731 admin     39  19  0.246t 0.012t 9.686g S 745.5 38.2 215:25.35 /opt/CPshrd-R81.10/jre_64/bin/java -D_solr=TRUE -Xdump:directory=/var/log/dump/usermode -Xdump:heap:events=gpf+user -Xdump:tool:none+
 9321 admin     20   0 1052836 695548  39836 R  10.6  2.1   4:28.82 fwd -n
10682 admin     39  19  659876 198044  83292 S   6.0  0.6   4:35.70 /opt/CPrt-R81.10/log_indexer/log_indexer
 2183 admin     20   0    3908   1732   1068 R   0.7  0.0   0:00.03 top s
  236 admin     20   0       0      0      0 S   0.3  0.0   0:00.70 [kworker/u16:29]
 8802 admin     20   0   29508   2208   1524 S   0.3  0.0   0:02.02 /bin/redis-server 127.0.0.1:6379
 9491 admin     20   0  212080  74036  38120 S   0.3  0.2   0:02.07 fwmha -H
10432 admin     20   0 8106120 1.900g  13876 S   0.3  6.1   3:47.45 /opt/CPshrd-R81.10/jre_64/bin/java -D_CPM=TRUE -Xaot:forceaot -Xmx3072m -Xms192m -Xgcpolicy:optavgpause -Djava.io.tmpdir=/opt/CPsuit+

We have experienced a lot of issues with missing logs and SmartConsole complaining about loosing connection to the log server.

In this moment one of our maestro security groups are not able to send logs to the log server, and I believe that it is related to this high cpu usage.
The CPU usage goes right back up after cpstop/cpstart og reboot of the log-server.

Any tips in further troubleshooting regarding this issue? I can se that the size of the local log file ($FWDIR/log/fw.log) on the appliances is incresing rapidly.

Fernando_Lopez · ‎2022-05-03

Hi Storsveen!

Did you get any update on this case?

I'm having similar issues with one of my customers and would be really glad if you could share any info that could help.

Thanks!

Ruan_Kotze · ‎2022-05-03

Had a ticket open with TAC for the same symptoms after upgrading the environment to R81.10 - SOLR taking up a lot of CPU and also crashing / restarting all the time. After a lot of troubleshooting we ended up doubling the RAM (it was running inside a VM so easy to do) and the issues immediately disappeared.

Edit: To be fair - a couple of internal segmentation firewalls were also added to the above SMS during this period, so the amount of logs processed increased. TAC can very quickly check if it's a sizing issue

Arskazv · ‎2022-08-02

I have a bit similar issues with dedicated R81.10 SmartEvent.

solr generates dumps at least once a day and solr logs tell continuously:

2022-08-03T09:32:17,029 WARN [DefaultQuartzScheduler_Worker-4] com.checkpoint.rfl.SolrPerformanceMonitoringJob.execute:2 - <<== Threads Report: number of threads - 276 ==>>
2022-08-03T09:32:17,031 WARN [DefaultQuartzScheduler_Worker-4] com.checkpoint.rfl.SolrPerformanceMonitoringJob.execute:5 - <<== Memory Report: Total Memory - 2048 MB; Used - 1943 MB, Free - 104 MB, Heap - 1943 MB, Non Heap - 104 MB, Percent Used - 94 %==>>

[Expert@smartevent-p3:0]# top
top - 09:35:22 up 1 day, 19:05, 1 user, load average: 12.68, 11.15, 7.91
Tasks: 289 total, 1 running, 288 sleeping, 0 stopped, 0 zombie
%Cpu0 : 16.3 us, 3.6 sy, 70.6 ni, 7.9 id, 0.0 wa, 0.4 hi, 1.2 si, 0.0 st
%Cpu1 : 2.4 us, 0.8 sy, 89.8 ni, 6.7 id, 0.0 wa, 0.4 hi, 0.0 si, 0.0 st
%Cpu2 : 0.0 us, 0.4 sy, 92.8 ni, 6.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu3 : 0.0 us, 0.0 sy, 95.6 ni, 4.0 id, 0.0 wa, 0.4 hi, 0.0 si, 0.0 st
%Cpu4 : 0.0 us, 0.0 sy, 92.5 ni, 7.1 id, 0.0 wa, 0.4 hi, 0.0 si, 0.0 st
%Cpu5 : 0.0 us, 0.4 sy, 90.5 ni, 8.7 id, 0.0 wa, 0.4 hi, 0.0 si, 0.0 st
%Cpu6 : 0.8 us, 0.8 sy, 94.9 ni, 3.2 id, 0.0 wa, 0.4 hi, 0.0 si, 0.0 st
%Cpu7 : 0.0 us, 0.0 sy, 93.7 ni, 6.0 id, 0.0 wa, 0.4 hi, 0.0 si, 0.0 st
%Cpu8 : 0.0 us, 0.4 sy, 92.9 ni, 5.9 id, 0.0 wa, 0.8 hi, 0.0 si, 0.0 st
%Cpu9 : 0.0 us, 0.4 sy, 94.5 ni, 4.3 id, 0.0 wa, 0.4 hi, 0.4 si, 0.0 st
%Cpu10 : 0.0 us, 0.4 sy, 94.0 ni, 5.2 id, 0.0 wa, 0.4 hi, 0.0 si, 0.0 st
%Cpu11 : 3.2 us, 0.4 sy, 90.1 ni, 5.9 id, 0.0 wa, 0.4 hi, 0.0 si, 0.0 st
KiB Mem : 32725928 total, 4313080 free, 6906192 used, 21506656 buff/cache
KiB Swap: 33029636 total, 32893956 free, 135680 used. 24518796 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4048 admin 39 19 0.211t 7.566g 5.370g S 1079 24.2 124:35.81 java
5093 admin 20 0 467808 360408 22168 S 24.9 1.1 367:47.50 cpsead
4328 admin 39 19 1220200 741492 9136 S 15.0 2.3 377:20.99 log_indexer
3887 admin 20 0 9982.2m 896472 13948 S 0.4 2.7 43:33.54 java

SmartEvent reports are empty then and then.

IZoom · ‎2023-04-25

Did you solve it? I have the same issue.

Mantecoso · ‎2023-05-08

Try this on SmartLog or SmartEvent

$FWDIR/scripts/override_server_setting.sh -p RFL_SOLR_MAX_HEAP 8192

cpstop;cpstart

Are you a member of CheckMates?

SOLR CPU usage R81.10