- CheckMates
- :
- Products
- :
- Quantum
- :
- Management
- :
- R80.40 management server performance issues
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Are you a member of CheckMates?
×- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
R80.40 management server performance issues
All,
Since we did an in place upgrade from R80.10 to R80.40 JHF 83 on our management station we are seeing decreased performance on our management station. The issue seems across the board. We see slowness in the smartconsole and ssh sessions. Both seem very laggy since the upgrade. We did reboot the management server since the upgrade. It seemed fine for a couple of days, but got worse since. I see that java is taking up a ton of CPU see attached top. Anyone see or experience this? I do have TAC case opened, but wanted to check if anyone out there has seen this.
Thanks,
Bill
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The process that is taking up all the CPU is niced, so it will use CPU only when available.
I suspect the indexing going on is related to the upgrade and will subside shortly.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Bill,
Can you please send us your 'SmartEventCollectLogs' output? or point to TAC case with this info?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Dror,
Pardon my ignorance. How do I get the SmartEventCollectLogs output? Would I need to run 'SmartEventSetDebugLevel all trace' first and for how long?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
No.
Once load is again high, simply run it as is via expert mode CLI:
SmartEventCollectLogs
and attach output here and to TAC ticket.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As Phoneboy said the indexing processes are running with minimum CPU and IO priority and will get shoved out of the way when other work needs to get done. However the very high wio values on some cores and not others is a bit concerning, are you using RAID for your disks? Is the RAID array in Optimal state? (use expert mode raid_diagnostic command)
You may want to check this recent SK, which is for gateways but sounds eerily similar to your situation: sk170560: High CPU, high IOWait utilization on random CPUs, and delayed CLI outputs on various comma...
now available at maxpowerfirewalls.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Phoneboy and Tim,
Thanks for your inputs. I will check out the SK as well. I checked the RAID on the disks. The RAID-5 state is optimal and all drives checked out fine. Our management server is an open server doing double duty as a management server and logging server at the same time. Our logging partition is 8TB large and consumes about 6TB. The servers were upgraded about 2 weeks ago. A reboot seems to help it for several days after. We recently rebooted it and keeping an eye out on it for performance issues.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hmm even with RAID in an optimal state, it is starting to smell like your disk path is oversubscribed a bit. Please post output of these two commands to display logging and indexing load:
cpstat mg -f log_server
cpstat mg -f indexer
now available at maxpowerfirewalls.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
# cpstat mg -f log_server
Log Receive Rate: 9355
Log Receive Rate Peak: 61544
Log Receive Rate Last 10 Minutes: 13169
Log Receive Rate Last Hour: 12465
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We were hit by the fix 'NEW: Solr server process is restarted automatically if it is not responsive for a long time.' in r80.30 take 219 - had to downgrade to take 217
We saw the same issue you are describing. The same "fix" were implemented in r80.40 take 78 - perhaps you are hit by the same issue? tail cpm.elg and you will see it crashing constantly.
Or upgrade to take 87 where the issue is fixed. See sk170634
take 87 just went GA - so I wouldn't recommend it.
Best regards,
Henrik
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Henrik,
I did the tail cpm.elg and see the Solr restarting. I will post this info in our case as well.
tail -f cpm.elg | grep -E 'Stopping|Starting'
25/11/20 07:48:42,784 INFO fts.solr.SolrServerRunner [qtp-536021905-133083]: Stopping Solr with $MDS_TEMPLATE/scripts/solr_stop.sh script
25/11/20 07:48:42,787 INFO fts.solr.SolrServerRunner [qtp-536021905-133083]: Starting Solr server with command: /opt/CPshrd-R80.40/jre_64/bin/java -D_CPM_SOLR=TRUE -Xmx8192m -Xms64m -Xgcpolicy:optavgpause -Djava.io.tmpdir=/opt/CPsuite-R80.40/fw1/tmp -Xaggressive -Xshareclasses:none -Xdump:heap:events=gpf+user -Xdump:directory=/var/log/dump/usermode -Xdump:tool:none -Xdump:tool:events=gpf+abort+traceassert+corruptcache,priority=1,range=1..0,exec=javaCompress.sh CPM_SOLR %pid -Xdump:tool:events=systhrow,filter=java/lang/OutOfMemoryError,priority=1,range=1..0,exec=javaCompress.sh CPM_SOLR %pid -Xdump:tool:events=throw,filter=java/lang/OutOfMemoryError,priority=1,exec=kill -9 %pid -Dsolr.solr.home=/opt/CPsuite-R80.40/fw1/Solr/solr/ -DNGM.SOLR.LOG.DIR=/opt/CPsuite-R80.40/fw1/log -Djava.util.logging.config.file=/opt/CPsuite-R80.40/fw1/Solr/etc/logging.properties -DSTART=/opt/CPsuite-R80.40/fw1/Solr/start.config -Djetty.home=/opt/CPsuite-R80.40/fw1/Solr/ -DSTOP.KEY=checkpointkey -DSTOP.PORT=8982 -Dpath=/opt/CPsuite-R80.40/fw1/cpm-server/java_is.jar:/opt/CPsuite-R80.40/fw1/cpm-server/java_sic.jar:/opt/CPshrd-R80.40/jars/jetty_assist.jar -jar /opt/CPsuite-R80.40/fw1/Solr/start.jar
25/11/20 07:49:06,916 INFO fts.solr.SolrServerRunner [qtp-536021905-206009]: Stopping Solr with $MDS_TEMPLATE/scripts/solr_stop.sh script
25/11/20 07:49:06,918 INFO fts.solr.SolrServerRunner [qtp-536021905-206009]: Starting Solr server with command: /opt/CPshrd-R80.40/jre_64/bin/java -D_CPM_SOLR=TRUE -Xmx8192m -Xms64m -Xgcpolicy:optavgpause -Djava.io.tmpdir=/opt/CPsuite-R80.40/fw1/tmp -Xaggressive -Xshareclasses:none -Xdump:heap:events=gpf+user -Xdump:directory=/var/log/dump/usermode -Xdump:tool:none -Xdump:tool:events=gpf+abort+traceassert+corruptcache,priority=1,range=1..0,exec=javaCompress.sh CPM_SOLR %pid -Xdump:tool:events=systhrow,filter=java/lang/OutOfMemoryError,priority=1,range=1..0,exec=javaCompress.sh CPM_SOLR %pid -Xdump:tool:events=throw,filter=java/lang/OutOfMemoryError,priority=1,exec=kill -9 %pid -Dsolr.solr.home=/opt/CPsuite-R80.40/fw1/Solr/solr/ -DNGM.SOLR.LOG.DIR=/opt/CPsuite-R80.40/fw1/log -Djava.util.logging.config.file=/opt/CPsuite-R80.40/fw1/Solr/etc/logging.properties -DSTART=/opt/CPsuite-R80.40/fw1/Solr/start.config -Djetty.home=/opt/CPsuite-R80.40/fw1/Solr/ -DSTOP.KEY=checkpointkey -DSTOP.PORT=8982 -Dpath=/opt/CPsuite-R80.40/fw1/cpm-server/java_is.jar:/opt/CPsuite-R80.40/fw1/cpm-server/java_sic.jar:/opt/CPshrd-R80.40/jars/jetty_assist.jar -jar /opt/CPsuite-R80.40/fw1/Solr/start.jar
25/11/20 07:49:14,937 INFO fts.solr.SolrServerRunner [qtp-536021905-21461]: Stopping Solr with $MDS_TEMPLATE/scripts/solr_stop.sh script
25/11/20 07:49:14,940 INFO fts.solr.SolrServerRunner [qtp-536021905-21461]: Starting Solr server with command: /opt/CPshrd-R80.40/jre_64/bin/java -D_CPM_SOLR=TRUE -Xmx8192m -Xms64m -Xgcpolicy:optavgpause -Djava.io.tmpdir=/opt/CPsuite-R80.40/fw1/tmp -Xaggressive -Xshareclasses:none -Xdump:heap:events=gpf+user -Xdump:directory=/var/log/dump/usermode -Xdump:tool:none -Xdump:tool:events=gpf+abort+traceassert+corruptcache,priority=1,range=1..0,exec=javaCompress.sh CPM_SOLR %pid -Xdump:tool:events=systhrow,filter=java/lang/OutOfMemoryError,priority=1,range=1..0,exec=javaCompress.sh CPM_SOLR %pid -Xdump:tool:events=throw,filter=java/lang/OutOfMemoryError,priority=1,exec=kill -9 %pid -Dsolr.solr.home=/opt/CPsuite-R80.40/fw1/Solr/solr/ -DNGM.SOLR.LOG.DIR=/opt/CPsuite-R80.40/fw1/log -Djava.util.logging.config.file=/opt/CPsuite-R80.40/fw1/Solr/etc/logging.properties -DSTART=/opt/CPsuite-R80.40/fw1/Solr/start.config -Djetty.home=/opt/CPsuite-R80.40/fw1/Solr/ -DSTOP.KEY=checkpointkey -DSTOP.PORT=8982 -Dpath=/opt/CPsuite-R80.40/fw1/cpm-server/java_is.jar:/opt/CPsuite-R80.40/fw1/cpm-server/java_sic.jar:/opt/CPshrd-R80.40/jars/jetty_assist.jar -jar /opt/CPsuite-R80.40/fw1/Solr/start.jar
25/11/20 07:49:47,380 INFO fts.solr.SolrServerRunner [qtp-536021905-176307]: Stopping Solr with $MDS_TEMPLATE/scripts/solr_stop.sh script
25/11/20 07:49:47,383 INFO fts.solr.SolrServerRunner [qtp-536021905-176307]: Starting Solr server with command: /opt/CPshrd-R80.40/jre_64/bin/java -D_CPM_SOLR=TRUE -Xmx8192m -Xms64m -Xgcpolicy:optavgpause -Djava.io.tmpdir=/opt/CPsuite-R80.40/fw1/tmp -Xaggressive -Xshareclasses:none -Xdump:heap:events=gpf+user -Xdump:directory=/var/log/dump/usermode -Xdump:tool:none -Xdump:tool:events=gpf+abort+traceassert+corruptcache,priority=1,range=1..0,exec=javaCompress.sh CPM_SOLR %pid -Xdump:tool:events=systhrow,filter=java/lang/OutOfMemoryError,priority=1,range=1..0,exec=javaCompress.sh CPM_SOLR %pid -Xdump:tool:events=throw,filter=java/lang/OutOfMemoryError,priority=1,exec=kill -9 %pid -Dsolr.solr.home=/opt/CPsuite-R80.40/fw1/Solr/solr/ -DNGM.SOLR.LOG.DIR=/opt/CPsuite-R80.40/fw1/log -Djava.util.logging.config.file=/opt/CPsuite-R80.40/fw1/Solr/etc/logging.properties -DSTART=/opt/CPsuite-R80.40/fw1/Solr/start.config -Djetty.home=/opt/CPsuite-R80.40/fw1/Solr/ -DSTOP.KEY=checkpointkey -DSTOP.PORT=8982 -Dpath=/opt/CPsuite-R80.40/fw1/cpm-server/java_is.jar:/opt/CPsuite-R80.40/fw1/cpm-server/java_sic.jar:/opt/CPshrd-R80.40/jars/jetty_assist.jar -jar /opt/CPsuite-R80.40/fw1/Solr/start.jar
25/11/20 07:49:48,261 INFO fts.solr.SolrServerRunner [qtp-536021905-161867]: Stopping Solr with $MDS_TEMPLATE/scripts/solr_stop.sh script
25/11/20 07:49:48,263 INFO fts.solr.SolrServerRunner [qtp-536021905-161867]: Starting Solr server with command: /opt/CPshrd-R80.40/jre_64/bin/java -D_CPM_SOLR=TRUE -Xmx8192m -Xms64m -Xgcpolicy:optavgpause -Djava.io.tmpdir=/opt/CPsuite-R80.40/fw1/tmp -Xaggressive -Xshareclasses:none -Xdump:heap:events=gpf+user -Xdump:directory=/var/log/dump/usermode -Xdump:tool:none -Xdump:tool:events=gpf+abort+traceassert+corruptcache,priority=1,range=1..0,exec=javaCompress.sh CPM_SOLR %pid -Xdump:tool:events=systhrow,filter=java/lang/OutOfMemoryError,priority=1,range=1..0,exec=javaCompress.sh CPM_SOLR %pid -Xdump:tool:events=throw,filter=java/lang/OutOfMemoryError,priority=1,exec=kill -9 %pid -Dsolr.solr.home=/opt/CPsuite-R80.40/fw1/Solr/solr/ -DNGM.SOLR.LOG.DIR=/opt/CPsuite-R80.40/fw1/log -Djava.util.logging.config.file=/opt/CPsuite-R80.40/fw1/Solr/etc/logging.properties -DSTART=/opt/CPsuite-R80.40/fw1/Solr/start.config -Djetty.home=/opt/CPsuite-R80.40/fw1/Solr/ -DSTOP.KEY=checkpointkey -DSTOP.PORT=8982 -Dpath=/opt/CPsuite-R80.40/fw1/cpm-server/java_is.jar:/opt/CPsuite-R80.40/fw1/cpm-server/java_sic.jar:/opt/CPshrd-R80.40/jars/jetty_assist.jar -jar /opt/CPsuite-R80.40/fw1/Solr/start.jar
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Your log-rate is quite high: ~10,000 logs/sec with peaks of 60K logs/sec.
Please send cpm.elg's & the 'SmartEventCollectLogs' output here (TAC ticket #?).
Also, does it continue to happen consistently, both the CPM_Solr restarts & the high-load causing slowness of Mgmt. server?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Dror,
I have upload the cpm.elg from today and SmartEventCollectLogs from today to our case. I put these in the incoming folder. The case is 6-0002423430. The load may not be as high since we are on holiday today.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I cannot find your case no. for some reason.
if load is higher today, then please re-generate & re-upload.
Please also send it directly to my Email: drora@checkpoint.com.
thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The load is higher today. I am in process of getting the info to you. I will upload to case again and email them to you as well when completed.