Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Collaborator

R80.40 management server performance issues

All,

Since we did an in place upgrade from R80.10 to R80.40 JHF 83 on our management station we are seeing decreased performance on our management station.  The issue seems across the board.   We see slowness in the smartconsole and ssh sessions.  Both seem very laggy since the upgrade.  We did reboot the management server since the upgrade.  It seemed fine for a couple of days, but got worse since.  I see that java is taking up a ton of CPU see attached top.  Anyone see or experience this?  I do have TAC case opened, but wanted to check if anyone out there has seen this.

Thanks,

Bill

 

 

0 Kudos
Reply
12 Replies
Highlighted
Admin
Admin

The process that is taking up all the CPU is niced, so it will use CPU only when available.
I suspect the indexing going on is related to the upgrade and will subside shortly.

0 Kudos
Reply
Highlighted
Employee++
Employee++

Hi Bill,

Can you please send us your 'SmartEventCollectLogs' output? or point to TAC case with this info?

 

0 Kudos
Reply
Highlighted
Collaborator

Hi Dror,

Pardon my ignorance. How do I get the SmartEventCollectLogs output?  Would I need to run 'SmartEventSetDebugLevel all trace' first and for how long?

0 Kudos
Reply
Highlighted
Employee++
Employee++

No.
Once load is again high, simply run it as is via expert mode CLI:
SmartEventCollectLogs
and attach output here and to TAC ticket.

0 Kudos
Reply
Highlighted
Champion
Champion

As Phoneboy said the indexing processes are running with minimum CPU and IO priority and will get shoved out of the way when other work needs to get done.  However the very high wio values on some cores and not others is a bit concerning, are you using RAID for your disks?  Is the RAID array in Optimal state?  (use expert mode raid_diagnostic command)

You may want to check this recent SK, which is for gateways but sounds eerily similar to your situation:  sk170560: High CPU, high IOWait utilization on random CPUs, and delayed CLI outputs on various comma...

Gaia 3.10 Immersion Self-paced Video Series
now available at http://www.maxpowerfirewalls.com
0 Kudos
Reply
Highlighted
Collaborator

Hi Phoneboy and Tim,

Thanks for your inputs.  I will check out the SK as well.  I checked the RAID on the disks.  The RAID-5 state is optimal and all drives checked out fine.  Our management server is an open server doing double duty as a management server and logging server at the same time.  Our logging partition is 8TB large and consumes about 6TB.  The servers were upgraded about 2 weeks ago.  A reboot seems to help it for several days after.  We recently rebooted it and keeping an eye out on it for performance issues.

 

0 Kudos
Reply
Highlighted
Champion
Champion

Hmm even with RAID in an optimal state, it is starting to smell like your disk path is oversubscribed a bit.   Please post output of these two commands to display logging and indexing load:

cpstat mg -f log_server

cpstat mg -f indexer

 

Gaia 3.10 Immersion Self-paced Video Series
now available at http://www.maxpowerfirewalls.com
0 Kudos
Reply
Highlighted
Collaborator

# cpstat mg -f log_server

Log Receive Rate:                 9355

Log Receive Rate Peak:            61544

Log Receive Rate Last 10 Minutes: 13169

Log Receive Rate Last Hour:       12465


0 Kudos
Reply
Highlighted
Contributor

We were hit by the fix 'NEW: Solr server process is restarted automatically if it is not responsive for a long time.' in r80.30 take 219 - had to downgrade to take 217

We saw the same issue you are describing. The same "fix" were implemented in r80.40 take 78 - perhaps you are hit by the same issue? tail cpm.elg and you will see it crashing constantly.

Or upgrade to take 87 where the issue is fixed. See sk170634

take 87 just went GA - so I wouldn't recommend it.

Best regards,

Henrik

0 Kudos
Reply
Highlighted
Collaborator

Hi Henrik,

I did the tail cpm.elg and see the Solr restarting.  I will post this info in our case as well.

tail -f cpm.elg | grep -E 'Stopping|Starting'

25/11/20 07:48:42,784 INFO fts.solr.SolrServerRunner [qtp-536021905-133083]: Stopping Solr with $MDS_TEMPLATE/scripts/solr_stop.sh script
25/11/20 07:48:42,787 INFO fts.solr.SolrServerRunner [qtp-536021905-133083]: Starting Solr server with command: /opt/CPshrd-R80.40/jre_64/bin/java -D_CPM_SOLR=TRUE -Xmx8192m -Xms64m -Xgcpolicy:optavgpause -Djava.io.tmpdir=/opt/CPsuite-R80.40/fw1/tmp -Xaggressive -Xshareclasses:none -Xdump:heap:events=gpf+user -Xdump:directory=/var/log/dump/usermode -Xdump:tool:none -Xdump:tool:events=gpf+abort+traceassert+corruptcache,priority=1,range=1..0,exec=javaCompress.sh CPM_SOLR %pid -Xdump:tool:events=systhrow,filter=java/lang/OutOfMemoryError,priority=1,range=1..0,exec=javaCompress.sh CPM_SOLR %pid -Xdump:tool:events=throw,filter=java/lang/OutOfMemoryError,priority=1,exec=kill -9 %pid -Dsolr.solr.home=/opt/CPsuite-R80.40/fw1/Solr/solr/ -DNGM.SOLR.LOG.DIR=/opt/CPsuite-R80.40/fw1/log -Djava.util.logging.config.file=/opt/CPsuite-R80.40/fw1/Solr/etc/logging.properties -DSTART=/opt/CPsuite-R80.40/fw1/Solr/start.config -Djetty.home=/opt/CPsuite-R80.40/fw1/Solr/ -DSTOP.KEY=checkpointkey -DSTOP.PORT=8982 -Dpath=/opt/CPsuite-R80.40/fw1/cpm-server/java_is.jar:/opt/CPsuite-R80.40/fw1/cpm-server/java_sic.jar:/opt/CPshrd-R80.40/jars/jetty_assist.jar -jar /opt/CPsuite-R80.40/fw1/Solr/start.jar
25/11/20 07:49:06,916 INFO fts.solr.SolrServerRunner [qtp-536021905-206009]: Stopping Solr with $MDS_TEMPLATE/scripts/solr_stop.sh script
25/11/20 07:49:06,918 INFO fts.solr.SolrServerRunner [qtp-536021905-206009]: Starting Solr server with command: /opt/CPshrd-R80.40/jre_64/bin/java -D_CPM_SOLR=TRUE -Xmx8192m -Xms64m -Xgcpolicy:optavgpause -Djava.io.tmpdir=/opt/CPsuite-R80.40/fw1/tmp -Xaggressive -Xshareclasses:none -Xdump:heap:events=gpf+user -Xdump:directory=/var/log/dump/usermode -Xdump:tool:none -Xdump:tool:events=gpf+abort+traceassert+corruptcache,priority=1,range=1..0,exec=javaCompress.sh CPM_SOLR %pid -Xdump:tool:events=systhrow,filter=java/lang/OutOfMemoryError,priority=1,range=1..0,exec=javaCompress.sh CPM_SOLR %pid -Xdump:tool:events=throw,filter=java/lang/OutOfMemoryError,priority=1,exec=kill -9 %pid -Dsolr.solr.home=/opt/CPsuite-R80.40/fw1/Solr/solr/ -DNGM.SOLR.LOG.DIR=/opt/CPsuite-R80.40/fw1/log -Djava.util.logging.config.file=/opt/CPsuite-R80.40/fw1/Solr/etc/logging.properties -DSTART=/opt/CPsuite-R80.40/fw1/Solr/start.config -Djetty.home=/opt/CPsuite-R80.40/fw1/Solr/ -DSTOP.KEY=checkpointkey -DSTOP.PORT=8982 -Dpath=/opt/CPsuite-R80.40/fw1/cpm-server/java_is.jar:/opt/CPsuite-R80.40/fw1/cpm-server/java_sic.jar:/opt/CPshrd-R80.40/jars/jetty_assist.jar -jar /opt/CPsuite-R80.40/fw1/Solr/start.jar
25/11/20 07:49:14,937 INFO fts.solr.SolrServerRunner [qtp-536021905-21461]: Stopping Solr with $MDS_TEMPLATE/scripts/solr_stop.sh script
25/11/20 07:49:14,940 INFO fts.solr.SolrServerRunner [qtp-536021905-21461]: Starting Solr server with command: /opt/CPshrd-R80.40/jre_64/bin/java -D_CPM_SOLR=TRUE -Xmx8192m -Xms64m -Xgcpolicy:optavgpause -Djava.io.tmpdir=/opt/CPsuite-R80.40/fw1/tmp -Xaggressive -Xshareclasses:none -Xdump:heap:events=gpf+user -Xdump:directory=/var/log/dump/usermode -Xdump:tool:none -Xdump:tool:events=gpf+abort+traceassert+corruptcache,priority=1,range=1..0,exec=javaCompress.sh CPM_SOLR %pid -Xdump:tool:events=systhrow,filter=java/lang/OutOfMemoryError,priority=1,range=1..0,exec=javaCompress.sh CPM_SOLR %pid -Xdump:tool:events=throw,filter=java/lang/OutOfMemoryError,priority=1,exec=kill -9 %pid -Dsolr.solr.home=/opt/CPsuite-R80.40/fw1/Solr/solr/ -DNGM.SOLR.LOG.DIR=/opt/CPsuite-R80.40/fw1/log -Djava.util.logging.config.file=/opt/CPsuite-R80.40/fw1/Solr/etc/logging.properties -DSTART=/opt/CPsuite-R80.40/fw1/Solr/start.config -Djetty.home=/opt/CPsuite-R80.40/fw1/Solr/ -DSTOP.KEY=checkpointkey -DSTOP.PORT=8982 -Dpath=/opt/CPsuite-R80.40/fw1/cpm-server/java_is.jar:/opt/CPsuite-R80.40/fw1/cpm-server/java_sic.jar:/opt/CPshrd-R80.40/jars/jetty_assist.jar -jar /opt/CPsuite-R80.40/fw1/Solr/start.jar
25/11/20 07:49:47,380 INFO fts.solr.SolrServerRunner [qtp-536021905-176307]: Stopping Solr with $MDS_TEMPLATE/scripts/solr_stop.sh script
25/11/20 07:49:47,383 INFO fts.solr.SolrServerRunner [qtp-536021905-176307]: Starting Solr server with command: /opt/CPshrd-R80.40/jre_64/bin/java -D_CPM_SOLR=TRUE -Xmx8192m -Xms64m -Xgcpolicy:optavgpause -Djava.io.tmpdir=/opt/CPsuite-R80.40/fw1/tmp -Xaggressive -Xshareclasses:none -Xdump:heap:events=gpf+user -Xdump:directory=/var/log/dump/usermode -Xdump:tool:none -Xdump:tool:events=gpf+abort+traceassert+corruptcache,priority=1,range=1..0,exec=javaCompress.sh CPM_SOLR %pid -Xdump:tool:events=systhrow,filter=java/lang/OutOfMemoryError,priority=1,range=1..0,exec=javaCompress.sh CPM_SOLR %pid -Xdump:tool:events=throw,filter=java/lang/OutOfMemoryError,priority=1,exec=kill -9 %pid -Dsolr.solr.home=/opt/CPsuite-R80.40/fw1/Solr/solr/ -DNGM.SOLR.LOG.DIR=/opt/CPsuite-R80.40/fw1/log -Djava.util.logging.config.file=/opt/CPsuite-R80.40/fw1/Solr/etc/logging.properties -DSTART=/opt/CPsuite-R80.40/fw1/Solr/start.config -Djetty.home=/opt/CPsuite-R80.40/fw1/Solr/ -DSTOP.KEY=checkpointkey -DSTOP.PORT=8982 -Dpath=/opt/CPsuite-R80.40/fw1/cpm-server/java_is.jar:/opt/CPsuite-R80.40/fw1/cpm-server/java_sic.jar:/opt/CPshrd-R80.40/jars/jetty_assist.jar -jar /opt/CPsuite-R80.40/fw1/Solr/start.jar
25/11/20 07:49:48,261 INFO fts.solr.SolrServerRunner [qtp-536021905-161867]: Stopping Solr with $MDS_TEMPLATE/scripts/solr_stop.sh script
25/11/20 07:49:48,263 INFO fts.solr.SolrServerRunner [qtp-536021905-161867]: Starting Solr server with command: /opt/CPshrd-R80.40/jre_64/bin/java -D_CPM_SOLR=TRUE -Xmx8192m -Xms64m -Xgcpolicy:optavgpause -Djava.io.tmpdir=/opt/CPsuite-R80.40/fw1/tmp -Xaggressive -Xshareclasses:none -Xdump:heap:events=gpf+user -Xdump:directory=/var/log/dump/usermode -Xdump:tool:none -Xdump:tool:events=gpf+abort+traceassert+corruptcache,priority=1,range=1..0,exec=javaCompress.sh CPM_SOLR %pid -Xdump:tool:events=systhrow,filter=java/lang/OutOfMemoryError,priority=1,range=1..0,exec=javaCompress.sh CPM_SOLR %pid -Xdump:tool:events=throw,filter=java/lang/OutOfMemoryError,priority=1,exec=kill -9 %pid -Dsolr.solr.home=/opt/CPsuite-R80.40/fw1/Solr/solr/ -DNGM.SOLR.LOG.DIR=/opt/CPsuite-R80.40/fw1/log -Djava.util.logging.config.file=/opt/CPsuite-R80.40/fw1/Solr/etc/logging.properties -DSTART=/opt/CPsuite-R80.40/fw1/Solr/start.config -Djetty.home=/opt/CPsuite-R80.40/fw1/Solr/ -DSTOP.KEY=checkpointkey -DSTOP.PORT=8982 -Dpath=/opt/CPsuite-R80.40/fw1/cpm-server/java_is.jar:/opt/CPsuite-R80.40/fw1/cpm-server/java_sic.jar:/opt/CPshrd-R80.40/jars/jetty_assist.jar -jar /opt/CPsuite-R80.40/fw1/Solr/start.jar

0 Kudos
Reply
Highlighted
Employee++
Employee++

Your log-rate is quite high: ~10,000 logs/sec with peaks of 60K logs/sec.
Please send cpm.elg's & the 'SmartEventCollectLogs' output here (TAC ticket #?).
Also, does it continue to happen consistently, both the CPM_Solr restarts & the high-load causing slowness of Mgmt. server?

 

Highlighted
Collaborator

Hi Dror,

I have upload the cpm.elg from today and SmartEventCollectLogs from today to our case.  I put these in the incoming folder.  The case is 6-0002423430.  The load may not be as high since we are on holiday today.

0 Kudos
Reply