cancel
Showing results for 
Search instead for 
Did you mean: 
Create a Post
Nico_V
Iron

Re: Management server slowness in R80.10

Ok, problem solved for us, but the solution is probably useless to most of you reading due to the origin of the issue and the definitely not-recommended deployment:
We were essentially testing the management server in the gns3 vm, which is virtualized via esxi. So, gaia was running as a nested vm.

We have now created a dedicated vm on the same host and it works wonderfully (the installer even recognized the virtualization technology - vmware - right away, showing the vmware logo).

Thanks for all the pointers, I learned a lot in the process.

Re: Management server slowness in R80.10

Hi,

I don't know if I did interpret the ps values correctly and if there may be an issue with memory allocation.

This is a MDS ;  memory_allocation (4096m) is set.

We experience heavy issues logging in (admins cannot login)  to the MDS or domains.

We have some java processes, which consume a lot of CPUs.

Some CPUs are utilized up to 95% for some seconds from time to time and afterwards going back to 30-40%.

Could you please give me a hint about the memory utilization of the Java processes ?

It seems like Virtual Set Size is at 5,8 GB and Resident Set Size is at 1,5 GB.

As we have set memory_allocation to 4 GB (4096) this should be OK, or have I misinterpreted something ?

Is the memory_allocation shared between the java processes ?

How's about the 32 bit java process ?

What's about all the other java processes and their memory allocation ?

[Expert@MDS-R80.10:0]# ps -aux --sort -c
Warning: bad syntax, perhaps a bogus '-'? See /usr/share/doc/procps-3.2.7/FAQ
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
admin    25868  154  1.1 5871712 1515484 ?     Ssl  08:57 694:49 /opt/CPshrd-R80/jre_64/bin/java -D_CPM=TRUE -Xaot:forceaot -Xmx4096m -Xms192m -Xgcpol
admin     5719  152  0.8 8146888 1128252 ?     Sl   08:59 681:48 /opt/CPshrd-R80/jre_64/bin/java -D_CPM_SOLR=TRUE -Xmx2048m -Xms64m -Xgcpolicy:optavgp
admin    26016 54.0  0.5 1631208 754920 ?      Ssl  08:57 243:25 /opt/CPshrd-R80/jre_64/bin/java -D_smartview=TRUE -Xdump:directory=/var/log/dump/user
admin    14364 45.6  0.3 459236 408216 ?       Ssl  09:04 202:32 /opt/CPshrd-R80/jre_32/bin/java -Xmx256m -Xms128m -Xshareclasses:none -Dfile.encoding
admin    25920 45.1  2.7 51262804 3668900 ?    SNsl 08:57 203:20 /opt/CPshrd-R80/jre_64/bin/java -D_solr=TRUE -Xdump:directory=/var/log/dump/usermode

top - 17:12:12 up 18 days,  7:57,  1 user,  load average: 13.07, 12.56, 13.46
Tasks: 610 total,   3 running, 605 sleeping,   0 stopped,   2 zombie
Cpu(s): 19.8%us,  1.3%sy,  1.5%ni, 76.7%id,  0.3%wa,  0.0%hi,  0.3%si,  0.0%st
Mem:  131868088k total, 83197084k used, 48671004k free,  2050804k buffers
Swap: 67103496k total,    28332k used, 67075164k free, 44201004k cached

0 Kudos

Re: Management server slowness in R80.10

High memory utilization by java processes is not necessarily indicative of a problem.  Based on your top output you have 128GB of RAM, 83GB of which is being used for code execution and 45GB is used for disk buffering/caching.  Swap utilization is negligible which means that all code is fully executing in RAM and not being slowed down by paging/swapping. 

I'm not seeing a problem memory-wise, how many cores does this box have?  During the login problems, you could be running short of available CPU slices (less likely), or hitting some kind of heavy disk I/O contention (more likely).  The latter would be indicated by high wa values shown by top while the issue is occurring.

--
Second Edition of my "Max Power" Firewall Book
Now Available at http://www.maxpowerfirewalls.com

"IPS Immersion Training" Self-paced Video Class
Now Available at http://www.maxpowerfirewalls.com
0 Kudos

Re: Management server slowness in R80.10

Hello Timothy,

we have 24 CPUs and from time to time some of them get wait states just for a second. I had to wait for some 10-30 seconds to get the following output. The next top refresh shows already 0.x % wait states for this CPUs.

top - 19:47:49 up 19 days, 10:33,  1 user,  load average: 16.41, 18.07, 18.15

Tasks: 726 total,  10 running, 714 sleeping,   0 stopped,   2 zombie

Cpu0  : 51.1%us,  8.5%sy,  0.0%ni, 23.1%id, 11.1%wa,  0.0%hi,  6.2%si,  0.0%st

Cpu1  : 31.2%us,  5.5%sy,  1.0%ni, 62.3%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st

Cpu2  : 29.2%us,  6.2%sy,  1.3%ni, 63.3%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st

Cpu3  : 28.2%us,  5.5%sy,  5.8%ni, 60.2%id,  0.0%wa,  0.0%hi,  0.3%si,  0.0%st

Cpu4  : 51.1%us, 10.4%sy,  0.0%ni, 31.9%id,  6.5%wa,  0.0%hi,  0.0%si,  0.0%st

Cpu5  : 81.6%us,  8.4%sy,  0.0%ni, 10.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st

Cpu6  : 51.8%us,  4.9%sy,  0.3%ni, 43.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st

Cpu7  : 29.1%us,  3.9%sy,  4.6%ni, 62.4%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st

Cpu8  : 44.8%us,  3.6%sy,  0.3%ni, 50.6%id,  0.3%wa,  0.0%hi,  0.3%si,  0.0%st

Cpu9  : 28.7%us,  3.6%sy,  0.7%ni, 66.8%id,  0.0%wa,  0.0%hi,  0.3%si,  0.0%st

Cpu10 : 32.0%us,  5.5%sy,  1.9%ni, 60.2%id,  0.0%wa,  0.0%hi,  0.3%si,  0.0%st

Cpu11 : 62.0%us,  6.8%sy,  2.3%ni, 28.6%id,  0.0%wa,  0.0%hi,  0.3%si,  0.0%st

Cpu12 : 42.7%us,  5.8%sy,  0.6%ni, 35.0%id, 15.5%wa,  0.0%hi,  0.3%si,  0.0%st

Cpu13 : 43.4%us,  9.7%sy,  0.0%ni, 46.9%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st

Cpu14 : 46.4%us,  5.5%sy,  0.6%ni, 45.8%id,  1.6%wa,  0.0%hi,  0.0%si,  0.0%st

Cpu15 : 31.6%us,  5.2%sy,  4.9%ni, 58.0%id,  0.0%wa,  0.0%hi,  0.3%si,  0.0%st

Cpu16 : 36.9%us,  4.5%sy,  0.3%ni, 57.3%id,  1.0%wa,  0.0%hi,  0.0%si,  0.0%st

Cpu17 : 51.5%us, 27.2%sy,  0.3%ni, 20.7%id,  0.0%wa,  0.0%hi,  0.3%si,  0.0%st

Cpu18 : 43.3%us,  5.9%sy,  1.6%ni, 48.9%id,  0.3%wa,  0.0%hi,  0.0%si,  0.0%st

Cpu19 : 59.4%us,  5.2%sy,  1.3%ni, 33.1%id,  1.0%wa,  0.0%hi,  0.0%si,  0.0%st

Cpu20 : 71.8%us,  6.5%sy,  0.6%ni, 19.7%id,  1.0%wa,  0.0%hi,  0.3%si,  0.0%st

Cpu21 : 32.8%us,  8.1%sy,  0.6%ni, 58.4%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st

Cpu22 : 30.4%us,  3.6%sy,  0.3%ni, 38.2%id, 27.5%wa,  0.0%hi,  0.0%si,  0.0%st

Cpu23 : 45.8%us, 13.3%sy,  0.3%ni, 40.3%id,  0.3%wa,  0.0%hi,  0.0%si,  0.0%st

Mem:  131868088k total, 130985980k used,   882108k free,  3141268k buffers

Swap: 67103496k total,    28332k used, 67075164k free, 64537056k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND

25868 admin     18   0 5732m 1.4g  12m S  199  1.1   2407:30 java

 5719 admin     21   0 7896m 1.3g 450m S  104  1.0   2058:32 java

 4379 admin     25   0 1040m 707m  48m R   97  0.5 226:22.75 fwm

24215 admin     25   0  865m 533m  49m R   94  0.4 197:02.74 fwm

13797 cp_postg  18   0 1667m 1.6g 1.5g R   88  1.3 104:00.13 postgres

13776 cp_postg  18   0 1606m 1.5g 1.5g R   85  1.2 618:29.11 postgres

24246 admin     18   0  680m 357m  46m R   62  0.3 120:20.54 fwm

26016 admin     18   0 1585m 734m  13m S   35  0.6   1152:20 java

 4595 admin     16   0  890m 566m  48m R   34  0.4 259:16.34 fwm

 4717 admin     15   0  709m 395m  47m S   33  0.3 141:22.46 fwm

24299 admin     15   0  911m 595m  48m S   29  0.5 195:31.70 fwm

Here some I/O statistics (taken every second):

(the first output listed here is not the first output of that command - so you can trust the values 🙂 )

avg-cpu:  %user   %nice %system %iowait  %steal   %idle

          29.86    0.71    2.29    0.29    0.00   66.85

Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util

sda               0.00   779.21  0.00 732.67     0.00 12095.05    16.51     0.28    0.38   0.07   4.95

sdb               0.00     0.00  0.00 891.09     0.00 23706.93    26.60     0.92    1.04   0.06   5.25

dm-0              0.00     0.00  0.00  0.00     0.00 12380.20     0.00     0.46    0.00   0.00   5.15

dm-1              0.00     0.00  0.00 2927.72     0.00 23421.78     8.00     5.34    1.83   0.02   5.54

avg-cpu:  %user   %nice %system %iowait  %steal   %idle

          22.48    1.46    3.75    0.25    0.00   72.06

Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util

sda               0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

dm-0              0.00     0.00  1.00  0.00    16.00  7368.00  7384.00     0.28  282.00  39.00   3.90

dm-1              0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle

          30.13    0.67    5.49    0.17    0.00   63.55

Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util

sda               0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

sdb               0.00     0.00  0.00 807.92     0.00 13306.93    16.47     0.68    0.85   0.05   3.76

dm-0              0.00     0.00  0.00  0.00     0.00 13306.93     0.00     1.16    0.00   0.00   3.76

dm-1              0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle

          29.67    2.46    3.87    0.33    0.00   63.67

Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util

sda               0.00  1455.45  0.00 636.63     0.00 16736.63    26.29     2.19    3.43   0.04   2.67

sdb               0.00     0.00  0.00 629.70     0.00 14217.82    22.58     0.95    1.50   0.08   4.95

dm-0              0.00     0.00  0.00  0.00     0.00 10336.63     0.00     0.60    0.00   0.00   4.75

dm-1              0.00     0.00  0.00 2577.23     0.00 20617.82     8.00     9.47    3.67   0.01   2.67

0 Kudos
Ivan_Moore
Nickel

Re: Management server slowness in R80.10

I would say that now that we have been running R80.10 since April I would go back to R77 in a heartbeat if I could.  R80.10 is the biggest hunk of junk Check Point might have ever put out.  Performance has gotten worse and worse on our system despite working with R&D for months.    Throughout the day folks can't even log into Smart Console most of the time.  Policy pushes, global assignments, etc.. fail without error...they just fail.  

I spent nearly 60 hours upgrading our Provider-1 environment to 80.10...twice...because they said it would take 10 hours to upgrade our primary the first time and at 24 hours in I had to make the decision to back out once it finished due to not having enough time in our maintenance window.   I won't get that time back and the results make me just hate the time I spent.  

And we have to get everything upgraded to R80.10 by May???   I can't trust 80.x after this experience and will recommend to anyone to avoid it like the plague as it's not ready.  

0 Kudos

Re: Management server slowness in R80.10

Hi Ivan! Sorry to hear about your experience with R80.10, does not sound good at all. I just got curious as for some of us it's been relatively "painless" having only minor bumps, would you mind sharing rough numbers from your environment? So that others that are facing upgrade or not question have better judgement.

I can happily provide ours:

ParameterNumber
Domain count<30
Gateway count<50
Total rule count<20000
CPU cores (R80.10/R77.30)16/8
RAM GB124/16
Number of admins<30
Separate log serveryes
Managing VSXyes

I'm just guessing that yours is considerably bigger. Maarten Sjouw‌ would have some comments here I'm sure Smiley Happy

Re: Management server slowness in R80.10

Hello Ivan,

regarding your login issues:

please make sure you have the latest SmartConsole R80.10 version installed.

This may help...

0 Kudos

Re: Management server slowness in R80.10

I would disagree, last two releases (56 and 73) made it worse. For us and few others that reported it here  

Ivan_Moore
Nickel

Re: Management server slowness in R80.10

ParameterNumber
Domain count42
Gateway count284
Total rule count>40000
CPU cores (R80.10/R77.30)16
RAM GB197G
Number of admins>30 
Separate log serveryes
Managing VSXyes

We also have our Management servers in different regions.  Primary MDM in US, Secondary MDM in Germany.  Each have local MLM's (2 each region).  We have a lot of admin's configured but at any point in time there is probably no more than 10 connected.   

We use Tufin for policy management so a ton of API calls...and pretty constant during the day.  

But kick off a couple policy installs and things crawl.  Tried to create a new Domain and had issues with the CMA being created...then it got stuck and didn't clean up after itself properly.  Didn't know this until we restarted the MDM a few days later and it wouldn't start.  I only found the issue after running cpm doctor and seeing the errors that way.  Still waiting on how to fix some of those issues found in that report months later.  

Ivan

Highlighted

Re: Management server slowness in R80.10

I think you might have hit nail on the head! We had a lot of slowness connected to Tufin activity (took a while to work out that). So make sure that you are running the latest take 142 and have 64bit JRE with extended memory from deafult 256MB/32bit setup (https://community.checkpoint.com/thread/9495-api-dying-on-mds-take-142-every-few-days ). Or should I say hopefully CP has already worked that out for you Smiley Happy

0 Kudos
Ivan_Moore
Nickel

Re: Management server slowness in R80.10

we are on 121 right now.  Been looking at 142.  Will probably jump to that once our other fixes are ported.

0 Kudos

Re: Management server slowness in R80.10

And one more thing that you hopefully knew already - client to MDS latency makes huge difference these days. We had admins in Brazil that basically were not able to use primary MDS in Europe until we provided virtual terminal server for them here in Europe. Latency was around 180ms, impossible to work. Payback for having multiple concurrent admins in RW mode Smiley Sad

0 Kudos
Ivan_Moore
Nickel

Re: Management server slowness in R80.10

I also know there are MDS HA performance issues which I think may also be something we are running into.  

Re: Management server slowness in R80.10

For R80.10:

ParameterNumber
Domain count23
Gateway count85
Total rule count<10000
CPU cores R80.108
RAM GB 96G
Number of admins>60 
Separate log serverno, max retention 28 days (scripted)
Managing VSXyes

For R77.30:

ParameterNumber
Domain count130
Gateway count505
Total rule count

<20000

Number of MDS3
CPU cores per MDS8
RAM GB 96G
Number of admins>80 
Separate log serverno, max retention 28 days (scripted)
Managing VSXyes

We are not using tufin, we do use admin terminal servers for our internal users which are close to the MDS in network sense of things.

Our customers with Read Only access do not have problems we have heard with slowness.

The R80.10 MDS was built from scratch and as soon as R80.20 comes out we will wait a couple of weeks before we will do an upgrade to R80.20 and start the planning for the R77.30 setup to be migrated.

Regards, Maarten