Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Daniel_Collins
Collaborator
Jump to solution

R80.20 Management Performance

Hello Check Mates!

I hope you can help perhaps shed some light on an issue we're seeing with one of our customers. The customer is commercially sensitive due to some long-standing issues they've had with a 61k appliance and a recent code upgrade on the system (management at the moment) to R80.20 has degraded performance from the customer's perspective.

What we're seeing is this:

- A slowness in stacking and unstacking the subject headings in the rulebase
 - There is around 700 rules with 200 subject headings in the policy
 - What we see is you press the button to drop the subject headings and then the wire frames appear for the rules, a few seconds later the rule content pops into the console
- Adding say objects to rules (clicking the *) that there is a good second or few seconds delay until the search box appears.

The management server is on R80.20 with the latest T91 of the JHF installed. Very well specced, 16 cores / 18GB RAM / SSD based flash storage in VMware. The console is being run on a machine with 32 cores and 64GB of RAM, similar storage scenario. We observed the server via SSH while testing these issues and saw no noticable load on the system, use of swap or any %WA on I/O.

From our perspective as a partner, the behaviour we see other than the rule stacking is as we'd expect from an R80.x install of management. I do not have a point of comparison for the rule stacking issue, all of the customers I have worked with as of late (in R80.x days) have significantly smaller rulebases or far fewer subject headings.

The customer was on R77.30 before and has noticed that the server performances significantly worse in R80.20 than it did previously. We can replicate these issues through a database export into a lab server as well as exporting the policy via the python script into a fresh management server, it follows the policy.

There is an element of expectation here, but this customer is commercially sensitive as we will be trying to ensure they continue to replace the 61k's with another Check Point appliance (something that's not SP based) so we're looking to see what we can do in terms of tuning up performance of the management server.

We're not in a position to re-jig the policy (in terms of in-line layers, due to the 61k being on R76SP.50 and consultancy time needed to do so prior to a replacement solution) but the policy is very tidy. Some perhaps duplication but nothing severe.

I've been through the VMware tuning guide on sk104848 and not had any noticeable difference..

Any thoughts?

1 Solution

Accepted Solutions
HeikoAnkenbrand
Champion Champion
Champion
0 Kudos
20 Replies
PhoneBoy
Admin
Admin
It's true that R77.x and R80.x have very different performance characteristics with respect to the management server.
You mention around 700 rules with 200 section headings.
Any idea how many objects we're talking about?

While there are many differences between R77.x and R80.x, one important one might explain the differences you're seeing.
In SmartDashboard R77.x, prior to rendering anything in SmartDashboard, the entire rulebase would need to be read into the client.
That meant a longer startup time for the client, but you're working with all the data locally, so it meant the client appeared "fast."
Any changes you made were in SmartDashboard were on local data until you hit Save or Install Policy.
The downside to this approach is that only one admin could be logged in in read/write mode at one time.

In R80.x, the underlying infrastructure is dramatically different.
SmartConsole now reads in information as needed using various API calls versus trying to get it all in one go.
This reduces the startup time and allows for things like multiple concurrent read/write administrators working on the same policy, multiple sessions, etc.

It does mean certain operations like you describe take longer than they previously did.
Time to complete said operations can be impacted by network latency/bandwidth as well.
0 Kudos
HeikoAnkenbrand
Champion Champion
Champion

Hi @Daniel_Collins 

more read here:

R80.10 Management Performance Guide

➜ CCSM Elite, CCME, CCTE
0 Kudos
Martin_Valenta
Advisor

16 cpu and 18 gb ram only? I would add more RAM definitely and also as long as it's on VM, it might not get enough IO even with SSD, if disk resources are shared with others. On machine where you run SmartConsole you don't need really 32 cpu and 64 gb ram, that will not speed up anything, maybe with r77.x which was client based management, but with r80.x everything is handled by management server.

0 Kudos
Timothy_Hall
Champion
Champion

What is the network latency between the SmartConsole client system and the SMS?  Due to some of the changes mentioned by Dameon in regards to most of the processing happening on the SMS, SmartConsole performance can now be dramatically affected by high network latency in R80+.  Make sure that your .NET libraries are up to date on the SmartConsole system (especially if it is an older OS such as Windows 7) and run dxdiag to ensure all hardware-based graphics acceleration is working correctly.  Also make sure you are running the latest version of the R80.20 SmartConsole software available here: sk137593: R80.20 SmartConsole Releases

Any chance that the SmartConsole is being run from inside an RDP session?  If so make sure Font Smoothing is enabled in the RDP client, it makes a huge difference.

On the SMS side even though the OS doesn't seem to be short of memory, your 18GB of total RAM is used to set the maximum Java heap sizes available to SMS processes such as cpm which is what the SmartConsole GUI is interacting with.  Java heap sizes continue to scale upwards until about 35.6GB of RAM, if a process like cpm doesn't have enough heap when working with a large configuration, it can wind up expending more CPU time performing heap garbage collection than actually getting useful work done.  Given the size of your configuration you may want to try increasing RAM to 32GB to memory-scale the Java heap sizes higher which can make a big difference to processes such as cpm.  Core-based resource scaling tops out at 12+ cores, so your 16-core allocation is perfect.

 

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
Chris_Atkinson
Employee Employee
Employee

 

Not much really to add to the above, some great insight shared by all.

If the machine running the console is a VM ensure it has enough video memory allocated. I presume also that the production SMS was upgraded using a method providing the XFS file system?

CCSM R77/R80/ELITE
0 Kudos
Tomer_Noy
Employee
Employee

A lot of good feedback and tips were given.

I can emphasize two points to check, based on what you wrote:

1) Since the Management is running on a VM, make sure that it has dedicated resources. Sometimes you can allocate memory and CPU to a VM, but if it's shared, then other VMs can take it.

2) Latency can be a significant factor in R80.x compared to R77.x. Since most of the processing is done on the server side, the client may need to make many requests to the server to perform operations. If your latency is towards 200ms (or above), then that could have a big impact. 

 

We have recently done a lot of work to move some of the client-server requests to background threads, thus allowing us to process in parallel and avoid blocking the UI. Most of the work concentrated on the rulebase, which is a complex component with many calculations. If you indeed have high latency, then you can open a ticket and request to get these fixes as a private HF. We are working to get them into the next version and hopefully to future jumbo versions as well.

JozkoMrkvicka
Mentor
Mentor

Is there some recommended setup for VM which is hosting SmartConsole ?

Besides the already mentioned points:
1. Dedicated resources
2. The latency between SmartConsole and Management below 200 ms
3. Enabled Font Smoothing

According to the R80.20 and R80.30 Release Notes these are minimum requirements:
image.png

 

Kind regards,
Jozko Mrkvicka
0 Kudos
Timothy_Hall
Champion
Champion

In regard to dedicated CPU resources, an easy way to see if performance is being impacted by using virtual (non-dedicated) CPU's in a virtualized environment is to look at the CPU "steal" percentage ("st" in top and "%steal" in sar -u) on the SMS/MDS.  A nonzero steal indicates the percentage of time execution on a virtual CPU was blocked by the hypervisor waiting for availability of a "real" CPU.  Obviously this is not a desirable condition from a performance perspective, and generally if steal is consistently >20% you should probably look at allocating dedicated CPUs.  Steal percentage will always be zero on bare metal (non-virtualized) SMS/MDS hardware.

 

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
Daniel_Collins
Collaborator

Thanks everyone for your feedback, it's really useful.

You mention high-latency, what are you referring to - back-end server latency of C2S latency?

It's worth mentioning that all of the customer's issues are easy for me to replicate on my home lab server, although not as well equipped my C2S latency is around 1ms (as it's only one hop away) and I get all the same GUI and rule loading issues.

As one has mentioned, I genuinely don't think throwing more resources at it is the answer. I have a bit of a VMware background and throwing more resources than sometimes what the underlying system has, just causes more contention issues - that and looking at the system performance it's not wanting for anything.

From our perspective, a management server just being used for policy management (no logging nothing else) and a single administrator with 4 policies and 4 gateways - 8 CPU's and 16GB of RAM should really be sufficient. We are however happy to add more resources, I am just concerned about exacerbating the customer's perception of the products "degrading" performance..

Also the performance issues happen when It's the only VM on my lab box with nothing else running! so no contention issues there (SSD storage too).

The SmartConsole is running on physical hardware, not virtualized in both mine and the customer environment. Although the customer does RDP to the machine - I do not.

Martin_Valenta
Advisor
Everybody screamed at first time, when moved from r77.x to r80.x, but it change of architecture is at the end bringing more benefits to all customers.
Daniel_Collins
Collaborator

Thanks everyone for all your feedback it's been quite helpful.

We think we've made a good start with TAC - they provided us with a "fixed" SmartConsole and some changes to the java heap size for CPM and that has made a significant improvement to the system performance. Mostly from the new console which is a vast improvment.

TAC have confirmed that this *should* be intergrated into the main train of the console soon.. although I am not privvy to the changes made, I can only guess it's some caching/optimizing of the content pulled from the server.

0 Kudos
Martin_Valenta
Advisor
I would be interested to know what tuning on SmartConsole they did..
0 Kudos
Daniel_Collins
Collaborator

Me too! but they wouldn't disclose what's been changed ☹️

0 Kudos
Timothy_Hall
Champion
Champion

There are some hints to what was probably changed in this thread:

https://community.checkpoint.com/t5/Policy-Management/Searching-Network-Objects-in-R80-xx-is-cripple...

 

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
0 Kudos
Tomer_Noy
Employee
Employee

The optimizations were mainly in three areas:

1) Specific optimizations in groups with thousands of objects

2) Identifying UI requests to the server that were blocking the main processing thread, and moving them to background threads. Especially in cases of high latency, or slow responses from the server, this yielded significant improvements. The positive impact is due to the fact that the user doesn't "feel" that his UI is blocked and waiting, plus we are able to make many requests in parallel (instead of sequentially).

3) Additional general optimizations that were found in the investigation (lower impact)

 

Increasing caching was evaluated, but it didn't provide a significant improvement. Also, there are downsides to "over caching" since we need to make sure that objects are up-to-date and this involves extra notifications and monitoring for updates.

 

Kudos to Amir Jaron, @Nurit_Gr and their developers that implemented this.

 

The improvements are in R80.40, so anyone who wants to get them via the EA is more than welcome to join.
We also plan to integrate them to later JHFs.

 

Daniel_Collins
Collaborator
Thanks a lot for your feedback, it's much appreciated.

So for clarity this will be factored into future releases of major versions of Check Point rather than new builds of the console for older versions? Just concerned the customer might upgrade their console version (because of a new JHF) and these performance improvements aren't there...
0 Kudos
Tomer_Noy
Employee
Employee
We plan to integrate the improvements into future JHFs as well. I can’t give a specific date on that...

If you are running with private fixes, it’s always recommended to look at the JHF / SmartConsole build SK to verify that your fixes are included before updating.

Server side JHFs have a mechanism to warn you if you’re about to lose a fix, but the SmartConsole is a full replacement, so there are no checks.
Chris_Atkinson
Employee Employee
Employee

@Tomer_Noy Are you able to share any further info on the applicable JHF takes & SmartConsole builds now?

CCSM R77/R80/ELITE
0 Kudos
Tomer_Noy
Employee
Employee

R80.20 JHF take 100 (that was just released) includes these fixes.

They are listed under: PRJ-7609

Chris_Atkinson
Employee Employee
Employee

Thanks Tomer!

(Now available per sk137593)
CCSM R77/R80/ELITE
0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events