- CheckMates
- :
- Products
- :
- Quantum
- :
- SMB Gateways (Spark)
- :
- Re: R77.20.85 performance issue on centrally manag...
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Are you a member of CheckMates?
×- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
R77.20.85 performance issue on centrally managed SMB
Guys,
That build is causing significant traffic delays and CPU load is higher than that of R77.20.81.
Any of you experiencing similar problem ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I see. I will try to replicate the behavior in a lab and see if upgrade vs clean install makes any difference.
I only think that for a small office appliance, remote upgrade via WebUI MUST work well. If it doesn't, something needs to be corrected.
Offices that use a 1400 appliance are not like big companies that have their own IT team to do a clean install after hours.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Remember that the unit has three different firmware images: Active fw, previous fw and backup fw. In WebGUI, only the active fw is changed, current fw is promoted to previous FW and the backup fw is untouched. Revert to the factory default image and settings installs the backup firmware as active firmware.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am 99.999% sure it is a problem in the firmware. I saw a message from top command complaining about incorrect kernel HZ value. That will explain why it is so slow. But I had no time to look into it more closely.
If someone is running 85 with central management and has no performance issues then that would be a different story.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Issue resides on SMB local management and central management on different capacity, cpu load and GUI times out. Firmware issue. Great indication is the kernel HZ value error msg. when you use the top command.
Installing the firmware via USB, believe when I did this will wipe your configurations and set the box to factory defaults, in addition, upgrade your primary/backup firmware image on the unit.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Coming in late to this thread, but the behavior you are describing sounds eerily similar to the CPU downclocking that occurs when a CPU fan fails, as the CPU slows itself down to keep from literally bursting into flames. The load average rises, the system is sluggish, and always busy for no glaringly obvious reason. Not saying you have a CPU fan failure here, but perhaps some kind of issue in the firmware that is causing the CPU to downclock or run at a non-standard clock speed. That incorrect kernel HZ message in top is quite intriguing...
--
CheckMates Break Out Sessions Speaker
CPX 2019 Las Vegas & Vienna - Tuesday@13:30
March 27th with sessions for both the EMEA and Americas time zones
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanx for your comments Tim. In this case it was not a fan failure because a previous build runs just fine on the very same hardware. I have also ran 'show diag' and there was no indication of a fan failure.
I am remotely speculating here in assuming the problem was that they tried to dynamically adjust CPU performance based on current load and that did not go as planned. Fact is that build will boot and behave well for some time and then load spikes will start to happen more and more often for no apparent reason.
The HZ message appears on previous builds as well. It appears at some random time when running 'top'. So likely not the root cause.
Nevertheless, from what I know R&D identified the problem and is currently going through full cycle QA testing. Are they going to communicate what was the problem I don't know but for me enough is that it will be fixed after all.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
How is the fix working so far?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
running good
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you just to confirm are you running build 731 central or standalone? just surprised that no docs are out on this release of fixes.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, there should be some info about 751. sk140193 still directs us to download of 731.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Agree with Pedro. Also, I do not understand why are we not getting notifications in Security Management server when new firmware is available just like it does for the other appliances.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Performance issue is solved for sure. But can't say a lot more because I have not tested it much today.
Just let me clarify that I was provided with a jumbo hotfix. That was a quick hack just to prove where the problem was. The build that we will get officially as GA may very well be different. Let's be patient and wait and we will see.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Corrupted firmware will cause process crashes, boot failures, spontaneous reboots, etc. Degraded performance is very unlikely to be caused by that. It is either configuration issue or just plain bug in the firmware.... or binaries compiled in debug mode that somehow made it in release version of the image
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
100% agreed..
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Degraded performance can be caused by a wide spectrum of causes - and of course, corrupted firmware that makes daemons crash will cost ressources. So i would assume that it is unlikely,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I saw another instance of this same issue recently.
Anyone experiencing this issue is encouraged to open a TAC SR.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
That will require test lab with access from Internet for CheckPoint to investigate. Which unfortunately I do not have.
But what about CheckPoint itself? Don't you guys have test lab and spare quality engineer to try reproduce this issue and see if it is common or configuration specific? Because with those of us that reported it already it seems more like a common issue to me.
If you can point the appropriate person to this thread may be something like that can be arranged.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I can't say if there's enough information in this thread to reproduce the issue, which is why it's important everyone who has this issue open a TAC SR.
That said, I did point the relevant parties to this thread already.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanx, Dameon. I may open SR next week if it is fine for TAC to provide them only with CPinfo files and such.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dameon, I am setting up a lab with version 85 to try to reproduce. My production gateways are back to .81 version.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Reminder that issue is not only on central managed devices. Standalone devices have a latency with GUI timeouts when navigating. Issue already reported and can be associated with high cpu, not present with build 541 or 551 for r77.20.81
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Hristo,
Thanks for identify and raising this issue
Currently we don't manage to replicate such behavior internally
we would like to work with you to investigate the problem
Please contact me directly amire@checkpoint.com
Amir, SMB Director R&D
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Amir,
Thanx for your support. Mail sent.
I'll keep this thread posted on recent developments about this.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Amir appreciate that attention is also looked into the GUI lag/latency causing timeout and not sure if it's related to the new event viewer features added. This is on standalone. Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We are trying, together with CheckPoint R&D to identify in which of the builds problem appeared for first time. We will probably know that tomorrow.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Today we identified the build where it all started and I was provided with jumbo hotfix to test. So far, it looks good. But I will do a bit more testing tomorrow before I make any final statements.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
That's good news.. are you also testing the GUI response?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm sorry mate, mine a centrally managed appliances.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can you share with us? When did it start?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I prefer such info to be coming from CheckPoint and not me as I am not sure how accurate it will be.
