- Local User Groups
Welcome to CheckMates
Journey to the Cloud with Confidence!
Webinar: Wed 10 June @ 8am PT | 11am ET
I am Gil Shwed
Ask Me Anything!
for working from home
APT41 and Living Off The Land
I see. I will try to replicate the behavior in a lab and see if upgrade vs clean install makes any difference.
I only think that for a small office appliance, remote upgrade via WebUI MUST work well. If it doesn't, something needs to be corrected.
Offices that use a 1400 appliance are not like big companies that have their own IT team to do a clean install after hours.
Remember that the unit has three different firmware images: Active fw, previous fw and backup fw. In WebGUI, only the active fw is changed, current fw is promoted to previous FW and the backup fw is untouched. Revert to the factory default image and settings installs the backup firmware as active firmware.
I am 99.999% sure it is a problem in the firmware. I saw a message from top command complaining about incorrect kernel HZ value. That will explain why it is so slow. But I had no time to look into it more closely.
If someone is running 85 with central management and has no performance issues then that would be a different story.
Issue resides on SMB local management and central management on different capacity, cpu load and GUI times out. Firmware issue. Great indication is the kernel HZ value error msg. when you use the top command.
Installing the firmware via USB, believe when I did this will wipe your configurations and set the box to factory defaults, in addition, upgrade your primary/backup firmware image on the unit.
Coming in late to this thread, but the behavior you are describing sounds eerily similar to the CPU downclocking that occurs when a CPU fan fails, as the CPU slows itself down to keep from literally bursting into flames. The load average rises, the system is sluggish, and always busy for no glaringly obvious reason. Not saying you have a CPU fan failure here, but perhaps some kind of issue in the firmware that is causing the CPU to downclock or run at a non-standard clock speed. That incorrect kernel HZ message in top is quite intriguing...
Thanx for your comments Tim. In this case it was not a fan failure because a previous build runs just fine on the very same hardware. I have also ran 'show diag' and there was no indication of a fan failure.
I am remotely speculating here in assuming the problem was that they tried to dynamically adjust CPU performance based on current load and that did not go as planned. Fact is that build will boot and behave well for some time and then load spikes will start to happen more and more often for no apparent reason.
The HZ message appears on previous builds as well. It appears at some random time when running 'top'. So likely not the root cause.
Nevertheless, from what I know R&D identified the problem and is currently going through full cycle QA testing. Are they going to communicate what was the problem I don't know but for me enough is that it will be fixed after all.
Thank you just to confirm are you running build 731 central or standalone? just surprised that no docs are out on this release of fixes.
Agree with Pedro. Also, I do not understand why are we not getting notifications in Security Management server when new firmware is available just like it does for the other appliances.
Performance issue is solved for sure. But can't say a lot more because I have not tested it much today.
Just let me clarify that I was provided with a jumbo hotfix. That was a quick hack just to prove where the problem was. The build that we will get officially as GA may very well be different. Let's be patient and wait and we will see.
Corrupted firmware will cause process crashes, boot failures, spontaneous reboots, etc. Degraded performance is very unlikely to be caused by that. It is either configuration issue or just plain bug in the firmware.... or binaries compiled in debug mode that somehow made it in release version of the image
Degraded performance can be caused by a wide spectrum of causes - and of course, corrupted firmware that makes daemons crash will cost ressources. So i would assume that it is unlikely,
That will require test lab with access from Internet for CheckPoint to investigate. Which unfortunately I do not have.
But what about CheckPoint itself? Don't you guys have test lab and spare quality engineer to try reproduce this issue and see if it is common or configuration specific? Because with those of us that reported it already it seems more like a common issue to me.
If you can point the appropriate person to this thread may be something like that can be arranged.
I can't say if there's enough information in this thread to reproduce the issue, which is why it's important everyone who has this issue open a TAC SR.
That said, I did point the relevant parties to this thread already.
Reminder that issue is not only on central managed devices. Standalone devices have a latency with GUI timeouts when navigating. Issue already reported and can be associated with high cpu, not present with build 541 or 551 for r77.20.81
Thanks for identify and raising this issue
Currently we don't manage to replicate such behavior internally
we would like to work with you to investigate the problem
Please contact me directly email@example.com
Amir, SMB Director R&D
Amir appreciate that attention is also looked into the GUI lag/latency causing timeout and not sure if it's related to the new event viewer features added. This is on standalone. Thanks
We are trying, together with CheckPoint R&D to identify in which of the builds problem appeared for first time. We will probably know that tomorrow.
Today we identified the build where it all started and I was provided with jumbo hotfix to test. So far, it looks good. But I will do a bit more testing tomorrow before I make any final statements.