Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
HristoGrigorov

R77.20.85 performance issue on centrally managed SMB

Guys,

That build is causing significant traffic delays and CPU load is higher than that of R77.20.81. 

Any of you experiencing similar problem ?

123 Replies
Pedro_Espindola
Advisor

I see. I will try to replicate the behavior in a lab and see if upgrade vs clean install makes any difference.

I only think that for a small office appliance, remote upgrade via WebUI MUST work well. If it doesn't, something needs to be corrected.

Offices that use a 1400 appliance are not like big companies that have their own IT team to do a clean install after hours.

0 Kudos
G_W_Albrecht
Legend Legend
Legend

Remember that the unit has three different firmware images: Active fw, previous fw and backup fw. In WebGUI, only the active fw is changed, current fw is promoted to previous FW and the backup fw is untouched. Revert to the factory default image and settings installs the backup firmware as active firmware.

CCSP - CCSE / CCTE / CTPS / CCME / CCSM Elite / SMB Specialist
HristoGrigorov

I am 99.999% sure it is a problem in the firmware. I saw a message from top command complaining about incorrect kernel HZ value. That will explain why it is so slow. But I had no time to look into it more closely.

If someone is running 85 with central management and has no performance issues then that would be a different story. 

0 Kudos
Naftali_Oziel
Collaborator

Issue resides on SMB local management and central management on different capacity, cpu load and GUI times out. Firmware issue.  Great indication is the kernel HZ value error msg. when you use the top command.  

Installing the firmware via USB, believe when I did this will wipe your configurations and set the box to factory defaults, in addition,  upgrade your primary/backup firmware image on the unit.  

0 Kudos
Timothy_Hall
Legend Legend
Legend

Coming in late to this thread, but the behavior you are describing sounds eerily similar to the CPU downclocking that occurs when a CPU fan fails, as the CPU slows itself down to keep from literally bursting into flames.  The load average rises, the system is sluggish, and always busy for no glaringly obvious reason.  Not saying you have a CPU fan failure here, but perhaps some kind of issue in the firmware that is causing the CPU to downclock or run at a non-standard clock speed.  That incorrect kernel HZ message in top is quite intriguing...

--

CheckMates Break Out Sessions Speaker

CPX 2019 Las Vegas & Vienna - Tuesday@13:30

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
0 Kudos
HristoGrigorov

Thanx for your comments Tim. In this case it was not a fan failure because a previous build runs just fine on the very same hardware. I have also ran 'show diag' and there was no indication of a fan failure.

I am remotely speculating here in assuming the problem was that they tried to dynamically adjust CPU performance based on current load and that did not go as planned. Fact is that build will boot and behave well for some time and then load spikes will start to happen more and more often for no apparent reason.

The HZ message appears on previous builds as well. It appears at some random time when running 'top'. So likely not the root cause. 

Nevertheless, from what I know R&D identified the problem and is currently going through full cycle QA testing. Are they going to communicate what was the problem I don't know but for me enough is that it will be fixed after all. 

0 Kudos
Naftali_Oziel
Collaborator

How is the fix working so far? 

0 Kudos
KAPIL_RANA
Explorer

running good

0 Kudos
Naftali_Oziel
Collaborator

Thank you just to confirm are you running build 731 central or standalone?  just surprised that no docs are out on this release of fixes.

0 Kudos
Pedro_Espindola
Advisor

Yes, there should be some info about 751. sk140193 still directs us to download of 731.

0 Kudos
HristoGrigorov

Agree with Pedro. Also, I do not understand why are we not getting notifications in Security Management server when new firmware is available just like it does for the other appliances.

0 Kudos
HristoGrigorov

Performance issue is solved for sure. But can't say a lot more because I have not tested it much today.

Just let me clarify that I was provided with a jumbo hotfix. That was a quick hack just to prove where the problem was. The build that we will get officially as GA may very well be different. Let's be patient and wait and we will see.

0 Kudos
HristoGrigorov

Corrupted firmware will cause process crashes, boot failures, spontaneous reboots, etc. Degraded performance is very unlikely to be caused by that. It is either configuration issue or just plain bug in the firmware.... or binaries compiled in debug mode that somehow made it in release version of the image Smiley Happy

Naftali_Oziel
Collaborator

100% agreed..

0 Kudos
G_W_Albrecht
Legend Legend
Legend

Degraded performance can be caused by a wide spectrum of causes - and of course, corrupted firmware that makes daemons crash will cost ressources. So i would assume that it is unlikely, 

CCSP - CCSE / CCTE / CTPS / CCME / CCSM Elite / SMB Specialist
0 Kudos
PhoneBoy
Admin
Admin

I saw another instance of this same issue recently.

Anyone experiencing this issue is encouraged to open a TAC SR.

0 Kudos
HristoGrigorov

That will require test lab with access from Internet for CheckPoint to investigate. Which unfortunately I do not have. 

But what about CheckPoint itself? Don't you guys have test lab and spare quality engineer to try reproduce this issue and see if it is common or configuration specific? Because with those of us that reported it already it seems more like a common issue to me.

If you can point the appropriate person to this thread may be something like that can be arranged.

0 Kudos
PhoneBoy
Admin
Admin

I can't say if there's enough information in this thread to reproduce the issue, which is why it's important everyone who has this issue open a TAC SR.

That said, I did point the relevant parties to this thread already. 

HristoGrigorov

Thanx, Dameon. I may open SR next week if it is fine for TAC to provide them only with CPinfo files and such. 

0 Kudos
Pedro_Espindola
Advisor

Dameon, I am setting up a lab with version 85 to try to reproduce. My production gateways are back to .81 version.

0 Kudos
Naftali_Oziel
Collaborator

Reminder that issue is not only on central managed devices.  Standalone devices have a latency with GUI timeouts when navigating.  Issue already reported and can be associated with high cpu, not present with build 541 or 551 for r77.20.81

0 Kudos
Amir_Erman
Employee
Employee

Hi Hristo, 

Thanks for identify and raising this issue 

Currently we don't manage to replicate such behavior internally 

we would like to work with you to investigate the problem 

Please contact me directly amire@checkpoint.com

Amir, SMB Director R&D

0 Kudos
HristoGrigorov

Hello Amir,

Thanx for your support. Mail sent.

I'll keep this thread posted on recent developments about this.

0 Kudos
Naftali_Oziel
Collaborator

Amir appreciate that attention is also looked into the GUI lag/latency causing timeout and not sure  if it's related to the new event viewer features added.  This is on standalone.  Thanks

0 Kudos
HristoGrigorov

We are trying, together with CheckPoint R&D to identify in which of the builds problem appeared for first time. We will probably know that tomorrow.  

HristoGrigorov

Today we identified the build where it all started and I was provided with jumbo hotfix to test. So far, it looks good. But I will do a bit more testing tomorrow before I make any final statements. 

0 Kudos
Naftali_Oziel
Collaborator

That's good news.. are you also testing the GUI response?

0 Kudos
HristoGrigorov

I'm sorry mate, mine a centrally managed appliances. 

0 Kudos
Pedro_Espindola
Advisor

Can you share with us? When did it start?

0 Kudos
HristoGrigorov

I prefer such info to be coming from CheckPoint and not me as I am not sure how accurate it will be.

0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events