Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Kaspars_Zibarts
Employee Employee
Employee

SmartLog becomes slow / unusable

This is a long shot and Checkpoint crew not going to like me as there is very little detail / investigation done but it's worth a try

In nutshell: our SmartLog searches occasionally become very slow so you can't even use it.

We run MLM (VM, 16 cores / 128GB / 6TB) with handful of CLMs, R80.10 take 42, log rate averaging 10k/sec.

We had a case open but that lead to nowhere (due to late response from R&D initially and then lack of time and resources on our part as it was round Xmas)

Typically MLM reboot fixes it for a while and then it comes back.

We were waiting for R80.20 which promises faster file system, but

But my colleague pointed out very interesting fact: we used Tufin a lot to do firewall changes back in December and we had a lot of SmartLog "slowness". Then we backed off from Tufin changes for a while in Feb/Mar making log search noticeably better.

Yesterday I did a batch of changes through Tufin SecureChange (9-10AM) and log search ground to halt again and we were forced to reboot MLM around 1PM.

For all I know - Tufin utilizes API to do changes in R80.10 management. Could it be that long / complex API queries / results somewhat clog up available resources on MDS/CMAs? We kind of noticed that every time I run some API scripts (non-Tufin) it had the same impact on SmartLog performance.

I know Tomer Sole‌ you might want to say something as you love API Smiley Happy

And don't get me wrong - I love both SmartLog and API, so I'm not whinging just want to hear others. I will keep digging

0 Kudos
5 Replies
Timothy_Hall
Champion
Champion

The processes associated with log indexing/searching and the API that may be related to your problem are:

  • cpm
  • SOLR/java_solr
  • RFL/LogCore
  • INDEXER/log_indexer
  • SmartLog_Server

What I would suggest is baselining these key processes when search performance is good in regards to:

  • CPU & Memory Usage (Use ps and top commands)
  • Disk Usage - Unfortunately the current Gaia kernel does not support the use of the iotop command, so there is no direct way to view disk utilization per process; you will be stuck looking at system-wide disk I/O stats with iostat and having to infer what is going on.

Then wait for a period of terrible search performance (or induce it with a bunch of Tufin queries) then examine how these processes have changed.  If you spot one that is chewing up lots of CPU/memory/disk, try GENTLY killing that process with kill -15 (not -9 unless necessary), give the killed process 60 seconds to respawn, wait another 60 seconds for it to fully initialize, and see if good search performance has returned.

Once you have identified the process that may be going out to lunch on you, further debugging can be attempted.

--
Second Edition of my "Max Power" Firewall Book
Now Available at http://www.maxpowerfirewalls.com

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
Kaspars_Zibarts
Employee Employee
Employee

Funny enough that's exactly what I wrote in a script to collect stats before and after. Plus checking for any zombie processes. But then had no time to "execute".. tomorrow it is

0 Kudos
Oren_Koren
Employee Alumnus
Employee Alumnus

Hey,

i want to take it also internally.(with BizDev & Tufin).

please send me a mail to orenkor@checkpoint.com in this topic (i cant find your mail address in your profile)

Thanks,

Oren

Kaspars_Zibarts
Employee Employee
Employee

Thanks Oren Koren‌ and Tomer Sole‌ for such quick response! You guys saved my day (without SR!)

In nutshell - MDM/MLM upgrade to take 91 seems to have resolved our slow log issues when using Tufin or API scripts

Some notes that I'm copying from Tomer's emails, that made the correlation

Tufin uses an external database to model the policy based on Check Point logs. So I can see how the log server could get to some thresholds

Did you move your Management to R80.10 Jumbo Hotfix Take 70 from Jan.15. A performance issue with the Management API regarding very large groups was resolved in that update

Tomer_Sole
Mentor
Mentor

Cool, also thanks for the badge!

I'm also happy to hear the via-Tufin slowness was resolved. We value Tufin as a great technology partner of Check Point.

0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events