- CheckMates
- :
- Products
- :
- Quantum
- :
- Management
- :
- SmartLog becomes slow / unusable
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Are you a member of CheckMates?
×- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
SmartLog becomes slow / unusable
This is a long shot and Checkpoint crew not going to like me as there is very little detail / investigation done but it's worth a try
In nutshell: our SmartLog searches occasionally become very slow so you can't even use it.
We run MLM (VM, 16 cores / 128GB / 6TB) with handful of CLMs, R80.10 take 42, log rate averaging 10k/sec.
We had a case open but that lead to nowhere (due to late response from R&D initially and then lack of time and resources on our part as it was round Xmas)
Typically MLM reboot fixes it for a while and then it comes back.
We were waiting for R80.20 which promises faster file system, but
But my colleague pointed out very interesting fact: we used Tufin a lot to do firewall changes back in December and we had a lot of SmartLog "slowness". Then we backed off from Tufin changes for a while in Feb/Mar making log search noticeably better.
Yesterday I did a batch of changes through Tufin SecureChange (9-10AM) and log search ground to halt again and we were forced to reboot MLM around 1PM.
For all I know - Tufin utilizes API to do changes in R80.10 management. Could it be that long / complex API queries / results somewhat clog up available resources on MDS/CMAs? We kind of noticed that every time I run some API scripts (non-Tufin) it had the same impact on SmartLog performance.
I know Tomer Sole you might want to say something as you love API
And don't get me wrong - I love both SmartLog and API, so I'm not whinging just want to hear others. I will keep digging
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The processes associated with log indexing/searching and the API that may be related to your problem are:
- cpm
- SOLR/java_solr
- RFL/LogCore
- INDEXER/log_indexer
- SmartLog_Server
What I would suggest is baselining these key processes when search performance is good in regards to:
- CPU & Memory Usage (Use ps and top commands)
- Disk Usage - Unfortunately the current Gaia kernel does not support the use of the iotop command, so there is no direct way to view disk utilization per process; you will be stuck looking at system-wide disk I/O stats with iostat and having to infer what is going on.
Then wait for a period of terrible search performance (or induce it with a bunch of Tufin queries) then examine how these processes have changed. If you spot one that is chewing up lots of CPU/memory/disk, try GENTLY killing that process with kill -15 (not -9 unless necessary), give the killed process 60 seconds to respawn, wait another 60 seconds for it to fully initialize, and see if good search performance has returned.
Once you have identified the process that may be going out to lunch on you, further debugging can be attempted.
--
Second Edition of my "Max Power" Firewall Book
Now Available at http://www.maxpowerfirewalls.com
CET (Europe) Timezone Course Scheduled for July 1-2
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Funny enough that's exactly what I wrote in a script to collect stats before and after. Plus checking for any zombie processes. But then had no time to "execute".. tomorrow it is
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hey,
i want to take it also internally.(with BizDev & Tufin).
please send me a mail to orenkor@checkpoint.com in this topic (i cant find your mail address in your profile)
Thanks,
Oren
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Oren Koren and Tomer Sole for such quick response! You guys saved my day (without SR!)
In nutshell - MDM/MLM upgrade to take 91 seems to have resolved our slow log issues when using Tufin or API scripts
Some notes that I'm copying from Tomer's emails, that made the correlation
Tufin uses an external database to model the policy based on Check Point logs. So I can see how the log server could get to some thresholds
Did you move your Management to R80.10 Jumbo Hotfix Take 70 from Jan.15. A performance issue with the Management API regarding very large groups was resolved in that update
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Cool, also thanks for the badge!
I'm also happy to hear the via-Tufin slowness was resolved. We value Tufin as a great technology partner of Check Point.
