Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Harald_Hansen
Advisor
Advisor

R80.20 (mlm) log servers swapping

I have experienced a couple R80.20 MDS log servers swapping, this could include log servers as well. The customers have enough RAM, so the file system cache eats up about 60 % of available memory. Still the system swap is increasing slowly. We are using either GA or ongoing take JHF on both systems. The MDS is neither swapping nor using as much cache as the MLM.

Anyone experiencing the same problem?

Check your swap usage with free/top and sar:

 

[Expert@mdlog:0]# free -m
total used free shared buff/cache available
Mem: 15921 5505 271 241 10144 9522
Swap: 8189 526 7663

[Expert@mlm:0]# for safile in $safiles; do sar -S -f "/var/log/sa/$safile" |grep Average|awk '{print $3}'; done
0
6160
28746
58714
27729
...
411624
453370
495278
518524

 

 

0 Kudos
8 Replies
Timothy_Hall
Legend Legend
Legend

About 10GB of your 16GB of RAM is used for buffering/caching and could be reallocated for code execution if needed, so the system is definitely not running low on available memory.  Processes will sometimes reserve areas of swap even though they aren't actually  swapping which is OK and won't slow down the system since there is no active paging/swapping happening.  What do sar -W and (to a lesser extent) sar -B say about actual active paging/swapping activity?

 

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
0 Kudos
Harald_Hansen
Advisor
Advisor

Seems like we have a small, but steady, amount of swapin/outs during load, at night with no load SAR does not register anything.
0 Kudos
Timothy_Hall
Legend Legend
Legend

Doesn't sound too concerning to me, might be interesting to run iotop since you are on the new kernel to see what processes are paging.  If it is just SOLR and LogCore/RFL that is constant log indexing and is expected.

 

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
0 Kudos
Harald_Hansen
Advisor
Advisor

Well, on the other system we are now at 2 GB swap in a week. In about 3 weeks we will have a full swap if this continues. Also writing to swap is not desirable on SSD drives, that is why we over provision memory.
0 Kudos
VSX_Bernie
Contributor

@Harald_Hansen  - Did you ever come to any sort of conclusion regarding this?

We have just recently switched to dedicated log servers, to conserver resources on our MDS servers.
The log servers have less RAM, and therefore also less swap.

Because of this we realized the exact same as you.
Swap is being slowly consumed over time, even though plenty of memory is available.

It is definitely related to logging.
We have SNMP monitoring of swap for both our MDS servers and log servers.

From this we can see that the issue was exponential on the MDS servers when they serviced all the logs.
We can see now that the issue has drastically decreased (consumption speed) on the MDS servers, after having moved logging to dedicated log servers - but the consumption issue is still there.

A reboot always temporarily fixes the issue, but then consumption is increased over time again.
On a log server logging for approximately 45 Virtual Systems with 16GB of swap, free swap was 15.65GB after reboot.

After 38 days free swap has decreased to 8.47GB - so a consumption of 7.18GB.
Weekly consumption is around 0,67GB - but it varies.

This really seems like a memory leak issue, but only affecting swap.
Available memory looks to be unaffected.

I know from experience that in R80.40 there used to be a well-known (at least to TAC it was) general GAIA memory leak issue, that affected both SMS and GWs alike.

It was a lot like this, but for available memory instead of swap.


It is worth mentioning that we are running R81.10 Take 156, so this is still a problem for newer versions.
I have monitoring data from two years back, when we were running R81.10 Take 55 on these servers, and I can see it has been an issue ever since.

0 Kudos
VSX_Bernie
Contributor

Also - I know that a low memory situation would warrant an increase in swap, that the system would not automatically free.
This is not the case, as none of the Log Servers have had less than 20GB available memory - ever.

0 Kudos
Harald_Hansen
Advisor
Advisor

Hi,

I haven't followed up on this issue. I'll need to check on some customers, though in my original case it was deemed acceptable. 

Best regards,
Harald

 

VSX_Bernie
Contributor

All right - thanks for sharing

0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events