cancel
Showing results for 
Search instead for 
Did you mean: 
Create a Post
Highlighted

MDS root partition nearly full stopping mgmt HA sync in R80.10

Hi, been a long time since I have posted here, too busy 🙂

just stumbled across interesting thing with R80.10 take 142 MDS - we have a HA solution and couple of days ago sync suddenly stopped working with the yellow warning in the SmartConsole

mgmt_ha_sync_error.png

When I tried to sync it manually FWM process died on primary MDS. Analyses showed that root partition reached 100% during the sync

I did a manual check and saw a lot of diskspace used in $MDSDIR/tmp/mgha, so I cleaned it up manually and after reboot MDS was functioning again.

At this point we had 10GB free in 100GB root partition. Another attempt to sync MDS resulted in the same - partition was filled up with huge files in $MDSDIR/tmp/mgha. So obviously sync required more than 10GB but there was nothing too obvious to clean up.

Went into our lab and noticed that the same MDS in lab environment had 40GB free of 100GB. Which felt strange as lab is 100% replica of the production. So i had two options -. try to build a new VM and make root partition bigger or try to salvage existing VM that MDS run on with the same 100GB root partition.

Since I had similar disk usage on the secondary MDS, I thought to try to take full backup and restore on the same VM to see if it does any difference. And voila! After backup restore root partition usage went down from 90% to 60%! That would mean that MDS would store a lot of temp data in all CMA directories that backup restore seems to clean up.

Did the same then on primary MDS (take backup and then restore it on the same VM) and we were back in business - root partition usage reduced to 61%. Here's disk usage before and after restore:


[Expert@mds01:0]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg_splat-lv_current  97G  83G  9.2G  90% /
/dev/sda1                        289M 24M  251M  9%  /boot
tmpfs                            63G  4.0K 63G   1%  /dev/shm
/dev/mapper/vg_splat-lv_log      238G 91G  135G  41% /var/log


[Expert@mds01:0]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg_splat-lv_current  97G  57G  36G   61% /
/dev/sda1                        289M 24M  251M  9% /boot
tmpfs                            63G  4.0K 63G   1% /dev/shm
/dev/mapper/vg_splat-lv_log      238G 109G 117G  49% /var/log

 

After this HA sync worked like a clock and I measured that it consumed 18GB of temp disk space in the root partition during the process! That seems to match our backup size roughly

Just wondering if anyone else has noticed anything like that? And a bit of warning if you run MDS HA - have a look at the root partition usage and make sure you have enough disk space to do full sync..

 And those running R80.20 - I wonder if it is a bit more efficient regarding temp disk space during full HA sync?