- CheckMates
- :
- Products
- :
- Quantum
- :
- Management
- :
- Re: MDS backup too big and slow in R80.10
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Are you a member of CheckMates?
×- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
MDS backup too big and slow in R80.10
For those running MDS management solution. What's your take on backup after R80.10? In our case in R77.30 backup was approx 3GB in size and it took less than half an hour to restore MDS and have it up and running. With R80.10 backup has grown to 18GB(!) within a year and actual process takes well over an hour if not closer to two. As an engineer I might accept the argument that R80.10 brought in so many new features thus increasing backup size but from business and disaster recovery point of of view it is complete shumbles.
Ironically it makes even support process painfully slow as I was asked to upload MDS backup yesterday and considering that CP FTP servers are over 50ms away from us, it will take couple of hours to complete that.
I have been raising SRs trying to point out inefficiency of MDS backup process for years - same MDS TGZ being archived and compressed 4 times... Seriously. In order to restore backup now (offcial MDS GAIA backup) we would need nearly 100GB free disk-space. Not that it costs too much money but it makes it so slow.
I'm not expecting many votes as probably not that many run MDS but still would be good to hear opinions about the matter
- Tags:
- backup
- mds
- provider-1
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have already gone through it and it made no difference Tomer staright after updating to Take 42. Postgres just keeps growing like nuts every month
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
how many revisions do you have & how many IPS updates? the difference is that we didn't take history in our backups prior to R80.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
When this info came out originally, I went and deleted every single revision on every single CMA that we had (we're talking hundreds) but hardly made a dent in a backup size. I already have asked that we world rather not have revisions at all and instead rely on good old backup restore. But got nowhere. Is it possible to disable revisions across whole MDS?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
First of all, if you deleted all your previous revisions and size was still big, then even if you could disable revisions that wouldn't have solved your particular problem.
So now I'm thinking it's a support ticket worthy. Please send me privately your support ticket so that I can track its findings.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We have also seen our backup size grow. I've been running into a memory consumption issue while purging and haven't been able to purge for a while, so I assumed that was our case. We are working on getting that straightened out first. The other thing we've noticed is while backups are occurring, SmartConsole becomes slow or unresponsive.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We have always run backup out of hours as it needs to halt CMAs. memory consumption - i didn't notice yesterday when I purged over 10000 revisions. No problems there. But we have 128GB on that VM so I believe that should suffice.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
After upgrading memory, the memory consumption issue went away.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
OK, morning update: after purging over 10000 revisions across all CMAs backup size remained the same
Before
-rw-r--r--. 1 netbackup1 netbackup 17G Aug 22 04:10 backup_mds01_22_Aug_2018_02_30.tgz
After
-rw-r--r--. 1 netbackup1 netbackup 17G Aug 24 04:08 backup_mds01_24_Aug_2018_02_30.tgz
SR it is then
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I just want to say I'm tracking your support request and hope this comes to a resolution soon that might fit for other customers as well.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Much appreciated! thanks heaps
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Are you running the MDS on vmware or dedicated hardware?
I've also noticed the same, mostly caused by slow disk IO.
Running on dedicated hardware speeded up the backups.
And performance of MDS itself.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It's a VM with proper storage. So IO is not the issue. How big is your backup? How much did it grow from R77 to R80?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Our backup is around 30 Gb. On dedicated hardware it takes 1:15.
Before our transition to dedicated hardware it took around 5 hours.
We also didn't expect disk IO issues, but we saw a lot of improvement.
The dump of the postgresql database was causing a lot of disk IO and took some time.
We've migrated a long time ago. I don't know the size of the backup from R77.30 anymore.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Did you mean 1hr to create backup? That's normal. Ours is 17GB and it takes approximately half the time to create it. The problem is restore time for me. Takes way too long imo
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The restore also made a big difference on the physical hardware. Restoring the 30 Gb and the reinitialize of the solr database to approx 14 hours in vmware. On physical hardware it was 2 hours I think. My opinion is that big MDS environments are not suitable for vmware.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hard to say about 30GB, but our 17GB restore in ~2hrs in the lab VM which is slightly under-powered. Still a long time compare what it used to be in R77. And it keeps growing quite fast (~1GB per month) I just think the whole Gaia backup process for MDS can be improved dramatically as the basic mds backup file that forms the biggest chunk in the archive is actually compressed and archived 4 times! seems an overkill. They could have used simple tar without compression after it's been compressed once..
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The restore itself is 2 hours. But when you start the MDS, it takes up to several ours rebuilding the SOLR.
And yes... why the use the gzip archive multiple times... that takes time... and not necessary.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
That's my point - if we whinge enough here we might get some attention. Sorry Tomer - nothing against you personally, on contrary, you have always gone extra mile and it is really appreciated. More of a constructive feedback about the product and yes, we will have another remote session next week, had issues with my lab yesterday (backup restore took too long to restore so I was not ready when meeting time was up..)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Are you just simply running an mds_backup?
1.) Are you by chance using the -l flag to not include your logs as well?
2.) What dir are you running the mds_backup from?
I recently upgraded from R77.30 MDS to R80.10 and my backups grew by maybe 300-400MB. Nothing crazy.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, no this is box standard Gaia backup that will wrap in mds_backup. Logs are excluded. The biggest increase is postgres DB dump (part of mds_backup) that has grown from 3GB orginally to nearly 10GB in one year.. So keep eye on yours. Or it could be that ours is "broken" somewhere as we went from R77.30 to R80 and then R80.10. Who knows. We'll keep it posted here about my SR progress
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Our case is the opposite, we had almost 50GB backup files under R77.30, probably because of many historical database revisions we had kept in each CMA. Under R80.10 the backup is about 22GB, because older revisions got lost during the migration. During the last 3-4 months the backup has grown 1GB, which is not terrible and usually it takes approximately 1h20-1h30 to create the backup file. Restoration time is about 1 hour in a vmware lab, but solr rebuild takes additional time indeed.
From time to time it happens that pg_dump processes will stay in the process table even after the backup file has been created, sometimes dbedit locks are not removed etc, I would say that older pre-R80 backup was more bullet proof than the current one. No positive comments about R80.10, no quick crossdomain search, the SmartConsole is buggy, it has short freeze moments all the time which can even observed on CP demo streams, full text search not working properly etc.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yep - that's what I meant. Full restore including solr dB restore that may extend total time by an hour or even more.
Otherwise I actually liked R80.10. No doubt we raised fair bit of cases but that was expected with such major update on SW architecture. We use Tufin quite extensively that normally covers CP shortcomings like cross CMA search. SmartConsole has been rather stable since last JHF I must say.
I wanted to keep this thread purely for backup restore time but feedback is much appreciated!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Kaspars,
Thank you for raising this item for discussion.
Eran Habad and I (in Management R&D) would like to further investigate this issue in order to better understand the reason for the backup growth.
We would greatly appreciate it if you can use your SR and ask Support to open a CFG task to be assigned to Eran's group. On this SR, please attach a recent backup file + ask support to run the CPM Doctor utility and provide the output. Both items should give us an overall view of your system and what is backed up.
I hope that you can share this info with us.
Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Tomer, SR was created on 24th August. I'll pass the message above directly in the case. You guys have full backup there.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Kaspars,
Was there ever any resolution to this? Im interested to know what TAC found, if you can share, as to the reasoning behind the extremely large backups.
Thanks!
- Mike
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Apparently some things have been discovered in our backup and we're not the only one. We have not received full fix, just partial. So in short, still waiting but likel like there's light end of the tunnel. I will definitely post here if we get noticeable results eventually
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Great, thank you!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Some good news Mike Andretta from R&D! Our current backup is 17GB going down to 7GB! Back to R77.30 size
I just wanted to let you know that I have tested the private hotfix on our replication with your database and managed to generate an mds_backup file which is 6.6GB.
I will update you once the fix is inside the R80.20 Jumbo hotfix.