Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Bob_Zimmerman
MVP Gold
MVP Gold

Unreliable Snapshots

I've recently had three separate systems blow up when I tried to revert to snapshots. It looks like something is causing the systems to create an extremely large file in /tmp named snap_log_backup.tgz. This file fills up all the storage in the snapshot, so a snapshot which would ordinarily be 14 GB is 32 GB (the size of lv_current) instead. All three were R82 at the time of snapshot; two were jumbo 44, one (a management server) was jumbo 60.

Reverting to such a snapshot results in a non-bootable system. It hangs during init, and you need the ability to power-cycle it to get back to the boot menu to get into maintenance mode. And of course this happened to two systems built by somebody unfamiliar with LOMs in a facility which isn't typically staffed, so I had to send someone to go physically there and hit a button.

Check your snapshot sizes with 'lvs' and get the "Size" and "Used" values for lv_current from 'df -h'. A healthy snapshot should be about as big as lv_current's Used. A snapshot with this problem will be about the size of lv_current's Size.

You might be able to fix the snapshot by mounting it without the '-o ro' and removing the /tmp/snap_log_backup.tgz from inside it. On the third system, this got the snapshot to restore at least well enough to boot. I haven't tested the restored system exhaustively.

I don't yet know anything about what causes the problem in the first place. Support apparently couldn't reproduce it, but it has bit me every single time I've needed to restore to a snapshot in the last few months.

1 Reply
PhoneBoy
Admin
Admin

I imagine if the shapshot is bigger than your snapshot partition, that could cause an issue.
Hopefully, we can get to the bottom of it.

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events