Hi,
I’d like to elaborate a bit on the postgres disk space consumption on standby machines and the IPS purge not running issue @jgar encountered.
Postgres disk space consumption on standby machines
Roughly speaking, when deleting or updating a record in postgres, postgres does not release the remaining disk space to the OS, but keeps it for internal usage. In some cases, postgres ‘recycles’ these bytes for further records updates or creations (by ‘autovacuum’ for instance), but in some cases, due to internal calculations, it just keeps consuming more disk space and will eventually require manual cleaning. This additional disk space that can be cleaned up is called ‘bloat’.
The postgres DB owner (in this case – us J) is supposed to maintain the health of the DB.
Full sync is a Management operation that creates a lot of bloat – that is because in full sync, in simple words, we delete all the records on the synchronized machine, and copy them from the synchronizing machine.
Before R81.10, this was a heavy manual operation.
In R81.10, due to major performance improvement in full sync duration, we have automatic full sync (in every domain apart from System domain) every 5 minutes (instead of the incremental sync that wasn’t stable enough).
In the bottom line, this means that every 5 minutes we create bloat – and in some cases the automatic cleanup is not fast enough to overcome it.
This behavior impacts the most on MDSs with many standby CMAs, or SMC standby machines.
A solution to this problem is in progress.
- For MDSs, a solution is in development and should be released. It will not require a code change and should be released in an updatable way in the next few months.
- For SMCs (and SAs), several customers already received HFs with a fix for it ( @jgar – you should receive a HF next week).
Once the fix is proved as solving the problem, it will be released via Jumbo HF (PRHF-29307)
IPS purge not running on SA in Full HA environment
The issue described above with the IPS purge not running is simply a bug. The IPS purge has a mechanism to avoid deletion of IPS packages when they are suspected to be used by other domains/machines in the environment. In case of SMC/SA HA, this call failed, but was not required in the first place. This was fixed under PRHF-29565 and will be released in the next Jumbo HFs.
Thanks,
Natan