Replication is not a backup. Or not? Here's how we used lazy replication for recovery by accidentally deleting shortcuts.
Replication is not your database backup tool (gitlab-ce
With delayed replica, we recovered data in just 1,5 hours. See how it was.
Point-in-time recovery with PostgreSQL
PostgreSQL has a built-in function that restores the state of a database to a specific point in time. It is called
To use this feature for a cold backup, we regularly make a base database backup and store it in an archive (the GitLab archives live in
What is delayed replication?
Delayed replication is the application of changes from WAL with a delay. That is, the transaction occurred on the hour X
, but it will appear in the replica with a delay d
in time X + d
.
There are 2 ways to set up a physical database replica in PostgreSQL: archive restore and streaming replication.
How to set up delayed backup recovery
recovery.conf
. Example:
standby_mode = 'on'
restore_command = '/usr/bin/envdir /etc/wal-e.d/env /opt/wal-e/bin/wal-e wal-fetch -p 4 "%f" "%p"'
recovery_min_apply_delay = '8h'
recovery_target_timeline = 'latest'
With these settings, we have configured a delayed replica with archive restore. Used here restore_command
) from the archive, and the changes will be applied after eight hours (recovery_min_apply_delay
). Replica will watch for timeline changes in the archive, such as due to a cluster failover (recovery_target_timeline
).
Π‘ recovery_min_apply_delay
you can set up latency streaming replication, but there are a couple of pitfalls associated with replication slots, hot spare feedback, and so on. The WAL archive avoids them.
Parameter recovery_min_apply_delay
appeared only in PostgreSQL 9.3. In previous versions, delayed replication requires a combination of pg_xlog_replay_pause(), pg_xlog_replay_resume()
) or hold the WAL segments in the archive for the duration of the delay.
How does PostgreSQL do it?
It's interesting to see how PostgreSQL implements lazy restore. Let's look at recoveryApplyDelay(XlogReaderState)
static bool
recoveryApplyDelay(XLogReaderState *record)
{
uint8 xact_info;
TimestampTz xtime;
long secs;
int microsecs;
/* nothing to do if no delay configured */
if (recovery_min_apply_delay <= 0)
return false;
/* no delay is applied on a database not yet consistent */
if (!reachedConsistency)
return false;
/*
* Is it a COMMIT record?
*
* We deliberately choose not to delay aborts since they have no effect on
* MVCC. We already allow replay of records that don't have a timestamp,
* so there is already opportunity for issues caused by early conflicts on
* standbys.
*/
if (XLogRecGetRmid(record) != RM_XACT_ID)
return false;
xact_info = XLogRecGetInfo(record) & XLOG_XACT_OPMASK;
if (xact_info != XLOG_XACT_COMMIT &&
xact_info != XLOG_XACT_COMMIT_PREPARED)
return false;
if (!getRecordTimestamp(record, &xtime))
return false;
recoveryDelayUntilTime =
TimestampTzPlusMilliseconds(xtime, recovery_min_apply_delay);
/*
* Exit without arming the latch if it's already past time to apply this
* record
*/
TimestampDifference(GetCurrentTimestamp(), recoveryDelayUntilTime,
&secs, µsecs);
if (secs <= 0 && microsecs <= 0)
return false;
while (true)
{
// Shortened:
// Use WaitLatch until we reached recoveryDelayUntilTime
// and then
break;
}
return true;
}
The bottom line is that the delay is based on the physical time recorded in the timestamp of the transaction commit (xtime
). As you can see, the delay applies only to commits and does not affect other records - all changes are applied directly, and the commit is delayed, so we will see the changes only after the configured delay.
How to use lazy replica for data recovery
Let's say we have a database cluster in production and a replica with an eight-hour delay. Let's see how to recover data using an example
When we became aware of the problem, we
SELECT pg_xlog_replay_pause();
With a pause, we had no risk that the replica would repeat the request DELETE
. Useful thing if you need time to figure everything out.
The bottom line is that the delayed replica must reach the moment before the request DELETE
. We approximately knew the physical time of removal. We removed recovery_min_apply_delay
and added recovery_target_time
Π² recovery.conf
. So the replica reaches the right moment without delay:
recovery_target_time = '2018-10-12 09:25:00+00'
With timestamps, it is better to reduce the excess so as not to miss. True, the greater the decrease, the more data we lose. Again, if we skip the request DELETE
, everything will be deleted again and you will have to start over (or even take a cold backup for PITR).
We restarted the delayed Postgres instance and the WAL segments were repeated until the specified time. You can track progress at this stage by querying:
SELECT
-- current location in WAL
pg_last_xlog_replay_location(),
-- current transaction timestamp (state of the replica)
pg_last_xact_replay_timestamp(),
-- current physical time
now(),
-- the amount of time still to be applied until recovery_target_time has been reached
'2018-10-12 09:25:00+00'::timestamptz - pg_last_xact_replay_timestamp() as delay;
If the timestamp no longer changes, the restore is complete. You can customize the action recovery_target_action
The database came to the state before that ill-fated request. Now you can, for example, export data. We have exported the remote label data and all the links to issues and merge requests and transferred them to the production database. If the losses are large, you can simply promote the replica and use it as the main one. But then all changes will be lost after the moment to which we recovered.
It is better to use transaction IDs instead of timestamps. It is useful to record these IDs, for example, for DDL statements (such as DROP TABLE
), by using log_statements = 'ddl'
. If we had a transaction ID, we would take recovery_target_xid
and ran everything down to the transaction before the request DELETE
.
Getting back to work is very simple: remove all changes from recovery.conf
and restart postgres. Soon the cue will again have an eight-hour delay, and we are ready for future troubles.
Recovery Benefits
With a delayed replica, instead of a cold backup, you do not have to spend hours restoring the entire snapshot from the archive. For example, we need five hours to get the entire 2 TB base backup. And then you still have to apply the entire daily WAL to recover to the desired state (in the worst case).
A delayed replica is better than a cold backup in two ways:
- You do not need to get the entire base backup from the archive.
- There is a fixed eight-hour window of WAL segments that must be repeated.
Also, we are constantly checking to see if WAL can be PITRed, and we would quickly notice corruption or other problems with the WAL archive by monitoring the backlog of the delayed replica.
In this example, it took us 50 minutes to restore, that is, the speed was 110 GB of WAL data per hour (the archive was still on
Summary: where a delayed replica is useful (and where not)
Use delayed replication as a first aid if you accidentally lose data and notice this disaster within the configured delay.
But keep in mind: replication is not a backup.
Backup and replication have different purposes. A cold backup will come in handy if you accidentally made DELETE
or DROP TABLE
. We make a backup from cold storage and restore the previous state of a table or an entire database. But at the same time the request DROP TABLE
almost instantly reproduced in all replicas on the working cluster, so regular replication will not save here. Replication itself keeps the database available when individual servers are leased out and distributes the load.
Even with a delayed replica, we sometimes really need a cold backup in a safe place, if suddenly there is a data center failure, hidden damage, or other events that you donβt immediately notice. Here from one replication there is no sense.
Note. On
Source: habr.com