Postgres backup summary

Backups are essential for databases. There are various ways to back up your data, and this article summarizes a variety of tools.

pg_dump/pg_restore

pg_dump pg_dumpall extracts the database as a script file or other archive。这种备份被分类为逻辑备份,其大小可能比物理备份小得多。这部分是由于索引未存储在SQL转储中。从逻辑备份还原时,只存储CREATE INDEX命令,并且必须重建索引。

One advantage of the SQL dump method is that the output data file can be reloaded into a newer version of Postgres, so this dump and restore method is often chosen in version upgrades and migrations. Another advantage is that these tools can specify specific database objects for backup and ignore others. For example, if only a certain subset of tables needs to be backed up in a test environment. Or perform some risky single-table backups while the database is running.

Postgres creates consistent backups even when the database is being used concurrently. Does not block other users from accessing the database (reading or writing). This shows that the dump is also internally consistent, meaning that the dump represents a snapshot of the database at the time the dump was started. Dumps don't block other operations, but backups can be long-running (hours or days, depending on hardware and database size). Due to the method Postgres uses to achieve concurrency (multiversion concurrency control), long-running backups can cause Postgres performance degradation until the dump is complete.

To dump a single database table, you can run a command like this:

pg_dump -t my_table > table.sql

To restore the above backup, run the following command:

psql -f table.sqld

Of course, the parameters of the dump tool are more than that. You can refer to the pg official website manual http://postgres.cn/docs/11/app-pgdump.html

pg_dump sequentially scans the entire dataset as it creates the file. Reading the entire database is a basic corruption check on all table data (not indexes). pg_dump will throw an exception if the data is corrupt. pg_dump executes statements internally SELECT. If you're having trouble running pg_dump, try in psql to make sure you're getting data from the database you're using.

Server and file system backup

For Linux administrators it is customary to use rsyncor other tools to back up the entire machine running the database. Postgres cannot be safely backed up with file-oriented tools while it is running, and there is no easy way to stop writing. To get the database into a state where you can access the data using rsync, you either have to bring the database down, or do all the work of setting up an archive of changes

Physical Backup and WAL Archiving

In addition to basic dump files, more sophisticated Postgres backup methods rely on saving the database's write-ahead log (WAL) file. WAL tracks changes to all database blocks, saving them into segments with a default size of 16MB. A server's continuous collection of WAL files is called its WAL stream. Before the database can be safely copied, it is necessary to start archiving the files of the WAL stream and then perform the process of generating a "base backup" (i.e. pg_basebackup). WAL can be incrementally backed up to make recovery possible, and of course these recovery are based on point-in-time recovery tools.

Create a base backup using the pg_basebackup utility:

$ sudo -u postgres pg_basebackup -h localhost -p 5432 -U postgres \
	-D /var/lib/pgsql/15/backups -Ft -z -Xs -P -c fast
  • This command should be postgresrun as user.
  • -DThe parameter specifies where to save the backup.
  • -FtThe parameter indicates that the backup file storage method uses the tar format.
  • -Xsparameter indicates that the WAL file will be streamed to the backup. This is important because there can be a lot of WAL activity while the backup is taking place, and you probably don't want to keep those files on the primary during that time. This is the default behavior, but worth pointing out.
  • -zThe parameter indicates that the tar file will be compressed, compression is only available when using the tar format, and the suffix .gzwill be automatically added to all tar file names.
  • -PThe parameter indicates the progress of the real-time print backup.
  • The fast -cparameter means to take the checkpoint immediately. If this parameter is not specified, the backup will not start until Postgres triggers a checkpoint by itself.

As soon as you enter the command, the backup starts immediately. Depending on the size of your cluster, it may take some time to complete. However, the backup operation does not interrupt any other connections to the database.

Steps to restore from a backup using pg_basebackup

  • Make sure the database is closed.
$ sudo systemctl stop postgresql-15.service
$ sudo systemctl status postgresql-15.service
  1. Delete the contents of the Postgres data directory to simulate disaster.
  • $ sudo rm -rf /var/lib/pgsql/15/data/*
    
  • Extract the base.tar.gz into the data directory.
$ sudo -u postgres ls -l /var/lib/pgsql/15/backups
total 29016
-rw-------. 1 postgres postgres   182000 Nov 23 21:09 backup_manifest
-rw-------. 1 postgres postgres 29503703 Nov 23 21:09 base.tar.gz
-rw-------. 1 postgres postgres	17730 Nov 23 21:09 pg_wal.tar.gz


$ sudo -u postgres tar -xvf /var/lib/pgsql/15/backups/base.tar.gz \
     -C /var/lib/pgsql/15/data
  • Extract pg_wal.tar.gz to a new directory outside the data directory. In this case, a directory named pg_wal is created in the backup directory.
$ sudo -u postgres ls -l /var/lib/pgsql/15/backups
total 29016
-rw-------. 1 postgres postgres   182000 Nov 23 21:09 backup_manifest
-rw-------. 1 postgres postgres 29503703 Nov 23 21:09 base.tar.gz
-rw-------. 1 postgres postgres	17730 Nov 23 21:09 pg_wal.tar.gz

$ sudo -u postgres mkdir -p /var/lib/pgsql/15/backups/pg_wal

$ sudo -u postgres tar -xvf /var/lib/pgsql/15/backups/pg_wal.tar.gz \
      -C /var/lib/pgsql/15/backups/pg_wal/
  • Create a recovery.signal file.
$ sudo -u postgres touch /var/lib/pgsql/15/data/recovery.signal
  • Set restore_command in postgresql.conf to copy the WAL file streamed during backup.
$ echo "restore_command = 'cp /var/lib/pgsql/15/backups/pg_wal/%f %p'" | \
      sudo tee -a /var/lib/pgsql/15/data/postgresql.conf
  • Start the database.
$ sudo systemctl start postgresql-15.service sudo systemctl status
postgresql-15.service
  • The database is now up and running based on the information contained in the previous base backup.

pgBackRest

This is a very powerful backup tool. There are many large Postgres environments that depend on pgBackRest.

pgBackRest can perform three types of backups:

  • Full Backup - Copies the entire contents of the DB cluster to the backup.
  • Differential backup - only copies the DB cluster files that have changed since the last full backup
  • Incremental Backup - Copies only the DB cluster files that have changed since the last full, differential, or incremental.

pgBackRest has some special features, such as:

  • Allows going back to a point in time - PITR (Point in Time Recovery)
  • Create an incremental restore that will use the database files that already exist and are updated based on the WAL segments. This makes restores faster, especially if you have a large database and don't have to restore the entire database.

  • Support for multiple backup repositories - say a local or a remote for redundancy.

Regarding archiving, users can set archive_commandparameters to use pgBackRest to copy WAL files to an external archive. According to the required data retention policy, set the file expiration time or unlimited retention.

To start pgBackRest after installation, run the following command:

$ sudo -u postgres pgbackrest --stanza=demo --log-level-console=info stanza-create

To perform an incremental restore:


$ sudo systemctl stop postgresql-15.service
$ sudo -u postgres pgbackrest \
--stanza=db --delta \
--type=time "--target=2022-09-01 00:00:05.010329+00" \
--target-action=promote restore

 After the restore is complete, restart the database and verify that the user tables have been restored.


$ sudo systemctl start postgresql-15.service
$ sudo -u postgres psql -c "select * from users limit 1"

backup timing

pgBackRest has quite a few setup options and configurations to set up policies that meet specific needs. Your backup strategy will depend on several factors, including recovery point objectives, available storage, and other factors. The correct solution will vary with these requirements. The correct strategy is to strike a balance between recovery time, storage used, IO overhead on the source database, and other factors.

A common recommendation is to combine the backup and WAL archiving features of pgBackRest. Customers are generally advised to take full weekly base backups in addition to continuously archiving the WAL files, and to consider other forms of incremental backups—perhaps even pg_dump.

in conclusion

logical backup        physical backup
pgdump         pgbasebackup pgbackrest
data backup
Backup the specified table or schema × ×
DDL/schema √         √   √  
Migrate between different versions √   × ×
restore to a point in time × × √  
Incremental/differential backup × × √  

Guess you like

Origin blog.csdn.net/hezhou876887962/article/details/129466904