PG backup and recovery (two)

Two, physical backup

2.1 Cold standby

The physical cold backup is to stop the database and perform a physical backup by physically copying the PGDATA directory. The biggest disadvantage of this method is that the database must be stopped in order to ensure data consistency.

2.2 Hot Standby

1. Snapshot backup

Through the snapshot backup of the server, the file system or block device level snapshot function is used to complete the backup to ensure the consistency of the data at the time of backup.

2. PITR hot backup (point-in-time-recover)

The implementation of RITR is mainly based on the WAL log of the PG database, and all database changes will be recorded in the WAL log. The WAL log in the PG database can be used for crash recovery. When the database is down and restarted, the database locates the location of the last checkpoint and restores all changes after the checkpoint through the WAL log, so that the database can reach a consistent state. It is also based on this principle that PITR also uses WAL logs to ensure the consistency of data backup.

The physical hot backup of PG is mainly divided into two parts. The first part is the online copy of data files. This part is also called basic backup. Basic backup cannot guarantee data consistency; the second part is WAL log backup. Basic backup + WAL log backup can be called a complete backup of the PG database to ensure the consistency of the backup database.

1) pg_hba.conf file

The pg_hba.conf file is the authentication configuration file of pg. Whether it is our user login or master-slave replication, we need to fill in the relevant authentication configuration under this file to ensure that the database can be connected normally.

pg_hba.conf mainly consists of 5 parameters: Type (host type), Database (database name), User (user name), Address (IP address and mask), Method (encryption method)

  • Type

Indicates the allowed connection method.

"local" means using Unix-domain socket for connection;

"Host" means that ssl or non-ssl encrypted TCP/IP socket can be used for connection;

"Hostssl" means that the connection must be made through an ssl-encrypted TCP/IP socket;

"Hostnossl" means to connect by using a non-ssl TCP/IP socket.

  • Database

Indicates that the database information can be accessed, which can be "all", "sameuser", "samerole", "replication", or the name of the business database;

"All" does not include "replication". If you need to pass replication, you need to write a separate release rule;

Multiple databases are separated by commas

  • User

Represents the database user information of the authentication configuration, which can be "all" or a certain database user can be specified;

Multiple user authorizations can be separated by commas;

Can refer to the configuration information information authentication configuration in the external file, @${filename}

  • Address

Indicates the host IP information of the authentication configuration, which can be a host name or IP+mask;

0.0.0.0/0 means all hosts

  • Method

Indicates the password encryption strategy, which can be set to "trust", "reject", "md5", "password", "scram-sha-256", "gss", "sspi", "ident", "peer", "pam" ”, “ldap”, “radius” or “cert”;

password means to send the password in plain text;

2) wal log backup

the way Features
wal log archive Always lag behind by a wal log, implemented by commands such as cp, scp, etc.
Streaming replication There are mainly two modes: synchronous and asynchronous, quasi-real-time backup, master-slave replication is implemented based on this method
  • wal log archive configuration
# 相关参数配置,修改后需要重启数据库
# 归档目录需要对postgres进行授权

$ vi postgresql.conf
wal_level = replica             # minimal, replica, or logical
archive_mode = on               # enables archiving; off, on, or always
                                # (change requires restart)
archive_command = 'cp %p /data/pgsql12/archive/%f && echo %f >> /data/pgsql12/archive/archive.list'             # command to use to archive a logfile segment


# 重启数据库
$ psql -D /data/pgsql12/data stop
$ psql -D /data/pgsql12/data start


# 查看归档wal日志
$ pwd
/data/pgsql12/archive
$ ll
total 16388
-rw------- 1 postgres postgres 16777216 Sep  5 16:44 00000001000000000000000D
-rw------- 1 postgres postgres       25 Sep  5 16:44 archive.list

3) pg_basebackup tool

The pg_basebackup tool must connect to the database instance through the replication protocol, so before using the pg_basebackup tool, we need to configure the pg_hba.conf authentication information according to the actual situation.

The pg_basebackup tool can only back up the entire database instance, and cannot individually select a database or a table for backup. The tool can be connected to a primary database for backup, or it can be connected to a standby database instance for backup.

  • Basic syntax and important parameters:
pg_basebackup [OPTION]...
parameter meaning
-D/–pg-data Backup file directory, which means that the backup file is written to that directory
-F/–format The default is p, you can choose p, t
-r/–max-rate Maximum rate limit for data transmission
-R/–write-recovery-conf Output configuration information for replication
-X, --wal-method Specify the way to copy the wal log, including none, fetch, stream, it is recommended to use stream to avoid receiving wal information because the source log is overwritten
-z, --gzip Whether to compress, use with -F t
-Z, --compress=0-9 Compression level, the larger the number, the greater the compression rate and the more CPU resources are consumed
-c, --checkpoint Set the checkpoint mode: fast, spread
-C, --create-slot Create a replication slot
-S, --slot=SLOTNAME Specify the name of the replication slot
-l, --label=LABEL Specify a backup identifier to facilitate subsequent maintenance by operation and maintenance personnel
-n, --no-clean do not clean up after errors
-N, --no-sync do not wait for changes to be written safely to disk
-P, --progress Print backup progress information
-v, --verbose Output detailed information
  • Examples of commonly used backup commands

Common commands used to build a standby database:

# -h、-p -U 指定源数据库IP、port、数据库用户 信息
# -D 指定目标端数据目录
# -P、-v 表示打印详细的备份信息
# -R 表示写入replication的恢复文件 ??(有点不要确定具体做了什么)
# -X stream 表示备份开始后,启动一个stream链接从主数据库接收wal日志,避免了featch模式下wal日志被覆盖的风险
# -C、-S 指定了在源数据库创建一个什么名称的复制槽
# -l 若不指定会创建一个默认的标签,建议使用该参数做一些时间标识,便于维护

-bash-4.2$ pg_basebackup -h 192.168.0.175 -p 5432 -D /data/pgsql12/data/ -U repl  -P -v -R -X stream -C -S ${pgstandby_slotname}


-bash-4.2$ cat backup_label.old                             //查看标签信息
START WAL LOCATION: 0/2000028 (file 000000010000000000000002)
CHECKPOINT LOCATION: 0/2000060
BACKUP METHOD: streamed
BACKUP FROM: master
START TIME: 2020-09-05 13:06:31 CST
LABEL: pg_basebackup base backup
START TIMELINE: 1
-bash-4.2$

Commonly used commands for basic backup and recovery

4) Recovery example by time

1. First do a basic backup through pg_basebackup

$ pg_basebackup -D /data/pg_backup/  -v -P -Urepl -h 127.0.0.1 -p5432 -R
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/F000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_15981"
32566/32566 kB (100%), 1/1 tablespace
pg_basebackup: write-ahead log end point: 0/F000138
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: syncing data to disk ...
pg_basebackup: base backup completed

2. Write partial data

db1=# create table t3(id int);
CREATE TABLE
db1=# insert into t3 values (generate_series(1,1000));
INSERT 0 1000

db1=# select now();
              now
-------------------------------
 2020-09-05 17:34:21.595701+08
(1 row)

db1=# select pg_current_wal_lsn(),
                  pg_walfile_name(pg_current_wal_lsn()),
                  pg_walfile_name_offset(pg_current_wal_lsn());
 pg_current_wal_lsn |     pg_walfile_name      |      pg_walfile_name_offset
--------------------+--------------------------+-----------------------------------
 0/100201A8         | 000000010000000000000010 | (000000010000000000000010,131496)
(1 row)

db1=# select pg_current_wal_lsn();
 pg_current_wal_lsn
--------------------
 0/100201A8
(1 row)




db1=# create table t4(id int);
CREATE TABLE
db1=# insert into t4 values (generate_series(1,1000));
INSERT 0 1000

3. Simulate database failure and restore through basic backup and wal archive log

$ pg_ctl -D /data/pgsql12/data/ stop
waiting for server to shut down.... done
server stopped
$ ll /data/pgsql12/archive
total 65544
-rw------- 1 postgres postgres 16777216 Sep  5 16:44 00000001000000000000000D
-rw------- 1 postgres postgres 16777216 Sep  5 17:26 00000001000000000000000E
-rw------- 1 postgres postgres 16777216 Sep  5 17:26 00000001000000000000000F
-rw------- 1 postgres postgres      337 Sep  5 17:26 00000001000000000000000F.00000028.backup
-rw------- 1 postgres postgres 16777216 Sep  5 17:40 000000010000000000000010
-rw------- 1 postgres postgres      141 Sep  5 17:40 archive.list

4. Copy the basic backup to only the PGDATA directory

$ mv data data_bak
$ mv pg_backup data         //该步骤要关注目录的权限,默认备份的目录权限不正确,可使用cp -r来做

5. Modify the recover parameters of the postgresql.auto.conf file

# 基于时间点进行恢复
restore_command = 'cp /data/pgsql12/archive/%f %p > /data/pgsql12/archive/recovery.log 2>&1 '
recovery_target_time = '2020-09-05 17:34:21'

# 基于lsn进行恢复
restore_command = 'cp /data/pgsql12/archive/%f %p > /data/pgsql12/archive/recovery.log 2>&1 '
recovery_target_lsn='0/100201A8'

6. Start the database for recovery

$ pg_ctl  -D /data/pgsql12/data start -l /data/pgsql12/logs/logfile
waiting for server to start.... done
server started


# 基于时间点恢复的日志
2020-09-05 18:04:14.891 CST [16354] LOG:  starting PostgreSQL 12.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit
2020-09-05 18:04:14.891 CST [16354] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2020-09-05 18:04:14.891 CST [16354] LOG:  listening on IPv6 address "::", port 5432
2020-09-05 18:04:14.895 CST [16354] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2020-09-05 18:04:14.912 CST [16355] LOG:  database system was interrupted; last known up at 2020-09-05 17:26:43 CST
2020-09-05 18:04:14.928 CST [16355] LOG:  entering standby mode
2020-09-05 18:04:15.290 CST [16355] LOG:  restored log file "00000001000000000000000F" from archive
2020-09-05 18:04:15.310 CST [16355] LOG:  redo starts at 0/F000028
2020-09-05 18:04:15.312 CST [16355] LOG:  consistent recovery state reached at 0/F000138
2020-09-05 18:04:15.419 CST [16354] LOG:  database system is ready to accept read only connections
2020-09-05 18:04:15.861 CST [16355] LOG:  restored log file "000000010000000000000010" from archive
2020-09-05 18:04:15.875 CST [16355] LOG:  recovery stopping before commit of transaction 496, time 2020-09-05 17:37:55.830747+08
2020-09-05 18:04:15.875 CST [16355] LOG:  recovery has paused
2020-09-05 18:04:15.875 CST [16355] HINT:  Execute pg_wal_replay_resume() to continue.


# 基于lsn恢复的日志

2020-09-05 18:09:51.809 CST [16442] LOG:  starting PostgreSQL 12.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit
2020-09-05 18:09:51.809 CST [16442] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2020-09-05 18:09:51.809 CST [16442] LOG:  listening on IPv6 address "::", port 5432
2020-09-05 18:09:51.814 CST [16442] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2020-09-05 18:09:51.919 CST [16443] LOG:  database system was interrupted; last known up at 2020-09-05 17:26:43 CST
2020-09-05 18:09:51.989 CST [16443] LOG:  entering standby mode
2020-09-05 18:09:52.086 CST [16443] LOG:  restored log file "00000001000000000000000F" from archive
2020-09-05 18:09:52.119 CST [16443] LOG:  redo starts at 0/F000028
2020-09-05 18:09:52.121 CST [16443] LOG:  consistent recovery state reached at 0/F000138
2020-09-05 18:09:52.121 CST [16442] LOG:  database system is ready to accept read only connections
2020-09-05 18:09:52.164 CST [16443] LOG:  restored log file "000000010000000000000010" from archive
2020-09-05 18:09:52.179 CST [16443] LOG:  recovery stopping after WAL location (LSN) "0/100201A8"
2020-09-05 18:09:52.179 CST [16443] LOG:  recovery has paused
2020-09-05 18:09:52.179 CST [16443] HINT:  Execute pg_wal_replay_resume() to continue.


7. Data verification

postgres=# \c db1
You are now connected to database "db1" as user "postgres".
db1=# \d
        List of relations
 Schema | Name | Type  |  Owner
--------+------+-------+----------
 public | t1   | table | postgres
 public | t2   | table | postgres
 public | t3   | table | postgres
(3 rows)

db1=# select count(*) from t3;
 count
-------
  1000
(1 row)

db1=# select count(*) from t4;
ERROR:  relation "t4" does not exist
LINE 1: select count(*) from t4;

problem:

1. If we recover based on lsn, how to find the information of the Lsn site that needs to be recovered?

2. The following error will appear when the master-slave architecture is added directly to the master. How should we deal with this situation?


2020-09-05 18:05:48.345 CST [16387] ERROR:  replication slot "pgstandby1" does not exist
2020-09-05 18:05:53.351 CST [16388] ERROR:  replication slot "pgstandby1" does not exist

Guess you like

Origin blog.csdn.net/weixin_37692493/article/details/108500420