PGPool-II + PG stream duplication implementations standby switching HA

 

PG stream copying can achieve hot standby switch, but is to manually create a trigger file to achieve, for some HA scenarios, the need when the host down, the backup machine automatically switches to know the information upon inquiry pgpool-II can achieve this based on Features. Based on PG stream copy basis in order to achieve the main pgpool-II switchover. It is required to be installed prior to configuration pgpool good on both machines pg planning database, and configured the stream replication environment, on stream replication configuration reference earlier: http://www.jianshu.com/p/12bc931ebba3 .

pgpool dual-machine cluster architecture diagram .png

 

  As shown in FIG PGPool dual-based cluster: pg master node and a backup node to achieve hot standby copy stream, pgpool1, pgpool2 as middleware, the standby pg node joins the cluster, to achieve separate read and write, load balancing and fail-switching HA . Pgpool between two nodes can be entrusted to a virtual ip access node as the application addresses two nodes is monitored by watchdog, when pgpool1 down, pgpool2 will automatically take over the virtual ip Foreign continue to provide uninterrupted service.

A master plan

Hostname | IP | role | port
: ----: |: ----: |: ----: |: ----: |: ----: |: ----:
Master | 192.168.0.108 | PGMaster | 5432
| 192.168.0.108 | pgpool1 | 9999
Slave | 192.168.0.109 | PGSlave | 5432
| 192.168.0.109 | pgpool2 | 9999
vip | 192.168.0.150 | virtual ip | 9999
after the establishment of a good master plan, the master, slave machines disposed at two host

[root@localhost ~]# vi .bashrc
#编辑内容如下:
192.168.0.108 master
192.168.0.109 slave
192.168.0.150 vip

Two Configuring ssh keys

On the master, slave machine generates ssh follows:

[root@localhost ~]# su - postgres
[postgres@localhost ~]$ ssh-keygen -t rsa
[postgres@localhost ~]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
[postgres@localhost ~]$ chmod 600 ~/.ssh/authorized_keys

The master copy of the public key, respectively, to the slave, slave copy of the public key to the master.

#master端
[postgres@localhost ~]$ scp ~/.ssh/authorized_keys postgres@slave:~/.ssh/
#slave端
[postgres@localhost ~]$ scp ~/.ssh/authorized_keys postgres@master:~/.ssh/

Under verify ssh configuration is successful

#master端
[postgres@slave ~]$ ssh postgres@slave
Last login: Tue Dec 20 21:22:50 2016 from master
#slave端
[postgres@slave ~]$ ssh postgres@master
Last login: Tue Dec 20 21:22:50 2016 from slave

Ssh trust relationships proved successful configuration.

III. Installation pgpool

Chinese configuration address can refer http://pgpool.projects.pgfoundry.org/pgpool-II/doc/pgpool-zh_cn.html

# 下载pgpool
[root@master opt]#   wget http://www.pgpool.net/mediawiki/images/pgpool-II-3.6.0.tar.gz
# 解压
[root@master opt]#   tar -zxvf pgpool-II-3.6.0.tar.gz
# 文件权限设置为postgres(其实并非一定装在postgres账户,只不过之前ssh设置都在postgres下,为了方便)
[root@master opt]#   chown -R postgres.postgres /opt/pgpool-II-3.6.0
[root@master ~]# su - postgres
[postgres@master opt]$  cd pgpool-II-3.6.0
[postgres@master pgpool-II-3.6.0]$  ./configure –prefix=/opt/pgpool -with-pgsql=path -with-pgsql=/home/postgres
[postgres@master pgpool-II-3.6.0]$  make
[postgres@master pgpool-II-3.6.0]$  make install

Installation pgpool correlation function, not mandatory, optional installation in order to stabilize the system, it is recommended to install
install pg_reclass, pg_recovery

[postgres@master pgpool-II-3.6.0]$  cd src/sql
[postgres@master sql]$  make
[postgres@master sql]$  make install
[postgres@master sql]$  psql -f insert_lock.sql

Installation all over.

Four configuration pgpool

Pgpool 4.1 Configuration Environment Variables

pgpool installed in the next postgres account, add the environment variable in the account, master, slave nodes perform.

[postgres@master ~]$ cd /home/postgres
[postgres@master ~]$ vim .bashrc
#编辑内容如下
PGPOOLHOME=/opt/pgpool
export PGPOOLHOME
PATH=$PATH:$HOME/.local/bin:$HOME/bin:$PGHOME/bin:$PGPOOLHOME/bin
export PATH

4.2 Configuration pool_hba.conf

pool_hba.conf is of user authentication, to be consistent and pg of pg_hba.conf, or are trust, or are md5 authentication mode, this uses the md5 authentication set as follows:

[postgres@master ~]$ cd /opt/pgpool/etc
[postgres@etc~]$ cp pool_hba.conf.sample pool_hba.conf
[postgres@etc~]$ vim pool_hba.conf
#编辑内容如下
# "local" is for Unix domain socket connections only
local   all         all                            md5
# IPv4 local connections:
host    all         all         0.0.0.0/0          md5
host    all         all         0/0                md5

4.3 Configuration pcp.conf

pcp.conf configured for landing management pgpool own use, some operations pgpool tools will be asked to provide passwords, configuration is as follows:

[postgres@master ~]$ cd /opt/pgpool/etc
[postgres@etc~]$ cp pcp.conf.sample pcp.conf
# 使用pg_md5生成配置的用户名密码
[postgres@etc~]$ pg_md5 nariadmin
6b07583ba8af8e03043a1163147faf6a
#pcp.conf是pgpool管理器自己的用户名和密码,用于管理集群。
[postgres@etc~]$ vim pcp.conf
#编辑内容如下
postgres:6b07583ba8af8e03043a1163147faf6a
#保存退出!
#在pgpool中添加pg数据库的用户名和密码
[postgres@etc~]$ pg_md5 -p -m -u postgres pool_passwd
#数据库登录用户是postgres,这里输入登录密码,不能出错
#输入密码后,在pgpool/etc目录下会生成一个pool_passwd文件

4.4 Configuration System Command privilege

Configuration ifconfig, arping execute permissions, perform failover_stream.sh need to use, allowing the user to perform other ordinary.

[root@master ~]# chmod u+s /sbin/ifconfig 
[root@master ~]# chmod u+s /usr/sbin 

4.5 Configuration pgpool.conf

Check your phone card, configure delegate_IP need back

[postgres@etc~]$ ifconfig

NIC name .png

Pgpool.conf configuration on the master:

[postgres@master ~]$ cd /opt/pgpool/etc
[postgres@etc~]$ cp pgpool.conf.sample pgpool.conf
[postgres@etc~]$ vim pgpool.conf

Edit follows:

# CONNECTIONS
listen_addresses = '*'
port = 9999
pcp_listen_addresses = '*'
pcp_port = 9898

# - Backend Connection Settings -

backend_hostname0 = 'master'
backend_port0 = 5432
backend_weight0 = 1
backend_data_directory0 = '/home/postgres/data'
backend_flag0 = 'ALLOW_TO_FAILOVER'

backend_hostname1 = 'slave'
backend_port1 = 5432
backend_weight1 = 1
backend_data_directory1 = '/home/postgres/data'
backend_flag1 = 'ALLOW_TO_FAILOVER'

# - Authentication -
enable_pool_hba = on
pool_passwd = 'pool_passwd'

# FILE LOCATIONS
pid_file_name = '/opt/pgpool/pgpool.pid'

replication_mode = off
load_balance_mode = on
master_slave_mode = on
master_slave_sub_mode = 'stream'

sr_check_period = 5
sr_check_user = 'repuser'
sr_check_password = 'repuser'
sr_check_database = 'postgres'

#------------------------------------------------------------------------------
# HEALTH CHECK 健康检查
#------------------------------------------------------------------------------

health_check_period = 10 # Health check period
                                   # Disabled (0) by default
health_check_timeout = 20
                                   # Health check timeout
                                   # 0 means no timeout
health_check_user = 'postgres'
                                   # Health check user
health_check_password = 'nariadmin' #数据库密码
                                   # Password for health check user
health_check_database = 'postgres'
#必须设置,否则primary数据库down了,pgpool不知道,不能及时切换。从库流复制还在连接数据,报连接失败。
#只有下次使用pgpool登录时,发现连接不上,然后报错,这时候,才知道挂了,pgpool进行切换。


#主备切换的命令行配置
#------------------------------------------------------------------------------
# FAILOVER AND FAILBACK
#------------------------------------------------------------------------------

failover_command = '/opt/pgpool/failover_stream.sh %H '

#------------------------------------------------------------------------------
# WATCHDOG
#------------------------------------------------------------------------------

# - Enabling -
use_watchdog = on
# - Watchdog communication Settings -

wd_hostname = 'master'
                                    # Host name or IP address of this watchdog
                                    # (change requires restart)
wd_port = 9000
                                    # port number for watchdog service
                                    # (change requires restart)
# - Virtual IP control Setting -

delegate_IP = 'vip'
                                    # delegate IP address
                                    # If this is empty, virtual IP never bring up.
                                    # (change requires restart)
if_cmd_path = '/sbin'
                                    # path to the directory where if_up/down_cmd exists
                                    # (change requires restart)
if_up_cmd = 'ifconfig eth1:0 inet $_IP_$ netmask 255.255.255.0'
                                    # startup delegate IP command
                                    # (change requires restart)
                                    # eth1根据现场机器改掉
if_down_cmd = 'ifconfig eth1:0 down'
                                    # shutdown delegate IP command
                                    # (change requires restart)
                                    # eth1根据现场机器改掉
# -- heartbeat mode --

wd_heartbeat_port = 9694
                                    # Port number for receiving heartbeat signal
                                    # (change requires restart)
wd_heartbeat_keepalive = 2
                                    # Interval time of sending heartbeat signal (sec)
                                    # (change requires restart)
wd_heartbeat_deadtime = 30
                                    # Deadtime interval for heartbeat signal (sec)
                                    # (change requires restart)
heartbeat_destination0 = 'slave'
                                    # Host name or IP address of destination 0
                                    # for sending heartbeat signal.
                                    # (change requires restart)
heartbeat_destination_port0 = 9694
                                    # Port number of destination 0 for sending
                                    # heartbeat signal. Usually this is the
                                    # same as wd_heartbeat_port.
                                    # (change requires restart)
heartbeat_device0 = 'eth1'
                                    # Name of NIC device (such like 'eth0')
                                    # used for sending/receiving heartbeat
                                    # signal to/from destination 0.
                                    # This works only when this is not empty
                                    # and pgpool has root privilege.
                                    # (change requires restart)
                                    # eth1根据现场机器改掉
# - Other pgpool Connection Settings -

other_pgpool_hostname0 = 'slave' #对端
                                    # Host name or IP address to connect to for other pgpool 0
                                    # (change requires restart)
other_pgpool_port0 = 9999
                                    # Port number for othet pgpool 0
                                    # (change requires restart)
other_wd_port0 = 9000
                                    # Port number for othet watchdog 0
                                    # (change requires restart)

Pgpool.conf configuration on the slave:

# CONNECTIONS
listen_addresses = '*'
port = 9999
pcp_listen_addresses = '*'
pcp_port = 9898

# - Backend Connection Settings -

backend_hostname0 = 'master'
backend_port0 = 5432
backend_weight0 = 1
backend_data_directory0 = '/home/postgres/data'
backend_flag0 = 'ALLOW_TO_FAILOVER'

backend_hostname1 = 'slave'
backend_port1 = 5432
backend_weight1 = 1
backend_data_directory1 = '/home/postgres/data'
backend_flag1 = 'ALLOW_TO_FAILOVER'

# - Authentication -
enable_pool_hba = on
pool_passwd = 'pool_passwd'

# FILE LOCATIONS
pid_file_name = '/opt/pgpool/pgpool.pid'

replication_mode = off
load_balance_mode = on
master_slave_mode = on
master_slave_sub_mode = 'stream'

sr_check_period = 5
sr_check_user = 'repuser'
sr_check_password = 'repuser'
sr_check_database = 'postgres'

#------------------------------------------------------------------------------
# HEALTH CHECK 健康检查
#------------------------------------------------------------------------------

health_check_period = 10 # Health check period
                                   # Disabled (0) by default
health_check_timeout = 20
                                   # Health check timeout
                                   # 0 means no timeout
health_check_user = 'postgres'
                                   # Health check user
health_check_password = 'nariadmin' #数据库密码
                                   # Password for health check user
health_check_database = 'postgres'
#必须设置,否则primary数据库down了,pgpool不知道,不能及时切换。从库流复制还在连接数据,报连接失败。
#只有下次使用pgpool登录时,发现连接不上,然后报错,这时候,才知道挂了,pgpool进行切换。


#主备切换的命令行配置
#------------------------------------------------------------------------------
# FAILOVER AND FAILBACK
#------------------------------------------------------------------------------

failover_command = '/opt/pgpool/failover_stream.sh %H '

#------------------------------------------------------------------------------
# WATCHDOG
#------------------------------------------------------------------------------

# - Enabling -
use_watchdog = on
# - Watchdog communication Settings -

wd_hostname = 'slave'  #本端
                                    # Host name or IP address of this watchdog
                                    # (change requires restart)
wd_port = 9000
                                    # port number for watchdog service
                                    # (change requires restart)
# - Virtual IP control Setting -

delegate_IP = 'vip'
                                    # delegate IP address
                                    # If this is empty, virtual IP never bring up.
                                    # (change requires restart)
if_cmd_path = '/sbin'
                                    # path to the directory where if_up/down_cmd exists
                                    # (change requires restart)
if_up_cmd = 'ifconfig eth1:0 inet $_IP_$ netmask 255.255.255.0'
                                    # startup delegate IP command
                                    # (change requires restart)
                                    # eth1根据现场机器改掉
if_down_cmd = 'ifconfig eth1:0 down'
                                    # shutdown delegate IP command
                                    # (change requires restart)
                                    # eth1根据现场机器改掉
# -- heartbeat mode --

wd_heartbeat_port = 9694
                                    # Port number for receiving heartbeat signal
                                    # (change requires restart)
wd_heartbeat_keepalive = 2
                                    # Interval time of sending heartbeat signal (sec)
                                    # (change requires restart)
wd_heartbeat_deadtime = 30
                                    # Deadtime interval for heartbeat signal (sec)
                                    # (change requires restart)
heartbeat_destination0 = 'master' #对端
                                    # Host name or IP address of destination 0
                                    # for sending heartbeat signal.
                                    # (change requires restart)
heartbeat_destination_port0 = 9694
                                    # Port number of destination 0 for sending
                                    # heartbeat signal. Usually this is the
                                    # same as wd_heartbeat_port.
                                    # (change requires restart)
heartbeat_device0 = 'eth1'
                                    # Name of NIC device (such like 'eth0')
                                    # used for sending/receiving heartbeat
                                    # signal to/from destination 0.
                                    # This works only when this is not empty
                                    # and pgpool has root privilege.
                                    # (change requires restart)
                                    # eth1根据现场机器改掉
# - Other pgpool Connection Settings -

other_pgpool_hostname0 = 'master' #对端
                                    # Host name or IP address to connect to for other pgpool 0
                                    # (change requires restart)
other_pgpool_port0 = 9999
                                    # Port number for othet pgpool 0
                                    # (change requires restart)
other_wd_port0 = 9000
                                    # Port number for othet watchdog 0
                                    # (change requires restart)

Configuration files, troubleshooting configuration is failover_command = '/opt/pgpool/failover_stream.sh% H', therefore, need to write a script failover_stream.sh in / opt / pgpool directory:

[postgres@master ~]$ cd /opt/pgpool
[postgres@pgpool~]$ touch failover_stream.sh
[postgres@pgpool~]$ vim failover_stream.sh

Note the use of the trigger file rather than promote, the trigger switch back and forth a problem with the file, edit the content as follows:

#! /bin/sh 
# Failover command for streaming replication. 
# Arguments: $1: new master hostname. 

new_master=$1 
trigger_command="$PGHOME/bin/pg_ctl promote -D $PGDATA" 

# Prompte standby database. 
/usr/bin/ssh -T $new_master $trigger_command 

exit 0; 

If it is created by other users, you need to give postgres executable permissions, for example,

[root@opt ~]$ chown -R postgres.postgres /opt/pgpool
[root@opt ~]]$ chmod 777  /opt/pgpool/failover_stream.sh

Five PGPool Cluster Management

Create two log files in master, slave node before you start:

[root@master ~]# mkdir /var/log/pgpool
[root@master ~]# chown -R postgres.postgres /var/log/pgpool
[root@master ~]# mkdir /var/run/pgpool
[root@master ~]# chown -R postgres.postgres /var/run/pgpool

Start Cluster 5.1

Respectively start primary, standby of pg library

#master上操作
[postgres@master ~]$ pg_ctl start -D $PGDATA
#slave上操作
[postgres@slave ~]$ pg_ctl start -D $PGDATA

Respectively start pgpool command:

#master上操作
# -D会重新加载pg nodes的状态如down或up
[postgres@master ~]$ pgpool -n -d -D > /var/log/pgpool/pgpool.log 2>&1 &
[1] 3557

#slave上操作
[postgres@slave ~]$ pgpool -n -d -D > /var/log/pgpool/pgpool.log 2>&1 &
[1] 3557

Note Quick terminate pgpool command:

[postgres@ ~]$ pgpool -m fast stop

After starting pgpool, view the status of the cluster nodes:

[postgres@master ~]$ psql -h vip -p 9999
psql (9.6.1)
#提示输入密码:
Type "help" for help.

postgres=# show pool_nodes;
 node_id | hostname | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay 
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------
 0       | master   | 5432 | up     | 0.500000  | primary | 0             | false  | 0
 1       | slave     | 5432 | up     | 0.500000  | standby | 0             |  true  | 0
(2 rows)

#在slave上节点也是psql -h vip -p 9999,双pgpool使用虚拟ip,做到高可用。

It found that the current node is normal standby state up.

5.2 Pgpool of HA

5.2.1 Analog master terminal pgpool downtime

在master节点上停止pgpool服务
[postgres@master ~]$ pgpool -m fast stop
#稍等片刻后,访问集群
[postgres@master ~]$ psql -h vip -p 9999
psql (9.6.1)
#提示输入密码:
Type "help" for help.

postgres=# show pool_nodes;
 node_id | hostname | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay 
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------
 0       | master   | 5432 | up     | 0.500000  | primary | 0             | false  | 0
 1       | slave     | 5432 | up     | 0.500000  | standby | 0             |  true  | 0
(2 rows)
#访问成功,在master节点上的pgpool宕机后,由slave节点的pgpool接管vip和集群服务,并未中断应用访问。
#在master上重新启动pgpool后,定制slave上的pgpool服务,结果一样。

5.2.2 Simulation pg primary master terminal downtime

[postgres@master ~]$ pg_ctl stop
#master端打印
2017-07-24 18:52:37.751 PDT [28154] STATEMENT:  SELECT pg_current_xlog_location()
2017-07-24 18:52:37.760 PDT [2553] LOG:  received fast shutdown request
2017-07-24 18:52:37.760 PDT [2553] LOG:  aborting any active transactions
2017-07-24 18:52:37.762 PDT [28156] FATAL:  canceling authentication due to timeout
2017-07-24 18:52:37.763 PDT [2555] LOG:  shutting down
2017-07-24 18:52:37.768 PDT [28158] FATAL:  the database system is shutting down
2017-07-24 18:52:37.775 PDT [28159] FATAL:  the database system is shutting down
2017-07-24 18:52:39.653 PDT [2553] LOG:  database system is shut down

#slave端打印
2017-07-24 18:52:41.455 PDT [2614] LOG:  invalid record length at 0/2A000098: wanted 24, got 0
2017-07-24 18:52:47.333 PDT [2614] LOG:  received promote request
2017-07-24 18:52:47.333 PDT [2614] LOG:  redo done at 0/2A000028
2017-07-24 18:52:47.333 PDT [2614] LOG:  last completed transaction was at log time 2017-07-24 18:17:00.946759-07
2017-07-24 18:52:47.336 PDT [2614] LOG:  selected new timeline ID: 10
2017-07-24 18:52:47.841 PDT [2614] LOG:  archive recovery complete
2017-07-24 18:52:47.851 PDT [2613] LOG:  database system is ready to accept connections

#日志清楚看到主机down机了,slave切换了。
#稍等片刻后,访问集群
[postgres@master ~]$ psql -h vip -p 9999
Password: 
psql (10beta1)
Type "help" for help.

postgres=# show pool_nodes;
 node_id | hostname | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay 
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------
 0       | master   | 5432 | down   | 0.500000  | standby | 0          | false             | 0
 1       | slave    | 5432 | up     | 0.500000  | primary | 0          | true              | 0
(2 rows)
#slave已经被切换成primary,且master节点状态是down

5.2.3 repair the master node rejoins the cluster

After the master node down machine, slave node has been switched become the primary, repaired master should rejoin the node as a standby primary.
Repair master end and start operation:

[postgres@master ~]$ cd $PGDATA
[postgres@master data]$ mv recovery.done recovery.conf #一定要把.done改成.conf
[postgres@master data]$ pg_ctl start

Join the cluster node status in pgpool:

#注意master的node_id是0,所以-n 0
[postgres@master data]$ pcp_attach_node -d -U postgres -h vip -p 9898 -n 0
#提示输入密码,输入pcp管理密码。
#查看当前状态
postgres=# show pool_nodes;
 node_id | hostname | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay 
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------
 0       | master   | 5432 | up    | 0.500000  | standby | 0             | false  | 0
 1       | slave     | 5432 | up     | 0.500000  | primary | 0             |  true  | 0
(2 rows)

5.2.4 directly down the host machine

The current slave node is primay, we direct the slave server directly off and found to achieve a standby switching, the slave has been down, and the master has been switched become the primary:

[postgres@master ~]$ psql -h vip -p 9999
Password: 
psql (10beta1)
Type "help" for help.

postgres=# show pool_nodes;
 node_id | hostname | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay 
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------
 0       | master   | 5432 | up     | 0.500000  | primary | 0          | true              | 0
 1       | slave    | 5432 | down   | 0.500000  | standby | 0          | false             | 0
(2 rows)

5.3 Data line sync

Node data when the switchover, and restart after a repairing node, data that has changed since the primary, or repair varies according to the cluster added stream copy mode, packets are likely to line sync time error:

#slave机器重启后,由于master或slave数据不同步产生了
[postgres@slave data]$ mv recovery.done recovery.conf
[postgres@slave data]$ pg_ctl start
waiting for server to start....2017-07-24 19:31:44.563 PDT [2663] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2017-07-24 19:31:44.563 PDT [2663] LOG:  listening on IPv6 address "::", port 5432
2017-07-24 19:31:44.565 PDT [2663] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2017-07-24 19:31:44.584 PDT [2664] LOG:  database system was shut down at 2017-07-24 19:31:30 PDT
2017-07-24 19:31:44.618 PDT [2664] LOG:  entering standby mode
2017-07-24 19:31:44.772 PDT [2664] LOG:  consistent recovery state reached at 0/2D000098
2017-07-24 19:31:44.772 PDT [2663] LOG:  database system is ready to accept read only connections
2017-07-24 19:31:44.772 PDT [2664] LOG:  invalid record length at 0/2D000098: wanted 24, got 0
2017-07-24 19:31:44.798 PDT [2668] LOG:  fetching timeline history file for timeline 11 from primary server
2017-07-24 19:31:44.826 PDT [2668] FATAL:  could not start WAL streaming: ERROR:  requested starting point 0/2D000000 on timeline 10 is not in this server's history
    DETAIL:  This server's history forked from timeline 10 at 0/2B0001B0.
2017-07-24 19:31:44.826 PDT [2664] LOG:  new timeline 11 forked off current database system timeline 10 before current recovery point 0/2D000098
 done

For this situation, the need to synchronize data pg_rewind timeline tool, particularly sub-step 5.

5.3.1 stopped needs to be done to synchronize the nodes pg Service

[postgres@slave ] pg_ctl stop 

5.3.2 synchronization master node timeline

[postgres@slave data]$ pg_rewind  --target-pgdata=/home/postgres/data --source-server='host=master port=5432 user=postgres dbname=postgres password=nariadmin'
servers diverged at WAL location 0/2B0001B0 on timeline 10
rewinding from last common checkpoint at 0/2B000108 on timeline 10
Done!

5.3.3 modify pg_hba.conf file with recovery.done

#pg_hba.conf与 recovery.done都是同步master上来的,要改成slave自己的
[postgres@slave ] cd $PGDATA
[postgres@slave data]$ mv recovery.done recovery.conf
[postgres@slave data]$ vi pg_hba.conf
#slave改成master(相当于slave的流复制对端)
host    replication     repuser         master                   md5
[postgres@slave data]$ vi recovery.conf
#slave改成master(相当于slave的流复制对端)
primary_conninfo = 'host=master port=5432 user=repuser password=repuser'   

5.3.4 Restart pg Service

[postgres@slave data]$ pg_ctl start
waiting for server to start....2017-07-24 19:47:06.821 PDT [2722] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2017-07-24 19:47:06.821 PDT [2722] LOG:  listening on IPv6 address "::", port 5432
2017-07-24 19:47:06.907 PDT [2722] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2017-07-24 19:47:06.930 PDT [2723] LOG:  database system was interrupted while in recovery at log time 2017-07-24 19:25:42 PDT
2017-07-24 19:47:06.930 PDT [2723] HINT:  If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.
2017-07-24 19:47:06.961 PDT [2723] LOG:  entering standby mode
2017-07-24 19:47:06.966 PDT [2723] LOG:  redo starts at 0/2B0000D0
2017-07-24 19:47:06.971 PDT [2723] LOG:  consistent recovery state reached at 0/2B01CA30
2017-07-24 19:47:06.972 PDT [2722] LOG:  database system is ready to accept read only connections
2017-07-24 19:47:06.972 PDT [2723] LOG:  invalid record length at 0/2B01CA30: wanted 24, got 0
2017-07-24 19:47:06.982 PDT [2727] LOG:  started streaming WAL from primary at 0/2B000000 on timeline 11
 done
server started

5.3.5 rejoins the cluster

#注意slave的node_id是1,所以-n 1
[postgres@slave data]$ pcp_attach_node -d -U postgres -h vip -p 9898 -n 1
Password: #提示输入密码,输入pcp管理密码。
DEBUG: recv: tos="m", len=8
DEBUG: recv: tos="r", len=21
DEBUG: send: tos="C", len=6
DEBUG: recv: tos="c", len=20
pcp_attach_node -- Command Successful
DEBUG: send: tos="X", len=4

5.3.6 View cluster node status

[postgres@slave data]$ psql -h vip -p 9999
Password: 
psql (10beta1)
Type "help" for help.

postgres=# show pool_nodes;
 node_id | hostname | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay 
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------
 0       | master   | 5432 | up     | 0.500000  | primary | 0          | true              | 0
 1       | slave    | 5432 | up     | 0.500000  | standby | 0          | false             | 0
(2 rows)

All restoration work is completed.

Published 17 original articles · won praise 2 · views 50000 +

Guess you like

Origin blog.csdn.net/u011250186/article/details/103767124