pgpool-Ⅱ一主两从同步流读写分离高可用方案测试

PostgreSQL pgpool-Ⅱ 一主多从读写分离HA搭建

  • 环境
服务器 角色
10.10.56.16:5432 master
10.10.56.17:5432 slave
10.10.56.18:5432 slave
10.10.56.16:9999 pgpool-Ⅱ Master
10.10.56.17:9999 pgpool-Ⅱ SLave
10.10.56.18:9999 pgpool-Ⅱ SLave

pgpool流复制架构图
pgpool流复制架构图

搭建1主两从同步流复制环境

  • PostgreSQL 数据库目录和端口
服务器 数据库目录 端口
16 /pgdata/pgpool/data 5532
17 /pgdata/pgpool/data 5532
18 /pgdata/pgpool/data 5532

- 安装PostgreSQL 数据库略,以下基于pg的命令 需要配置 环境变量 才能使用

  • 环境变量配置 vim /etc/profile 如下
CLWDB1: vim /etc/profile

export PATH=/opt/pgsql-10/bin:$PATH

16服务器PostgreSQL主库搭建

  • 16服务器创建数据库 data 目录
postgres@CLWDB1:/pgdata> mkdir pgpool/data -p
  • 16 服务器 初始化数据库
CLWDB1:/pgdata/10/data # su - postgres
postgres@CLWDB1:/pgdata/pgpool> initdb -D /pgdata/pgpool/data/
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.
...
Success. You can now start the database server using:

    pg_ctl -D /pgdata/pgpool/data/ -l logfile start
  • 修改 16 服务器 pg_hba.conf ,添加如下
# TYPE  DATABASE        USER            ADDRESS                 METHOD

# 表示允许任意网段的用户通过MD5进行认证连接
host    all             all             0.0.0.0/0               md5  

# 表示允许该网段10.10.56.0 的repl 用户进行流复制
host    replication     repl            10.10.56.0/0            trust
host    replication     all             ::1/128                 trust
  • 修改16服务器 postgresqlconf 参数配置如下, # rtm 为注释
listen_addresses = '*'                       # rtm
port = 5532                                  # rtm
max_connections = 100                        # rtm
superuser_reserved_connections = 10          # rtm
wal_level = logical                          # rtm
full_page_writes = on                        # rtm
wal_log_hints = off                          # rtm
archive_mode = on                            # rtm
archive_command = '/bin/true'
max_wal_senders = 50                         # rtm
hot_standby = on                             # rtm
log_destination = 'csvlog'                   # rtm
logging_collector = on                       # rtm  
log_directory = 'log'                        # rtm
log_filename = 'postgresql-%Y-%m-%d_%H%M%S'  # rtm
log_rotation_age = 1d                        # rtm
log_rotation_size = 10MB                     # rtm
log_statement = 'mod'                        # rtm
  • 16服务器启动pg服务
postgres@CLWDB1:/pgdata/pgpool/data> pg_ctl -D /pgdata/pgpool/data/ start
waiting for server to start....2018-07-02 15:00:21.403 CST [9609] LOG:  listening on IPv4 address "0.0.0.0", port 5532
2018-07-02 15:00:21.403 CST [9609] LOG:  listening on IPv6 address "::", port 5532
2018-07-02 15:00:21.405 CST [9609] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5532"
2018-07-02 15:00:21.412 CST [9609] LOG:  redirecting log output to logging collector process
2018-07-02 15:00:21.412 CST [9609] HINT:  Future log output will appear in directory "log".
 done
server started
  • 16 服务器修改默认用户 postgres 的密码和创建用于 流复制的用户 repl
postgres=# ALTER USER postgres WITH PASSWORD '123456';
ALTER ROLE
postgres=# CREATE USER  repl WITH PASSWORD '123456' REPLICATION;
CREATE ROLE
postgres=# \du
                                   List of roles
 Role name |                         Attributes                         | Member of
-----------+------------------------------------------------------------+-----------
 postgres  | Superuser, Create role, Create DB, Replication, Bypass RLS | {}
 repl      | Replication                                                | {}
  • 创建使用的测试数据库 pgpool 和 表 pgpool
postgres=# CREATE DATABASE pgpool ;
CREATE DATABASE
postgres=# \c pgpool
You are now connected to database "pgpool" as user "postgres".
pgpool=#
pgpool=# CREATE TABLE pgpool (id serial,age bigint,insertTime timestamp default now());
CREATE TABLE
pgpool=# insert into pgpool (age) values (1);
INSERT 0 1
pgpool=# select * from pgpool;
 id | age |         inserttime
----+-----+----------------------------
  1 |   1 | 2018-07-02 15:07:03.329849
(1 row)
  • 查看数据库是否为主库f 表示为主库
postgres=# select * from pg_is_in_recovery();
 pg_is_in_recovery
-------------------
 f
(1 row)

postgres=#

至此16 服务器主库搭建完毕

搭建17服务器备库 slave1

  • 17服务器创建数据库 data 目录
postgres@CLWDB2:/pgdata> mkdir pgpool/data -p
  • 17服务器使用 pg_basebackup 命令在线创建一个备库,使用该命令请确保 主库已经启动
postgres@CLWDB2:/pgdata/pgpool/data> pg_basebackup -h 10.10.56.16 -p 5532 -U repl -w -Fp -Xs -Pv -R -D /pgdata/pgpool/data/
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2000028 on timeline 1
pg_basebackup: starting background WAL receiver
31133/31133 kB (100%), 1/1 tablespace
pg_basebackup: write-ahead log end point: 0/20000F8
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: base backup completed
  • 参数说明
-h  启动的主库数据库地址               -p  主库数据库端口
-U  流复制用户                        -w  不使用密码验证
-Fp 备份输出正常的数据库目录           -Xs 使用流复制的方式进行复制
-Pv 输出复制过程的详细信息             -R  为备库创建recovery.conf文件
-D  指定创建的备库的数据库目录
  • 17服务器 在 recovery.conf 添加 application_nameslave1,配置如下
postgres@CLWDB2:/pgdata/pgpool/data> vim recovery.conf
standby_mode = 'on'
primary_conninfo = 'application_name=slave1 user=repl passfile=''/home/postgres/.pgpass'' host=10.10.56.16 port=5532 sslmode=disable sslcompression=1 target_session_attrs=any'
  • 17服务器修改postgresql.conf参数如下
max_connections = 200   # 允许的最大数据库连接数
max_wal_senders = 100  # 该参数需要大于主库,否则可能导致备库无法读操作
  • 17服务器分配700 给数据库data目录
postgres@CLWDB2:/pgdata/pgpool/data> chmod 700 /pgdata/pgpool/data/
  • 17服务器启动备库
postgres@CLWDB2:/pgdata/pgpool/data> pg_ctl -D /pgdata/pgpool/data/ start
waiting for server to start....2018-07-02 15:20:19.966 CST [23907] LOG:  listening on IPv4 address "0.0.0.0", port 5532
2018-07-02 15:20:19.966 CST [23907] LOG:  listening on IPv6 address "::", port 5532
2018-07-02 15:20:19.970 CST [23907] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5532"
2018-07-02 15:20:20.007 CST [23907] LOG:  redirecting log output to logging collector process
2018-07-02 15:20:20.007 CST [23907] HINT:  Future log output will appear in directory "log".
 done
server started
  • 17服务器连接数据库pgpool,查看数据是否同步
postgres@CLWDB2:/pgdata/pgpool/data> psql -h 10.10.56.17  -p 5532 -U postgres pgpool
Password for user postgres:
psql (10.3)
Type "help" for help.

pgpool=# \dt
         List of relations
 Schema |  Name  | Type  |  Owner
--------+--------+-------+----------
 public | pgpool | table | postgres
(1 row)

pgpool=# select * from pgpool;
 id | age |         inserttime
----+-----+----------------------------
  1 |   1 | 2018-07-02 15:07:03.329849
(1 row)
pgpool=#
  • 参数说明
-h 备库数据库服务器地址      -p  备库数据库端口
-U 连接数据库的用户     
  • 17服务器查看数据库是否为备库,t 表示为备库
pgpool=# select * from pg_is_in_recovery();
 pg_is_in_recovery
-------------------
 t
(1 row)

pgpool=#
  • 16服务器查看数据库的状态
postgres=# select client_addr,usename,backend_start,application_name,sync_state,sync_priority FROM pg_stat_replication;
 client_addr | usename |         backend_start         | application_name | sync_state | sync_priority
-------------+---------+-------------------------------+------------------+------------+---------------
 10.10.56.17 | repl    | 2018-07-02 15:20:20.066431+08 | slave1           | async      |             0
(1 row)

postgres=#
  • 参数说明
client_addr:备库服务器的地址         usename:使用的流复制用户
backend_start:流复制开始的时间       application_name:备库的名称
sync_state:备库与主库的同步状态      sync_priority:备库与主库变成同步状态的优先级

至此17服务器备库slave1搭建完毕

搭建18服务器同步备库

  • 18服务器创建数据库目录
postgres@CLWDB3:/pgdata/pgpool> mkdir -p /pgdata/pgpool/data/
  • 18服务器使用 pg_basebackup 命令搭建备库
postgres@CLWDB3:/pgdata/pgpool> pg_basebackup -h 10.10.56.16 -p 5532 -U repl -w -Fp -Xs -Pv -R -D /pgdata/pgpool/data/
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/4000028 on timeline 1
pg_basebackup: starting background WAL receiver
31133/31133 kB (100%), 1/1 tablespace
pg_basebackup: write-ahead log end point: 0/40000F8
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: base backup completed
postgres@CLWDB3:/pgdata/pgpool>
  • 18服务器修改 recovery.conf,添加参数application_nameslave2
postgres@CLWDB3:/pgdata/pgpool/data> vim recovery.conf
standby_mode = 'on'
primary_conninfo = 'application_name=slave2 user=repl passfile=''/home/postgres/.pgpass'' host=10.10.56.16 port=5532 sslmode=disable sslcompression=1 target_session_attrs=any'
  • 18服务器修改postgresql.conf参数如下
postgres@CLWDB3:/pgdata/pgpool/data> vim postgresql.conf

max_connections = 200   # 允许的最大数据库连接数
max_wal_senders = 100  # 该参数需要大于主库,否则可能导致备库无法读操作
  • 修改16服务器postgresql.conf 参数如下,修改后需要重新加载16服务器主库
synchronous_standby_names = 'slave1,slave2' # 表示该备库与主库为同步
# synchronous_commit = on  # 该参数主要控制主库提交事务是否等待备库提交完成后才提交
  • 重启16服务器主库使之生效
postgres@CLWDB1:/pgdata/pgpool/data> pg_ctl -D /pgdata/pgpool/data/ reload
waiting for server to shut down.... done
server stopped
waiting for server to start....2018-07-02 17:30:46.299 CST [10614] LOG:  listening on IPv4 address "0.0.0.0", port 5532
2018-07-02 17:30:46.300 CST [10614] LOG:  listening on IPv6 address "::", port 5532
2018-07-02 17:30:46.302 CST [10614] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5532"
2018-07-02 17:30:46.309 CST [10614] LOG:  redirecting log output to logging collector process
2018-07-02 17:30:46.309 CST [10614] HINT:  Future log output will appear in directory "log".
 done
server started
postgres@CLWDB1:/pgdata/pgpool/data>
  • 18服务器分配700 权限给数据库 data 目录
postgres@CLWDB3:/pgdata/pgpool/data> chmod 700 /pgdata/pgpool/data/
  • 18服务器启动数据库服务
postgres@CLWDB3:/pgdata/pgpool/data> pg_ctl -D /pgdata/pgpool/data/ start
waiting for server to start....2018-07-02 17:20:15.858 CST [23245] LOG:  listening on IPv4 address "0.0.0.0", port 5532
2018-07-02 17:20:15.858 CST [23245] LOG:  listening on IPv6 address "::", port 5532
2018-07-02 17:20:15.863 CST [23245] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5532"
2018-07-02 17:20:15.931 CST [23245] LOG:  redirecting log output to logging collector process
2018-07-02 17:20:15.931 CST [23245] HINT:  Future log output will appear in directory "log".
 done
server started
postgres@CLWDB3:/pgdata/pgpool/data>
  • 18服务器连接 pgpool 数据库,查看数据库、数据
postgres@CLWDB3:/pgdata/pgpool/data> psql -h 10.10.56.18 -p 5532 -U postgres  pgpool
Password for user postgres:
psql (10.3)
Type "help" for help.

pgpool=# \dt
         List of relations
 Schema |  Name  | Type  |  Owner
--------+--------+-------+----------
 public | pgpool | table | postgres
(1 row)

pgpool=# select * from pgpool;
 id | age |         inserttime
----+-----+----------------------------
  1 |   1 | 2018-07-02 15:07:03.329849
(1 row)
pgpool=#

发现数据库、表、数据已经成功同步

  • 16服务器查询数据库复制状态
pgpool=# select pid,usename,application_name,client_addr,client_port,state,sync_priority,sync_state,replay_lag from pg_stat_replication;
  pid  | usename | application_name | client_addr | client_port |   state   | sync_priority | sync_state | replay_lag
-------+---------+------------------+-------------+-------------+-----------+---------------+------------+------------
 23889 | repl    | slave2           | 10.10.56.18 |       51178 | streaming |             2 | potential  |
 24083 | repl    | slave1           | 10.10.56.17 |       53130 | streaming |             1 | sync       |
(2 rows)

pgpool=#

发现上述17、18服务器备库与主库的同步状态为sync表示为同步

  • 查看 18服务器状态,t为备库
pgpool=# select * from pg_is_in_recovery();
 pg_is_in_recovery
-------------------
 t
(1 row)
pgpool=#
  • 至此18服务器同步备库搭建完毕

安装 pgpool-Ⅱ

  • 介绍
pgpool 主从模式中,pgpool对从节点没有限制,可以为1-127个,也可以没有从节点
  • 建议使用拥有 root 权限安装,否则可能导致 VIP 无法切换 ,下述安装软件默认用户均为root
16 服务器安装pgpool-Ⅱ
  • 上传解压
CLWDB1:/home/postgres/pgsoft/pgpoolsoft # ls
pgpool-II-3.7.3.tar.gz
CLWDB1:/home/postgres/pgsoft/pgpoolsoft # tar xf pgpool-II-3.7.3.tar.gz
  • 编译检查到指定目录 /opt/pgpool-3
CLWDB1:/home/postgres/pgsoft/pgpoolsoft/pgpool-II-3.7.3 # ./configure --prefix=/opt/pgpool-3

checking for gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
...
....
config.status: creating src/tools/Makefile
config.status: creating src/tools/pgmd5/Makefile
config.status: creating src/tools/pcp/Makefile
config.status: creating src/watchdog/Makefile
config.status: creating src/include/config.h
config.status: executing libtool commands

看到如上表示编译检查成功

  • 编译安装
CLWDB1:/home/postgres/pgsoft/pgpoolsoft/pgpool-II-3.7.3 # make && make install
Making all in src
make[1]: Entering directory '/home/postgres/pgsoft/pgpoolsoft/pgpool-II-3.7.3/src'
Making all in parser
...
...
make[2]: Entering directory '/home/postgres/pgsoft/pgpoolsoft/pgpool-II-3.7.3'
make[2]: Nothing to be done for 'install-exec-am'.
make[2]: Nothing to be done for 'install-data-am'.
make[2]: Leaving directory '/home/postgres/pgsoft/pgpoolsoft/pgpool-II-3.7.3'
make[1]: Leaving directory '/home/postgres/pgsoft/pgpoolsoft/pgpool-II-3.7.3'

看到如上表示编译安装完成

  • 配置全局环境变量 export PATH=/opt/pgpool-3/bin:$PATH,重启
CLWDB1:/opt/pgpool-3 # vim /etc/profile

export PATH=/opt/pgpool-3/bin:$PATH

CLWDB1:/opt/pgpool-3 # source /etc/profile
  • 在主库安装 pgpool_regaclalss,PG8.0 后内部使用,防止处理相同名称的临时表出错
CLWDB1:cd /home/postgres/pgsoft/pgpoolsoft/pgpool-II-3.7.3/src/sql/pgpool-regclass
CLWDB1:/home/postgres/pgsoft/pgpoolsoft/pgpool-II-3.7.3/src/sql/pgpool-regclass # make && make install

sed 's,MODULE_PATHNAME,$libdir/pgpool-regclass,g' pgpool-regclass.sql.in >pgpool-regclass.sql
...
...
/usr/bin/install -c -m 644 .//pgpool_regclass.control '/opt/pgsql-10/share/extension/'
/usr/bin/install -c -m 644 .//pgpool_regclass--1.0.sql pgpool-regclass.sql '/opt/pgsql-10/share/extension/'
/usr/bin/install -c -m 755  pgpool-regclass.so '/opt/pgsql-10/lib/'

看到如上表示编译安装完成

  • 切换到postgres用户,执行 pgpool-regclass.sql sql文件到主库的 template1 模板数据库
CLWDB1:/home/postgres/pgsoft/pgpoolsoft/pgpool-II-3.7.3/src/sql/pgpool-regclass # su - postgres
postgres@CLWDB1:~> cd /home/postgres/pgsoft/pgpoolsoft/pgpool-II-3.7.3/src/sql/pgpool-regclass/
postgres@CLWDB1:~/pgsoft/pgpoolsoft/pgpool-II-3.7.3/src/sql/pgpool-regclass> psql -f pgpool-regclass.sql -h 10.10.56.16 -p 5532 -U postgres template1
Password for user postgres:
CREATE FUNCTION
postgres@CLWDB1:~/pgsoft/pgpoolsoft/pgpool-II-3.7.3/src/sql/pgpool-regclass>

至此 pgpool-regclass 函数安装成功

  • 切换为 root用户,编译安装 insert_lock
CLWDB1:/home/postgres/pgsoft/pgpoolsoft/pgpool-II-3.7.3/src/sql # pwd
/home/postgres/pgsoft/pgpoolsoft/pgpool-II-3.7.3/src/sql

CLWDB1:/home/postgres/pgsoft/pgpoolsoft/pgpool-II-3.7.3/src/sql # make && make install

make -C pgpool-recovery all
make[1]: Entering directory '/home/postgres/pgsoft/pgpoolsoft/pgpool-II-3.7.3/src/sql/pgpool-recovery'
sed 's,MODULE_PATHNAME,$libdir/pgpool-recovery,g' pgpool-recovery.sql.in >pgpool-recovery.sql
...
...
/usr/bin/install -c -m 644 .//pgpool_adm--1.0.sql  '/opt/pgsql-10/share/extension/'
make[1]: Leaving directory '/home/postgres/pgsoft/pgpoolsoft/pgpool-II-3.7.3/src/sql/pgpool_adm'

上述表示编译安装完成

  • 主库 template1 模板库建立 insert_lock
CLWDB1:/home/postgres/pgsoft/pgpoolsoft/pgpool-II-3.7.3/src/sql # su - postgres
postgres@CLWDB1:~/pgsoft/pgpoolsoft/pgpool-II-3.7.3/src/sql> cd /home/postgres/pgsoft/pgpoolsoft/pgpool-II-3.7.3/src/sql

postgres@CLWDB1:~/pgsoft/pgpoolsoft/pgpool-II-3.7.3/src/sql> psql -f insert_lock.sql -h 10.10.56.16 -p 5532 -U postgres template1
Password for user postgres:
psql:insert_lock.sql:3: ERROR:  schema "pgpool_catalog" does not exist
CREATE SCHEMA
CREATE TABLE
INSERT 0 1
GRANT
GRANT
GRANT
GRANT
postgres@CLWDB1:~/pgsoft/pgpoolsoft/pgpool-II-3.7.3/src/sql>

至此 insert_lock 表建立成功,该表主要用与解决 pgpool-ⅡVACUUM 表锁的互斥问题

  • 安装C语言函数 pgpool-recovery,用于在线恢复节点
CLWDB1:cd /home/postgres/pgsoft/pgpoolsoft/pgpool-II-3.7.3/src/sql/pgpool-recovery
CLWDB1:/home/postgres/pgsoft/pgpoolsoft/pgpool-II-3.7.3/src/sql/pgpool-recovery # make && make install

make: Nothing to be done for 'all'.
/usr/bin/mkdir -p '/opt/pgsql-10/share/extension'
/usr/bin/mkdir -p '/opt/pgsql-10/share/extension'
/usr/bin/mkdir -p '/opt/pgsql-10/lib'
/usr/bin/install -c -m 644 .//pgpool_recovery.control '/opt/pgsql-10/share/extension/'
/usr/bin/install -c -m 644 .//pgpool_recovery--1.1.sql pgpool-recovery.sql '/opt/pgsql-10/share/extension/'
/usr/bin/install -c -m 755  pgpool-recovery.so '/opt/pgsql-10/lib/'

看到上述表示编译安装完成

  • 在主库 template1 模板安装 pgpool-recovery 函数
postgres@CLWDB1:~/pgsoft/pgpoolsoft/pgpool-II-3.7.3/src/sql/pgpool-recovery> psql -f pgpool-recovery.sql -h 10.10.56.16 -p 5532 -U postgres template1
Password for user postgres:
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
postgres@CLWDB1:~/pgsoft/pgpoolsoft/pgpool-II-3.7.3/src/sql/pgpool-recovery>

至此该扩展函数安装成功

  • 连接主库,创建扩展函数(该步骤非必须,上述安装成功内部已经安装了扩展,下述只为方便查看)
postgres=# create extension pgpool_regclass;
CREATE EXTENSION
postgres=#  CREATE EXTENSION pgpool_recovery;
CREATE EXTENSION
postgres=# select * from pg_extension;
     extname     | extowner | extnamespace | extrelocatable | extversion | extconfig | extcondition
-----------------+----------+--------------+----------------+------------+-----------+--------------
 plpgsql         |       10 |           11 | f              | 1.0        |           |
 pgpool_regclass |       10 |         2200 | t              | 1.0        |           |
 pgpool_recovery |       10 |         2200 | t              | 1.1        |           |
(3 rows)

postgres=#

17、18服务器会通过流复制自动安装扩展函数,我们只需要在17、18安装pgpool-Ⅱ即可

17服务器安装 pgpool-Ⅱ

  • 上传、解压
CLWDB2:/home/postgres/pgsoft/pgpoolsoft # ls
pgpool-II-3.7.3.tar.gz
CLWDB2:/home/postgres/pgsoft/pgpoolsoft # tar xf pgpool-II-3.7.3.tar.gz
CLWDB2:/home/postgres/pgsoft/pgpoolsoft # ls
pgpool-II-3.7.3  pgpool-II-3.7.3.tar.gz
  • 编译检查到指定目录 /opt/pgpool-3
CLWDB2:/home/postgres/pgsoft/pgpoolsoft # cd pgpool-II-3.7.3/
CLWDB2:/home/postgres/pgsoft/pgpoolsoft/pgpool-II-3.7.3 # ./configure --prefix=/opt/pgpool-3

checking for gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
...
...
config.status: creating src/tools/pcp/Makefile
config.status: creating src/watchdog/Makefile
config.status: creating src/include/config.h
config.status: executing libtool commands

上述表示编译检查成功

  • 编译安装
CLWDB2:/home/postgres/pgsoft/pgpoolsoft/pgpool-II-3.7.3 # make && make install

Making all in src
make[1]: Entering directory '/home/postgres/pgsoft/pgpoolsoft/pgpool-II-3.7.3/src'
Making all in parser
...
...
make[2]: Nothing to be done for 'install-data-am'.
make[2]: Leaving directory '/home/postgres/pgsoft/pgpoolsoft/pgpool-II-3.7.3'
make[1]: Leaving directory '/home/postgres/pgsoft/pgpoolsoft/pgpool-II-3.7.3'
  • 连接服务器查看扩展函数
postgres@CLWDB2:~> psql
psql (10.3)
Type "help" for help.

postgres=# select * from pg_extension;
     extname     | extowner | extnamespace | extrelocatable | extversion | extconfig | extcondition
-----------------+----------+--------------+----------------+------------+-----------+--------------
 plpgsql         |       10 |           11 | f              | 1.0        |           |
 pgpool_regclass |       10 |         2200 | t              | 1.0        |           |
 pgpool_recovery |       10 |         2200 | t              | 1.1        |           |
(3 rows)

发现此时扩展函数已经通过流复制复制成功,至此17服务器pgpool-Ⅱ 安装成功

18服务器安装pgpool-Ⅱ

  • 上传解压
CLWDB3:/home/postgres/pgsoft/pgpoolsoft # ls
pgpool-II-3.7.3.tar.gz
CLWDB3:/home/postgres/pgsoft/pgpoolsoft # tar xf pgpool-II-3.7.3.tar.gz
CLWDB3:/home/postgres/pgsoft/pgpoolsoft # ls
pgpool-II-3.7.3  pgpool-II-3.7.3.tar.gz
  • 编译检查到指定目录 /opt/pgpool-3
CLWDB3:/home/postgres/pgsoft/pgpoolsoft # cd pgpool-II-3.7.3/
CLWDB3:/home/postgres/pgsoft/pgpoolsoft/pgpool-II-3.7.3 # ./configure --prefix=/opt/pgpool-3

checking for gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
...
...
config.status: creating src/tools/pgmd5/Makefile
config.status: creating src/tools/pcp/Makefile
config.status: creating src/watchdog/Makefile
config.status: creating src/include/config.h
config.status: executing libtool commands

上述表示编译检查成功

  • 编译安装
CLWDB3:/home/postgres/pgsoft/pgpoolsoft/pgpool-II-3.7.3 # make && make install

Making all in src
make[1]: Entering directory '/home/postgres/pgsoft/pgpoolsoft/pgpool-II-3.7.3/src'
Making all in parser
...
...
make[2]: Nothing to be done for 'install-exec-am'.
make[2]: Nothing to be done for 'install-data-am'.
make[2]: Leaving directory '/home/postgres/pgsoft/pgpoolsoft/pgpool-II-3.7.3'
make[1]: Leaving directory '/home/postgres/pgsoft/pgpoolsoft/pgpool-II-3.7.3'

至此pgpool-Ⅱ 编译安装成功

  • 登陆数据库,查看扩展函数
CLWDB3:/home/postgres/pgsoft/pgpoolsoft/pgpool-II-3.7.3 # su - postgres
postgres@CLWDB3:~> psql -p 5532
psql (10.3)
Type "help" for help.

postgres=# select * from pg_extension;
     extname     | extowner | extnamespace | extrelocatable | extversion | extconfig | extcondition
-----------------+----------+--------------+----------------+------------+-----------+--------------
 plpgsql         |       10 |           11 | f              | 1.0        |           |
 pgpool_regclass |       10 |         2200 | t              | 1.0        |           |
 pgpool_recovery |       10 |         2200 | t              | 1.1        |           |
(3 rows)

postgres=#

发现扩展函数同步流复制复制成功

服务器配置 pgpool-Ⅱ

16服务器配置pgpool-Ⅱ
  • 复制pgpool-Ⅱ 模板文件
CLWDB1:/opt/pgpool-3/etc # cd /opt/pgpool-3/etc
CLWDB1:/opt/pgpool-3/etc # cp pcp.conf.sample pcp.conf
CLWDB1:/opt/pgpool-3/etc # cp pgpool.conf.sample pgpool.conf
CLWDB1:/opt/pgpool-3/etc # cp pool_hba.conf.sample pool_hba.conf
  • 配置 pgpool.conf,参数配置如下
listen_addresses = '*'  # rtm
port = 9999     # rtm
pcp_listen_addresses = '*' # rtm
pcp_port = 9898         # rtm
backend_hostname0 = '10.10.56.16'  # rtm
backend_port0 = 5532    # rtm
backend_weight0 = 1     # rtm
backend_data_directory0 = '/pgdata/pgpool/data' # rtm
backend_flag0 = 'ALLOW_TO_FAILOVER'             # rtm
backend_hostname1 = '10.10.56.17'       # rtm
backend_port1 = 5532            # rtm
backend_weight1 = 1     # rtm
backend_data_directory1 = '/pgdata/pgpool/data'  # rtm
backend_flag1 = 'ALLOW_TO_FAILOVER'     # rtm
backend_hostname2 = '10.10.56.18'       # rtm
backend_port2 = 5532            # rtm
backend_weight2 = 1     # rtm
backend_data_directory2 = '/pgdata/pgpool/data'  # rtm
backend_flag2 = 'ALLOW_TO_FAILOVER'     # rtm
enable_pool_hba = on            # rtm
pool_passwd = 'pool_passwd'     # rtm
log_destination = 'stderr,syslog'       # rtm
log_line_prefix = '%t: pid %p: '  # rtm  # printf-style string to output at beginning of each log line.
log_connections = on            # rtm
log_hostname = on               # rtm
log_statement = on              # rtm
log_per_node_statement = on     # rtm
pid_file_name = '/opt/pgpool-3/run/pgpool/pgpool.pid' # rtm
logdir = '/opt/pgpool-3/log/pgpool'     # rtm
replication_mode = off          # rtm
load_balance_mode = on                  # rtm
master_slave_mode = on          # rtm
master_slave_sub_mode = 'stream'        # rtm
sr_check_period = 10            # rtm
sr_check_user = 'pgcheck'       # rtm
sr_check_password = '123456'    # rtm
sr_check_database = 'postgres'  # rtm
delay_threshold = i0000000      # rtm
health_check_period = 20          # rtm
health_check_user = 'repl'           # rtm
health_check_password = '123456'        # rtm
health_check_database = 'pgpool'        # rtm
health_check_max_retries = 3    # rtm
health_check_retry_delay = 3    # rtm
failover_command = '/opt/pgpool-3/script/failover.sh 10.10.56.16'   # rtm
fail_over_on_backend_error = off        # rtm
recovery_user = 'pgcheck'       # rtm
recovery_password = '123456'    # rtm
recovery_1st_stage_command = '/opt/pgpool-3/script/restore_1st.sh'      # rtm
recovery_2nd_stage_command = '/opt/pgpool-3/script/restore_2st.sh'      # rtm
client_idle_limit_in_recovery = 20      # rtm
use_watchdog = on               # rtm
wd_hostname = '10.10.56.16'             # rtm
wd_port = 9000                          # rtm
wd_priority = 1                         # rtm
delegate_IP = '10.10.56.87'             # rtm
if_up_cmd = 'ip addr add $_IP_$/24 dev eth0 label eth0:0'       # rtm
if_down_cmd = 'ip addr del $_IP_$/24 dev eth0'                  # rtm
wd_lifecheck_method = 'heartbeat'               # rtm
wd_interval = 15                        # rtm
wd_heartbeat_port = 9694                # rtm
wd_heartbeat_keepalive = 10             # rtm
heartbeat_destination0 = '10.10.56.17'  # rtm
heartbeat_destination_port0 = 9694      # rtm
heartbeat_device0 = 'eth0'               # rtm
heartbeat_destination1 = '10.10.56.18'   # rtm
heartbeat_destination_port1 = 9694       # rtm
heartbeat_device1 = 'eth0'       # rtm
wd_life_point = 3                # rtm
wd_lifecheck_query = 'SELECT 1'  # rtm
wd_lifecheck_dbname = 'template1'        # rtm
wd_lifecheck_user = 'pgcheck'            # rtm
wd_lifecheck_password = '123456'         # rtm
other_pgpool_hostname0 = '10.10.56.17'   # rtm
other_pgpool_port0 = 9999                # rtm
other_wd_port0 = 9000            # rtm
other_pgpool_hostname1 = '10.10.56.18'           # rtm
other_pgpool_port1 = 9999                # rtm
other_wd_port1 = 9000            # rtm
  • 创建上述配置的 PID目录和日志 目录、script目录
CLWDB1:/opt/pgpool-3 # mkdir -p /opt/pgpool-3/run/pgpool
CLWDB1:/opt/pgpool-3 # mkdir -p /opt/pgpool-3/script 
CLWDB1:/opt/pgpool-3 # mkdir -p /opt/pgpool-3/log/pgpool
  • 配置pgpool-Ⅱ pcp.conf 文件,该文件用于配置pcp命令管理用户认证文件
CLWDB1:/opt/pgpool-3/log/pgpool # pg_md5 123456
e10adc3949ba59abbe56e057f20f883e
CLWDB1:/opt/pgpool-3/etc # vim pcp.conf

# USERID:MD5PASSWD
postgres:e10adc3949ba59abbe56e057f20f883e

上述通过pg_md5对密码123456进行MD5加密,注意pcp.conf 一行表示一个认证信息,不允许有空格

  • 配置pool_passwd文件,默认不存在,可通过以下命令自动生成,该文件配置哪些用户可以访问 pgpool
CLWDB1:/opt/pgpool-3/etc # pg_md5 -p -m -u postgres pool_passwd
password:   #此处密码为123456
CLWDB1:/opt/pgpool-3/etc # tail -3 pool_passwd
postgres:md5a3556571e93b0d20722ba62be61e8c2d
CLWDB1:/opt/pgpool-3/etc #

上述表示为 postsgres 用户,密码:123456 ,生成一个md5加密的密码,写入到 pool_passwd 文件

  • 使用相同命令配置 pg_check用户,密码为 123456
CLWDB1:/opt/pgpool-3/etc # tail -2 pool_passwd
postgres:md5a3556571e93b0d20722ba62be61e8c2d
pgcheck:md5a3556571e93b0d20722ba62be61e8c2d
  • 配置pgpool-Ⅱ 认证配置文件 pool_hba.conf,类似于 PostgreSQL pg_hba.conf 文件
CLWDB1:/opt/pgpool-3/etc # vim pool_hba.conf

local   all         all                               trust
# IPv4 local connections:
host    all         all         127.0.0.1/32          md5 # 此处trust修改为md5
host    all         all         0.0.0.0/0             md5
host    all         all         ::1/128               trust
CLWDB1:/opt/pgpool-3/etc #

上述本地配置trust 改为md5,防止访问本地pgpool-Ⅱ MD5 验证出错

  • 启动 pgpool-Ⅱ
CLWDB1:/opt/pgpool-3 #  pgpool -n -d > pgpool.log 2>&1 &
[1] 23252
  • 查看系统日志
CLWDB1:/opt/pgpool-3 # tail -f /var/log/messages

...
...
2018-07-03T15:30:45.492302+08:00 CLWDB1 pgpool[23252]: [47-1] 2018-07-03 15:30:45: pid 23252: LOG:  pgpool-II successfully started. version 3.7.3 (amefuriboshi)
2018-07-03T15:30:41.467767+08:00 CLWDB1 pgpool[23255]: [33-1] 2018-07-03 15:30:41: pid 23255: DEBUG:  STATE MACHINE INVOKED WITH EVENT = STATE CHANGED Current State = MASTER
...
2018-07-03T15:38:56.488631+08:00 CLWDB1 pgpool[23277]: [71-1] 2018-07-03 15:38:56: pid 23277: DEBUG:  watchdog heartbeat: send heartbeat signal to 10.10.56.17:9694
2018-07-03T15:38:56.489123+08:00 CLWDB1 pgpool[23281]: [71-1] 2018-07-03 15:38:56: pid 23281: DEBUG:  watchdog heartbeat: send heartbeat signal to 10.10.56.18:9694
2018-07-03T15:38:57.022764+08:00 CLWDB1 pgpool[23255]: [192-1] 2018-07-03 15:38:57: pid 23255: DEBUG:  STATE MACHINE INVOKED WITH EVENT = TIMEOUT Current State = MASTER
2018-07-03T15:39:02.028390+08:00 CLWDB1 pgpool[23255]: [193-1] 2018-07-03 15:39:02: pid 23255: DEBUG:  STATE MACHINE INVOKED WITH EVENT = COMMAND FINISHED Current State = MASTER
2018-07-03T15:39:02.028938+08:00 CLWDB1 pgpool[23255]: [194-1] 2018-07-03 15:39:02: pid 23255: DEBUG:  I am the cluster leader node command finished with status:[COMMAND TIMEED OUT] which is success
2018-07-03T15:39:02.029442+08:00 CLWDB1 pgpool[23255]: [194-2] 2018-07-03 15:39:02: pid 23255: DETAIL:  The command was sent to 0 nodes and 0 nodes replied to it

因为此时17、18服务器没有配置pgpool,所以无法进行健康检测

  • 停止pgpool服务
CLWDB1:/opt/pgpool-3 # pgpool -m fast stop
2018-07-03 15:40:12: pid 23452: LOG:  stop request sent to pgpool. waiting for termination...
.done.
[1]+  Done                    pgpool -n -d > pgpool.log 2>&1
CLWDB1:/opt/pgpool-3 #

配置 17服务器的pgpool-Ⅱ

  • 17服务器pgpool配置与16基本相同,我们直接拷贝16服务器的配置文件

  • 在16服务器上传配置文件到17服务器

CLWDB1:/opt/pgpool-3/etc # ls
pcp.conf         pgpool.conf         pgpool.conf.sample-logical       pgpool.conf.sample-replication  pool_hba.conf         pool_passwd
pcp.conf.sample  pgpool.conf.sample  pgpool.conf.sample-master-slave  pgpool.conf.sample-stream       pool_hba.conf.sample
CLWDB1:/opt/pgpool-3/etc # scp pcp.conf pgpool.conf pool_passwd pool_hba.conf [email protected]:/opt/pgpool-3/etc
Password:       # 此处密码为super用户的密码
pcp.conf                                                                                                                     100%  903     0.9KB/s   00:00
pgpool.conf                                                                                                                  100%   36KB  35.9KB/s   00:00
pool_passwd                                                                                                                  100%   89     0.1KB/s   00:00
pool_hba.conf                                                                                                                100% 3317     3.2KB/s   00:00
CLWDB1:/opt/pgpool-3/etc #
  • 在17服务器使用super用户登陆分配权限给root
CLWDB2:/opt/pgpool/etc # su - super
CLWDB2:/opt/pgpool-3/etc # chown root.root /opt/ -R
  • 在复制过来的配置文件基础上,修改 pgpool.conf 文件参数如下
CLWDB2:/opt/pgpool-3/etc # vim pgpool.conf


wd_hostname = '10.10.56.17'             # rtm2 本地ip地址
wd_port = 9000                          # rtm2
wd_priority = 1                         # rtm2
heartbeat_destination0 = '10.10.56.16'  # rtm2 对端ip地址
heartbeat_destination_port0 = 9694      # rtm2
heartbeat_device0 = 'eth0'               # rtm2
other_pgpool_hostname0 = '10.10.56.16'   # rtm2 对端ip地址
other_pgpool_port0 = 9999                # rtm2
other_wd_port0 = 9000            # rtm2
  • 创建配置文件中的 PID目录和日志 目录、script目录
CLWDB2:/opt/pgpool-3 # cd /opt/pgpool-3/
CLWDB2:/opt/pgpool-3 # mkdir -p /opt/pgpool-3/run/pgpool
CLWDB2:/opt/pgpool-3 # mkdir -p /opt/pgpool-3/log/pgpool
CLWDB2:/opt/pgpool-3 # mkdir -p /opt/pgpool-3/script
CLWDB2:/opt/pgpool-3 # ls
bin  etc  include  lib  log  run  script  share

至此17服务器pgpool-Ⅱ 配置完毕

18服务器安装pgpool-Ⅱ
  • 18服务器pgpool配置文件与16服务器基本相同,我们copy 16服务器的配置文件

  • 在16服务器上传pgpool配置文件到18服务器

CLWDB1:/opt/pgpool-3/etc # scp pcp.conf pgpool.conf pool_passwd pool_hba.conf [email protected]:/opt/pgpool-3/etc
Password:
pcp.conf                                                                                                                     100%  903     0.9KB/s   00:00
pgpool.conf                                                                                                                  100%   36KB  35.9KB/s   00:00
pool_passwd                                                                                                                  100%   89     0.1KB/s   00:00
pool_hba.conf                                                                                                                100% 3317     3.2KB/s   00:00
CLWDB1:/opt/pgpool-3/etc #
  • 18服务器,在复制过来的配置文件 pgpool.conf修改如下参数
wd_hostname = '10.10.56.18'             # rtm3
wd_port = 9000                          # rtm3
wd_priority = 1                         # rtm3
heartbeat_destination0 = '10.10.56.17'  # rtm3
heartbeat_destination_port0 = 9694      # rtm3
heartbeat_device0 = 'eth0'               # rtm3
heartbeat_destination1 = '10.10.56.16'   # rtm3
heartbeat_destination_port1 = 9694       # rtm3
heartbeat_device1 = 'eth0'       # rtm3
other_pgpool_hostname1 = '10.10.56.16'           # rtm3
other_pgpool_port1 = 9999                # rtm3
other_wd_port1 = 9000            # rtm3
  • 创建配置文件中的 PID目录和日志 目录、script目录
CLWDB3:/opt/pgpool-3 # cd /opt/pgpool-3/
CLWDB3:/opt/pgpool-3 # mkdir -p /opt/pgpool-3/run/pgpool
CLWDB3:/opt/pgpool-3 # mkdir -p /opt/pgpool-3/log/pgpool
CLWDB3:/opt/pgpool-3 # mkdir -p /opt/pgpool-3/script
CLWDB3:/opt/pgpool-3 # ls
bin  etc  include  lib  log  run  script  share
CLWDB3:/opt/pgpool-3 #

至此18服务器pgpool配置完毕

  • 注意,以上 pgppool.conf中的网卡请确认为自己对应的网卡名称,此处为 eth0

  • 分别启动16、17、18服务器的pgpool-Ⅱ 服务

CLWDB2:/opt/pgpool-3 # pgpool -n -d > pgpool.log 2>&1 &
  • 查看16服务器日志
...
...
2018-07-05 10:20:33: pid 9528: DEBUG:  received heartbeat signal from "10.10.56.18(10.10.56.18):9999" node:10.10.56.18:9999 Linux CLWDB3
2018-07-05 10:20:34: pid 9529: DEBUG:  watchdog heartbeat: send heartbeat signal to 10.10.56.17:9694
2018-07-05 10:20:34: pid 9531: DEBUG:  watchdog heartbeat: send heartbeat signal to 10.10.56.18:9694
2018-07-05 10:20:35: pid 9528: DEBUG:  received heartbeat signal from "10.10.56.18(10.10.56.18):9999" node:10.10.56.18:9999 Linux CLWDB3
2018-07-05 10:20:37: pid 9528: DEBUG:  received heartbeat signal from "10.10.56.18(10.10.56.18):9999" node:10.10.56.18:9999 Linux CLWDB3
2018-07-05 10:20:39: pid 9528: DEBUG:  received heartbeat signal from "10.10.56.18(10.10.56.18):9999" node:10.10.56.18:9999 Linux CLWDB3
2018-07-05 10:20:41: pid 9526: DEBUG:  received watchdog packet type:M
2018-07-05 10:20:41: pid 9526: DEBUG:  reading packet type M of length 118
2018-07-05 10:20:41: pid 9526: DEBUG:  STATE MACHINE INVOKED WITH EVENT = PACKET RECEIVED Current State = STANDBY
2018-07-05 10:20:41: pid 9526: DEBUG:  received packet, watchdog node:[10.10.56.18:9999 Linux CLWDB3] command id:[37] type:[IAM COORDINATOR] state:[STANDBY]
2018-07-05 10:20:41: pid 9526: DEBUG:  sending packet, watchdog node:[10.10.56.18:9999 Linux CLWDB3] command id:[37] type:[NODE INFO] state:[STANDBY]
2018-07-05 10:20:41: pid 9526: DEBUG:  sending watchdog packet to socket:9, type:[I], command ID:37, data Length:253
2018-07-05 10:20:41: pid 9528: DEBUG:  received heartbeat signal from "10.10.56.18(10.10.56.18):9999" node:10.10.56.18:9999 Linux CLWDB3
2018-07-05 10:20:43: pid 9528: DEBUG:  received heartbeat signal from "10.10.56.17(10.10.56.17):9999" node:Not_Set
...
...

可以看到STATE MACHINE INVOKED WITH EVENT = PACKET RECEIVED Current State = STANDBY表示该节点为pgpool备节点,看到watchdog heartbeat 分别向17、18服务器发送数据包,received watchdog packet也接收到了对应下响应,则表示该集群正常

  • 查看17服务器日志
...
2018-07-05 10:20:03: pid 11321: DEBUG:  watchdog heartbeat: send heartbeat signal to 10.10.56.16:9694
2018-07-05 10:20:03: pid 11323: DEBUG:  watchdog heartbeat: send heartbeat signal to 10.10.56.18:9694
2018-07-05 10:20:03: pid 11320: DEBUG:  received heartbeat signal from "10.10.56.18(10.10.56.18):9999" node:10.10.56.18:9999 Linux CLWDB3
2018-07-05 10:20:04: pid 11320: DEBUG:  received heartbeat signal from "10.10.56.16(10.10.56.16):9999" node:10.10.56.16:9999 Linux CLWDB1
2018-07-05 10:20:05: pid 11320: DEBUG:  received heartbeat signal from "10.10.56.18(10.10.56.18):9999" node:10.10.56.18:9999 Linux CLWDB3
^L2018-07-05 10:20:07: pid 11320: DEBUG:  received heartbeat signal from "10.10.56.18(10.10.56.18):9999" node:10.10.56.18:9999 Linux CLWDB3
2018-07-05 10:20:09: pid 11320: DEBUG:  received heartbeat signal from "10.10.56.18(10.10.56.18):9999" node:10.10.56.18:9999 Linux CLWDB3
2018-07-05 10:20:11: pid 11318: DEBUG:  received watchdog packet type:M
2018-07-05 10:20:11: pid 11318: DEBUG:  reading packet type M of length 118
2018-07-05 10:20:11: pid 11318: DEBUG:  STATE MACHINE INVOKED WITH EVENT = PACKET RECEIVED Current State = STANDBY
2018-07-05 10:20:11: pid 11318: DEBUG:  received packet, watchdog node:[10.10.56.18:9999 Linux CLWDB3] command id:[34] type:[IAM COORDINATOR] state:[STANDBY]
2018-07-05 10:20:11: pid 11318: DEBUG:  sending packet, watchdog node:[10.10.56.18:9999 Linux CLWDB3] command id:[34] type:[NODE INFO] state:[STANDBY]
2018-07-05 10:20:11: pid 11318: DEBUG:  sending watchdog packet to socket:9, type:[I], command ID:34, data Length:251
2018-07-05 10:20:11: pid 11320: DEBUG:  received heartbeat signal from "10.10.56.18(10.10.56.18):9999" node:10.10.56.18:9999 Linux CLWDB3
...

可以看到STATE MACHINE INVOKED WITH EVENT = PACKET RECEIVED Current State = STANDBY表示该节点为pgpool备节点,看到watchdog heartbeat 分别向16、18服务器发送数据包,received watchdog packet也接收到了对应下响应,则表示该集群正常

  • 查看18服务器日志
2018-07-05 10:19:09: pid 11537: DEBUG:  watchdog heartbeat: send heartbeat signal to 10.10.56.17:9694
2018-07-05 10:19:09: pid 11533: DEBUG:  watchdog heartbeat: send heartbeat signal to 10.10.56.16:9694
2018-07-05 10:19:11: pid 11527: DEBUG:  STATE MACHINE INVOKED WITH EVENT = TIMEOUT Current State = MASTER
2018-07-05 10:19:11: pid 11527: DEBUG:  sending packet, watchdog node:[10.10.56.16:9999 Linux CLWDB1] command id:[28] type:[IAM COORDINATOR] state:[MASTER]
2018-07-05 10:19:11: pid 11527: DEBUG:  sending watchdog packet to socket:9, type:[M], command ID:28, data Length:118
2018-07-05 10:19:11: pid 11527: DEBUG:  sending packet, watchdog node:[10.10.56.17:9999 Linux CLWDB2] command id:[28] type:[IAM COORDINATOR] state:[MASTER]
2018-07-05 10:19:11: pid 11527: DEBUG:  sending watchdog packet to socket:11, type:[M], command ID:28, data Length:118
2018-07-05 10:19:11: pid 11527: DEBUG:  received watchdog packet type:I
2018-07-05 10:19:11: pid 11527: DEBUG:  reading packet type I of length 251
2018-07-05 10:19:11: pid 11527: DEBUG:  STATE MACHINE INVOKED WITH EVENT = PACKET RECEIVED Current State = MASTER
2018-07-05 10:19:11: pid 11527: DEBUG:  received packet, watchdog node:[10.10.56.16:9999 Linux CLWDB1] command id:[28] type:[NODE INFO] state:[MASTER]
2018-07-05 10:19:11: pid 11527: DEBUG:  packet I with command ID 28 is reply to the command M
2018-07-05 10:19:11: pid 11527: DEBUG:  Watchdog node "10.10.56.16:9999 Linux CLWDB1" has replied for command id 28
2018-07-05 10:19:11: pid 11527: DEBUG:  received watchdog packet type:I
2018-07-05 10:19:11: pid 11527: DEBUG:  reading packet type I of length 251
2018-07-05 10:19:11: pid 11527: DEBUG:  STATE MACHINE INVOKED WITH EVENT = PACKET RECEIVED Current State = MASTER
2018-07-05 10:19:11: pid 11527: DEBUG:  received packet, watchdog node:[10.10.56.17:9999 Linux CLWDB2] command id:[28] type:[NODE INFO] state:[MASTER]
2018-07-05 10:19:11: pid 11527: DEBUG:  packet I with command ID 28 is reply to the command M
2018-07-05 10:19:11: pid 11527: DEBUG:  Watchdog node "10.10.56.17:9999 Linux CLWDB2" has replied for command id 28
2018-07-05 10:19:11: pid 11527: DEBUG:  command I with command id 28 is finished with COMMAND_FINISHED_ALL_REPLIED
2018-07-05 10:19:11: pid 11527: DEBUG:  STATE MACHINE INVOKED WITH EVENT = COMMAND FINISHED Current State = MASTER
2018-07-05 10:19:11: pid 11527: DEBUG:  I am the cluster leader node command finished with status:[ALL NODES REPLIED]
2018-07-05 10:19:11: pid 11527: DETAIL:  The command was sent to 2 nodes and 2 nodes replied to it

上述日志可以看到id:[28] type:[IAM COORDINATOR] state:[MASTER] 表示该pgpool节点为MASTERwatchdog heartbeat分别向16、17发送心跳检测,也接受到了响应,表示集群状态健康

  • 注意上述的 pgpool 由于配置节点优先级为默认,节点角色取决于启动的先后顺序

  • 查看VIP 绑定

CLWDB3:/opt/pgpool-3 # ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether fa:16:3e:8b:7c:28 brd ff:ff:ff:ff:ff:ff
    inet 10.10.56.18/24 brd 10.10.56.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet 10.10.56.87/24 scope global secondary eth0:0
       valid_lft forever preferred_lft forever
    inet6 fe80::f816:3eff:fe8b:7c28/64 scope link
       valid_lft forever preferred_lft forever

pgpool 状态查看

  • 连接pgpool, 查看数据库状态
CLWDB1:~ # psql -h 10.10.56.87 -p 9999 -U postgres pgpool
Password for user postgres:
psql (10.3)
Type "help" for help.

pgpool=# show pool_nodes;
LOG:  statement: show pool_nodes;
 node_id |  hostname   | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay
---------+-------------+------+--------+-----------+---------+------------+-------------------+-------------------
 0       | 10.10.56.16 | 5532 | up     | 0.333333  | primary | 0          | true              | 0
 1       | 10.10.56.17 | 5532 | up     | 0.333333  | standby | 0          | false             | 0
 2       | 10.10.56.18 | 5532 | unused | 0.333333  | standby | 0          | false             | 0
(3 rows)

pgpool=#
  • 参数介绍
-h VIP地址                            -p pgpool 端口  
-U 数据库用户,必须在postgres数据库和pgpool的认证文件中存在   pgpool 连接的数据库

node_id:pgpool服务器的id编号          hostname :服务器地址
port : 数据库端口号                    staus:数据库的状态,up为正在运行,unused 已启动,但没有连接
role: 数据库的角色                     select_cnt: 查询语句的数量统计
load_balance_node:是否开启负载均衡     replication_delay:主备数据库之间的复制延迟
  • 查看 pgpool进程 信息
pgpool=# show pool_processes;
LOG:  statement: show pool_processes;
 pool_pid |     start_time      | database | username |     create_time     | pool_counter
----------+---------------------+----------+----------+---------------------+--------------
 11530    | 2018-07-05 10:16:48 |          |          |                     |
 11550    | 2018-07-05 10:16:48 | pgpool   | postgres | 2018-07-05 10:45:41 | 1
 11551    | 2018-07-05 10:16:48 |          |          |                     |
 11552    | 2018-07-05 10:16:48 |          |          |                     |
 11557    | 2018-07-05 10:16:48 | pgpool   | postgres | 2018-07-05 11:06:22 | 1
 11558    | 2018-07-05 10:16:48 |          |          |                     |
 11562    | 2018-07-05 10:16:48 | pgpool   | postgres | 2018-07-05 11:03:16 | 1
 11563    | 2018-07-05 10:16:48 |          |          |                     |
(32 rows)
pgpool=#
  • 查看pgpool 配置信息
pgpool=# show pool_status;
LOG:  statement: show pool_status;
pgpool=#
                item                 |                    value                    |                                   description                            
--------------------------------------+---------------------------------------------+---------------------------------------------------------------------------------
 listen_addresses                     | *                                           | host name(s) or IP address(es) to listen on
 port                                 | 9999                                        | pgpool accepting port number
 socket_dir                           | /tmp                                        | pgpool socket directory
 pcp_listen_addresses                 | *                                           | host name(s) or IP address(es) for pcp process to listen on
 pcp_port                             | 9898                                        | PCP port # to bind
 pcp_socket_dir                       | /tmp                                        | PCP socket directory
 enable_pool_hba                      | 1                                           | if true, use pool_hba.conf for client authentication
 pool_passwd                          | pool_passwd                                 | file name of pool_passwd for md5 authentication
 authentication_timeout               | 60                                          | maximum time in seconds to complete client authentication
 ssl                                  | 0                                           |
 ...
  • 查看 pgpool 连接池
pgpool=# show pool_pools;
LOG:  statement: show pool_pools;
   | 0
 11561    | 2018-07-05 10:16:48 | 2       | 0          |          |          |                     | 0            | 0            | 0            | 0
   | 0
 11561    | 2018-07-05 10:16:48 | 2       | 1          |          |          |                     | 0            | 0            | 0            | 0

 11561    | 2018-07-05 10:16:48 | 3       | 2          |          |          |                     | 0            | 0            | 0            | 0
   | 0
 11562    | 2018-07-05 10:16:48 | 0       | 0          | pgpool   | postgres | 2018-07-05 11:03:16 | 3            | 0            | 1            | 11085
   | 1
 11562    | 2018-07-05 10:16:48 | 0       | 1          | pgpool   | postgres | 2018-07-05 11:03:16 | 3            | 0            | 1            | 12724
   | 1
 11562    | 2018-07-05 10:16:48 | 0       | 2          |          |          |                     | 0            | 0            | 0            | 0
  • 查看

pcp 配置管理 pgpool

  • 背景
pcp 是用来管理 pgpool 的linux命令 ,所有参数 pgpool-Ⅱ 3.5之后 都发生了变化,通过pcp.conf 来管理认证连接,管理哪些用户可以通过 pcp 连接管理pgpool-Ⅱ 
  • pcp 参数说明
-h 为pgpool服务器安装地址,或者VIP 地址 
-d 表示为debug 模式 
-U 为pcp 用户,该用户为 `pcp.conf` 配置文件配置的用户,与数据库用户无关,推荐全部使用统 一用户,便于管理 
-v 表示输出详细信
  • 查看 pgpool集群状态
CLWDB3:/opt/pgpool-3/etc # pcp_watchdog_info -h 10.10.56.87 -p 9898 -U pgcheck -v
Password:
Watchdog Cluster Information
Total Nodes          : 3
Remote Nodes         : 2
Quorum state         : QUORUM EXIST
Alive Remote Nodes   : 2
VIP up on local node : YES
Master Node Name     : 10.10.56.18:9999 Linux CLWDB3
Master Host Name     : 10.10.56.18

Watchdog Node Information
Node Name      : 10.10.56.18:9999 Linux CLWDB3
Host Name      : 10.10.56.18
Delegate IP    : 10.10.56.87
Pgpool port    : 9999
Watchdog port  : 9000
Node priority  : 1
Status         : 4
Status Name    : MASTER

Node Name      : 10.10.56.16:9999 Linux CLWDB1
Host Name      : 10.10.56.16
Delegate IP    : 10.10.56.87
Pgpool port    : 9999
Watchdog port  : 9000
Node priority  : 1
Status         : 7
Status Name    : STANDBY

Node Name      : 10.10.56.17:9999 Linux CLWDB2
Host Name      : 10.10.56.17
Delegate IP    : 10.10.56.87
Pgpool port    : 9999
Watchdog port  : 9000
Node priority  : 1
Status         : 7
Status Name    : STANDBY

CLWDB3:/opt/pgpool-3/etc #

上述可知18服务器为 pgpool Master, 16、17服务器pgpool为 standby,VIP up on local node : YES Master Node Name:10.10.56.18 表示VIP 绑定在18服务器上

  • 查看节点数量
CLWDB3:/opt/pgpool-3/etc # pcp_node_count  -h 10.10.56.87 -p 9898 -U pgcheck -v
Password:
Node Count
____________
 3
CLWDB3:/opt/pgpool-3/etc #
  • 查看pgpool集群配置
CLWDB3:/opt/pgpool-3/etc # pcp_pool_status -h 10.10.56.87 -p 9898 -U pgcheck -v
Password:
Name [  0]:     listen_addresses
Value:          *
Description:    host name(s) or IP address(es) to listen on

Name [  1]:     port
Value:          9999
Description:    pgpool accepting port number

Name [  2]:     socket_dir
Value:          /tmp
Description:    pgpool socket directory
...
...
  • 查看pgpool processor进程信息
CLWDB3:/opt/pgpool-3/etc # pcp_proc_count -h 10.10.56.87 -p 9898 -U pgcheck -v
Password:
No       |       PID
_____________________
0        |       11530
1        |       11532
2        |       11534
3        |       11536
4        |       11538
5        |       11539
6        |       11540
7        |       11541
8        |       11542
9        |       11543
10       |       11544
11       |       11545
12       |       11546
13       |       11547
14       |       11548
15       |       11549
16       |       11550
17       |       11551
18       |       11552
19       |       11553
20       |       11554
21       |       11555
22       |       11556
23       |       11873
24       |       11558
25       |       11559
26       |       11560
27       |       11561
28       |       11562
29       |       11563
30       |       11564
31       |       11565

Total Processes:32
CLWDB3:/opt/pgpool-3/etc #

pgpool 脚本配置

  • 创建16服务器自动切换 failover.sh脚本,分配 755 权限
CLWDB1:/opt/pgpool-3/script # vim failover.sh

#!/bin/bash  -x

falling_node=$1          # %d
old_primary=10.10.56.16           # %P
new_primary=10.10.56.17           # %H
pgdata=/pgdata/pgpool/data2                # %R

pghome=/opt/pgsql-10
log=/opt/pgpool-3/log/pgpool/failover.log

date >> $log
echo "failed_node_id=$falling_node new_primary=$new_primary" >> $log

if [ $falling_node = $old_primary ]; then
    if [ $UID -eq 0 ]
    then
        su postgres -c "ssh -T postgres@$new_primary $pghome/bin/pg_ctl promote -D $pgdata"
    else
        ssh -T postgres@$new_primary $pghome/bin/pg_ctl promote -D $pgdata
    fi
    exit 0;
fi;
exit 0;

CLWDB1:/opt/pgpool-3/script # vim failover.sh
CLWDB1:/opt/pgpool-3/script # chmod 755 failover.sh
  • 创建 17 、18 服务器 failover.sh,分配 755权限,脚本内容同上,主要修改 new_primary
17 服务器 new_primary = 10.10.56.17
18 服务器 new_primary = 10.10.56.18
  • 分别在16、17、18 创建在线恢复第一阶段脚本 restore_1st.sh,分配755执行权限
CLWDB1:/opt/pgpool-3/script # vim restore_1st.sh
CLWDB1:/opt/pgpool-3/script # chmod 755 restore_1st.sh
CLWDB1:/opt/pgpool-3/script # ll
total 0
-rwxr-xr-x 1 root root 0 Jul  5 15:04 restore_1st.sh
  • 分别在16、17、18创建在线第二阶段恢复脚本restore_2st.sh,分配755执行权限
CLWDB1:/opt/pgpool-3/script # vim restore_2st.sh
CLWDB1:/opt/pgpool-3/script # chmod 755 restore_2st.sh
CLWDB1:/opt/pgpool-3/script # ll
total 0
-rwxr-xr-x 1 root root 0 Jul  5 15:04 restore_1st.sh
-rwxr-xr-x 1 root root 0 Jul  5 15:05 restore_2st.sh
  • 恢复脚本没测试,后续补上
  • 16、17、18服务器在/home/postgres下创建.pgpass 用于切换和恢复时执行命令需要输入密码
CLWDB1:~ # vim .pgpass

10.10.56.16:5532:replication:repl:123456
10.10.56.17:5532:replication:repl:123456
10.10.56.18:5532:replication:repl:123456

CLWDB1:~ # chmod 700 .pgpass
CLWDB1:~ #

配置16、17、18服务器互信,参考如下配置

服务器信任
  • 在A服务器生成密钥和公钥,进入.ssh目录,执行 ssh-keygen -t rsa,一路回车即可
[root@rtm1 ~]# cd ~/.ssh/
[root@rtm1 .ssh]# ls
known_hosts
[root@rtm1 .ssh]# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:fmPDpqBGqNl7XBb9qh7K+pb0nNL46B352uNp7LjEvNw root@rtm1
The key's randomart image is:
+---[RSA 2048]----+
|                 |
|                 |
|       .         |
|      . .        |
|   .   .S.       |
|  . oooo ..      |
| + + OOo..B      |
|o ..OB=X=* o     |
|  oOB+@BE.       |
+----[SHA256]-----+
[root@rtm1 .ssh]#
[root@rtm1 .ssh]# ls
id_rsa  id_rsa.pub  known_hosts
[root@rtm1 .ssh]#

id_rsa.pub 为公钥,用于加密,id_rsa 为私钥用于解密

  • 在A服务器添加密钥,Identity added 表示添加成功
[root@rtm1 .ssh]# ssh-agent bash
[root@rtm1 .ssh]# ssh-add id_rsa
Identity added: id_rsa (id_rsa)
[root@rtm1 .ssh]# ls
id_rsa  id_rsa.pub  known_hosts
  • 把A服务器公钥上传到B服务器的 ~/.ssh/ 目录下
[root@rtm1 .ssh]# scp id_rsa.pub [email protected]:~/.ssh/
root@192.168.31.121's password:
id_rsa.pub                                                                                                                    100%  391   502.2KB/s   00:00
[root@rtm1 .ssh]#
  • B服务器添加 A 服务器公钥
[root@rtm2 ~]# cd ~/.ssh/
[root@rtm2 .ssh]# cat id_rsa.pub >> authorized_keys
[root@rtm2 .ssh]# ls
authorized_keys  id_rsa.pub  known_hosts
[root@rtm2 .ssh]# cat authorized_keys
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDr8QU4fFqkZUCF9mpYmyhX3DgKF/ue4PqPKiyHCaYMpVa2SjFtDiGH/gEXdSJIjOAp1qpn9UDea2vNIIRxtRex1X9hkDId8zVvlgaByPNoLjZnOeLKncbbldSJYUsjCQ0I3eTdI0REssV8KoH45SNsD39CO7R3Ts+3fRX648C6ITBdU2a6bUckworY6oECR4FhpMMAXSkFbzn2BoRANTVPpLarr59+sXyB2bWSvOubq6UbVhGJd+2tUtNG90yW8JtEbjOjzHNOSS4Cwyfq0Zcm3SDKDe6AjvIHyZFaScY4T6UA2FBXHSrrZqZuO74LXyM4JQAwcmL3pjReAMLUchHP root@rtm1
[root@rtm2 .ssh]#
  • 测试是否 A服务 器信任 B服务器 ,通过在A服务器 ssh 命令连接,若无需输入密码则表示成功。
[root@rtm1 .ssh]# ssh 192.168.31.121
Last login: Sun May 20 15:16:17 2018 from 192.168.31.42
[root@rtm2 ~]#
  • 注意
发现成功连接,表示A信任B成功,此时B服务器上A的公钥文件 `id_rsa.pub` 也可以删除


如果想要B服务器信任A服务器,按照上述操作执行一遍,即可。这样就会彼此信任。
  • 创建 自动切换脚本
CLWDB1:/opt/pgpool-3/script # vim failover.sh
CLWDB1:/opt/pgpool-3/script # chmod 755 failover.sh
CLWDB1:/opt/pgpool-3/script # ll

-rwxr-xr-x 1 root root 0 Jul  5 16:36 failover.sh
  • failover.sh脚本内容如下
CLWDB1:/opt/pgpool-3/script # vim failover.sh

#!/bin/bash  -x

falling_node=$1          # %d
old_primary=10.10.56.16           # %P
new_primary=10.10.56.17           # %H
pgdata=/pgdata/pgpool/data2                # %R

pghome=/opt/pgsql-10
log=/opt/pgpool-3/log/pgpool/failover.log

date >> $log
echo "failed_node_id=$falling_node new_primary=$new_primary" >> $log

if [ $falling_node = $old_primary ]; then
    if [ $UID -eq 0 ]
    then
        su postgres -c "ssh -T postgres@$new_primary $pghome/bin/pg_ctl promote -D $pgdata"
    else
        ssh -T postgres@$new_primary $pghome/bin/pg_ctl promote -D $pgdata
    fi
    exit 0;
fi;
exit 0;

测试 pgpool 集群高可用

  • 查看当前pgpool集群状态
CLWDB3:~ # pcp_watchdog_info -h 10.10.56.87 -p 9898 -U pgcheck
Password:
3 YES 10.10.56.16:9999 Linux CLWDB1 10.10.56.16

10.10.56.16:9999 Linux CLWDB1 10.10.56.16 9999 9000 4 MASTER
10.10.56.17:9999 Linux CLWDB2 10.10.56.17 9999 9000 7 STANDBY
10.10.56.18:9999 Linux CLWDB3 10.10.56.18 9999 9000 7 STANDBY
CLWDB3:~ #

上述可知:16为 pgpool Master, 17、18 服务器为pgpool Standby

  • 查看 postgreSQL 集群状态
pgpool=# show pool_nodes;
 node_id |  hostname   | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay
---------+-------------+------+--------+-----------+---------+------------+-------------------+-------------------
 0       | 10.10.56.16 | 5532 | up     | 0.333333  | primary | 8          | false             | 0
 1       | 10.10.56.17 | 5532 | up     | 0.333333  | standby | 0          | false             | 0
 2       | 10.10.56.18 | 5532 | up     | 0.333333  | standby | 0          | true              | 0
(3 rows)

pgpool=#

发现此时数据库集群状态都为 up,表示正常

  • 查看pgpool Master 16服务器 VIP 绑定
CLWDB1:~ # ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether fa:16:3e:93:70:27 brd ff:ff:ff:ff:ff:ff
    inet 10.10.56.16/24 brd 10.10.56.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet 10.10.56.87/24 scope global secondary eth0:0
       valid_lft forever preferred_lft forever
    inet6 fe80::f816:3eff:fe93:7027/64 scope link
       valid_lft forever preferred_lft forever
CLWDB1:~ #

上述可知:VIP 绑定在pgpool Master 16服务器

  • kill 掉16服务器 pgpool 服务,模拟pgpool服务器挂掉
CLWDB1:~ # killall pgpool
CLWDB1:~ # ps -ef| grep pgpool
postgres 10614     1  0 Jul02 ?        00:00:32 /opt/pgsql-10/bin/postgres -D /pgdata/pgpool/data
root     26306  5040  0 09:45 pts/3    00:00:00 grep --color=auto pgpool
CLWDB1:~ #
  • 查看 pgpool 集群状态
CLWDB2:~ # pcp_watchdog_info -h 10.10.56.87 -p 9898 -U pgcheck
Password:
3 YES 10.10.56.17:9999 Linux CLWDB2 10.10.56.17

10.10.56.17:9999 Linux CLWDB2 10.10.56.17 9999 9000 4 MASTER
10.10.56.16:9999 Linux CLWDB1 10.10.56.16 9999 9000 10 SHUTDOWN
10.10.56.18:9999 Linux CLWDB3 10.10.56.18 9999 9000 7 STANDBY
CLWDB2:~ #

上述可知:16 服务器挂掉后,17选举为 pgpool Master

  • 查看 VIP 绑定
CLWDB2:~ # ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether fa:16:3e:f5:22:50 brd ff:ff:ff:ff:ff:ff
    inet 10.10.56.17/24 brd 10.10.56.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet 10.10.56.87/24 scope global secondary eth0:0
       valid_lft forever preferred_lft forever
    inet6 fe80::f816:3eff:fef5:2250/64 scope link
       valid_lft forever preferred_lft forever
CLWDB2:~ #

发现此时 VIP 从16服务器移除,绑定在17 pgpool Master 服务器上

  • 查看数据库状态
pgpool=# show pool_nodes;
 node_id |  hostname   | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay
---------+-------------+------+--------+-----------+---------+------------+-------------------+-------------------
 0       | 10.10.56.16 | 5532 | up     | 0.333333  | primary | 9          | true              | 0
 1       | 10.10.56.17 | 5532 | up     | 0.333333  | standby | 0          | false             | 0
 2       | 10.10.56.18 | 5532 | up     | 0.333333  | standby | 3          | false             | 0
(3 rows)

pgpool=#

发现数据库状态为 up,可以继续对外提供服务

  • 此时 kill 掉 17pgpool服务器,查看 pgpool状态

CLWDB3:/opt/pgpool-3 # tail -f pgpool.log

2018-07-06 14:29:17: pid 14815: DEBUG:  watchdog life checking by heartbeat
2018-07-06 14:29:17: pid 14815: DETAIL:  checking pgpool 1 (10.10.56.16:9999)
2018-07-06 14:29:17: pid 14815: DEBUG:  watchdog checking if pgpool is alive using heartbeat
2018-07-06 14:29:17: pid 14815: DETAIL:  the last heartbeat from "10.10.56.16:9999" received 203 seconds ago
2018-07-06 14:29:17: pid 14815: DEBUG:  watchdog life checking by heartbeat
2018-07-06 14:29:17: pid 14815: DETAIL:  checking pgpool 2 (10.10.56.17:9999)
2018-07-06 14:29:17: pid 14815: DEBUG:  watchdog checking if pgpool is alive using heartbeat
2018-07-06 14:29:17: pid 14815: DETAIL:  the last heartbeat from "10.10.56.17:9999" received 113 seconds ago
2018-07-06 14:29:18: pid 14817: DEBUG:  watchdog heartbeat: send heartbeat signal to 10.10.56.16:9694
2018-07-06 14:29:18: pid 14819: DEBUG:  watchdog heartbeat: send heartbeat signal to 10.10.56.17:9694
2018-07-06 14:29:20: pid 14817: DEBUG:  watchdog heartbeat: send heartbeat signal to 10.10.56.16:9694
2018-07-06 14:29:20: pid 14819: DEBUG:  watchdog heartbeat: send heartbeat signal to 10.10.56.17:9694
2018-07-06 14:29:22: pid 14817: DEBUG:  watchdog heartbeat: send heartbeat signal to 10.10.56.16:9694
2018-07-06 14:29:22: pid 14819: DEBUG:  watchdog heartbeat: send heartbeat signal to 10.10.56.17:9694
2018-07-06 14:29:24: pid 14817: DEBUG:  watchdog heartbeat: send heartbeat signal to 10.10.56.16:9694
2018-07-06 14:29:24: pid 14819: DEBUG:  watchdog heartbeat: send heartbeat signal to 10.10.56.17:9694
...
...

发现此时pgpool无法检测到其他pgpool 节点,集群失效,即pgpool集群最少为两节点,可以实现高可用

测试pgpool的负载均衡

  • 查看数据库节点 status 状态和 node id
pgpool=# show pool_nodes;
LOG:  statement: show pool_nodes;
 node_id |  hostname   | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay
---------+-------------+------+--------+-----------+---------+------------+-------------------+-------------------
 0       | 10.10.56.16 | 5532 | up     | 0.333333  | primary | 5          | false             | 0
 1       | 10.10.56.17 | 5532 | up     | 0.333333  | standby | 3          | true              | 0
 2       | 10.10.56.18 | 5532 | up     | 0.333333  | standby | 0          | false             | 0
(3 rows)

pgpool=#
  • 连接服务,执行 SELECT SQL 语句
pgpool=# select * from pg2 where id =1;
LOG:  statement: select * from pg2 where id =1;
LOG:  DB node id: 1 backend pid: 762 statement: select * from pg2 where id =1;
 id | age |         inserttime
----+-----+----------------------------
  1 |   1 | 2018-07-03 13:19:31.085511
(1 row)

pgpool=#

发现上述sql被分配到 node id:1 执行,通过节点对应的节点id可知为17服务器

  • 查看 16服务器PG数据库日志
2018-07-06 15:45:38.119 CST,"repl","postgres",5802,"10.10.56.17:46440",5b3f1e22.16aa,1,"idle",2018-07-06 15:45:38 CST,7/31217,0,LOG,00000,"statement: SELECT pg_current_wal_lsn()",,,,,,,,,""
2018-07-06 15:45:41.034 CST,"repl","postgres",5804,"10.10.56.18:42586",5b3f1e25.16ac,1,"idle",2018-07-06 15:45:41 CST,7/31220,0,LOG,00000,"statement: SELECT pg_current_wal_lsn()",,,,,,,,,""
  • 查看 17服务器pg数据库日志
2018-07-06 15:45:38.121 CST,"repl","postgres",915,"10.10.56.17:58760",5b3f1e22.393,1,"idle",2018-07-06 15:45:38 CST,4/14364,0,LOG,00000,"statement: SELECT pg_last_wal_replay_lsn()",,,,,,,,,""
2018-07-06 15:45:39.787 CST,"postgres","pgpool",762,"10.10.56.18:37960",5b3f1d20.2fa,3,"idle",2018-07-06 15:41:20 CST,2/33694,0,LOG,00000,"statement: select * from pg2 where id =1;",,,,,,,,,"psql"
  • 查看 18 服务器PG数据库日志
2018-07-06 15:45:38.119 CST,"repl","postgres",5802,"10.10.56.17:46440",5b3f1e22.16aa,1,"idle",2018-07-06 15:45:38 CST,7/31217,0,LOG,00000,"statement: SELECT pg_current_wal_lsn()",,,,,,,,,""
2018-07-06 15:45:41.034 CST,"repl","postgres",5804,"10.10.56.18:42586",5b3f1e25.16ac,1,"idle",2018-07-06 15:45:41 CST,7/31220,0,LOG,00000,"statement: SELECT pg_current_wal_lsn()",,,,,,,,,""

发现该 SELECT SQL 确实被分配到备库17服务器进行执行的

  • 测试2 ,重新打开一个连接,执行 SELECT SQL
pgpool=# select * from pg2 where id =1;
LOG:  statement: select * from pg2 where id =1;
LOG:  DB node id: 2 backend pid: 23249 statement: select * from pg2 where id =1;
 id | age |         inserttime
----+-----+----------------------------
  1 |   1 | 2018-07-03 13:19:31.085511
(1 row)

pgpool=# select * from pg2 where id = 1;
LOG:  statement: select * from pg2 where id = 1;
LOG:  DB node id: 2 backend pid: 23249 statement: select * from pg2 where id = 1;
 id | age |         inserttime
----+-----+----------------------------
  1 |   1 | 2018-07-03 13:19:31.085511
(1 row)

pgpool=#

发现该 SELECT SQL 语句被分配到node id:2 即18服务器备库进行执行,继续执行SQL还是会分配到当前节点

  • 查看18服务器日志
2018-07-06 15:51:17.468 CST,"postgres","pgpool",23249,"10.10.56.18:45876",5b3f1f66.5ad1,1,"idle",2018-07-06 15:51:02 CST,3/293,0,LOG,00000,"statement: select * from pg2 where id =1;",,,,,,,,,"psql"
2018-07-06 15:51:18.937 CST,"repl","postgres",23255,"10.10.56.17:34056",5b3f1f76.5ad7,1,"idle",2018-07-06 15:51:18 CST,4/10,0,LOG,00000,"statement: SELECT pg_last_wal_replay_lsn()",,,,,,,,,""

发现该 SELECT SQL 确实在18服务器数据库执行,同时查看16、17日志同上,发现并没有执行该SQL

  • 重新打开会话连接,执行 SELECT SQL
pgpool=# select * from pgpool where id = 1;
LOG:  statement: select * from pgpool where id = 1;
LOG:  Unable to parse the query: "select * from pgpool where id = 1;" from client CLWDB3(41428)
LOG:  DB node id: 0 backend pid: 6155 statement: select * from pgpool where id = 1;
 id | age |         inserttime
----+-----+----------------------------
  1 |   1 | 2018-07-02 15:07:03.329849
(1 row)

pgpool=#

发现 查询SQL 被分配到 node id:0 即16服务器数据库执行

  • 查看16服务器日志
2018-07-06 15:57:17.773 CST,"postgres","pgpool",6155,"10.10.56.18:43020",5b3f20d2.180b,1,"idle",2018-07-06 15:57:06 CST,7/31566,0,LOG,00000,"statement: select * from pgpool where id = 1;",,,,,,,,,"psql"
2018-07-06 15:57:19.803 CST,"repl","postgres",6161,"10.10.56.17:47070",5b3f20df.1811,1,"idle",2018-07-06 15:57:19 CST,8/1364,0,LOG,00000,"statement: SELECT pg_current_wal_lsn()",,,,,,,,,""
  • 测试结果一
由于pgpool 配置的权重相同,即 `查询sql语句` 被分配到各节点服务器概率是相同的,
因为 pgpool 的负载均衡是基于会话的,即打开连接后,第一次被分配到哪个节点,在会
话结束前所有的SQL操作都会被分配到该节点
  • 测试 INSERT SQL
CLWDB3:/opt/pgpool-3 # psql -h 10.10.56.87 -p 9999 -U postgres pgpool
Password for user postgres:
psql (10.3)
Type "help" for help.

pgpool=# insert into pgpool (age) values(2);
LOG:  statement: insert into pgpool (age) values(2);
LOG:  Unable to parse the query: "insert into pgpool (age) values(2);" from client CLWDB3(41892)
LOG:  DB node id: 0 backend pid: 6429 statement: insert into pgpool (age) values(2);
INSERT 0 1
pgpool=# select * from pgpool;
LOG:  statement: select * from pgpool;
LOG:  Unable to parse the query: "select * from pgpool;" from client CLWDB3(41892)
LOG:  DB node id: 0 backend pid: 6429 statement: select * from pgpool;
 id | age |         inserttime
----+-----+----------------------------
  1 |   1 | 2018-07-02 15:07:03.329849
 17 |   2 | 2018-07-06 16:09:36.44314
(2 rows)

pgpool=#

发现 INSERET SQL 被分配到 node id:016 Primary 主数据库服务器执行

  • 查看 16服务器 数据库日志
2018-07-06 16:09:36.443 CST,"postgres","pgpool",6429,"10.10.56.18:43318",5b3f22ab.191d,3,"idle",2018-07-06 16:04:59 CST,5/101756,0,LOG,00000,"statement: insert into pgpool (age) values(2);",,,,,,,,,"psql"
2018-07-06 16:09:37.188 CST,"repl","postgres",6570,"10.10.56.16:53638",5b3f23c1.19aa,1,"idle",2018-07-06 16:09:37 CST,8/1843,0,LOG,00000,"statement: SELECT pg_current_wal_lsn()",,,,,,,,,""

发现该 INSERT SQL 确实在该主库上执行

  • 重新开启连接,先执行 SELECT SQL 再执行 INSERT SQL
pgpool=# select * from pg2;
LOG:  statement: select * from pg2;
LOG:  DB node id: 2 backend pid: 23745 statement: select * from pg2;
 id | age |         inserttime
----+-----+----------------------------
  1 |   1 | 2018-07-03 13:19:31.085511
  2 |   2 | 2018-07-06 16:07:30.1766
(2 rows)

发现该 SELECT SQL 被分配到 node id:2 即18备库服务器

  • 查看18服务器数据库日志
2018-07-06 16:16:36.963 CST,"postgres","pgpool",23745,"10.10.56.18:46788",5b3f24f4.5cc1,3,"idle",2018-07-06 16:14:44 CST,5/316,0,LOG,00000,"statement: select * from pg2;",,,,,,,,,"psql"
2018-07-06 16:16:38.162 CST,"repl","postgres",23815,"10.10.56.16:57740",5b3f2566.5d07,1,"idle",2018-07-06 16:16:38 CST,6/58,0,LOG,00000,"statement: SELECT pg_last_wal_replay_lsn()",,,,,,,,,""

发现该 SELECT SQL 确实分配到该18服务器

  • 继续在 当前会话 执行 INSERT SQL
pgpool=# insert into pg2 values (4,4,now());
LOG:  statement: insert into pg2 values (4,4,now());
LOG:  DB node id: 0 backend pid: 6729 statement: insert into pg2 values (4,4,now());
INSERT 0 1
pgpool=#

发现 INSERT SQL 被分配到 node id:0 即16服务器主库上执行

  • 查看16服务器数据库日志
2018-07-06 16:21:36.766 CST,"postgres","pgpool",6729,"10.10.56.18:43704",5b3f24f4.1a49,12,"idle",2018-07-06 16:14:44 CST,8/1897,0,LOG,00000,"statement: insert into pg2 values (4,4,now());",,,,,,,,,"psql"
2018-07-06 16:21:38.849 CST,"repl","postgres",6981,"10.10.56.16:54286",5b3f2692.1b45,1,"idle",2018-07-06 16:21:38 CST,5/101860,0,LOG,00000,"statement: SELECT pg_current_wal_lsn()",,,,,,,,,""
  • 在当前会话继续执行 SELECT SQL
LOG:  statement: select * from pg2;
LOG:  DB node id: 2 backend pid: 23745 statement: select * from pg2;
 id | age |         inserttime
----+-----+----------------------------
  1 |   1 | 2018-07-03 13:19:31.085511
  2 |   2 | 2018-07-06 16:07:30.1766
  3 |   3 | 2018-07-06 16:16:53.937526
  4 |   4 | 2018-07-06 16:21:36.766792
(4 rows)

pgpool=#

发现该 SELECT SQL 继续分配到 node id:2 即18数据库服务器

  • 查看 18 数据库服务器日志
2018-07-06 16:24:19.324 CST,"postgres","pgpool",23745,"10.10.56.18:46788",5b3f24f4.5cc1,4,"idle",2018-07-06 16:14:44 CST,5/317,0,LOG,00000,"statement: select * from pg2;",,,,,,,,,"psql"
2018-07-06 16:24:25.274 CST,"repl","postgres",23959,"10.10.56.17:35838",5b3f2739.5d97,1,"idle",2018-07-06 16:24:25 CST,2/336,0,LOG,00000,"statement: SELECT pg_last_wal_replay_lsn()",,,,,,,,,""
  • 测试结果二
 经上述插入测试可知,不论当前会话首次被分配到哪个服务器,执行 `INSERT SQL` 都
 会被分配 16服务器 主库上进行执行,`SELECT SQL` 则随机分配节点
  • 测试 UPDATE SQL
pgpool=# select * from pg2;
LOG:  statement: select * from pg2;
LOG:  DB node id: 1 backend pid: 2347 statement: select * from pg2;
 id | age |         inserttime
----+-----+----------------------------
  1 |   1 | 2018-07-03 13:19:31.085511
  2 |   2 | 2018-07-06 16:07:30.1766
  3 |   3 | 2018-07-06 16:16:53.937526
  4 |   4 | 2018-07-06 16:21:36.766792
(4 rows)

pgpool=# update pg2 set age = 55 where id = 2;
LOG:  statement: update pg2 set age = 55 where id = 2;
LOG:  DB node id: 0 backend pid: 7317 statement: update pg2 set age = 55 where id = 2;
UPDATE 1
pgpool=# select * from pg2;
LOG:  statement: select * from pg2;
LOG:  DB node id: 1 backend pid: 2347 statement: select * from pg2;
 id | age |         inserttime
----+-----+----------------------------
  1 |   1 | 2018-07-03 13:19:31.085511
  3 |   3 | 2018-07-06 16:16:53.937526
  4 |   4 | 2018-07-06 16:21:36.766792
  2 |  55 | 2018-07-06 16:07:30.1766
(4 rows)
pgpool=#
  • 测试结果三 :执行 UPDATE SQL 会被自动分配到 16主库服务器 执行

  • 测试 DELETE SQL

pgpool=# select * from pg2;
LOG:  statement: select * from pg2;
LOG:  DB node id: 1 backend pid: 2347 statement: select * from pg2;
 id | age |         inserttime
----+-----+----------------------------
  1 |   1 | 2018-07-03 13:19:31.085511
  3 |   3 | 2018-07-06 16:16:53.937526
  4 |   4 | 2018-07-06 16:21:36.766792
  2 |  55 | 2018-07-06 16:07:30.1766
(4 rows)

pgpool=# delete from pg2 where id != 1;
LOG:  statement: delete from pg2 where id != 1;
LOG:  DB node id: 0 backend pid: 7317 statement: delete from pg2 where id != 1;
DELETE 3
pgpool=# select * from pg2;
LOG:  statement: select * from pg2;
LOG:  DB node id: 1 backend pid: 2347 statement: select * from pg2;
 id | age |         inserttime
----+-----+----------------------------
  1 |   1 | 2018-07-03 13:19:31.085511
(1 row)

pgpool=#
  • 测试结果四 : 执行 DELETE SQL 会自动分配到 16 主库数据库执行

  • 查看 pgpool 日志 可通过 journalctl -a | grep pgpool 查看

测试 pgpool-Ⅱ 负载均衡到指定节点

  • 分别16、17、18 服务器修改 pgpool 配置文件pgpool.conf如下
database_redirect_preference_list = 'test:1'
  • 重启 pgpool
CLWDB2:/opt/pgpool-3 # pgpool reload
  • 查看此时负载
test=# show pool_nodes;
 node_id |  hostname   | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay
---------+-------------+------+--------+-----------+---------+------------+-------------------+-------------------
 0       | 10.10.56.16 | 5532 | up     | 0.000000  | primary | 0          | false             | 0
 1       | 10.10.56.17 | 5532 | up     | 0.363636  | standby | 4          | true              | 0
 2       | 10.10.56.18 | 5532 | up     | 0.636364  | standby | 6          | false             | 0
(3 rows)

test=#

发现此时load_balance_node 17服务器为true表示查询语句会负载到该服务器

  • 连接 pgpool,执行 SELECT SQL
test=# select * from sr where id = 2;
 id | age |         inserttime
----+-----+----------------------------
  2 |   2 | 2018-07-09 20:28:03.358615
(1 row)

test=#
  • 查看17服务器日志
2018-07-09 21:18:15.066 CST,"postgres","test",9036,"10.10.56.17:44772",5b435e90.234c,2,"idle",2018-07-09 21:09:36 CST,2/5984,0,LOG,00000,"statement: select * from sr where id = 2;",,,,,,,,,"psql"

发现此时 SELECT SQL负载到17服务器

  • UPDATE SQL
test=# update sr set age = 1 where id = 1;
UPDATE 1
  • 查看日志
2018-07-09 21:14:54.371 CST,"postgres","test",30749,"10.10.56.17:60684",5b435e90.781d,9,"idle",2018-07-09 21:09:36 CST,5/7015,0,LOG,00000,"statement: update sr set age = 1 where id = 1;",,,,,,,,,"psql"

发现此时 UPDATE SQL 负载到16 master 主服务器上

测试pg数据库主从切换

  • 查看此时数据库状态
CLWDB1:/opt/pgpool-3 # psql -h 10.10.56.87 -p 9999 -U postgres test
Password for user postgres:
psql (10.3)
Type "help" for help.

test=# show pool_nodes;
 node_id |  hostname   | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay
---------+-------------+------+--------+-----------+---------+------------+-------------------+-------------------
 0       | 10.10.56.16 | 5532 | up     | 0.000000  | primary | 0          | false             | 0
 1       | 10.10.56.17 | 5532 | up     | 0.363636  | standby | 8          | true              | 0
 2       | 10.10.56.18 | 5532 | up     | 0.636364  | standby | 6          | false             | 0
(3 rows)

test=#
  • 停止掉16服务器主数据库,模拟主库宕机
postgres@CLWDB1:/pgdata/pgpool/data2> pg_ctl -D /pgdata/pgpool/data2/ stop
waiting for server to shut down.... done
server stopped
postgres@CLWDB1:/pgdata/pgpool/data2>
  • 连接pgpool查看数据库状态
CLWDB1:/opt/pgpool-3 # psql -h 10.10.56.87 -p 9999 -U postgres test
Password for user postgres:
psql (10.3)
Type "help" for help.

test=# show pool_nodes;
 node_id |  hostname   | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay
---------+-------------+------+--------+-----------+---------+------------+-------------------+-------------------
 0       | 10.10.56.16 | 5532 | down   | 0.000000  | standby | 0          | false             | 0
 1       | 10.10.56.17 | 5532 | up     | 0.363636  | primary | 8          | true              | 0
 2       | 10.10.56.18 | 5532 | up     | 0.636364  | standby | 6          | false             | 328
(3 rows)

test=#

发现此时16 Primary数据库 已经停止服务,并且17服务器已经成功切换为 primary数据库

  • 执行 SELECT SQLUPDATE SQL
test=# select * from sr;
 id | age |         inserttime
----+-----+----------------------------
  2 |   2 | 2018-07-09 20:28:03.358615
  1 |   1 | 2018-07-09 20:27:51.981741
(2 rows)

test=# update sr set age = 3 where id = 1;
UPDATE 1
test=# select * from sr;
 id | age |         inserttime
----+-----+----------------------------
  2 |   2 | 2018-07-09 20:28:03.358615
  1 |   3 | 2018-07-09 20:27:51.981741
(2 rows)

test=#

发现数据库可以正常执行,继续对外提供服务,至此可以保证数据库的高可用

  • 测试结果
     环境:161718分别安装pgpool , postgresql数据库 ,16数据库为主,1718为同步备库
     测试 1:测试 `pgpool` 软件高可用, 结果:停掉16 Master pgpool ,1718会根据权重自动选举为Master,pg数据库集群正常提供服务,不受影响
     测试 2:在测试1的基础上,继续停掉17 Mater pgpool  结果: pgpool 集群会停止服务,无法获取VIP
     官方建议使用pgpool 最少2个节点,从节点可以为1-127个

     测试3:在 pgpool 集群节点正常情况下,对数据库进行增、删、改、查 SQL测试,可正常提供服务
                查询SQL 会随机分配到节点上,对数据库的变更到操作会分配到16 primary 主pg数据库上进行执行
     测试4 :在pgpool 集群节点健康情况下,配置查询到指定pg备库节点, 结果:对于查询SQL 会自动分配到指定配置的pg 备库节点 
     测试5:测试pgpool 数据库主从自动切换, 停止16 primary 主数据库服务器,模拟宕机 结果:17 pg 备库 会自动切换为Primary 数据库,继续正常对外提供服务
  • 遇到的问题
测试负载到指定节点时,不论怎么配置,查询SQL都被分配到主pg数据库,经调试,导致问题原
因为使用测试的数据库名称为 pgpool 比较特殊,导致该问题,故在测试过程中避免使用该
pgpool 作为测试数据库名称,防止不必要的问题

猜你喜欢

转载自blog.csdn.net/yaoqiancuo3276/article/details/80983201
今日推荐