postgresql|Database|Deployment and use of pgpool-4.4 based on postgresql-12 master-slave replication under centos7

Foreword:

The postgresql database cannot achieve optimal optimization using only some of its own configurations. It requires some external plug-ins (middleware) to improve the overall performance of the server. In layman's terms, the database cannot achieve optimal performance by relying only on itself. , many times it is necessary to change the overall architecture of the database and use some currently mature technologies, such as read-write separation technology, load balancing technology, cache technology and other cluster technologies.

The following figure shows some more mature cluster solutions:

As can be seen from the above table, pgpool is a relatively comprehensive middleware. It has connection pools and load balancing, and there is also a cache function that has not yet been written. In fact, the temptation to use this middleware that cannot be refused is load balancing and caching. , other functions are nothing.

This article will give a detailed introduction to the load balancing and caching functions of pgpool.

one,

A brief introduction to pgpool

Pgpool-II is a middleware that works between PostgreSQL server and PostgreSQL database client. It is licensed under the BSD license. It provides the following functions.

connection pool

Pgpool-II saves connections to the PostgreSQL server and reuses them when new connections come in with the same properties (i.e. username, database, protocol version). It reduces connection overhead and increases the overall throughput of the system.

copy

Pgpool-II can manage multiple PostgreSQL servers. Use the replication feature to create live backups on 2 or more physical disks so that in the event of a disk failure the service can continue running without stopping the server.

load balancing

If the database is replicated, executing a SELECT query on any server will return the same results. Pgpool-II utilizes replication capabilities to reduce the load on each PostgreSQL server by distributing SELECT queries among multiple servers, thereby increasing the overall throughput of the system. At best, performance improves proportionally to the number of PostgreSQL servers. Load balancing works best in situations where many users are executing many queries simultaneously.

limit exceeded connection

PostgreSQL has a limit on the maximum number of concurrent connections, and connections are rejected after so many connections. However, setting the maximum number of connections increases resource consumption and affects system performance. pgpool-II also has a limit on the maximum number of connections, but additional connections will be queued instead of returning an error immediately.

watchdog

Watchdog can coordinate multiple Pgpool-IIs to create a powerful cluster system to avoid single points of failure or split-brain. The watchdog can perform life checks on other pgpool-II nodes to detect failures of Pgpoll-II. If the active Pgpool-II fails, the standby Pgpool-II can be promoted to active and take over the virtual IP.

Query cache

The in-memory query cache allows saving a pair of SELECT statements and their results. If there is the same SELECT, Pgpool-II will return the value from the cache. Since there is no SQL parsing or access to PostgreSQL involved, it is very fast to use in-memory cache. On the other hand, in some cases it can be slower than the normal path because it adds some overhead of storing cached data.

Pgpool-II speaks PostgreSQL's backend and frontend protocols and passes messages between the backend and frontend. So the database application (frontend) thinks of Pgpool-II as the actual PostgreSQL server, and the server (backend) thinks of Pgpool-II as one of its clients. Because Pgpool-II is transparent to both servers and clients, existing database applications can be used with Pgpool-II. Pgpool-II talks about PostgreSQL's backend and frontend protocols and passes connections between them. So the database application (frontend) thinks of Pgpool-II as the actual PostgreSQL server, and the server (backend) thinks of Pgpool-II as one of its clients. Because Pgpool-II is transparent to both servers and clients, existing database applications can be used with Pgpool-II without requiring changes to existing business systems.




Then, the architecture should be to use a watchdog to use a virtual IP (that is, VIP) to proxy a cluster that is itself a master-slave flow replication. VIP can be regarded as the front end, the database can be regarded as the back end, and the master-slave flow replication The database cluster has such a characteristic: the master server can read and write, and the slave service can only read but not write. VIP can read and write through the load balancing function of pgpool. The load balancing strategy is that the master and slave allocate read tasks according to a specific algorithm, and the write tasks are still handed over to the master server.

In this way, the usage of the cluster will naturally increase. Some commonly used query statements are cached through the caching function of pgpool, which naturally improves the query efficiency of the entire cluster.

OK, let’s start with how to build pgpool.

two,

pgpool's official website: pgpool Wiki

There are download and installation tutorials, and some relatively new rpm installation packages are also provided. The rpm warehouse address is: Index of /yum/rpms/4.4/redhat/rhel-7-x86_64

Just configure the address as a yum warehouse.

The version and general situation of postgresql are as follows:

Server 11 is the master server and server 12 is the slave server.

For the construction of master-slave replication, see my blog: postgresql|database|[postgresql-12 master-slave replication deployment based on pg_basebackup]_postgresql12 master-slave_wanfeng_END's blog-CSDN blog

three,

Deployment of pgpool

The deployment work is relatively tedious and difficult. The main reason is that permission issues need to be handled carefully. Secondly, there are many parameters. Many places need to be adjusted according to the actual situation. Finally, pgpool has many functions. How to configure it well needs to be compared. Lots of patience.

First, let’s briefly introduce the components of pgpool. There are three management components of this middleware. One is the pool tool used on the database side. This tool is installed in the postgresql database in the form of a plug-in. The second is pcp on the operating system side. Tool, this tool needs to be configured in the main configuration file of pgpool. The third one is pgpoolAdm. This tool is a web-side management tool written in PHP. You can conveniently view pgpool and manage and configure pgpool on the web-side. The current version should be required Higher version of PHP is supported and is not used for the time being.

1,

Installation of management tools

In this case, only the management tools pool and pcp on the database side are installed. The pool tool is in the source code package.

After the file pgpool-II-4.4.4.tar.gz is uploaded to the server and decompressed, it is no different from the ordinary postgresql plug-in. The same make && make install is enough. The premise is that PGHOME and PGDATA are defined in the environment variables. variable.

2,

yum installation

After configuring the local warehouse and the official website warehouse mentioned above, you can run the following command to install it. A memcached service is installed here, which will be used as a cache service later.

yum install pgpool-II-pg12-debuginfo-4.4.2 pgpool-II-pg12-4.4.2 pgpool-II-pg12-devel-4.4.2 pgpool-II-pg12-extensions-4.4.2 -y
yum install memcached -y &&systemctl enable memcached && systemctl enable pgpool && systemctl start pgpool memcached

After the installation is complete, you will see pgpool-II in the /etc directory. In this directory are pgpool configuration files and some high-availability failover scripts. In this case, these scripts are not intended to be used. They only configure the pgpool service. In addition, please note: Both servers need to be installed. Memcached only needs to be installed on one server.

3,

Configuration file

As you can see, these files are all belonging to postgres groups. You must pay attention to these.

[root@node1 pgpool-II]# ls -al
total 144
drwxr-xr-x.  3 root     root       202 Sep 18 06:18 .
drwxr-xr-x. 83 root     root      8192 Sep 17 19:16 ..
-rw-------   1 postgres postgres   900 Sep 17 11:15 pcp.conf
-rw-------.  1 postgres postgres   858 Jan 22  2023 pcp.conf.sample
-rw-------   1 postgres postgres 52960 Sep 18 02:01 pgpool.conf
-rw-------.  1 postgres postgres 52964 Jan 22  2023 pgpool.conf.sample
-rw-------   1 postgres postgres     2 Sep 17 10:21 pgpool_node_id
-rw-------   1 postgres postgres  3537 Sep 17 11:54 pool_hba.conf
-rw-------.  1 postgres postgres  3476 Jan 22  2023 pool_hba.conf.sample
-rw-------.  1 postgres postgres    45 Sep 17 11:05 pool_passwd
drwxr-xr-x.  2 root     root      4096 Sep 17 10:02 sample_scripts

Configuration of pcp.conf

This file stores the management password of pgpool. This password can be different from the password of the postgresql database. That is to say, it can be set casually. The definition method is very simple. The form of username:password can be added at the end of the file. You just need to pay attention to one thing. The password is md5 encrypted and cannot be in clear text (both methods are acceptable, if you find it troublesome, just use the command on the third line. The user is postgres and the password is 123456)

[root@node1 pgpool-II]# pg_md5 123456
e10adc3949ba59abbe56e057f20f883e
[root@node1 pgpool-II]# echo "postgres:e10adc3949ba59abbe56e057f20f883e">>./pcp.conf
[root@node1 pgpool-II]# echo "postgres:`pg_md5 123456`">>./pcp.conf
[root@node1 pgpool-II]# cat pcp.conf
# PCP Client Authentication Configuration File
# ============================================
#
# This file contains user ID and his password for pgpool
# communication manager authentication.
#
# Note that users defined here do not need to be PostgreSQL
# users. These users are authorized ONLY for pgpool 
# communication manager.
#
# File Format
# ===========
#
# List one UserID and password on a single line. They must
# be concatenated together using ':' (colon) between them.
# No spaces or tabs are allowed anywhere in the line.
#
# Example:
# postgres:e8a48653851e28c69d0506508fb27fc5
#
# Be aware that there will be no spaces or tabs at the
# beginning of the line! although the above example looks
# like so.
#
# Lines beginning with '#' (pound) are comments and will
# be ignored. Again, no spaces or tabs allowed before '#'.

# USERID:MD5PASSWD
postgres:e10adc3949ba59abbe56e057f20f883e

Configuration of pgpool.conf file:

This file is the main configuration file of pgpool. All the commented lines have been removed, and only the released content is retained.

Note: The file path defined in this configuration file needs to be created manually. The /var/run/postgresql group is postgres.

sr_check_user = 'nobody' This nobody user needs to be created in the main database. The creation command is create role nobody login replication encrypted password 'replica';

Why the main database? Because it is streaming replication, once the master database is created, the slave database will naturally exist. The same is true for the plug-ins mentioned above.

[root@node1 pgpool-II]# sed -e '/^$/d' pgpool.conf |grep -v "\#"
backend_clustering_mode = 'streaming_replication'
listen_addresses = '*'
port = 15433
unix_socket_directories = '/var/run/postgresql'
pcp_listen_addresses = '*'
pcp_port = 19999
pcp_socket_dir = '/var/run/postgresql'
backend_hostname0 = '192.168.123.11'
backend_port0 = 5432
backend_weight0 = 1
backend_data_directory0 = '/usr/local/pgsql/data'
backend_flag0 = 'ALLOW_TO_FAILOVER'
backend_application_name0 = 'node1'
backend_hostname1 = '192.168.123.12'
backend_port1 = 5432
backend_weight1 = 1
backend_data_directory1 = '/usr/local/pgsql/data'
backend_flag1 = 'ALLOW_TO_FAILOVER'
backend_application_name1 = 'node2'
enable_pool_hba = on
pool_passwd = 'pool_passwd'
process_management_mode = dynamic
num_init_children = 32
min_spare_children = 5
max_spare_children = 10
max_pool = 4
child_life_time = 5min
log_destination = 'stderr'
log_connections = on
log_disconnections = on
log_hostname = on
log_statement = on
log_per_node_statement = on
log_client_messages = on
logging_collector = on
log_directory = '/var/log/pgpool_log'
log_filename = 'pgpool-%a.log'
log_file_mode = 0600
log_truncate_on_rotation = on
log_rotation_age = 1d
log_rotation_size = 0 
pid_file_name = '/var/run/postgresql/pgpool.pid'
logdir = '/tmp'
connection_cache = on
reset_query_list = 'ABORT; DISCARD ALL'
load_balance_mode = on
database_redirect_preference_list = 'postgres:1'
sr_check_period = 10
sr_check_user = 'nobody'
sr_check_password = 'replica'
sr_check_database = 'postgres'
delay_threshold = 1
delay_threshold_by_time = 1
prefer_lower_delay_standby = on
use_watchdog = on
hostname0 = '192.168.123.11'
wd_port0 = 9000
pgpool_port0 = 15433
hostname1 = '192.168.123.12'
wd_port1 = 9000
pgpool_port1 = 15433
wd_ipc_socket_dir = '/var/run/postgresql'
delegate_ip = '192.168.123.222'
if_cmd_path = '/sbin'
if_up_cmd = 'ip addr add $_IP_$/24 dev ens33 label ens33:0'
if_down_cmd = 'ip addr del $_IP_$/24 dev ens33'
arping_path = '/usr/sbin'
arping_cmd = 'arping -U $_IP_$ -w 1 -I ens33'
wd_monitoring_interfaces_list = ''
wd_lifecheck_method = 'heartbeat'
wd_interval = 10
heartbeat_hostname0 = '192.168.123.11'
heartbeat_port0 = 19694
heartbeat_device0 = 'ens33'
heartbeat_hostname1 = '192.168.123.12'
heartbeat_port1 = 19694
heartbeat_device1 = 'ens33'
wd_life_point = 3
wd_lifecheck_query = 'SELECT 1'
memory_cache_enabled = off
memqcache_method = 'memcached'
memqcache_memcached_host = '192.168.123.11'
memqcache_memcached_port = 11211
memqcache_total_size = 64MB
memqcache_max_num_cache = 1000000
memqcache_expire = 0
memqcache_cache_block_size = 1MB

Configuration of pool_passwd file:

重要:
su - postgres
pg_md5 -m -p -u postgres pool_passwd 
#此时会提示输入密码,此密码是postgresql服务器的postgres用户的密码,一会会用此命令登录postgresql数据库的哦
[root@node1 pgpool-II]# su - postgres
Last login: Mon Sep 18 06:34:54 CST 2023 on pts/1
[postgres@node1 ~]$ pg_md5 -m -p -u postgres pool_passwd
password: 
[postgres@node1 ~]$ logout
[root@node1 pgpool-II]# cat pool_passwd 
postgres:md5a3556571e93b0d20722ba62be61e8c2d

Configuration of pool_hab.conf file

The function of this file is to define which users of pgpool can access which back-end postgresql database. The function is similar to the pg_hba.conf file of the postgresql database.

If you don’t want to be too troublesome (that is, not too safe), then the following configuration is sufficient:

# IPv4 local connections:
host    all         all         127.0.0.1/32          trust
host    all         all         ::1/128               trust
host    all         all         0.0.0.0/0               md5

Configuration of pgpool_node_id file

This file is an identification file, indicating which backend the pgpool corresponds to. Therefore, the content of this file on server 11 is just a 0, and the content of the last file on the server 12 is just a 1. Of course, if there are other nodes, they will be in sequence. Just increase the number, the maximum seems to be 127 nodes.

[root@node1 pgpool-II]# cat pgpool_node_id 
0

To be continued! ! !

Guess you like

Origin blog.csdn.net/alwaysbefine/article/details/132942114