1. Introduction to PXC
Refer to the official Percona https://www.percona.com/software/mysql-database/percona-xtradb-cluster
PXC (Percona XtraDB Cluster) is an open source MySQL high-availability solution. It integrates Percona Server and XtraBackup with the Galera library to achieve synchronous multi-master replication. Galera-based high-availability solutions mainly include MariaDB Galera Cluster and Percona XtraDB Cluster. At present, the PXC architecture is used more and more mature in the production line. Compared with the traditional MHA and dual-master cluster architectures based on master-slave replication, the most prominent feature of Galera Cluster is that it solves the long-maligned replication delay problem and can basically achieve real-time synchronization. And between nodes, their mutual relationship is equal. Galera Cluster itself is also a multi-master architecture. PXC is a synchronous replication implemented by a storage engine, and non-asynchronous replication, so its data consistency is quite high.
To build a PXC architecture, at least three MySQL instances are required to form a cluster. The three instances are not in master-slave mode, but are each master. Therefore, the relationship between the three is equal, regardless of subordination, which is also called multi-master Architecture. When the client reads and writes, the instance connected to it is the same, and the data read is the same. After writing to any instance, the cluster will synchronize its newly written data to other instances. This architecture does not Sharing any data is a highly redundant cluster architecture.
1.1 PXC advantages and disadvantages
advantage:
- Realize the high availability of MySQL cluster and strong data consistency.
- Completed the real multi-node read-write cluster solution.
- The delay problem of master-slave replication is improved, and real-time synchronization is basically achieved.
- Newly added nodes can be deployed automatically without manual backup in advance, which is convenient for maintenance.
- Because it is multi-node write, DB failover is easy.
Disadvantages:
- It is expensive to join a new node. When adding a new node, the completed data set must be copied from one of the existing nodes. If the data is 100G, copy 100G.
- Any update needs to be globally verified before it can be executed on other nodes. The performance of the cluster is limited to the worst-performing node, which is what we often call the law of barrels.
- Because of the need to ensure data consistency, PXC uses a real-time storage engine to implement synchronous replication, so when multiple nodes write concurrently, the problem of lock conflicts is more serious.
- There is an expansion problem. Therefore, write operations will occur on nodes. For large scenarios with write loads, PXC is not recommended.
- Only the innoDB storage engine is supported.
1.2 PXC principle
The operation flow of PXC is roughly like this. First, before the client submits the transaction to the write point requesting the connection, the node broadcasts the authorization writeset that needs to be generated, and then obtains the id of the global transaction, and transmits it to other nodes. . After other nodes merge the data through the certification, they find that there is no conflicting data, and then execute the apply_cb and commit_cd operations, otherwise the transaction is discarded.
After the current node (the write node requested by the client) passes the verification, the commit_cb operation is executed and ok is returned to the client. If the verification fails, then rollback_cb.
In the PXC cluster on the production line, there must be at least three nodes. If the same one fails the verification and there is a data conflict, then the method adopted at this time is to remove the node with inconsistent data from the cluster, and it will automatically execute the shutdown command to automatically shut down.
1.3 Important concepts in PXC
The first step is to scale the number of nodes in the cluster. The entire cluster node is controlled within a range of at least three and at most eight. The minimum of three is to prevent split-brain, because split-brain occurs only when there are two nodes. The performance of split brain is to input any command, and the result is unkown command.
When a new node wants to join the PXC cluster, a doner (provider) node needs to be elected from each node in the cluster as a contributor to the full amount of data. PXC has two data transmission methods for nodes, one is called SST full transmission, and the other is called IST incremental transmission. SST transmission has three methods: XtraBackup, mysqldump, and rsync, while incremental transmission only has XtraBacker. Generally, when the amount of data is not large, SST can be used as a full loss, but only the XtraBackup method is used.
In a node cluster, state switching will occur due to a new node joining failure, synchronization failure, etc. The following are examples of the meaning of these states:
- open: The node is successfully started, try to connect to the cluster.
- primary: The node is already in the cluster. When a new node joins the cluster, the state will be generated when the donor is elected for data synchronization.
- joiner: The node is in a state of waiting to receive synchronized data files
- joinerd: The node has completed synchronization, trying to keep the progress consistent with other nodes in the cluster.
- synced: The status of the node normally providing services, indicating that the synchronization has been completed and the progress of the cluster is consistent.
- doner: The node is in the state when it provides full data for the newly added node.
1.4 Important configuration parameters in PXC
In the process of building PXC, you need to set some parameters in my.cnf
-
wsrep_cluster_name: Specify the logical name of the cluster. For all nodes in the cluster, the cluster name must be the same.
-
wsrep_cluster_address: Specify the ip address of each node in the cluster
-
wsrep_node_name: Specify the logical name of the current node and the masses
-
wsrep_node_address: Specify the IP address of the current node
-
wwsrep_provider: Specify the Galera library path
-
wsrep_sst_method: In the mode, PXC uses XtraBackup for SST transmission. It is strongly recommended that this parameter refers to xtrabackup-v2
-
wsrep_sst_auth: Specify the authentication credential as SST as <sst_user><sst_pwd>. This user must be created after booting the first node and given the necessary permissions
-
pxc_strict_mode: Strict mode, the official recommendation is that the parameter value is ENFORCING.
Another particularly important module in PXC is Gcache. Its core function is that each node caches the current latest write set. If a new node joins the cluster, the new data waiting increment can be passed to the new node without using the SST method. This allows nodes to join the cluster faster. The GCache module involves the following parameters:
-
gcache.size: represents the size used to cache write set incremental information. Its default size is 128MB, which is set by the wsrep_provider_options variable parameter. It is recommended to adjust to the range of 2G-4G, enough space is convenient for buffering more incremental information.
-
gcache.page_size: It can be understood that if the memory is not enough (not enough Gcache), write the write set directly to the disk file.
-
gcache.mem_size: Represents the size of the memory cache in Gcache, a moderate increase can improve the performance of the entire cluster.
1.5 PXC cluster status monitoring
After the cluster is set up, you can view the status of each node in the cluster through the following status variable'%wsrep%'. Here are a few important parameters to facilitate the discovery of problems.
2. Deploy PXC
2.1 Environmental description
Required file download: 1111
Address planning:
CPU name | IP address |
---|---|
pxc-node1 | 192.168.1.61 |
pxc-node2 | 192.168.1.64 |
pxc-node3 | 192.168.1.65 |
Resolve dependent packages
yum install -y libev lsof perl-Compress-Raw-Bzip2 perl-Compress-Raw-Zlib perl-DBD-MySQL perl-DBI perl-Digest perl-Digest-MD5 perl-IO-Compress perl-Net-Daemon perl-PlRPC qpress socat openssl openssl-devel
//The qpress package may not be installed, we will install it manually, and the above will provide the required packages
tar xf qpress-11-linux-x64.tar -C /usr/local/bin/
Install XtraBackup
yum -y install percona-xtrabackup-24-2.4.18-1.el7.x86_64.rpm
Uninstall mariadb
rpm -e mariadb-libs --nodeps
Create mysql user and group
groupadd -r mysql && useradd -M -s /bin/false -r -g mysql mysql
Unzip the package to /usr/local/mysql, create a data directory, and grant permissions
tar zxf Percona-XtraDB-Cluster-5.7.28-rel31-31.41.1.Linux.x86_64.ssl101.tar.gz
mv Percona-XtraDB-Cluster-5.7.28-rel31-31.41.1.Linux.x86_64.ssl101 /usr/local/mysql
mkdir /usr/local/mysql/data
chown -R mysql:mysql /usr/local/mysql/
Configure environment variables
tail -1 /etc/profile
export PATH=/usr/local/mysql/bin:$PATH
. /etc/profile
Prepare the configuration file, the binlog format must be row, the configuration files on pxc-node2 and pxc-node3 are the same, but note that you need to change server_id, wsrep_node_name, wsrep_node_address
[client]
port = 3306
socket = /tmp/mysql.sock
[mysql]
prompt="\u@\h \R:\m:\s[\d]> "
no-auto-rehash
[mysqld]
user = mysql
port = 3306
basedir = /usr/local/mysql
datadir = /usr/local/mysql/data
socket = /tmp/mysql.sock
pid-file = db.pid
character-set-server = utf8mb4
skip_name_resolve = 1
open_files_limit = 65535
back_log = 1024
max_connections = 512
max_connect_errors = 1000000
table_open_cache = 1024
table_definition_cache = 1024
table_open_cache_instances = 64
thread_stack = 512K
external-locking = FALSE
max_allowed_packet = 32M
sort_buffer_size = 4M
join_buffer_size = 4M
thread_cache_size = 768
interactive_timeout = 600
wait_timeout = 600
tmp_table_size = 32M
max_heap_table_size = 32M
slow_query_log = 1
slow_query_log_file = /usr/local/mysql/data/slow.log
log-error = /usr/local/mysql/data/error.log
long_query_time = 0.1
server-id = 1813306
log-bin = /usr/local/mysql/data/mysql-bin
sync_binlog = 1
binlog_cache_size = 4M
max_binlog_cache_size = 1G
max_binlog_size = 1G
expire_logs_days = 7
master_info_repository = TABLE
relay_log_info_repository = TABLE
gtid_mode = on
enforce_gtid_consistency = 1
log_slave_updates
binlog_format = row
relay_log_recovery = 1
relay-log-purge = 1
key_buffer_size = 32M
read_buffer_size = 8M
read_rnd_buffer_size = 4M
bulk_insert_buffer_size = 64M
lock_wait_timeout = 3600
explicit_defaults_for_timestamp = 1
innodb_thread_concurrency = 0
innodb_sync_spin_loops = 100
innodb_spin_wait_delay = 30
transaction_isolation = REPEATABLE-READ
innodb_buffer_pool_size = 1024M
innodb_buffer_pool_instances = 8
innodb_buffer_pool_load_at_startup = 1
innodb_buffer_pool_dump_at_shutdown = 1
innodb_data_file_path = ibdata1:1G:autoextend
innodb_flush_log_at_trx_commit = 1
innodb_log_buffer_size = 32M
innodb_log_file_size = 2G
innodb_log_files_in_group = 2
innodb_io_capacity = 2000
innodb_io_capacity_max = 4000
innodb_flush_neighbors = 0
innodb_write_io_threads = 4
innodb_read_io_threads = 4
innodb_purge_threads = 4
innodb_page_cleaners = 4
innodb_open_files = 65535
innodb_max_dirty_pages_pct = 50
innodb_flush_method = O_DIRECT
innodb_lru_scan_depth = 4000
innodb_checksum_algorithm = crc32
innodb_lock_wait_timeout = 10
innodb_rollback_on_timeout = 1
innodb_print_all_deadlocks = 1
innodb_file_per_table = 1
innodb_online_alter_log_max_size = 4G
internal_tmp_disk_storage_engine = InnoDB
innodb_stats_on_metadata = 0
wsrep_provider=/usr/local/mysql/lib/libgalera_smm.so
wsrep_provider_options="gcache.size=1G"
wsrep_cluster_name=pxc-test
wsrep_cluster_address=gcomm://192.168.1.61,192.168.1.64,192.168.1.65
wsrep_node_name=pxc-node1
wsrep_node_address=192.168.1.61
wsrep_sst_method=xtrabackup-v2
wsrep_sst_auth=sst:pwd@123
pxc_strict_mode=ENFORCING
default_storage_engine=InnoDB
innodb_autoinc_lock_mode=2
[mysqldump]
quick
max_allowed_packet = 32M
Each node completes MySQL initialization
mysqld --defaults-file=/etc/my.cnf --user=mysql --basedir=/usr/local/mysql/ --datadir=/usr/local/mysql/data/ --initialize
2.3 Boot the first node to initialize the cluster
Start MySQL on pxc-node1
mysqld --defaults-file=/etc/my.cnf --wsrep_new_cluster &
netstat -anput | grep mysql
tcp 0 0 0.0.0.0:4567 0.0.0.0:* LISTEN 1691/mysqld
tcp6 0 0 :::3306 :::* LISTEN 1691/mysqld
Obtain the temporary password from the error log, log in to MySQL to change the root password
[root@pxc-node1 ~]# grep 'password' /usr/local/mysql/data/error.log
2021-03-14T11:05:42.083115Z 1 [Note] A temporary password is generated for root@localhost: k-506%(lZJlu
root@localhost 19:09: [(none)]> alter user root@'localhost' identified by '123.com';
Create SST transfer account in PXC
root@localhost 19:09: [(none)]> grant all privileges on *.* to 'sst'@'localhost' identified by 'pwd@123';
root@localhost 19:13: [(none)]> flush privileges;
2.4 Add other nodes to the cluster
Start MySQL on pxc-node2 and pxc-node3, and add pxc-node2 and pxc-node3 to the cluster. The process takes a few minutes, wait patiently
mysqld --defaults-file=/etc/my.cnf &
netstat -anput | grep mysql
tcp 0 0 0.0.0.0:4567 0.0.0.0:* LISTEN 2653/mysqld
At this moment, pxc-node2 and pxc-node3 are synchronizing data from pxc-node1 to the local
netstat -anput | grep mysql
tcp 0 0 0.0.0.0:4567 0.0.0.0:* LISTEN 2653/mysqld
tcp 0 0 192.168.1.64:4567 192.168.1.65:48590 ESTABLISHED 2653/mysqld
tcp 0 0 192.168.1.64:41942 192.168.1.61:4567 ESTABLISHED 2653/mysqld
tcp6 0 0 :::3306 :::* LISTEN 2653/mysqld
The data has been synchronized to the local, so you can directly log in to the MySQL terminal by directly using the root user password set on pxc-node1
mysql -uroot -p123.com
Check the cluster status, you can see that there are three nodes in the current cluster
root@localhost 11:30: [(none)]> show global status like '%wsrep_cluster_s%';
+--------------------------+--------------------------------------+
| Variable_name | Value |
+--------------------------+--------------------------------------+
| wsrep_cluster_size | 3 |
| wsrep_cluster_state_uuid | abd434e7-853e-11eb-b686-920f5f5a4d49 |
| wsrep_cluster_status | Primary |
+--------------------------+--------------------------------------+
3 rows in set (0.00 sec)
root@localhost 11:33: [(none)]> show global status like '%wsrep_ready%';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| wsrep_ready | ON |
+---------------+-------+
1 row in set (0.00 sec)
2.5 Verify copy
Create a library on one of them to see if the other two are synchronized.
root@localhost 11:35: [(none)]> create database pxc
root@localhost 11:41: [(none)]> use pxc
root@localhost 11:41: [pxc]> create table test_t1 (
-> id int primary key auto_increment,
-> name varchar(22)
-> );
root@localhost 11:43: [pxc]> insert into test_t1(name) values('zhangsan'),('lisi');
//Check whether it is synchronized on other nodes
root@localhost 11:45: [qin]> select * from pxc.test_t1;
+----+----------+
| id | name |
+----+----------+
| 1 | zhangsan |
| 4 | lisi |
+----+----------+
-> id int primary key auto_increment,
-> name varchar(22)
-> );
root@localhost 11:43: [pxc]> insert into test_t1(name) values(‘zhangsan’),(‘lisi’);
//在其他节点查看是否同步
```bash
root@localhost 11:45: [qin]> select * from pxc.test_t1;
+----+----------+
| id | name |
+----+----------+
| 1 | zhangsan |
| 4 | lisi |
+----+----------+