table of Contents

A, Galera Cluster experimental environment

1. Installation galera-3, mysql-wsrep-5.7, Percona-XtraBackup-2.4.15

2. modify the configuration file

3. Initialize Cluster

4. Start the other cluster nodes mysqld service

5. Verify Installation

6. Troubleshooting

Third, add nodes using the SST

Fourth, the increase in the use of IST node

1. Set gcache.size

2. IST test

reference:

A, Galera Cluster experimental environment

This part to build a three-node Galera Cluster for MySQL 5.7 as an example, the installation step Galera basic cluster configuration, the following experiment environment.

IP host name:
172.16.1.125 HDP2
172.16.1.126 hdp3
172.16.1.127 hdp4

Software Environment:
CentOS Linux Release 7.2.1511 (Core)
Galera-3.28
MySQL-5.7.27-wsrep
Percona-XtraBackup-2.4.15

Hardware Environment:
three virtual machines, each basic configuration:
dual-core double the CPU, Intel (R) Xeon (R) E5-2420 0 the CPU @ 1.90 GHz
. 8G physical memory, 8G Swap
. 100G physical hard disk

Second, the initial installation

We start from the simplest scenario, assume that in the absence of any data and application access, the install from scratch Galera Cluster.

1. Installation galera-3, mysql-wsrep-5.7, Percona-XtraBackup-2.4.15

The following steps are performed three hosts as root.

(1) Installation dependencies

yum install perl-Time-HiRes
yum -y install perl-DBD-MySQL.x86_64
yum -y install libaio*

(2) create yum source file

cat > /etc/yum.repos.d/galera.repo <<-END
[galera]
name = Galera
baseurl = https://releases.galeracluster.com/galera-3.28/centos/7/x86_64
gpgkey = https://releases.galeracluster.com/galera-3.28/GPG-KEY-galeracluster.com
gpgcheck = 1

[mysql-wsrep]
name = MySQL-wsrep
baseurl =  https://releases.galeracluster.com/mysql-wsrep-5.7.27-25.19/centos/7/x86_64
gpgkey = https://releases.galeracluster.com/mysql-wsrep-5.7.27-25.19/GPG-KEY-galeracluster.com
gpgcheck = 1
END

(3) mounted galera-3 and mysql-wsrep-5.7

yum install -y galera-3 mysql-wsrep-5.7

(4) and recognize the packet rpm

[root@hdp2~]#rpm -qa | grep -E 'galera|wsrep'
mysql-wsrep-client-5.7-5.7.27-25.19.el7.x86_64
galera-3-25.3.28-1.el7.x86_64
mysql-wsrep-common-5.7-5.7.27-25.19.el7.x86_64
mysql-wsrep-libs-5.7-5.7.27-25.19.el7.x86_64
mysql-wsrep-server-5.7-5.7.27-25.19.el7.x86_64
mysql-wsrep-5.7-5.7.27-25.19.el7.x86_64
mysql-wsrep-libs-compat-5.7-5.7.27-25.19.el7.x86_64
[root@hdp2~]#

(5) Installation xtrabackup
If using SST xtrabackup need to perform this step. Note xtrabackup compatibility with MySQL server, if a version mismatch error similar to the following will be reported in xtrabackup log:

innobackupex: Error: Unsupported server version: '5.7.27' Please report a bug at https://bugs.launchpad.net/percona-xtrabackup

For MySQL 5.7.27, from https://www.percona.com/downloads/Percona-XtraBackup-2.4/LATEST/ download xtrabackup 2.4.15 version.

# 安装xtrabackup
rpm -ivh percona-xtrabackup-24-2.4.15-1.el7.x86_64.rpm

At this point the package installation is complete. Start the cluster must be configured as long as a few little.

2. modify the configuration file

/Etc/my.cnf edit files, add the following:

[mysqld]
log-error=/var/log/mysqld.log
wsrep_provider=/usr/lib64/galera-3/libgalera_smm.so
wsrep_cluster_name="mysql_galera_cluster"
wsrep_cluster_address="gcomm://172.16.1.125,172.16.1.126,172.16.1.127"
wsrep_sst_method=xtrabackup
wsrep_sst_auth=root:P@sswo2d
wsrep_node_name=node1               # 另外两个节点分别为node2、node3
wsrep_node_address="172.16.1.125"   # 另外两个节点分别为172.16.1.126、172.16.1.127

The system variables:

log-error: MySQL error log file, after the initial cluster initialization find the password from the file.
wsrep_provider: galera library files.
wsrep_cluster_name: cluster name.
wsrep_cluster_address: cluster node IP address.
wsrep_sst_method: SST method.
wsrep_sst_auth: SST authentication information, xtrabackup this user name and password using a database connection instance.
wsrep_node_name: the name of the current node.
wsrep_node_address: The current node address.

3. Initialize Cluster

Below the root user. In either host execution.

(1) Start the first node

/usr/bin/mysqld_bootstrap

This command will start the machine service mysqld, MySQL the default installation directory is / var / lib / mysql. Note, / usr / bin / mysqld_bootstrap command is used only when the first cluster node is started, because the script with one argument: -wsrep-new-cluster, on behalf of the new cluster.

# 查看mysqld服务状态
systemctl status mysqld

(2) Find and modify the initial password

# 查找初始密码
grep -i 'temporary password' /var/log/mysqld.log

# 修改mysql root用户密码，需要根据提示输入上一步输出的初始密码
mysqladmin -uroot -p password 'P@sswo2d'

(3) create a non-root account management

create user wxy identified by 'P@sswo2d';
grant all on *.* to wxy with grant option;

4. Start the other cluster nodes mysqld service

# The root user on the other two hosts

systemctl start mysqld

5. Verify Installation

(1) check the number of cluster nodes

mysql> show status like 'wsrep_cluster_size';
+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| wsrep_cluster_size | 3     |
+--------------------+-------+
1 row in set (0.00 sec)

(2) in the table were built three nodes insert the data, view the replication

-- node1
create database test;
use test;
create table t1(a int);
insert into t1 values(1);

-- node2
use test;
create table t2(a int);
insert into t2 values(2);

-- node2
use test;
create table t3(a int);
insert into t3 values(3);

In the three nodes query data, results were consistent:

mysql> select t1.a,t2.a,t3.a from test.t1,test.t2,test.t3;
+------+------+------+
| a    | a    | a    |
+------+------+------+
|    1 |    2 |    3 |
+------+------+------+
1 row in set (0.00 sec)

6. Troubleshooting

If the cluster during initialization or start mysqld service, the error log similar to the following error occurs:

2019-10-05T10:25:29.729981Z 0 [ERROR] WSREP: wsrep_load(): dlopen(): /usr/lib64/galera-3/libgalera_smm.so: symbol SSL_COMP_free_compression_methods, version libssl.so.10 not defined in file libssl.so.10 with link time reference

Description galera plugin does not load successfully. At this point you can still successfully start mysqld, but in order to provide read and write single-instance service does not perform data replication between nodes. Upgrade OpenSSL package to resolve this issue. E.g:

cd /etc/yum.repos.d/
wget http://mirrors.163.com/.help/CentOS7-Base-163.repo
yum -y upgrade openssl

Third, add nodes using the SST

Assume node1, Galera cluster node node2 is being used now to add a third node node3. The goal is the premise of the existing non-blocking node, data synchronization is performed SST new node.

(1) The hdp4 initialized to a new node

# 在hdp4上执行
systemctl stop mysqld
rm -rf /var/lib/mysql
rm -rf /var/log/mysqld.log

(2) using tpcc-mysql hdp2 perform pressure measurement, the load applied to the analog

# 安装tpcc-mysql
tar -zxvf tpcc-mysql.tar.gz
cd tpcc-mysql/src
make
cd ..

# 创建压测库表
mysql -uwxy -pP@sswo2d -h172.16.1.125 -e "create database tpcc_test;"
mysql -uwxy -pP@sswo2d -h172.16.1.125 -Dtpcc_test < create_table.sql
mysql -uwxy -pP@sswo2d -h172.16.1.125 -Dtpcc_test < add_fkey_idx.sql

# 准备数据
tpcc_load 172.16.1.125 tpcc_test wxy P@sswo2d 10

# 备份测试库用于重复测试
mysqldump --databases tpcc_test -uwxy -pP@sswo2d -h172.16.1.125 > tpcc_test.sql

# 执行压测
tpcc_start -h172.16.1.125 -d tpcc_test -u wxy -p "P@sswo2d" -w 10 -c 32 -r 60 -l 300

Details about the installation and use of tpcc-mysql description see https://wxy0327.blog.csdn.net/article/details/94614149#1.%20%E6%B5%8B%E8%AF%95%E8%A7% E5% 88%% 84 92 .

(3) during the pressure measurement performed on starting MySQL instance hdp4

systemctl start mysqld

In hdp4 startup, hdp2, hdp3 are non-blocking, but will be reported the following error lock:

Deadlock found when trying to get lock; try restarting transaction
Lock wait timeout exceeded; try restarting transaction

We need to focus on this issue when Galera Cluster production systems add a node online. When the MySQL instance hdp4 started successfully, subsequent pressure measurement process will not be given. /var/log/mysqld.log file hdp2 the following information about the SST:

2019-10-16T02:04:07.877299Z 2 [Note] WSREP: Assign initial position for certification: 106026, protocol version: 4
2019-10-16T02:04:07.877385Z 0 [Note] WSREP: Service thread queue flushed.
2019-10-16T02:04:08.307002Z 0 [Note] WSREP: Member 0.0 (node3) requested state transfer from '*any*'. Selected 1.0 (node1)(SYNCED) as donor.
2019-10-16T02:04:08.307028Z 0 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 106177)
2019-10-16T02:04:08.377506Z 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2019-10-16T02:04:08.377644Z 0 [Note] WSREP: Running: 'wsrep_sst_xtrabackup --role 'donor' --address '172.16.1.127:4444/xtrabackup_sst' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --defaults-group-suffix ''   '' --gtid 'cada8d04-ef2b-11e9-a196-1ea90518b418:106177''
2019-10-16T02:04:08.380201Z 2 [Note] WSREP: sst_donor_thread signaled with 0
WSREP_SST: [INFO] Streaming with tar (20191016 10:04:08.542)
WSREP_SST: [INFO] Using socat as streamer (20191016 10:04:08.544)
WSREP_SST: [INFO] Streaming the backup to joiner at 172.16.1.127 4444 (20191016 10:04:08.552)
WSREP_SST: [INFO] Evaluating innobackupex --defaults-file=/etc/my.cnf $INNOEXTRA --galera-info --stream=$sfmt ${TMPDIR} 2>${DATA}/innobackup.backup.log | socat -u stdio TCP:172.16.1.127:4444; RC=( ${PIPESTATUS[@]} ) (20191016 10:04:08.555)
2019-10-16T02:04:10.586433Z 0 [Note] WSREP: (cad9d0f0, 'tcp://0.0.0.0:4567') turning message relay requesting off
2019-10-16T02:05:11.408356Z 89 [Note] WSREP: Provider paused at cada8d04-ef2b-11e9-a196-1ea90518b418:108094 (111402)
2019-10-16T02:05:11.679454Z 89 [Note] WSREP: resuming provider at 111402
2019-10-16T02:05:11.679485Z 89 [Note] WSREP: Provider resumed.
2019-10-16T02:05:12.057568Z 0 [Note] WSREP: 1.0 (node1): State transfer to 0.0 (node3) complete.
2019-10-16T02:05:12.057596Z 0 [Note] WSREP: Shifting DONOR/DESYNCED -> JOINED (TO: 108185)
WSREP_SST: [INFO] Total time on donor: 0 seconds (20191016 10:05:12.057)
2019-10-16T02:05:12.058281Z 0 [Note] WSREP: Member 1.0 (node1) synced with group.
2019-10-16T02:05:12.058296Z 0 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 108185)
2019-10-16T02:05:12.127179Z 2 [Note] WSREP: Synchronized with group, ready for connections
2019-10-16T02:05:12.127215Z 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2019-10-16T02:05:20.968604Z 0 [Note] WSREP: 0.0 (node3): State transfer from 1.0 (node1) complete.
2019-10-16T02:09:34.556227Z 0 [Note] WSREP: Member 0.0 (node3) synced with group.

When the execution systemctl start mysqld on hdp4, transactions for all nodes in the cluster will first submitted flushed to disk, and then select a new donor to the whole amount of synchronous data node, in this case the system selects the node1 as donors. Then node1 call xtrabackup to node3 physical file copy, write sets generated during the cached. When node1 become donors, state the SYNCED into DONOR / DESYNCED; xtrabackup when the backup is complete, its status changes JOINED; finally finished when the write caching application set, and from the state JOINED become SYNCED. The same can be found from /var/log/mysqld.log file hdp4, the state change process node3 is: OPEN -> PRIMARY -> JOINER -> JOINED -> SYNCED.

Fourth, the increase in the use of IST node

SST method the total amount of data copied from the donor node to a new node is added, similar to do a full backup master MySQL database, and then restore the database, but in Galera cluster, the process depends on the state of the newly added node automatic trigger. For the new node under high concurrency large database environment, SST way can be very painful. First of all, if you use mysqldump or rsync do SST, it is blocked when the donor node. Secondly, several hundred GB or more of data sets, even if the network is fast enough, the synchronization process can also take several hours to complete. So when the new production environment node is best to avoid using SST, but instead IST.

IST only send it in less than a donor Gcache set to write a new node. Gcache is set to write a file to save a copy of Galera node. IST is much faster than the SST, non-blocking, no significant effect on donors. Whenever possible, this should be the first choice for the new node.

SST is sometimes unavoidable, when the new node status can not be determined Galera This happens. Grastate.dat state is stored in a file, if the following occurs will trigger SST:

Grastate.dat file does not exist in the MySQL data directory - node could be a "clean" the new node.
No seqno grastate.dat file or group id-- node may crash during DDL.
Due to lack of permissions or file system corruption, grastate.dat unreadable.

1. Set gcache.size

We mentioned in the article about the IST, the use of IST need to meet two conditions precedent: the same UUID new node in the cluster; Gcache enough to store an incremental write set. The first point very well satisfied, will automatically keep the UUID for the new cluster node xtrabackup do with physical backup cluster instance. To satisfy the second condition, it requires some calculations to estimate the required size Gcache. For example, for tpcc-mysql pressure measurement can be estimated by the following method.

(1) perform pressure measurement

tpcc_start -h172.16.1.125 -d tpcc_test -u wxy -p "P@sswo2d" -w 10 -c 32 -r 60 -l 300

(2) the pressure measurement performed during the query

set global show_compatibility_56=on;
set @start := (select sum(variable_value/1024/1024) from information_schema.global_status where variable_name like 'wsrep%bytes'); 
do sleep(60); 
set @end := (select sum(variable_value/1024/1024) from information_schema.global_status where variable_name like 'wsrep%bytes'); 
select round((@end - @start),2) as `Mb/min`, round((@end - @start),2) * 60 as `Mb/hour`;

The query count bytes written per minute, the following results:

+--------+---------+
| Mb/min | Mb/hour |
+--------+---------+
| 116.66 | 6999.60 |
+--------+---------+

Pressure measuring total execution time 6 minutes (one minute preheat, performed 5 minutes), as long as gcache.size set to be greater than 117 * 6MB, so here the gcache.size are set to 1G, IST sufficient presentation data synchronization. Add the following parameters in the configuration file /etc/my.cnf three nodes, and then restart the instance take effect.

wsrep_provider_options="gcache.size=1073741824"

2. IST test

Also assume that node1, node2 Galera cluster nodes are being used now to add a third node node3. To avoid SST, from node1 use xtrabackup create a full backup, restore and create backup on node3 Galera state file so that the state can determine Galera node and skip SST. As close as possible to the latest data before IST, you will create an incremental backup.

(1) The hdp4 initialized to a new node

# 在hdp4上用root用户执行
systemctl stop mysqld
rm -rf /var/lib/mysql/*
rm -rf /var/log/mysqld.log 
rm -rf /tmp/incremental/*

(2) re-introducing the measured pressure to a cluster library

mysql -uwxy -pP@sswo2d -h172.16.1.125 -Dtpcc_test < tpcc_test.sql

(3) perform pressure measurement simulated production load

tpcc_start -h172.16.1.125 -d tpcc_test -u wxy -p "P@sswo2d" -w 10 -c 32 -r 60 -l 300

The next section (4), (5), (6), (7) are performed during step (3) Operation of step.

(4) to perform manual backup of cluster xtrabackup

# 在hdp2上用mysql用户执行下面的命令进行全量备份（已经事先配置好了hdp2到hdp4的免密登录）
innobackupex --defaults-file=/etc/my.cnf --user=wxy --password=P@sswo2d --socket=/var/lib/mysql/mysql.sock --galera-info --no-lock --stream=xbstream ./ | ssh [email protected] "xbstream -x -C /var/lib/mysql"

# 再执行一个增量备份，这里仅用于演示
scp [email protected]:/var/lib/mysql/xtrabackup_checkpoints /home/mysql/
innobackupex --defaults-file=/etc/my.cnf --user=wxy --password=P@sswo2d --socket=/var/lib/mysql/mysql.sock --incremental --incremental-basedir=/home/mysql --galera-info --no-lock --stream=xbstream ./ | ssh [email protected] "xbstream -x -C /tmp/incremental"

Data files (5) Recovery of hdp4
(4) After completion of the step execution, the following command mysql user on hdp4:

# 恢复全量
innobackupex --apply-log --redo-only /var/lib/mysql/
# 恢复增量
innobackupex --apply-log --redo-only /var/lib/mysql/ --incremental-dir=/tmp/incremental

(6) generating grastate.dat file
generated based on the contents xtrabackup_galera_info grastate.dat document file for incremental synchronization IST. Execute the following command mysql user on hdp4:

# 查看xtrabackup_galera_info
cat /var/lib/mysql/xtrabackup_galera_info

# 生成grastate.dat文件，uuid和seqno的值来自xtrabackup_galera_info
tee /var/lib/mysql/grastate.dat <<EOF
# GALERA saved state
version: 2.1
uuid:    650c3acb-eff8-11e9-9905-c73959fd46ca
seqno:   743544
safe_to_bootstrap: 0
EOF

(7) start a new node instance

# 用root用户在hdp4上执行
systemctl start mysqld

/var/log/mysqld.log file hdp2 the following information about the IST:

2019-10-17T06:37:00.517675Z 1 [Note] WSREP: Assign initial position for certification: 758430, protocol version: 4
2019-10-17T06:37:00.517777Z 0 [Note] WSREP: Service thread queue flushed.
2019-10-17T06:37:00.961803Z 0 [Note] WSREP: Member 2.0 (node3) requested state transfer from '*any*'. Selected 1.0 (node2)(SYNCED) as donor.
2019-10-17T06:37:01.223935Z 0 [Note] WSREP: 1.0 (node2): State transfer to 2.0 (node3) complete.
2019-10-17T06:37:01.224331Z 0 [Note] WSREP: Member 1.0 (node2) synced with group.
2019-10-17T06:37:02.301018Z 0 [Note] WSREP: (4ce6e1a5, 'tcp://0.0.0.0:4567') turning message relay requesting off
2019-10-17T06:38:14.740957Z 0 [Note] WSREP: 2.0 (node3): State transfer from 1.0 (node2) complete.
2019-10-17T06:39:31.183193Z 0 [Note] WSREP: Member 2.0 (node3) synced with group.

You can see, the system selects node2 as a donor, it /var/log/mysqld.log file has the following information about the IST:

2019-10-17T06:37:00.991588Z 0 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 758624)
2019-10-17T06:37:01.045666Z 2 [Note] WSREP: IST request: 650c3acb-eff8-11e9-9905-c73959fd46ca:743544-758430|tcp://172.16.1.127:4568
2019-10-17T06:37:01.045701Z 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2019-10-17T06:37:01.045885Z 0 [Note] WSREP: Running: 'wsrep_sst_xtrabackup --role 'donor' --address '172.16.1.127:4444/xtrabackup_sst' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --defaults-group-suffix ''   '' --gtid '650c3acb-eff8-11e9-9905-c73959fd46ca:743544' --bypass'
2019-10-17T06:37:01.046440Z 2 [Note] WSREP: sst_donor_thread signaled with 0
2019-10-17T06:37:01.048205Z 0 [Note] WSREP: async IST sender starting to serve tcp://172.16.1.127:4568 sending 743545-758430

Display transmits a write current between the hdp4 743545-758430 of hdp3.

IST can be found in /var/log/mysqld.log node3 incremental file hdp4 in synchronization and status change process:

2019-10-17T06:37:01.445113Z 0 [Note] WSREP: Signalling provider to continue.
2019-10-17T06:37:01.445151Z 0 [Note] WSREP: Initialized wsrep sidno 2
2019-10-17T06:37:01.445185Z 0 [Note] WSREP: SST received: 650c3acb-eff8-11e9-9905-c73959fd46ca:743544
2019-10-17T06:37:01.445588Z 0 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.7.27'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MySQL Community Server - (GPL), wsrep_25.19
2019-10-17T06:37:01.446278Z 2 [Note] WSREP: Receiving IST: 14886 writesets, seqnos 743544-758430
2019-10-17T06:37:01.446519Z 0 [Note] WSREP: Receiving IST...  0.0% (    0/14886 events) complete.
2019-10-17T06:37:02.539991Z 0 [Note] WSREP: (852308ca, 'tcp://0.0.0.0:4567') turning message relay requesting off
2019-10-17T06:37:11.453604Z 0 [Note] WSREP: Receiving IST... 11.1% ( 1648/14886 events) complete.
2019-10-17T06:37:21.485718Z 0 [Note] WSREP: Receiving IST... 23.5% ( 3504/14886 events) complete.
2019-10-17T06:37:31.686873Z 0 [Note] WSREP: Receiving IST... 37.8% ( 5632/14886 events) complete.
2019-10-17T06:37:31.751232Z 24 [Note] Got an error writing communication packets
2019-10-17T06:37:41.721891Z 0 [Note] WSREP: Receiving IST... 56.2% ( 8368/14886 events) complete.
2019-10-17T06:37:51.733818Z 0 [Note] WSREP: Receiving IST... 67.3% (10016/14886 events) complete.
2019-10-17T06:37:54.160644Z 39 [Note] Got an error writing communication packets
2019-10-17T06:38:01.739171Z 0 [Note] WSREP: Receiving IST... 80.0% (11904/14886 events) complete.
2019-10-17T06:38:03.165189Z 45 [Note] Got an error reading communication packets
2019-10-17T06:38:11.778534Z 0 [Note] WSREP: Receiving IST... 94.9% (14128/14886 events) complete.
2019-10-17T06:38:14.765552Z 0 [Note] WSREP: Receiving IST...100.0% (14886/14886 events) complete.
2019-10-17T06:38:14.765871Z 2 [Note] WSREP: IST received: 650c3acb-eff8-11e9-9905-c73959fd46ca:758430
2019-10-17T06:38:14.766840Z 0 [Note] WSREP: 2.0 (node3): State transfer from 1.0 (node2) complete.
2019-10-17T06:38:14.766873Z 0 [Note] WSREP: Shifting JOINER -> JOINED (TO: 777023)
2019-10-17T06:39:31.208799Z 0 [Note] WSREP: Member 2.0 (node3) synced with group.
2019-10-17T06:39:31.208825Z 0 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 777023)
2019-10-17T06:39:31.241137Z 2 [Note] WSREP: Synchronized with group, ready for connections
2019-10-17T06:39:31.241155Z 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.