MHA high availability configuration and failover of MySQL database

Table of contents

1. About MHA

1.1 Overview of MHA

 1.2 Composition of MHA

 1.3 Features of MHA

 1.4 The working principle of MHA (emphasis)

2. Experiment preparation 

2.1 Construction ideas

2.2 Environment preparation

2.3. Turn off all server firewalls and enhancements

 2.4 Add domain name resolution in Master, Slave1, Slave2

 3. Build MySQL MHA

  3.1 Modify the configuration files of Master, Slave1, and Slave2

  3.2 Create two soft links for Master, Slave1, and Slave2 nodes

 3.3 Configure MySQL as one master and two slaves

  3.4 View binary files and synchronization points on the Master node

 3.5 Perform synchronization operations on Slave1 and Slave2 nodes 

  3.6 Set Slave1 and Slave2 to read-only mode

  3.7 Verify master-slave replication

  4. Install MHA software

 4.1 Install the MHA-dependent environment on all servers, first install the epel source

4.2. Install the manager component on the MHA-manager node (only need to be installed on the manager)

4.3 After the manager component is installed, several tools will be generated under /usr/local/bin

 4.4 After the node component is installed, several scripts will be generated under /usr/local/bin (these tools are usually triggered by the script of MHAManager, no human operation is required) mainly as follows

4.5 Configure passwordless authentication on all servers 

4.6 Password-free login test 

 4.7 Configure MHA on the manager node

 4.7.1 Copy the relevant scripts to the /usr/local/bin directory on the manager node

4.7.2 Make the script of VIP management in the above automatic switching to /usr/local/bin directory, here use master_ip_failover script to manage VIP and failover

 4.7.3 Modify master_ip_failover Delete all, add the following content, modify related parameters

 4.7.4 Create the MHA software directory and copy the configuration file, here use the app1.cnf configuration file to manage the mysql node server

 4.7.5 Enable virtual IP on Master service

 4.7.6 Test ssh passwordless authentication on the manager node, if it is normal, it will finally output successfully.

  4.7.7 Test the mysql master-slave connection on the manager node, and the words MySQL Replication Health is OK appear at the end, indicating that it is normal

  4.7.8 Start MHA on the manager node

4.7.9 Check the MHA status, you can see that the current master is the master node.

  4.7.10. Check the MHA log and see that the current master is 192.168.226.10, as shown below.

  4.7.11 Check whether the VIP address 192.168.226.200 of the master exists. This VIP address will not disappear because the manager node stops the MHA service.

 5. Fault simulation

5.1 Monitor and observe log records on the manager node

  5.2 Stop the mysql service on the Master 

 5.3 Check the file content in the configuration file and find that server1 is gone

 5.4 Check on slave1 and slave2 to see if slave1 takes over VIP

 5.5 Algorithm for Failover Alternative Primary Database

 6. Troubleshooting

6.1 Repair the master server master

  6.2 Repair the master-slave (check the binary file and synchronization point on the current master library server slave1 (192.168.226.20))

 6.3 Execute the synchronization operation on the original main library server master (192.168.223.10) 

  6.4 Modify the configuration file app1.cnf on the manager node (add this record, because it will disappear automatically when it detects the failure)

 6.5 Start MHA on the manager node

Seven. Summary

1. About MHA

1.1 Overview of MHA

  • MHA is a set of excellent failover and master-slave replication software in MySQL high availability environment. Its appearance is to solve the problem of MySQL single point
  • During the MySQL failover process, MHA can automatically complete the failover operation within 0-30 seconds
  • MHA can ensure data consistency to the greatest extent during the failover process to achieve high availability in the true sense

 

 1.2 Composition of MHA

 MHA Node (data node) -------- run on each MySQL server

MHA Manager (management node) ------- can be deployed on an independent machine to manage multiple master-slaver clusters; it can also be deployed on a slave node

ps: MHA Manager will regularly detect the master node in the cluster. When the master fails, it can automatically promote the slave with the latest data to the new master, and then redirect all other slaves to the new master. The entire failover process is correct Application is completely transparent

 1.3 Features of MHA

During the automatic failover process, MHA tries to save the binary log from the downtime master server to ensure that the data is not lost to the greatest extent

But this is not always possible: for example, if the primary server hardware fails or is not accessible via ssh,

MHA cannot save the binary log, only failover and lose the latest data

MHA can be combined with semi-synchronous replication to greatly reduce the risk of data loss

For example, only one slave has received the latest binary log, MHA can apply the latest binary log to all other slave servers, so it can ensure the data consistency of all nodes

At present, MHA supports master-multiple-slave architecture, at least three services (that is, one master and two slaves).
Summary: MHA is to solve the problem of failover, data preservation as much as possible, and consistency of all node logs

 1.4 The working principle of MHA (emphasis)

Save the binary log event (binlog event) from the downtime Master;

Identify the Slave with the latest update;

Apply differential relay logs to other slaves;

Apply binary log events saved from Master;

Promote a Slave to the new Master;

Make other Slaves connect to the new Master for replication;

2. Experiment preparation 

2.1 Construction ideas

1.MHA架构
1)数据库安装
2)一主两从
3)MHA搭建
 
2.故障模拟
1)主库失效
2)备选主库成为主库
3)原故障主库恢复重新加入到MHA成为从库

2.2 Environment preparation

master服务器: 192.168.40.172       mysql和MHA node组件   主机名:master
slave1服务器: 192.168.40.16       mysql和MHA node组件   主机名:slave1
slave2服务器: 192.168.40.17       mysql和MHA node组件   主机名:slave2
MHA manager服务器: 192.168.40.170  MHA node和manager组件 主机名:manager

2.3. Turn off all server firewalls and enhancements

#四台服务器上都要操作
systemctl stop firewalld
systemctl disable firewalld
setenforce 0

 2.4 Add domain name resolution in Master, Slave1, Slave2

 

 3. Build MySQL MHA

  3.1 Modify the configuration files of Master, Slave1, and Slave2

###修改 Master、Slave1、Slave2 节点的 Mysql主配置文件/etc/my.cnf 
###Master 节点###
vim /etc/my.cnf
[mysqld]
server-id = 1
log_bin = master-bin
log-slave-updates = true
 
systemctl restart mysqld
 
 
###Slave1 节点###
vim /etc/my.cnf
server-id = 2 						   ###三台服务器的 server-id 不能一样
log_bin = master-bin
relay-log = relay-log-bin
relay-log-index = slave-relay-bin.index
 
systemctl restart mysqld
 
 
###Slave2 节点###
vim /etc/my.cnf
server-id = 3 
log_bin = master-bin
relay-log = relay-log-bin
relay-log-index = slave-relay-bin.index
 
systemctl restart mysqld

 

 

  3.2 Create two soft links for Master, Slave1, and Slave2 nodes

 3.3 Configure MySQL as one master and two slaves

#################Master、Slave1、Slave2 节点上都授权###################
 
grant replication slave on *.* to 'myslave'@'192.168.40.%' identified by '123456';
#允许mslvae用户通过密码在192.168.226网段的数据库上进行服务器复制操作
 
grant all privileges on *.* to 'mha'@'192.168.40.%' identified by 'manager';
#授权mha用户通过密码manager对192.168.226网段的数据库进行所有操作
 
grant all privileges on *.* to 'mha'@'master' identified by 'manager';
grant all privileges on *.* to 'mha'@'slave1' identified by 'manager';
grant all privileges on *.* to 'mha'@'slave2' identified by 'manager';
#预防从服务器通过主机连接不上,再次根据主机名进行授权
 
 
#刷新库
flush privileges;

  3.4 View binary files and synchronization points on the Master node

 
#################在master上##################
show master status;
ps:每个人的二进制文件名或者偏移量都是不一样的

 3.5 Perform synchronization operations on Slave1 and Slave2 nodes 

change master to master_host='192.168.40.172',master_user='myslave',master_password='123456',master_log_file='master-bin.000001',master_log_pos=1455;"
start slave;
show slave status\G

  3.6 Set Slave1 and Slave2 to read-only mode

set global read_only=1;

  3.7 Verify master-slave replication

 

  4. Install MHA software

 4.1 Install the MHA-dependent environment on all servers, first install the epel source

###在master、slave1、slave2、mha服务器上都安装node组件(每台机子都要)
 
yum install epel-release --nogpgcheck -y
 
yum install -y perl-DBD-MySQL \
perl-Config-Tiny \
perl-Log-Dispatch \
perl-Parallel-ForkManager \
perl-ExtUtils-CBuilder \
perl-ExtUtils-MakeMaker \
perl-CPAN
 
 
将软件包mha4mysql-node-0.57.tar.gz放入/opt目录下
 
cd /opt
tar zxvf mha4mysql-node-0.57.tar.gz
cd mha4mysql-node-0.57
perl Makefile.PL
make && make install

4.2. Install the manager component on the MHA-manager node (only need to be installed on the manager)

  • The version of each operating system is different, here CentOS7.4 must choose version 0.57.
  • The node component must be installed on all servers first, and finally the manager component must be installed on the MHA-manager node, because the manager depends on the node component.
##将需要的包下载到/opt下##
 
##每台服务器上解压安装node组件##
cd /opt
tar zxf mha4mysql-node-0.57.tar.gz
cd mha4mysql-node-0.57
perl Makefile.PL
make && make install

4.3 After the manager component is installed, several tools will be generated under /usr/local/bin

masterha_check_ssh Check the SSH configuration status of MHA
masterha_check_repl Check MySQL replication status
masterha_manger Script to start the manager
masterha_check_status Detect the current MHA running status
masterha_master_monitor Detect whether the master is down
masterha_master_switch Control failover (automatic or manual)
masterha_conf_host  Add or delete configured server information
masterha_stop  close manager

 4.4 After the node component is installed, several scripts will be generated under /usr/local/bin (these tools are usually triggered by the script of MHAManager, without human operation) mainly as follows

save_binary_logs Save and copy master's binary log
apply_diff_relay_logs Identify differential relay log events and apply their differential events to other slaves
filter_mysqlbinlog  Remove unnecessary ROLLBACK events (MHA no longer uses this tool)
purge_relay_logs  Clear relay logs (does not block SQL thread)

4.5 Configure passwordless authentication on all servers 

###在manager节点上配置到所有数据库节点的无密码认证
ssh-keygen -t rsa 				#一路按回车键
ssh-copy-id 192.168.40.172
ssh-copy-id 192.168.40.16
ssh-copy-id 192.168.40.17
####数据库服务器之间互相导入即可####
 
 
###在master上
ssh-keygen -t rsa 				#一路按回车键
ssh-copy-id 192.168.40.16
ssh-copy-id 192.168.40.17
 
 
###在slave1上
ssh-keygen -t rsa 				#一路按回车键
ssh-copy-id 192.168.40.17
ssh-copy-id 192.168.40.172
 
 
###在slave2上
ssh-keygen -t rsa 				#一路按回车键
ssh-copy-id 192.168.40.16
ssh-copy-id 192.168.40.172

4.6 Password-free login test 

###免密登录测试###
 
###在manager节点上
ssh 192.168.40.172
ssh 192.168.40.16
ssh 192.168.40.17
 
###在master节点上
ssh 192.168.40.16
ssh 192.168.40.17
 
 
###在slave1节点上
ssh 192.168.40.172
ssh 192.168.40.17
 
 
###在slave2节点上
ssh 192.168.40.172
ssh 192.168.40.16

 4.7 Configure MHA on the manager node

 4.7.1 Copy the relevant scripts to the /usr/local/bin directory on the manager node

cp -rp /opt/mha4mysql-manager-0.57/samples/scripts /usr/local/bin
 
#拷贝后会有四个执行文件
ll /usr/local/bin/scripts/

master_ip_failover  Script for VIP management when switching automatically
master_ip_online_change VIP management when switching online
power_manager Script to shut down a host after a failure
send_report Script to send alert after failover

4.7.2 Make the script of VIP management in the above automatic switching to /usr/local/bin directory, here use master_ip_failover script to manage VIP and failover

cp /usr/local/bin/scripts/master_ip_failover /usr/local/bin

 4.7.3 Modify master_ip_failover Delete all, add the following content, modify related parameters

###修改master_ip_failover 全部删除,添加以下内容,修改相关参数
 
#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';
 
use Getopt::Long;
 
my (
$command, $ssh_user, $orig_master_host, $orig_master_ip,
$orig_master_port, $new_master_host, $new_master_ip, $new_master_port
);
#############################添加内容部分#########################################
my $vip = '192.168.40.210';									#指定vip的地址
my $brdc = '192.168.40.255';								#指定vip的广播地址
my $ifdev = 'ens33';										#指定vip绑定的网卡
my $key = '1';												#指定vip绑定的虚拟网卡序列号
my $ssh_start_vip = "/sbin/ifconfig ens33:$key $vip";		#代表此变量值为ifconfig ens33:1 192.168.59.188
my $ssh_stop_vip = "/sbin/ifconfig ens33:$key down";		#代表此变量值为ifconfig ens33:1 192.168.59.188 down
my $exit_code = 0;											#指定退出状态码为0
#my $ssh_start_vip = "/usr/sbin/ip addr add $vip/24 brd $brdc dev $ifdev label $ifdev:$key;/usr/sbin/arping -q -A -c 1 -I $ifdev $vip;iptables -F;";
#my $ssh_stop_vip = "/usr/sbin/ip addr del $vip/24 dev $ifdev label $ifdev:$key";
##################################################################################
GetOptions(
'command=s' => \$command,
'ssh_user=s' => \$ssh_user,
'orig_master_host=s' => \$orig_master_host,
'orig_master_ip=s' => \$orig_master_ip,
'orig_master_port=i' => \$orig_master_port,
'new_master_host=s' => \$new_master_host,
'new_master_ip=s' => \$new_master_ip,
'new_master_port=i' => \$new_master_port,
);
 
exit &main();
 
sub main {
 
print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";
 
if ( $command eq "stop" || $command eq "stopssh" ) {
 
my $exit_code = 1;
eval {
print "Disabling the VIP on old master: $orig_master_host \n";
&stop_vip();
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) {
 
my $exit_code = 10;
eval {
print "Enabling the VIP - $vip on the new master - $new_master_host \n";
&start_vip();
$exit_code = 0;
};
if ($@) {
warn $@;
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) {
print "Checking the Status of the script.. OK \n";
exit 0;
}
else {
&usage();
exit 1;
}
}
sub start_vip() {
`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
## A simple system call that disable the VIP on the old_master
sub stop_vip() {
`ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}
 
sub usage {
print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}
 
#删除注释
:2,87 s/^#//

 4.7.4 Create the MHA software directory and copy the configuration file, here use the app1.cnf configuration file to manage the mysql node server

mkdir /etc/masterha
cp /opt/mha4mysql-manager-0.57/samples/conf/app1.cnf /etc/masterha
#修改app1.cnf配置文件,删除原文所有内容,添加下面的
vim /etc/masterha/app1.cnf
[server default]
manager_log=/var/log/masterha/app1/manager.log
manager_workdir=/var/log/masterha/app1
master_binlog_dir=/usr/local/mysql/data
master_ip_failover_script=/usr/local/bin/master_ip_failover
master_ip_online_change_script=/usr/local/bin/master_ip_online_change
password=manager
ping_interval=1
remote_workdir=/tmp
repl_password=123456
repl_user=myslave
secondary_check_script=/usr/local/bin/masterha_secondary_check -s 192.168.10.5 -s 192.168.10.6
shutdown_script=""
ssh_user=root
user=mha
 
[server1]
hostname=192.168.10.4
port=3306
 
[server2]
candidate_master=1
check_repl_delay=0
hostname=192.168.10.5
port=3306
 
[server3]
hostname=192.168.10.6
port=3306

 4.7.5 Enable virtual IP on Master service

ifconfig ens33:1 192.168.40.210/24

 4.7.6 Test ssh passwordless authentication on the manager node, if it is normal, it will finally output successfully.

masterha_check_ssh -conf=/etc/masterha/app1.cnf

  4.7.7 Test the mysql master-slave connection on the manager node, and the words MySQL Replication Health is OK appear at the end, indicating that it is normal

masterha_check_repl -conf=/etc/masterha/app1.cnf

  4.7.8 Start MHA on the manager node

nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1 &

remove_dead_master_conf This parameter means that when the master-slave switch occurs, the ip of the old master library will be removed from the configuration file
manger_log log storage location
ignore_last_failover By default, if the MHA detects continuous downtimes and the interval between two downtimes is less than 8 hours, Failover will not be performed. The reason for this limitation is to avoid the ping-pong effect. This parameter means to ignore the file generated by the last MHA trigger switch. By default, after the MHA switch occurs, it will be recorded in the log, which is the log app1.failover.complete file set above. If you find it in this directory when you switch again next time The existence of this file will not allow triggering switchover, unless it is received after the first switchover to delete the file, for convenience, here is set to --ignore_last_failover.

4.7.9 Check the MHA status, you can see that the current master is the master node.

masterha_check_status --conf=/etc/masterha/app1.cnf

  4.7.10. Check the MHA log and see that the current master is 192.168.226.10, as shown below.

cat /var/log/masterha/app1/manager.log | grep "current master"

  4.7.11 Check whether the VIP address 192.168.226.200 of the master exists. This VIP address will not disappear because the manager node stops the MHA service.

ifconfig

 5. Fault simulation

###在 manager 节点上监控观察日志记录
tail -f /var/log/masterha/app1/manager.log
 
###在 Master 节点 192.168.226.10上停止mysql服务
systemctl stop mysqld
pkill -9 mysql
 
###在slave1,slave2上查看,正常自动切换一次后,MHA 进程会退出。HMA 会自动修改 app1.cnf 文件内容,将宕机的 master(192.168.226.10)节点删除。
查看 slave12 是否接管 VIP
ifconfig
 
 
故障切换备选主库的算法:
1.一般判断从库的是从(position/GTID)判断优劣,数据有差异,最接近于master的slave,成为备选主。
2.数据一致的情况下,按照配置文件顺序,选择备选主库。
3.设定有权重(candidate_master=1),按照权重强制指定备选主。
(1)默认情况下如果一个slave落后master 100M的relay logs的话,即使有权重,也会失效。
(2)如果check_repl_delay=0的话,即使落后很多日志,也强制选择其为备选主。

5.1 Monitor and observe log records on the manager node

tail -f /var/log/masterha/app1/manager.log

  5.2 Stop the mysql service on the Master 

systemctl stop mysqld

 5.3 Check the file content in the configuration file and find that server1 is gone

vim /etc/masterha/app1.cnf

 5.4 Check on slave1 and slave2 to see if slave1 takes over VIP

 

 5.5 Algorithm for Failover Alternative Primary Database

Generally, the slave library is judged from (position/GTID) to judge whether it is good or bad, and the data is different. The slave closest to the master becomes the candidate master.
If the data is consistent, select an alternative main library according to the order of the configuration files.
Set the weight (candidate_master=1), and the candidate master is forced to be designated according to the weight.
By default, if a slave lags behind the master's relay logs by 100M, it will fail even if it has weight. If check_repl_delay=0, even if there are many logs behind, it is forced to be selected as the alternate master.

 6. Troubleshooting

1.#修复mysql1
systemctl restart mysqld
 
2.#修复主从
#在现主库服务器 slave1 查看二进制文件和同步点
show master status;
#在原主库服务器 192.168.10.4上 执行同步操作
change master to master_host='192.168.10.5',master_user='myslave',master_password='123456',master_log_file='master-bin.000003',master_log_pos=154;
start slave;
show slave status\G;
 
3.#在 manager 节点上修改配置文件app1.cnf(再把这个记录添加进去,因为它检测掉失效时候会自动消失)
vi /etc/masterha/app1.cnf
....
secondary_check_script=/usr/local/bin/masterha_secondary_check -s 192.168.59.118 -s 192.168.59.114
....
[server1]
hostname=192.168.59.111
port=3306
 
[server2]
candidate_master=1
check_repl_delay=0
hostname=192.168.59.118
port=3306
 
[server3]
hostname=192.168.59.114
port=3306
 
4.#在 manager 节点上启动 MHA
nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1 &

6.1 Repair the master server master

 

  6.2 Repair the master-slave ( check the binary file and synchronization point on the current master library server slave1 (192.168.226.20) )

mysql -u root -p
 
show master status;

 6.3 Execute the synchronization operation on the original main library server master (192.168.223.10) 

 

reset slave;
change master to master_host='192.168.226.20',master_user='myslave',master_password='123456',master_log_file='master-bin.000002',master_log_pos=154;
 
start slave;
show slave status\G

  6.4 Modify the configuration file app1.cnf on the manager node (add this record, because it will disappear automatically when it detects the failure)

vim /etc/masterha/app1.cnf
……
secondary_check_script=/usr/local/bin/masterha_secondary_check -s 192.168.223.10 -s 192.168.223.30
......
[server1]
hostname=192.168.226.20
port=3306
 
[server2]
candidate_master=1
check_repl_delay=0
hostname=192.168.226.10
port=3306
 
[server3]
hostname=192.168.226.30
port=3306

 6.5 Start MHA on the manager node

masterha_stop --conf=/etc/masterha/app1.cnf
 
nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1 &
 
masterha_check_status --conf=/etc/masterha/app1.cnf

Seven. Summary

1.MHA

① Role: MySQL high availability, failover.

②Core part: (1) MHA component: manager: main function: do MHA startup, shutdown management and detection of various health statuses of MySQL.                                                  

 niode: When a failure occurs, save the binary log as much as possible and implement failover (VIP address is elegant)

③Configuration files required by MHA (2):

master_ip_failover: command tool, which defines VIP-based detection and failover (VIP from master ----" new master)

appl.conf: The main configuration file of MHA, which mainly defines the working directory of mha, the location of the log mysql binary log, the user who uses MHA to log in to MySQL, and the password to use the slave server. The account number and password of the identity synchronization master (five).

④ What actions will the failover MHA take:

(1) MHA will try to detect the survival status of the master multiple times.

(2) MHA will try many times to save the binary file of the master as much as possible

(3) According to the configuration part in app1.cnf, MHA will perform the location of the slave server------"master server

(4) MHA will finally switch the VIP address of the master to the position of the slave server

(5) After MHA selects a new master, it will execute the change master operation on the remaining slaves to point to the new master to ensure the health of the MySQL cluster.

2. MHA failure problem

① Make a soft link.

② Interaction-free (password-free login)

③ Five account authorizations (three of which are required for the test environment)

④ When running the MHA function for the first time, you need to temporarily add a virtual ip

⑤ Configuration file --- verification (master_ip_failover 1 failover script, main configuration file of appl.cnf.mha)

⑥ Install the node node first, and then install the master node.

Guess you like

Origin blog.csdn.net/m0_57554344/article/details/131902501