MySQL database MHA high availability configuration and troubleshooting

MHA overview

MHA (Master High Availability) is currently a relatively mature solution for MySQL high availability. It was developed by youshimaton (now working at Facebook) of DeNA in Japan. It is an excellent set of failover and High-availability software promoted by master and slave.
In the MySQL failover process, MHA can automatically complete the database failover operation within 0~30 seconds, and during the failover process, MHA can ensure data consistency to the greatest extent to achieve true High availability in the sense.
The software consists of two parts: MHA Manager (management node) and MHA Node (data node). MHA Manager can be deployed on a separate machine to manage multiple master-slave clusters, or it can be deployed on one slave node. MHA Node runs on each MySQL server. MHA Manager will periodically detect the master node in the cluster. When the master fails, it can automatically upgrade the slave with the latest data to the new master, and then redirect all other slaves to the new master. The master. The entire failover process is completely transparent to the application.
During the MHA automatic failover process, MHA tries to save the binary log from the down primary server to ensure that data is not lost to the greatest extent, but this is not always feasible. For example, if the main server hardware fails or cannot be accessed via ssh, MHA cannot save the binary log, and only fails over and loses the latest data. Using MySQL's semi-synchronous replication can greatly reduce the risk of data loss. MHA can be combined with semi-synchronous replication. If only one slave has received the latest binary log, MHA can apply the latest binary log to all other slave servers, so the data consistency of all nodes can be guaranteed.
At present, MHA mainly supports a one-master and multiple-slave architecture. To build MHA, a replication cluster must have at least three database servers, one master and two slaves, that is, one serves as the master, one serves as the standby master, and the other serves as the slave database. Because at least three servers are required, Taobao has also modified on this basis due to machine cost considerations. At present, Taobao TMHA already supports one master and one slave.

MHA composition

MHA Manager (management node) is
used to receive external signals and monitor the working status of the data nodes below.
MHA Node (data node) is the
work unit responsible for specific work

MHA features

In the process of automatic failover, MHA tries to save the binary log from the down main server to ensure that the data is not lost to the greatest extent.
The semi-synchronous replication of MySQL55 can greatly reduce the risk of data loss.

Insert picture description here
However, in the traditional architecture, there is only one mysql master server, so when a single point of failure occurs, the entire server cluster will be paralyzed.
To solve this situation, we need to re-establish a master server when the master server is down. Responsible for monitoring and other work

bring it on! Show! !

Insert picture description here
master 20.0.0.5
slave01 20.0.0.6
slave02 20.0.0.7

Compile and install MySQL

Compile and install mysql, because the system we use is CentOS 7, so use the compatible mah 0.5.7, and mysql uses the most suitable mysql5.6

yum -y install ncurses \
ncurses \
bison \
cmake \
gcc \
ncurses-devel \
gcc-c++

tar zxvf mysql-5.6.26.tar.gz
cd mysql-5.6.26/
cmake \
-DCMAKE_INSTALL_PREFIX=/usr/local/mysql \
-DDEFAULT_CHARSET=utf8 \
-DDEFAULT_COLLATION=utf8_general_ci \
-DWITH_EXTRA_CHARSETS=all \
-DSYSCONFDIR=/etc
make && make install

Copy configuration files and service startup scripts

[root@5centos mysql-5.6.26]# cp support-files/my-default.cnf /etc/my.cnf
cp:是否覆盖"/etc/my.cnf"? y
[root@5centos mysql-5.6.26]# cp support-files/mysql.server /etc/rc.d/init.d/mysqld

Add mysqld service

[root@5centos mysql-5.6.26]# chmod +x /etc/rc.d/init.d/mysqld
[root@7CentOS mysql-5.6.26]# chkconfig --add mysqld

Add environment variables

[root@5centos mysql-5.6.26]# echo "PATH=$PATH:/usr/local/mysql/bin" >> /etc/profile
[root@5centos mysql-5.6.26]# source /etc/profile

Add program running user

[root@5centos mysql-5.6.26]# groupadd mysql
[root@5centos mysql-5.6.26]# useradd -M -s /sbin/nologin mysql -g mysql
[root@5centos mysql-5.6.26]# chown -R mysql.mysql /usr/local/mysql

Initialize the database

/usr/local/mysql/scripts/mysql_install_db \
--basedir=/usr/local/mysql \
--datadir=/usr/local/mysql/data \
--user=mysql

Modify Matser configuration file

[root@5centos mysql-5.6.26]# vim /etc/my.cnf
server-id = 1	##三台MySQLserver-id不能一样
log_bin = master-bin
log-slave-updates = true

Modify slave01 and slave02 configuration files

[root@localhost mysql-5.6.26]# vim /etc/my.cnf
server-id = 2  ##02我写的3
log_bin = master-bin
relay-log = relay-log-bin
relay-log-index = slave-relay-bin.index
[root@7CentOS mysql-5.6.26]# systemctl restart mysqld

All three start services

ln -s /usr/local/mysql/bin/mysql /usr/sbin/
ln -s /usr/local/mysql/bin/mysqlbinlog /usr/sbin/
/usr/local/mysql/bin/mysqld_safe --user=mysql &

Enter the database

[root@5centos mysql-5.6.26]# mysql -u root -p
Enter password:   ##仅实验,所以懒一点使用空密码
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 3
Server version: 5.6.26-log Source distribution

Copyright (c) 2000, 2015, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> 

Configure remote login

GRANT ALL ON *.* TO 'remote_user'@'%' IDENTIFIED BY 'abc123';

Configure master-slave service

mysql_slave is used for database master-slave synchronization
mha_manager is used for manager to connect to MySQL

mysql> GRANT REPLICATION SLAVE ON *.* TO 'mysql_slave'@'20.0.0.%'  IDENTIFIED BY 'abc123';
Query OK, 0 rows affected (0.00 sec)
mysql> GRANT ALL PRIVILEGES ON *.* TO 'mha_manager'@'20.0.0.%'  IDENTIFIED BY 'abc123';
Query OK, 0 rows affected (0.00 sec)
mysql> FLUSH PRIVILEGES;
Query OK, 0 rows affected (0.00 sec)

The following three authorizations do not need to be added according to theory, but when doing the experiment environment, check the MySOL master-slave error report, and report that the two slave libraries cannot connect to the master library through the host name, so all databases add the following authorizations

mysql> GRANT ALL PRIVILEGES ON *.* TO 'mha_manager'@'master' IDENTIFIED BY 'abc123';
Query OK, 0 rows affected (0.00 sec)
mysql> GRANT ALL PRIVILEGES ON *.* TO 'mha_manager'@'slave1' IDENTIFIED BY 'abc123';
Query OK, 0 rows affected (0.00 sec)
mysql> GRANT ALL PRIVILEGES ON *.* TO 'mha_manager'@'slave2' IDENTIFIED BY 'abc123';
Query OK, 0 rows affected (0.00 sec)
mysql> FLUSH PRIVILEGES;
Query OK, 0 rows affected (0.00 sec)

View master status

mysql> show master status;
+-------------------+----------+--------------+------------------+-------------------+
| File              | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+-------------------+----------+--------------+------------------+-------------------+
| master-bin.000001 |     1739 |              |                  |                   |
+-------------------+----------+--------------+------------------+-------------------+
1 row in set (0.00 sec)

Set up slave service

mysql> CHANGE MASTER TO MASTER_HOST='20.0.0.5', MASTER_USER='mysql_slave', MASTER_PASSWORD='abc123', MASTER_LOG_FILE='master-bin.000001', MASTER_LOG_POS=1739;
Query OK, 0 rows affected, 2 warnings (0.01 sec)

mysql> START SLAVE;
Query OK, 0 rows affected (0.00 sec)
mysql> show slave status\G;
*************************** 1. row ***************************
            Slave_IO_State: Waiting for master to send event
            ……省略部分……
            Slave_IO_Running: Yes
            Slave_SQL_Running: Yes

Set two slave libraries as read-only

mysql> set global read_only=1;
Query OK, 0 rows affected (0.00 sec)

mysql> flush privileges;
Query OK, 0 rows affected (0.01 sec)

Install MHA

Install node on all servers, including the
dependencies required for MHA installation that were not used before

[root@localhost bao]# yum install epel-release --nogpgcheck -y
[root@localhost bao]# yum install -y perl-DBD-MySQL \
> perl-Config-Tiny \
> perl-Log-Dispatch \
> perl-Parallel-ForkManager \
> perl-ExtUtils-CBuilder \
> perl-ExtUtils-MakeMaker \
> perl-CPAN

Compile and install node

[root@localhost bao]# tar zxvf mha4mysql-node-0.57.tar.gz 
[root@localhost bao]# cd mha4mysql-node-0.57/
perl Makefile.PL
make &&make install

Compile and install manager, only installed on MHA

[root@localhost mha4mysql-node-0.57]# cd ..
[root@localhost bao]# tar zxvf mha4mysql-manager-0.57.tar.gz
[root@localhost mha4mysql-manager-0.57]# perl Makefile.PL
[root@localhost mha4mysql-manager-0.57]# make&&make install

After the manager is installed, several tools will be generated under /usr/local/bin, mainly including the following:
masterha_check_ssh checks the SSH configuration status of MHA
masterha_check_repl checks the MySQL replication status
masterha_manger starts the manager script
masterha_check_status detects the current MHA running status
masterha_master_monitor detects whether the master Downtime
masterha_master_switch control failover (automatic or manual)
masterha_conf_host add or delete the configured server information
masterha_stop close the manager

After node is installed, several scripts will be generated under /usr/local/bin (these tools are usually triggered by MHA Manager scripts without human operation). The main
ones are as follows: save_binary_logs save and copy master binary log
purge_relay_logs clear relay log ( (Does not block the SQL thread)
apply_diff_relay_logs identifies differential relay log events and applies the differential events to other slave
filter_mysqlbinlog to remove unnecessary ROLLBACK events (MHA no longer uses this tool)

[root@localhost mha4mysql-manager-0.57]# cd /usr/local/bin/
[root@localhost bin]# ls
apply_diff_relay_logs  masterha_conf_host        masterha_stop
filter_mysqlbinlog     masterha_manager          purge_relay_logs
masterha_check_repl    masterha_master_monitor   save_binary_logs
masterha_check_ssh     masterha_master_switch
masterha_check_status  masterha_secondary_check

SSH configuration

MHA

[root@localhost bin]# ssh-keygen -t rsa
##然后回车,最后按y
[root@localhost bin]# ssh [email protected]
[root@localhost bin]# ssh [email protected]
[root@localhost bin]# ssh [email protected]

master

ssh-keygen -t rsa
ssh [email protected]
ssh [email protected]

slave01

ssh-keygen -t rsa
ssh [email protected]
ssh [email protected]

slave02

ssh-keygen -t rsa
ssh [email protected]
ssh [email protected]

Configure MHA-Manager components

##在mha节点上复制相关脚本到/usr/local/bin目录
[root@mha ~]# cp -ra /root/mha4mysql-manager-0.57/samples/scripts /usr/local/bin
[root@mha ~]# ll /usr/local/bin/scripts/
总用量 32
-rwxr-xr-x. 1 1001 1001  3648 5月  31 2015 master_ip_failover ##自动切换时 VIP管理的脚本
-rwxr-xr-x. 1 1001 1001  9870 5月  31 2015 master_ip_online_change ##在线切换VIP的管理
-rwxr-xr-x. 1 1001 1001 11867 5月  31 2015 power_manager ##故障发生后关闭主机的脚本
-rwxr-xr-x. 1 1001 1001  1360 5月  31 2015 send_report ##因故障切换后发送报警的脚本
[root@mha ~]# cp /usr/local/bin/scripts/master_ip_failover /usr/local/bin/ ##自动切换时 VIP管理的脚本

Modify the master_ip_failover script

[root@mha ~]# vim /usr/local/bin/master_ip_failover  '删除全部后,重写'
#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';

use Getopt::Long;

my (
$command, $ssh_user, $orig_master_host, $orig_master_ip,
$orig_master_port, $new_master_host, $new_master_ip, $new_master_port
);
#########################################################
my $vip = '20.0.0.200';      ##浮动路由的ip地址
my $brdc = '20.0.0.255';     ##广播地址
my $ifdev = 'ens33';              ##网卡名
my $key = '1';
my $ssh_start_vip = "/sbin/ifconfig ens33:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig ens33:$key down";
my $exit_code = 0;
#my $ssh_start_vip = "/usr/sbin/ip addr add $vip/24 brd $brdc dev $ifdev label $ifdev:$key;/usr/sbin/arping -q -A -c 1 -I $ifdev $vip;iptables -F;";
#my $ssh_stop_vip = "/usr/sbin/ip addr del $vip/24 dev $ifdev label $ifdev:$key";
#########################################################
GetOptions(
'command=s' => \$command,
'ssh_user=s' => \$ssh_user,
'orig_master_host=s' => \$orig_master_host,
'orig_master_ip=s' => \$orig_master_ip,
'orig_master_port=i' => \$orig_master_port,
'new_master_host=s' => \$new_master_host,
'new_master_ip=s' => \$new_master_ip,
'new_master_port=i' => \$new_master_port,
);

exit &main();

sub main {
    
    

print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";

if ( $command eq "stop" || $command eq "stopssh" ) {
    
    

my $exit_code = 1;
eval {
    
    
print "Disabling the VIP on old master: $orig_master_host \n";
&stop_vip();
$exit_code = 0;
};
if ($@) {
    
    
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) {
    
    

my $exit_code = 10;
eval {
    
    
print "Enabling the VIP - $vip on the new master - $new_master_host \n";
&start_vip();
$exit_code = 0;
};
if ($@) {
    
    
warn $@;
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) {
    
    
print "Checking the Status of the script.. OK \n";
exit 0;
}
else {
    
    
&usage();
exit 1;
}
}
sub start_vip() {
    
    
`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
# A simple system call that disable the VIP on the old_master
sub stop_vip() {
    
    
`ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}

sub usage {
    
    
print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}

Create MHA software directory and copy configuration files

[root@mha ~]# cd /usr/local/bin/scripts/
[root@mha scripts]# cp master_ip_online_change /usr/local/bin/
[root@mha scripts]# cp send_report /usr/local/
[root@mha scripts]# mkdir /etc/masterha
[root@mha ~]# cp /root/mha4mysql-manager-0.57/samples/conf/app1.cnf /etc/masterha
[root@mha ~]# vim /etc/masterha/app1.cnf  ##删除内容,重新填写
[server default]
manager_log=/var/log/masterha/app1/manager.log  ##manager日志
manager_workdir=/var/log/masterha/app1    ##manager工作目录
master_binlog_dir=/usr/local/mysql/data    ##master保存binlog的位置,这里的路径要与master里配置的binlog的路径一直,以便mha能找到
master_ip_failover_script=/usr/local/bin/master_ip_failover  ##设置自动failover时候的切换脚本,也就是上边的那个脚本
master_ip_online_change_script=/usr/local/bin/master_ip_online_change ##设置手动切换时候的切换脚本
password=manager  ##设置mysql中root用户的密码,这个密码是前文中创建监控用户的那个密码
ping_interval=1  ##设置监控主库,发送ping包的时间间隔,默认是3秒,尝试三次没有回应的时候自动进行rail
remote_workdir=/tmp  ##设置远端mysql在发生切换时binlog的保存位置
repl_password=123  ##设置复制用户的密码
repl_user=myslave   ##设置复制用户的用户
secondary_check_script=/usr/local/bin/masterha_secondary_check -s 20.0.0.6 -s 20.0.0.7 ##设置从服务器的地址
shutdown_script="" ##设置故障发生后关闭故障主机脚本(该脚本的主要作用是关闭主机放在发生脑裂,这里没有使用)'
ssh_user=root  ##设置ssh的登录用户名
user=mha

[server1]
hostname=20.0.0.5
port=3306

[server2]
candidate_master=1  ##设置为候选master
hostname=20.0.0.6
check_repl_delay=0  ##默认情况下如果一个slave落后master 100M的relay logs 的话,mha将不会选择该slave作为一个新的master
port=3306

[server3]
hostname=20.0.0.7
port=3306

Test SSH without password authentication

SSH has no password authentication, it will output successfully at the end if it is normal

[root@mha ~]# masterha_check_ssh -conf=/etc/masterha/app1.cnfThu Aug 27 23:19:44 2020 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Thu Aug 27 23:19:44 2020 - [info] Reading application …省略部分…
Thu Aug 27 23:19:51 2020 - [info] All SSH connection tests passed successfully.

To verify mysql replication, mysql must all be started

[root@mha ~]# masterha_check_repl -conf=/etc/masterha/app1.cnf
IN SCRIPT TEST====/sbin/ifconfig ens33:1 down==/sbin/ifconfig ens33:1 20.0.0.200===

Checking the Status of the script.. OK 
Thu Aug 27 23:21:00 2020 - [info]  OK.
Thu Aug 27 23:21:00 2020 - [warning] shutdown_script is not defined.
Thu Aug 27 23:21:00 2020 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.

Start MHA

The first configuration needs to go to the master to manually open the virtual ip

[root@mysql1 ~]# /sbin/ifconfig ens33:1 20.0.0.200/24
[root@mha ~]# masterha_check_status --conf=/etc/masterha/app1.cnf
app1 (pid:65368) is running(0:PING_OK), master:20.0.0.5

Check whether the VIP address 20.0.0.200 of mysql1 exists, this VIP address will not disappear because the mha node stops the mha service

[root@mysql1 ~]# ifconfig
ens33: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 20.0.0.5  netmask 255.255.255.0  broadcast 20.0.0.255
        inet6 fe80::1301:89f0:4405:2aad  prefixlen 64  scopeid 0x20<link>
        ether 00:0c:29:a9:8d:f9  txqueuelen 1000  (Ethernet)
        RX packets 9588  bytes 10245457 (9.7 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 4708  bytes 579551 (565.9 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ens33:1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 20.0.0.200  netmask 255.255.255.0  broadcast 20.0.0.255
        ether 00:0c:29:a9:8d:f9  txqueuelen 1000  (Ethernet)

authenticating

View monitoring observation log

[root@mha ~]# tailf /var/log/masterha/app1/manager.log

Simulated failure

[root@mysql1 ~]# pkill -9 mysql

View 200 addresses

[root@mysql2 ~]# ifconfig 
ens33: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 20.0.0.6  netmask 255.255.255.0  broadcast 20.0.0.255
        inet6 fe80::1301:89f0:4405:2aad  prefixlen 64  scopeid 0x20<link>
        inet6 fe80::769:c122:2af2:c353  prefixlen 64  scopeid 0x20<link>
        ether 00:0c:29:c2:17:3b  txqueuelen 1000  (Ethernet)
        RX packets 9714  bytes 10234408 (9.7 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 5441  bytes 741224 (723.8 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ens33:1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 20.0.0.200  netmask 255.255.255.0  broadcast 20.0.0.255
        ether 00:0c:29:c2:17:3b  txqueuelen 1000  (Ethernet)

Guess you like

Origin blog.csdn.net/Ora_G/article/details/108265112