关于 MHA:
1.Master HA,对主节点进行监控,可实现自动故障转 移至其它从节点;通过提升某一从
节点为新的主节点,基于主 从复制实现,还需要客户端配合实现,目前MHA主要支持一 主多
从的架构,要搭建MHA,要求一个复制集群中必须最少有 三台数据库服务器,一主二从,
即一台充当master,一台充 当备用master,另外一台充当从库,如果财大气粗,也
可以用一台专门的服务器来当MHA监控管理服务器
2.MHA工作原理
1 从宕机崩溃的master保存二进制日志事件(binlog events)
2 识别含有最新更新的slave
3 应用差异的中继日志(relay log)到其他的slave
4 应用从master保存的二进制日志事件(binlog events)
5 提升一个slave为新的master
6 使其他的slave连接新的master进行复制
注意:MHA需要基于ssh,key验证登入方法
MHA软件由两部分组成,Manager工具包和Node工具包,具体的说明如下。
1.Manager工具包主要包括以下几个工具:
masterha_check_ssh 检查MHA的SSH配置状况
masterha_check_repl 检查MySQL复制状况
masterha_manger 启动MHA
masterha_check_status 检测当前MHA运行状态
masterha_master_monitor 检测master是否宕机
masterha_master_switch 控制故障转移(自动或者手动)
masterha_conf_host 添加或删除配置的server信息
2.Node工具包(这些工具通常由MHA Manager的脚本触发,无需人为操作)主要包括以下几个工具:
save_binary_logs 保存和复制master的二进制日志
apply_diff_relay_logs 识别差异的中继日志事件并将其差异的事件应用于其他的slave
filter_mysqlbinlog 去除不必要的ROLLBACK事件(MHA已不再使用这个工具)
purge_relay_logs 清除中继日志(不会阻塞SQL线程)
环境(server1,2,3实现了主从复制,最好实现半同步):
Server1 #MHA manager和master #为了节省资源,把master和mha放在同一台上
Server2 # slave
Server3 #slave
部署MHA
1.先恢复初始环境
[root@server1 ~]# cd /var/lib/mysql
[root@server1 mysql]# rm -fr *
[root@server1 mysql]# ls
[root@server1 mysql]# vim /etc/my.cnf
20 datadir=/var/lib/mysql
21 socket=/var/lib/mysql/mysql.sock
22 # Disabling symbolic-links is recommended to prevent assorted secu rity risks
23 symbolic-links=0
24
25 log-error=/var/log/mysqld.log
26 pid-file=/var/run/mysqld/mysqld.pid
27
28
29 server_id=1 #此处的id每个主机都是唯一的,不能写成一样的
30 gtid_mode=ON
31 enforce_gtid_consistency=ON
32 master_info_repository=TABLE
33 relay_log_info_repository=TABLE
34 log_slave_updates=ON
35 log_bin=binlog
36 binlog_format=ROW
[root@server1 mysql]# /etc/init.d/mysqld stop
Stopping mysqld: [ OK ]
[root@server1 mysql]# /etc/init.d/mysqld start
Starting mysqld: [ OK ]
[root@server1 mysql]# grep password /var/log/mysqld.log
2018-08-11T01:39:31.095743Z 1 [Note] A temporary password is generated for root@localhost: Imk-+>OP2AeQ
[root@server1 mysql]# mysql_secure_installation
#除server-id外,server2、3与server1配置完全相同,server2为备用的master
Server1:
[root@server1 mysql]# mysql -p
Enter password:
mysql> show master status;
+---------------+----------+--------------+------------------+------------------------------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+---------------+----------+--------------+------------------+------------------------------------------+
| binlog.000003 | 954 | | | 6470b0b9-9d07-11e8-b102-525400488d86:1-4 |
+---------------+----------+--------------+------------------+------------------------------------------+
1 row in set (0.00 sec)
mysql> grant replication slave on *.* to la@'172.25.44.%' identified by 'Ting@666';
Query OK, 0 rows affected, 1 warning (0.06 sec)
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| mysql |
| performance_schema |
| sys |
+--------------------+
4 rows in set (0.00 sec)
mysql> create database test;
Query OK, 1 row affected (0.04 sec)
Server2,server3:
[root@server2 mysql]# mysql -p
Enter password:
mysql> stop slave;
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> change master to master_host='172.25.44.1',master_user='la',master_password='Ting@666',master_auto_position=1;
Query OK, 0 rows affected, 2 warnings (0.44 sec)
mysql> start slave;
Query OK, 0 rows affected (0.07 sec)
mysql> show slave status\G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 172.25.44.1
Master_User: la
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: binlog.000003
Read_Master_Log_Pos: 1245
Relay_Log_File: server2-relay-bin.000002
Relay_Log_Pos: 1452
Relay_Master_Log_File: binlog.000003
**Slave_IO_Running: Yes
Slave_SQL_Running: Yes**
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| mysql |
| performance_schema |
| sys |
| test |
+--------------------+
5 rows in set (0.00 sec)
#如果出现了以下情况
#解决方案:
Server1:
mysql> reser master;
Server2:
mysql> stop slave;
mysql> start slave;
#
架构开始:
1.数据同步(数据一致性逻辑检测)
2.数据架构逻辑
Server2:
[root@server2 ~]# yum install mha4mysql-node-0.56-0.el6.noarch.rpm -y
Server3:
[root@server3 ~]# yum install mha4mysql-node-0.56-0.el6.noarch.rpm -y
Setver1:
[root@server1 ~]# ls
master_ip_failover
master_ip_online_change
mha4mysql-manager-0.56-0.el6.noarch.rpm
mha4mysql-node-0.56-0.el6.noarch.rpm
perl-Config-Tiny-2.12-7.1.el6.noarch.rpm
perl-Email-Date-Format-1.002-5.el6.noarch.rpm
perl-Log-Dispatch-2.27-1.el6.noarch.rpm
perl-Mail-Sender-0.8.16-3.el6.noarch.rpm
perl-Mail-Sendmail-0.79-12.el6.noarch.rpm
perl-MIME-Lite-3.027-2.el6.noarch.rpm
perl-MIME-Types-1.28-2.el6.noarch.rpm
perl-Parallel-ForkManager-0.7.9-1.el6.noarch.rpm
[root@server1 ~]# yum install * -y
[root@server1 ~]# vim /etc/yum.repos.d/rhel-source.repo
[root@server1 ~]# vim /etc/yum.repos.d/rhel-source.repo
[root@server1 ~]# mkdir -p /etc/masterha
[root@server1 ~]# cd /etc/masterha
[root@server1 masterha]# vim app1.cnf
1 [server default]
2 manager_workdir=/etc/masterha/ #设置manager的工作目录
3 manager_log=/etc/masterha/app1.log #设置manager的日志
4 master_binlog_dir=/var/lib/mysql #设置master保存binlog的位置
5 #master_ip_failover_script= /usr/local/bin/master_ip_failover #设置自动failover时候的切换脚本
6 #master_ip_online_change_script= /usr/local/bin/master_ip_online_change #设置手动切换的脚本
7 password=Ting@666 #监控用户密码
8 user=root #监控用户root
9 ping_interval=1 #设置监控主库,发送ping包的间隔,默认3秒,尝试三次没有回应的时候自动进行failover
10 remote_workdir=/tmp #设置远端mysql在发生切换时binlog的保存位置
11 repl_password=Ting@666 #复制用户密码
12 repl_user=la #复制用户
13 #report_script=/usr/local/send_report #设置发生切换后发送的报警的脚本
14 #secondary_check_script= /usr/local/bin/masterha_secondary_check -s server03 -s server02
15 #shutdown_script=""
16 ssh_user=root
17
18 [server1]
19 hostname=172.25.44.1
20 port=3306
21
22 [server2]
23 hostname=172.25.44.2
24 port=3306
25 #candidate_master=1 #设置为候选master,如果设置该参数以后,发生主从切换以后将会将此从库提升为主库,即使这个主库不是集群中事件最新的slave
26 #check_repl_delay=0 #默认情况下如果一个slave落后master 100M的relay logs的话,MHA将不会选择该slave作为一个新的master,因为对于这个slave的恢复需要花费很长时间,通过设置check_repl_delay=0,MHA触发切换在选择一个新的master的时候将会忽略复制延时,这个参数对于设置了candidate_master=1的主机非常有用,因为这个候选主在切换的过程中一定是新的master
27 [server3]
28 hostname=172.25.44.3
29 port=3306
[root@server1 masterha]# masterha_check_ssh --conf=/etc/masterha/app1.cnf ##检查MHA Manger到所有MHA Node的SSH连接状态,此处显示连接失败
[root@server1 ~]# ssh-keygen #生成密钥
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
7d:ec:3d:41:2f:e8:60:c7:14:e6:3a:48:a1:c7:c6:31 root@server1
The key's randomart image is:
+--[ RSA 2048]----+
| E o |
| + + o . |
| . * o . |
| + o = o . |
| S * * o .|
| . B . o |
| o o |
| . |
| |
+-----------------+
[root@server1 ~]# ssh-copy-id server1 #传送密钥
The authenticity of host 'server1 (172.25.44.1)' can't be established.
RSA key fingerprint is b3:76:b6:52:15:42:f0:60:dc:02:cc:56:39:c3:b2:83.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'server1' (RSA) to the list of known hosts.
root@server1's password:
Now try logging into the machine, with "ssh 'server1'", and check in:
.ssh/authorized_keys
to make sure we haven't added extra keys that you weren't expecting.
[root@server1 ~]# scp -r .ssh/ server2:
root@server2's password:
authorized_keys 100% 394 0.4KB/s 00:00
id_rsa 100% 1671 1.6KB/s 00:00
known_hosts 100% 1965 1.9KB/s 00:00
id_rsa.pub 100% 394 0.4KB/s 00:00
[root@server1 ~]# scp -r .ssh/ server3:
root@server3's password:
authorized_keys 100% 394 0.4KB/s 00:00
id_rsa 100% 1671 1.6KB/s 00:00
known_hosts 100% 1965 1.9KB/s 00:00
id_rsa.pub 100% 394 0.4KB/s 00:00
#测试是否免密(没有写解析的话,就输入ip进行测试):
[root@server1 ~]# masterha_check_ssh --conf=/etc/masterha/app1.cnf
[root@server1 ~]# masterha_check_repl --conf=/etc/masterha/app1.cnf #健康检查
#有报错,解决报错
[root@server1 ~]# mysql -p
Enter password:
mysql> grant all on *.* to root@'%' identified by 'Ting@666'; #创建监控root,给监控用户授权
Query OK, 0 rows affected, 1 warning (0.05 sec)
[root@server1 ~]# masterha_check_repl --conf=/etc/masterha/app1.cnf
##如果出现以下报错
解决方案:在server2和server3上两台slave服务器设置read_only(从库对外提供读服
务,只所以没有写进配置文件,是因为随时slave会提升为master)
##继续有报错:
解决方案(在server1上):
手动切换
[root@server1 ~]# masterha_master_switch --master_state=alive --conf=/etc/masterha/app1.cnf --new_master_host=172.25.44.1 --new_master_port=3306 --orig_master_is_new_slave #master活跃时,手动切换master至server1
Sat Aug 11 15:16:12 2018 - [info] Switching master to 172.25.44.2(172.25.44.2:3306) completed successfully.
在server3上查看
mysql> show slave status\G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 172.25.44.1 #master已经切换
Master_User: la
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: binlog.000001
Read_Master_Log_Pos: 808
Relay_Log_File: server3-relay-bin.000002
Relay_Log_Pos: 405
Relay_Master_Log_File: binlog.000001
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
在server1上查看 #已经变成master,查看不到slave的状态
mysql> show slave status\G;
Empty set (0.00 sec)
ERROR:
No query specified
自动切换
[root@server1 ~]#nohup masterha_manager --conf=/etc/mha/mha.conf &
[root@server1 ~]# ps ax
28066 pts/0 S 0:00 /bin/sh /usr/bin/mysqld_safe --datadir=/var/lib/mysql
28359 pts/0 Sl 0:09 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/my
[root@server1 ~]# kill -9 28066
[root@server1 ~]# kill -9 28359
在slave端查看
mysql> show slave status\G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 172.25.44.2
Master_User: la
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: binlog.000001
Read_Master_Log_Pos: 819
Relay_Log_File: server3-relay-bin.000002
Relay_Log_Pos: 405
Relay_Master_Log_File: binlog.000001
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Master_Server_Id: 2
Master_UUID: 85a2b952-9d07-11e8-be6f-525400d1190e
Master_Info_File: mysql.slave_master_info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set:
Executed_Gtid_Set: 85a2b952-9d07-11e8-be6f-525400d1190e:1-3
Auto_Position: 1
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version: