第2章 MHA的搭建和配置
2.1 环境准备
2.1.1 服务器角色
角色 |
主机名 |
IP地址 |
master |
master-53 |
172.16.1.53 |
slave(master) |
slave-54 |
172.16.1.54 |
slave(MHA manager) |
slave-55 |
172.16.1.55 |
2.1.2 主从同步
所有节点上创建rep用户
grant replication slave on *.* to rep@'172.16.1.%' identified by '123456';
flush privileges;
主从同步:
show master status;
slave上执行:
change master to
master_host='10.0.0.57',
master_port=3306,
master_user='rep',
master_password='123456',
master_log_file='mysql-bin.000002',
master_log_pos=120
;
开启主从同步
start slave;
查看结果
show slave status\G;
2.2 安装MHA
2.2.1 所有节点安装mha的node包
mkdir -p /home/oldboy/tools
cd /home/oldboy/tools
yum -y install perl-DBD-MySQL
yum localinstall /home/oldboy/tools/mha4mysql-node-0.56-0.el6.noarch.rpm -y
2.2.2 管理节点安装MHA的manager包
yum -y install perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes
yum -y localinstall /home/oldboy/tools/mha4mysql-manager-0.56-0.el6.noarch.rpm
2.2.3 所有节点生成ssh-key 并复制到其他主机
master-53
mkdir -p /root/.ssh
ssh-keygen -t rsa -P '' -f /root/.ssh/id_rsa &>/dev/null
ssh-copy-id -i /root/.ssh/id_rsa.pub [email protected]
ssh-copy-id -i /root/.ssh/id_rsa.pub [email protected]
slave-54
mkdir -p /root/.ssh
ssh-keygen -t rsa -P '' -f /root/.ssh/id_rsa &>/dev/null
ssh-copy-id -i /root/.ssh/id_rsa.pub [email protected]
ssh-copy-id -i /root/.ssh/id_rsa.pub [email protected]
slave-55
mkdir -p /root/.ssh
ssh-keygen -t rsa -P '' -f /root/.ssh/id_rsa &>/dev/null
ssh-copy-id -i /root/.ssh/id_rsa.pub [email protected]
ssh-copy-id -i /root/.ssh/id_rsa.pub [email protected]
ssh-copy-id -i /root/.ssh/id_rsa.pub [email protected]
2.3 node配置
2.3.1 所有节点数据库都开启binlog和relaylog
vim /data/3306/my.cnf
[mysqld]
log-bin = /data/3306/mysql-bin
relay-log = /data/3306/relay-bin
relay-log-info-file = /data/3306/relay-log.info
2.3.2 日志格式修改为mixed(非必须)
vim /etc/my.cnf
[mysqld]
log-bin = /data/3306/mysql-bin
relay-log = /data/3306/relay-bin
relay-log-info-file = /data/3306/relay-log.info
binlog_format = "MIXED"
2.3.3 禁用relaylog的自动删除功能
默认情况下,从库的relaylog会在SQL线程执行完毕后被自动删除,但是对于MHA场景下,同步滞后的从库的恢复,需要依赖其他从库的relaylog。所以禁用自动删除功能。后期需要手工清理relaylog。
set global relay_log_purge = 0;
或
vim /data/3306/my.cnf
[mysqld]
relay-log-purge = 0
查看设置结果
mysql> show global variables like '%relay_log_purge%';
+-----------------+-------+
| Variable_name | Value |
+-----------------+-------+
| relay_log_purge | OFF |
+-----------------+-------+
小结:
my.cnf的配置文件的改动
vim /etc/my.cnf
[mysqld]
log-bin = /data/3306/mysql-bin
relay-log = /data/3306/relay-bin
relay-log-info-file = /data/3306/relay-log.info
binlog_format = "MIXED"
relay-log-purge = 0
重启mysql服务
/data/3306/mysqld restart
2.3.4 所有节点创建MHA管理账号
创建MHA管理账号
mysql -uroot -poldboy123 -S /data/3306/mysql.sock
grant all on *.* to mha@'10.0.0.%' identified by 'mha';
flush privileges;
2.4 配置mha manager
在manager上创建MHA配置文件
mkdir -p /etc/mha
mkdir -p /var/log/mha/app1/
vim /etc/mha/app1.cnf
[server default]
manager_log=/var/log/mha/app1/manager.log
manager_workdir=/var/log/mha/app1.log
master_binlog_dir=/data/3306
user=mha
password=mha
ping_interval=2
repl_password=123456
repl_user=rep
ssh_user=root
[server1]
hostname=172.16.1.53
port=3306
[server2]
candidate_master=1
check_repl_delay=0
hostname=172.16.1.54
port=3306
[server3]
hostname=172.16.1.55
port=3306
no_master=1
2.5 检查manager配置
2.5.1 检查ssh
如显示以下提示表示ssh互信配置正确
[root@slave-55 tools]# masterha_check_ssh --conf=/etc/mha/app1.cnf
Fri Jul 1 01:39:34 2016 - [info] All SSH connection tests passed successfully.
如果出现以下提示则表示配置有错误
[root@slave-55 tools]# masterha_check_ssh --conf=/etc/mha/app1.cnf
Fri Jul 1 01:39:16 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Fri Jul 1 01:39:16 2016 - [info] Reading application default configuration from /etc/mha/app1.cnf..
Fri Jul 1 01:39:16 2016 - [info] Reading server configuration from /etc/mha/app1.cnf..
Fri Jul 1 01:39:16 2016 - [info] Starting SSH connection tests..
Fri Jul 1 01:39:18 2016 - [debug]
Fri Jul 1 01:39:16 2016 - [debug] Connecting via SSH from [email protected](172.16.1.53:22) to [email protected](172.16.1.54:22)..
Fri Jul 1 01:39:17 2016 - [debug] ok.
Fri Jul 1 01:39:17 2016 - [debug] Connecting via SSH from [email protected](172.16.1.53:22) to [email protected](172.16.1.55:22)..
Fri Jul 1 01:39:17 2016 - [debug] ok.
Fri Jul 1 01:39:18 2016 - [error][/usr/share/perl5/vendor_perl/MHA/SSHCheck.pm, ln63]
Fri Jul 1 01:39:17 2016 - [debug] Connecting via SSH from [email protected](172.16.1.55:22) to [email protected](172.16.1.53:22)..
Permission denied (publickey,password).
Fri Jul 1 01:39:17 2016 - [error][/usr/share/perl5/vendor_perl/MHA/SSHCheck.pm, ln111] SSH connection from [email protected](172.16.1.55:22) to [email protected](172.16.1.53:22) failed!
Fri Jul 1 01:39:18 2016 - [debug]
Fri Jul 1 01:39:17 2016 - [debug] Connecting via SSH from [email protected](172.16.1.54:22) to [email protected](172.16.1.53:22)..
Fri Jul 1 01:39:17 2016 - [debug] ok.
Fri Jul 1 01:39:17 2016 - [debug] Connecting via SSH from [email protected](172.16.1.54:22) to [email protected](172.16.1.55:22)..
Fri Jul 1 01:39:18 2016 - [debug] ok.
SSH Configuration Check Failed!
at /usr/bin/masterha_check_ssh line 44
则需要重分发ssh密钥。注意manager的ssh互信包括自己!!
2.5.2 检查主从同步
[root@slave-55 tools]# masterha_check_repl --conf=/etc/mha/app1.cnf
Fri Jul 1 01:44:02 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Fri Jul 1 01:44:02 2016 - [info] Reading application default configuration from /etc/mha/app1.cnf..
Fri Jul 1 01:44:02 2016 - [info] Reading server configuration from /etc/mha/app1.cnf..
Fri Jul 1 01:44:02 2016 - [info] MHA::MasterMonitor version 0.56.
Creating directory /var/log/mah/app1.log.. done.
Fri Jul 1 01:44:02 2016 - [info] GTID failover mode = 0
Fri Jul 1 01:44:02 2016 - [info] Dead Servers:
Fri Jul 1 01:44:02 2016 - [info] Alive Servers:
Fri Jul 1 01:44:02 2016 - [info] 172.16.1.53(172.16.1.53:3306)
Fri Jul 1 01:44:02 2016 - [info] 172.16.1.54(172.16.1.54:3306)
Fri Jul 1 01:44:02 2016 - [info] 172.16.1.55(172.16.1.55:3306)
Fri Jul 1 01:44:02 2016 - [info] Alive Slaves:
Fri Jul 1 01:44:02 2016 - [info] 172.16.1.54(172.16.1.54:3306) Version=5.5.49-log (oldest major version between slaves) log-bin:enabled
Fri Jul 1 01:44:02 2016 - [info] Replicating from 172.16.1.53(172.16.1.53:3306)
Fri Jul 1 01:44:02 2016 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Jul 1 01:44:02 2016 - [info] 172.16.1.55(172.16.1.55:3306) Version=5.5.49-log (oldest major version between slaves) log-bin:enabled
Fri Jul 1 01:44:02 2016 - [info] Replicating from 172.16.1.53(172.16.1.53:3306)
Fri Jul 1 01:44:02 2016 - [info] Current Alive Master: 172.16.1.53(172.16.1.53:3306)
Fri Jul 1 01:44:02 2016 - [info] Checking slave configurations..
Fri Jul 1 01:44:02 2016 - [info] read_only=1 is not set on slave 172.16.1.54(172.16.1.54:3306).
Fri Jul 1 01:44:02 2016 - [info] read_only=1 is not set on slave 172.16.1.55(172.16.1.55:3306).
Fri Jul 1 01:44:02 2016 - [info] Checking replication filtering settings..
Fri Jul 1 01:44:02 2016 - [info] binlog_do_db= , binlog_ignore_db=
Fri Jul 1 01:44:02 2016 - [info] Replication filtering check ok.
Fri Jul 1 01:44:02 2016 - [info] GTID (with auto-pos) is not supported
Fri Jul 1 01:44:02 2016 - [info] Starting SSH connection tests..
Fri Jul 1 01:44:05 2016 - [info] All SSH connection tests passed successfully.
Fri Jul 1 01:44:05 2016 - [info] Checking MHA Node version..
Fri Jul 1 01:44:05 2016 - [info] Version check ok.
Fri Jul 1 01:44:05 2016 - [info] Checking SSH publickey authentication settings on the current master..
Fri Jul 1 01:44:06 2016 - [info] HealthCheck: SSH to 172.16.1.53 is reachable.
Fri Jul 1 01:44:06 2016 - [info] Master MHA Node version is 0.56.
Fri Jul 1 01:44:06 2016 - [info] Checking recovery script configurations on 172.16.1.53(172.16.1.53:3306)..
Fri Jul 1 01:44:06 2016 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/3306 --output_file=/var/tmp/save_binary_logs_test --manager_version=0.56 --start_file=mysql-bin.000005
Fri Jul 1 01:44:06 2016 - [info] Connecting to [email protected](172.16.1.53:22)..
Creating /var/tmp if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /data/3306, up to mysql-bin.000005
Fri Jul 1 01:44:06 2016 - [info] Binlog setting check done.
Fri Jul 1 01:44:06 2016 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Fri Jul 1 01:44:06 2016 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=172.16.1.54 --slave_ip=172.16.1.54 --slave_port=3306 --workdir=/var/tmp --target_version=5.5.49-log --manager_version=0.56 --relay_log_info=/data/3306/relay-log.info --relay_dir=/data/3306/data/ --slave_pass=xxx
Fri Jul 1 01:44:06 2016 - [info] Connecting to [email protected](172.16.1.54:22)..
Checking slave recovery environment settings..
Opening /data/3306/relay-log.info ... ok.
Relay log found at /data/3306, up to relay-bin.000004
Temporary relay log file is /data/3306/relay-bin.000004
Testing mysql connection and privileges.. done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Fri Jul 1 01:44:07 2016 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=172.16.1.55 --slave_ip=172.16.1.55 --slave_port=3306 --workdir=/var/tmp --target_version=5.5.49-log --manager_version=0.56 --relay_log_info=/data/3306/relay-log.info --relay_dir=/data/3306/data/ --slave_pass=xxx
Fri Jul 1 01:44:07 2016 - [info] Connecting to [email protected](172.16.1.55:22)..
Checking slave recovery environment settings..
Opening /data/3306/relay-log.info ... ok.
Relay log found at /data/3306, up to relay-bin.000004
Temporary relay log file is /data/3306/relay-bin.000004
Testing mysql connection and privileges.. done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Fri Jul 1 01:44:07 2016 - [info] Slaves settings check done.
Fri Jul 1 01:44:07 2016 - [info]
172.16.1.53(172.16.1.53:3306) (current master)
+--172.16.1.54(172.16.1.54:3306)
+--172.16.1.55(172.16.1.55:3306)
Fri Jul 1 01:44:07 2016 - [info] Checking replication health on 172.16.1.54..
Fri Jul 1 01:44:07 2016 - [info] ok.
Fri Jul 1 01:44:07 2016 - [info] Checking replication health on 172.16.1.55..
Fri Jul 1 01:44:07 2016 - [info] ok.
Fri Jul 1 01:44:07 2016 - [warning] master_ip_failover_script is not defined.
Fri Jul 1 01:44:07 2016 - [warning] shutdown_script is not defined.
Fri Jul 1 01:44:07 2016 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.
2.6 启动MHA manager
启动manager
nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover > /var/log/mha/manager.log < /dev/null 2>&1 &
查看进程
[root@slave-55 tools]# ps -ef|grep master|grep -v grep
root 3790 1164 0 01:49 pts/0 00:00:00 perl /usr/bin/masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master
查看日志
tail -F /var/log/mha/app1/manager.log
查看mha的当前状态
[root@slave-55 ~]# masterha_check_status --conf=/etc/mha/app1.cnf
app1 (pid:3990) is running(0:PING_OK), master:172.16.1.53
2.7 测试
slave-55上查看状态
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 172.16.1.53
Master_User: rep
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000005
Read_Master_Log_Pos: 1381
Relay_Log_File: relay-bin.000004
Relay_Log_Pos: 459
Relay_Master_Log_File: mysql-bin.000005
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB: mysql
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 1381
Relay_Log_Space: 1262
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 53
1 row in set (0.00 sec)
停止master-53
[root@master-53 tools]# /data/3306/mysqld stop
Stoping MySQL...
Enter password:
[root@master-53 tools]# lsof -i :3306
slave-55上查看状态:
可能会出现以下情况:
mysql> show slave status\G;
*************************** 1. row ***************************
Slave_IO_State: Connecting to master
Master_Host: 172.16.1.54
Master_User: rep
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000006
Read_Master_Log_Pos: 313
Relay_Log_File: relay-bin.000001
Relay_Log_Pos: 4
Relay_Master_Log_File: mysql-bin.000006
Slave_IO_Running: Connecting
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB: mysql
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 313
Relay_Log_Space: 107
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 1045
Last_IO_Error: error connecting to master '[email protected]:3306' - retry-time: 60 retries: 86400
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 53
1 row in set (0.00 sec)
出现上边的原因是因为,没有在所有节点创建rep账号。
2.8 重启主库
master-53重启后并不能直接加入集群,需要作为slave,同步数据并重新跟新的master同步。
change master to
master_host='10.0.0.57',
master_port=3306,
master_user='rep',
master_password='123456',
master_log_file='mysql-bin.000003',
master_log_pos=522
;
重启mha进程
nohup masterha_manager --conf=/etc/mha/app2.cnf --remove_dead_master_conf --ignore_last_failover > /var/log/mha/manager.log < /dev/null 2>&1 &
[root@slave-55 ~]# cat /etc/mha/app2.cnf
[server default]
manager_log=/var/log/mha/app1/manager.log
manager_workdir=/var/log/mha/app1.log
master_binlog_dir=/data/3306
password=mha
ping_interval=1
repl_password=123456
repl_user=rep
ssh_user=root
user=mha
[server1]
candidate_master=1
check_repl_delay=0
hostname=10.0.0.58
port=3306
[server2]
candidate_master=1
check_repl_delay=0
hostname=172.16.1.57
port=3306
[server3]
hostname=172.16.1.59
no_master=1
port=3306
相关资料
http://blog.csdn.net/wulantian/article/details/12503473
http://wubx.net/mha-parameters/
http://www.cnblogs.com/gomysql/p/3675429.html