mysql mha

2MHA的搭建和配置

2.1 环境准备

2.1.1 服务器角色

 

角色

主机名

IP地址

master

master-53

172.16.1.53

slavemaster

slave-54

172.16.1.54

slaveMHA manager

slave-55

172.16.1.55

 

2.1.2 主从同步

所有节点上创建rep用户

grant replication slave on *.* to rep@'172.16.1.%' identified by '123456';

flush privileges;

主从同步

show master status;

slave上执行

change master to

master_host='10.0.0.57',

master_port=3306,

master_user='rep',

master_password='123456',

master_log_file='mysql-bin.000002',

master_log_pos=120

;

开启主从同步

start slave;

查看结果

show slave status\G;

2.2 安装MHA

2.2.1 所有节点安装mhanode

mkdir -p /home/oldboy/tools

cd /home/oldboy/tools

yum -y install perl-DBD-MySQL

yum localinstall /home/oldboy/tools/mha4mysql-node-0.56-0.el6.noarch.rpm -y

2.2.2 管理节点安装MHAmanager

yum -y install perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes

yum -y localinstall /home/oldboy/tools/mha4mysql-manager-0.56-0.el6.noarch.rpm

 

2.2.3 所有节点生成ssh-key 并复制到其他主机

 

master-53

mkdir -p /root/.ssh

ssh-keygen -t rsa -P '' -f /root/.ssh/id_rsa &>/dev/null

ssh-copy-id -i /root/.ssh/id_rsa.pub [email protected]

ssh-copy-id -i /root/.ssh/id_rsa.pub [email protected]

 

slave-54

mkdir -p /root/.ssh

ssh-keygen -t rsa -P '' -f /root/.ssh/id_rsa &>/dev/null

ssh-copy-id -i /root/.ssh/id_rsa.pub [email protected]

ssh-copy-id -i /root/.ssh/id_rsa.pub [email protected]

 

slave-55

mkdir -p /root/.ssh

ssh-keygen -t rsa -P '' -f /root/.ssh/id_rsa &>/dev/null

ssh-copy-id -i /root/.ssh/id_rsa.pub [email protected]

ssh-copy-id -i /root/.ssh/id_rsa.pub [email protected]

ssh-copy-id -i /root/.ssh/id_rsa.pub [email protected]

 

 

 

2.3 node配置

2.3.1 所有节点数据库都开启binlogrelaylog

vim /data/3306/my.cnf

[mysqld]

log-bin = /data/3306/mysql-bin

relay-log = /data/3306/relay-bin

relay-log-info-file = /data/3306/relay-log.info

2.3.2 日志格式修改为mixed(非必须)

vim /etc/my.cnf

[mysqld]

log-bin = /data/3306/mysql-bin

relay-log = /data/3306/relay-bin

relay-log-info-file = /data/3306/relay-log.info

binlog_format = "MIXED"

2.3.3 禁用relaylog的自动删除功能

默认情况下从库的relaylog会在SQL线程执行完毕后被自动删除但是对于MHA场景下同步滞后的从库的恢复需要依赖其他从库的relaylog所以禁用自动删除功能。后期需要手工清理relaylog

set global relay_log_purge = 0;

vim /data/3306/my.cnf

[mysqld]

relay-log-purge = 0

 

查看设置结果

 

mysql> show global variables like '%relay_log_purge%';

+-----------------+-------+

| Variable_name   | Value |

+-----------------+-------+

| relay_log_purge | OFF   |

+-----------------+-------+

 

 

小结

my.cnf的配置文件的改动

vim /etc/my.cnf

[mysqld]

log-bin = /data/3306/mysql-bin

relay-log = /data/3306/relay-bin

relay-log-info-file = /data/3306/relay-log.info

binlog_format = "MIXED"

relay-log-purge = 0

重启mysql服务

/data/3306/mysqld restart

 

2.3.4 所有节点创建MHA管理账号

创建MHA管理账号

mysql -uroot -poldboy123 -S /data/3306/mysql.sock

grant all on *.* to mha@'10.0.0.%' identified by 'mha';

flush privileges;

 

 

2.4 配置mha manager

manager上创建MHA配置文件

 

mkdir -p /etc/mha

mkdir -p /var/log/mha/app1/

vim /etc/mha/app1.cnf

[server default]

manager_log=/var/log/mha/app1/manager.log

manager_workdir=/var/log/mha/app1.log

master_binlog_dir=/data/3306

user=mha

password=mha

ping_interval=2

repl_password=123456

repl_user=rep

ssh_user=root

[server1]

hostname=172.16.1.53

port=3306

[server2]

candidate_master=1

check_repl_delay=0

hostname=172.16.1.54

port=3306

[server3]

hostname=172.16.1.55

port=3306

no_master=1

2.5 检查manager配置

2.5.1 检查ssh

如显示以下提示表示ssh互信配置正确

[root@slave-55 tools]# masterha_check_ssh --conf=/etc/mha/app1.cnf

Fri Jul  1 01:39:34 2016 - [info] All SSH connection tests passed successfully.

如果出现以下提示则表示配置有错误

[root@slave-55 tools]# masterha_check_ssh --conf=/etc/mha/app1.cnf

Fri Jul  1 01:39:16 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.

Fri Jul  1 01:39:16 2016 - [info] Reading application default configuration from /etc/mha/app1.cnf..

Fri Jul  1 01:39:16 2016 - [info] Reading server configuration from /etc/mha/app1.cnf..

Fri Jul  1 01:39:16 2016 - [info] Starting SSH connection tests..

Fri Jul  1 01:39:18 2016 - [debug]

Fri Jul  1 01:39:16 2016 - [debug]  Connecting via SSH from [email protected](172.16.1.53:22) to [email protected](172.16.1.54:22)..

Fri Jul  1 01:39:17 2016 - [debug]   ok.

Fri Jul  1 01:39:17 2016 - [debug]  Connecting via SSH from [email protected](172.16.1.53:22) to [email protected](172.16.1.55:22)..

Fri Jul  1 01:39:17 2016 - [debug]   ok.

Fri Jul  1 01:39:18 2016 - [error][/usr/share/perl5/vendor_perl/MHA/SSHCheck.pm, ln63]

Fri Jul  1 01:39:17 2016 - [debug]  Connecting via SSH from [email protected](172.16.1.55:22) to [email protected](172.16.1.53:22)..

Permission denied (publickey,password).

Fri Jul  1 01:39:17 2016 - [error][/usr/share/perl5/vendor_perl/MHA/SSHCheck.pm, ln111] SSH connection from [email protected](172.16.1.55:22) to [email protected](172.16.1.53:22) failed!

Fri Jul  1 01:39:18 2016 - [debug]

Fri Jul  1 01:39:17 2016 - [debug]  Connecting via SSH from [email protected](172.16.1.54:22) to [email protected](172.16.1.53:22)..

Fri Jul  1 01:39:17 2016 - [debug]   ok.

Fri Jul  1 01:39:17 2016 - [debug]  Connecting via SSH from [email protected](172.16.1.54:22) to [email protected](172.16.1.55:22)..

Fri Jul  1 01:39:18 2016 - [debug]   ok.

SSH Configuration Check Failed!

 at /usr/bin/masterha_check_ssh line 44

则需要重分发ssh密钥。注意managerssh互信包括自己!!

 

2.5.2 检查主从同步

 

[root@slave-55 tools]# masterha_check_repl --conf=/etc/mha/app1.cnf

Fri Jul  1 01:44:02 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.

Fri Jul  1 01:44:02 2016 - [info] Reading application default configuration from /etc/mha/app1.cnf..

Fri Jul  1 01:44:02 2016 - [info] Reading server configuration from /etc/mha/app1.cnf..

Fri Jul  1 01:44:02 2016 - [info] MHA::MasterMonitor version 0.56.

Creating directory /var/log/mah/app1.log.. done.

Fri Jul  1 01:44:02 2016 - [info] GTID failover mode = 0

Fri Jul  1 01:44:02 2016 - [info] Dead Servers:

Fri Jul  1 01:44:02 2016 - [info] Alive Servers:

Fri Jul  1 01:44:02 2016 - [info]   172.16.1.53(172.16.1.53:3306)

Fri Jul  1 01:44:02 2016 - [info]   172.16.1.54(172.16.1.54:3306)

Fri Jul  1 01:44:02 2016 - [info]   172.16.1.55(172.16.1.55:3306)

Fri Jul  1 01:44:02 2016 - [info] Alive Slaves:

Fri Jul  1 01:44:02 2016 - [info]   172.16.1.54(172.16.1.54:3306)  Version=5.5.49-log (oldest major version between slaves) log-bin:enabled

Fri Jul  1 01:44:02 2016 - [info]     Replicating from 172.16.1.53(172.16.1.53:3306)

Fri Jul  1 01:44:02 2016 - [info]     Primary candidate for the new Master (candidate_master is set)

Fri Jul  1 01:44:02 2016 - [info]   172.16.1.55(172.16.1.55:3306)  Version=5.5.49-log (oldest major version between slaves) log-bin:enabled

Fri Jul  1 01:44:02 2016 - [info]     Replicating from 172.16.1.53(172.16.1.53:3306)

Fri Jul  1 01:44:02 2016 - [info] Current Alive Master: 172.16.1.53(172.16.1.53:3306)

Fri Jul  1 01:44:02 2016 - [info] Checking slave configurations..

Fri Jul  1 01:44:02 2016 - [info]  read_only=1 is not set on slave 172.16.1.54(172.16.1.54:3306).

Fri Jul  1 01:44:02 2016 - [info]  read_only=1 is not set on slave 172.16.1.55(172.16.1.55:3306).

Fri Jul  1 01:44:02 2016 - [info] Checking replication filtering settings..

Fri Jul  1 01:44:02 2016 - [info]  binlog_do_db= , binlog_ignore_db=

Fri Jul  1 01:44:02 2016 - [info]  Replication filtering check ok.

Fri Jul  1 01:44:02 2016 - [info] GTID (with auto-pos) is not supported

Fri Jul  1 01:44:02 2016 - [info] Starting SSH connection tests..

Fri Jul  1 01:44:05 2016 - [info] All SSH connection tests passed successfully.

Fri Jul  1 01:44:05 2016 - [info] Checking MHA Node version..

Fri Jul  1 01:44:05 2016 - [info]  Version check ok.

Fri Jul  1 01:44:05 2016 - [info] Checking SSH publickey authentication settings on the current master..

Fri Jul  1 01:44:06 2016 - [info] HealthCheck: SSH to 172.16.1.53 is reachable.

Fri Jul  1 01:44:06 2016 - [info] Master MHA Node version is 0.56.

Fri Jul  1 01:44:06 2016 - [info] Checking recovery script configurations on 172.16.1.53(172.16.1.53:3306)..

Fri Jul  1 01:44:06 2016 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/3306 --output_file=/var/tmp/save_binary_logs_test --manager_version=0.56 --start_file=mysql-bin.000005

Fri Jul  1 01:44:06 2016 - [info]   Connecting to [email protected](172.16.1.53:22)..

  Creating /var/tmp if not exists..    ok.

  Checking output directory is accessible or not..

   ok.

  Binlog found at /data/3306, up to mysql-bin.000005

Fri Jul  1 01:44:06 2016 - [info] Binlog setting check done.

Fri Jul  1 01:44:06 2016 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..

Fri Jul  1 01:44:06 2016 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=172.16.1.54 --slave_ip=172.16.1.54 --slave_port=3306 --workdir=/var/tmp --target_version=5.5.49-log --manager_version=0.56 --relay_log_info=/data/3306/relay-log.info  --relay_dir=/data/3306/data/  --slave_pass=xxx

Fri Jul  1 01:44:06 2016 - [info]   Connecting to [email protected](172.16.1.54:22)..

  Checking slave recovery environment settings..

    Opening /data/3306/relay-log.info ... ok.

    Relay log found at /data/3306, up to relay-bin.000004

    Temporary relay log file is /data/3306/relay-bin.000004

    Testing mysql connection and privileges.. done.

    Testing mysqlbinlog output.. done.

    Cleaning up test file(s).. done.

Fri Jul  1 01:44:07 2016 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=172.16.1.55 --slave_ip=172.16.1.55 --slave_port=3306 --workdir=/var/tmp --target_version=5.5.49-log --manager_version=0.56 --relay_log_info=/data/3306/relay-log.info  --relay_dir=/data/3306/data/  --slave_pass=xxx

Fri Jul  1 01:44:07 2016 - [info]   Connecting to [email protected](172.16.1.55:22)..

  Checking slave recovery environment settings..

    Opening /data/3306/relay-log.info ... ok.

    Relay log found at /data/3306, up to relay-bin.000004

    Temporary relay log file is /data/3306/relay-bin.000004

    Testing mysql connection and privileges.. done.

    Testing mysqlbinlog output.. done.

    Cleaning up test file(s).. done.

Fri Jul  1 01:44:07 2016 - [info] Slaves settings check done.

Fri Jul  1 01:44:07 2016 - [info]

172.16.1.53(172.16.1.53:3306) (current master)

 +--172.16.1.54(172.16.1.54:3306)

 +--172.16.1.55(172.16.1.55:3306)

Fri Jul  1 01:44:07 2016 - [info] Checking replication health on 172.16.1.54..

Fri Jul  1 01:44:07 2016 - [info]  ok.

Fri Jul  1 01:44:07 2016 - [info] Checking replication health on 172.16.1.55..

Fri Jul  1 01:44:07 2016 - [info]  ok.

Fri Jul  1 01:44:07 2016 - [warning] master_ip_failover_script is not defined.

Fri Jul  1 01:44:07 2016 - [warning] shutdown_script is not defined.

Fri Jul  1 01:44:07 2016 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.

 

 

 

2.6 启动MHA manager

启动manager

nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover > /var/log/mha/manager.log < /dev/null 2>&1 & 

查看进程

[root@slave-55 tools]# ps -ef|grep master|grep -v grep

root       3790   1164  0 01:49 pts/0    00:00:00 perl /usr/bin/masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master

 

查看日志

tail -F /var/log/mha/app1/manager.log

 

查看mha的当前状态

[root@slave-55 ~]# masterha_check_status --conf=/etc/mha/app1.cnf

app1 (pid:3990) is running(0:PING_OK), master:172.16.1.53

2.7 测试

slave-55上查看状态

mysql> show slave status\G

*************************** 1. row ***************************

               Slave_IO_State: Waiting for master to send event

                  Master_Host: 172.16.1.53

                  Master_User: rep

                  Master_Port: 3306

                Connect_Retry: 60

              Master_Log_File: mysql-bin.000005

          Read_Master_Log_Pos: 1381

               Relay_Log_File: relay-bin.000004

                Relay_Log_Pos: 459

        Relay_Master_Log_File: mysql-bin.000005

             Slave_IO_Running: Yes

            Slave_SQL_Running: Yes

              Replicate_Do_DB:

          Replicate_Ignore_DB: mysql

           Replicate_Do_Table:

       Replicate_Ignore_Table:

      Replicate_Wild_Do_Table:

  Replicate_Wild_Ignore_Table:

                   Last_Errno: 0

                   Last_Error:

                 Skip_Counter: 0

          Exec_Master_Log_Pos: 1381

              Relay_Log_Space: 1262

              Until_Condition: None

               Until_Log_File:

                Until_Log_Pos: 0

           Master_SSL_Allowed: No

           Master_SSL_CA_File:

           Master_SSL_CA_Path:

              Master_SSL_Cert:

            Master_SSL_Cipher:

               Master_SSL_Key:

        Seconds_Behind_Master: 0

Master_SSL_Verify_Server_Cert: No

                Last_IO_Errno: 0

                Last_IO_Error:

               Last_SQL_Errno: 0

               Last_SQL_Error:

  Replicate_Ignore_Server_Ids:

             Master_Server_Id: 53

1 row in set (0.00 sec)

 

停止master-53

[root@master-53 tools]# /data/3306/mysqld stop

Stoping MySQL...

Enter password:

[root@master-53 tools]# lsof -i :3306

 

slave-55上查看状态

 

 

可能会出现以下情况

mysql> show slave status\G;

*************************** 1. row ***************************

               Slave_IO_State: Connecting to master

                  Master_Host: 172.16.1.54

                  Master_User: rep

                  Master_Port: 3306

                Connect_Retry: 60

              Master_Log_File: mysql-bin.000006

          Read_Master_Log_Pos: 313

               Relay_Log_File: relay-bin.000001

                Relay_Log_Pos: 4

        Relay_Master_Log_File: mysql-bin.000006

             Slave_IO_Running: Connecting

            Slave_SQL_Running: Yes

              Replicate_Do_DB:

          Replicate_Ignore_DB: mysql

           Replicate_Do_Table:

       Replicate_Ignore_Table:

      Replicate_Wild_Do_Table:

  Replicate_Wild_Ignore_Table:

                   Last_Errno: 0

                   Last_Error:

                 Skip_Counter: 0

          Exec_Master_Log_Pos: 313

              Relay_Log_Space: 107

              Until_Condition: None

               Until_Log_File:

                Until_Log_Pos: 0

           Master_SSL_Allowed: No

           Master_SSL_CA_File:

           Master_SSL_CA_Path:

              Master_SSL_Cert:

            Master_SSL_Cipher:

               Master_SSL_Key:

        Seconds_Behind_Master: NULL

Master_SSL_Verify_Server_Cert: No

                Last_IO_Errno: 1045

                Last_IO_Error: error connecting to master '[email protected]:3306' - retry-time: 60  retries: 86400

               Last_SQL_Errno: 0

               Last_SQL_Error:

  Replicate_Ignore_Server_Ids:

             Master_Server_Id: 53

1 row in set (0.00 sec)

出现上边的原因是因为,没有在所有节点创建rep账号

2.8 重启主库

master-53重启后并不能直接加入集群需要作为slave,同步数据并重新跟新的master同步

change master to

master_host='10.0.0.57',

master_port=3306,

master_user='rep',

master_password='123456',

master_log_file='mysql-bin.000003',

master_log_pos=522

;

重启mha进程

nohup masterha_manager --conf=/etc/mha/app2.cnf --remove_dead_master_conf --ignore_last_failover > /var/log/mha/manager.log < /dev/null 2>&1 & 

[root@slave-55 ~]# cat /etc/mha/app2.cnf

[server default]

manager_log=/var/log/mha/app1/manager.log

manager_workdir=/var/log/mha/app1.log

master_binlog_dir=/data/3306

password=mha

ping_interval=1

repl_password=123456

repl_user=rep

ssh_user=root

user=mha

[server1]

candidate_master=1

check_repl_delay=0

hostname=10.0.0.58

port=3306

[server2]

candidate_master=1

check_repl_delay=0

hostname=172.16.1.57

port=3306

[server3]

hostname=172.16.1.59

no_master=1

port=3306

相关资料

http://blog.csdn.net/wulantian/article/details/12503473

http://wubx.net/mha-parameters/

http://hugnew.com/?p=749

http://www.cnblogs.com/gomysql/p/3675429.html

 

猜你喜欢

转载自blog.csdn.net/qq_39583463/article/details/80951244