MYSQL高可用MHA配置测试

环境:  mysql version 5.7.22 on redhat6.4 64bit
         master-db1,mha-node       192.168.0.100
         slave-db2,mha-node        192.168.0.105
         slave-db3,mha-manager    192.168.0.107

1. 192.168.0.100 master-db1 与 192.168.0.105 slave-db2 主从复制:略过
    192.168.0.100 master-db1 与 192.168.0.107 slave-db3 主从复制:略过

2.master-db1创建mha的管理和监控账号mha,密码123456
   create user 'mha'@'%' identified by '123456';
   grant all privileges on *.* to mha;

3.安装mha
  mha node上:
  yum install -y perl-DBD-MySQL ncftp perl-DBI.x86
  rpm -ivh mha4mysql-node-0.57-0.el7.noarch.rpm
  mha manager上:
  yum install -y perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes ncftp
  rpm -ivh mha4mysql-manager-0.57-0.el7.noarch.rpm

4.配置3台服务器之间的ssh互相免密,如100 master-db1上,其他2台同理:
ssh-keygen会生成pub和private key在root home目录下
ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected]
ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected]

5.mha manager配置文件:
  root@rhel64-64bit Desktop]# mkdir -p /etc/mha---->vim mha.conf
  [root@rhel64-64bit Desktop]# mkdir -p /home/mysql-mha
  mha.conf配置文件如下:

[server default]
user=mha
password=123456
manager_workdir=/home/mysql-mha
manager_log=/home/mysql-mha/manager.log
remote_workdir=/home/mysql-mha
ssh_user=root
repl_user=replicant
repl_password=123456
ping_interval=1
master_binlog_dir=/var/lib/mysql
master_ip_failover_script=/usr/bin/master_ip_failover
secondary_check_script=/usr/bin/masterha_secondary_check -s 192.168.0.100 -s 192.168.0.105 -s 192.168.0.107

[server1]
hostname=192.168.0.100
port=3306
candidate_master=1

[server2]
hostname=192.168.0.105
port=3306
candidate_master=1

[server3]
hostname=192.168.0.107
port=3306
no_master=1

6.mha manager上执行masterha_check_ssh --conf=/etc/mha/mha.conf报错:

[root@rhel64-64bit bin]# masterha_check_ssh --conf=/etc/mha/mha.conf
Mon Jun 11 13:53:21 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Jun 11 13:53:21 2018 - [info] Reading application default configuration from /etc/mha/mha.conf..
Mon Jun 11 13:53:21 2018 - [info] Reading server configuration from /etc/mha/mha.conf..
Mon Jun 11 13:53:21 2018 - [info] Starting SSH connection tests..
Mon Jun 11 13:53:40 2018 - [error][/usr/share/perl5/vendor_perl/MHA/SSHCheck.pm, ln63] 
Mon Jun 11 13:53:22 2018 - [debug]  Connecting via SSH from [email protected](192.168.0.107:22) to [email protected](192.168.0.100:22)..
Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
Mon Jun 11 13:53:40 2018 - [error][/usr/share/perl5/vendor_perl/MHA/SSHCheck.pm, ln111] SSH connection from [email protected](192.168.0.107:22) to [email protected](192.168.0.100:22) failed!
Mon Jun 11 13:54:33 2018 - [debug] 
Mon Jun 11 13:53:21 2018 - [debug]  Connecting via SSH from [email protected](192.168.0.100:22) to [email protected](192.168.0.105:22)..
Mon Jun 11 13:53:57 2018 - [debug]   ok.
Mon Jun 11 13:53:57 2018 - [debug]  Connecting via SSH from [email protected](192.168.0.100:22) to [email protected](192.168.0.107:22)..
SSH Configuration Check Failed!

从报错上看是权限问题,检查100和107机器的/root/.ssh下的id_rsa,id_rsa.pub包括.ssh文件夹等文件权限,600,644,700没有问题的,而且手动测试ssh from107到100也是正常,也是没有找到可以检查的日志,manager.log还没生成。。

查看脚本/usr/share/perl5/vendor_perl/MHA/SSHCheck.pm
 my $local_file =
        "$workdir/$target->{ssh_host}_$target->{ssh_port}_ssh_check.log";
也木有ssh_check.log日志,全机器搜索也没有,大多网上资料说是公钥文件或权限等问题,建议重新配置
只能把所有机器中的.ssh下面的文件全删,重新配置,遇到新问题:
[root@rhel64-64bit .ssh]# ssh 192.168.0.105
Agent admitted failure to sign using the key.
[email protected]'s password:
解决办法:ssh-add   ~/.ssh/id_rsa 应该是把服务器的private key加到ssh-agent的高速缓存中,以便ssh时发送给对方服务器以作认证.

但是重新配置后,masterha_check_ssh问题依旧。。

最终,醒悟到manger和slave-db3是同一个机器,所以应该在107机器上配置ssh到本机的免密登录,问题得以解决:
ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected]

Mon Jun 11 16:33:42 2018 - [info] All SSH connection tests passed successfully.
7.mha manager上masterha_check_repl --conf=/etc/mha/mha.conf查看复制状态:
[root@rhel64-64bit Desktop]# masterha_check_repl --conf=/etc/mha/mha.conf
MySQL Replication Health is OK.

8.在mha manager上启动manager:
nohup masterha_manager --conf=/etc/mha/mha.conf &

[root@rhel64-64bit Desktop]# ps -ef|grep master
root      6823  6015  0 16:39 pts/10   00:00:00 perl /usr/bin/masterha_manager --conf=/etc/mha/mha.conf
[root@rhel64-64bit Desktop]# jobs -l
[1]+  8930 Running                 nohup masterha_manager --conf=/etc/mha/mha.conf &


[root@rhel64-64bit Desktop]#  masterha_check_status --conf=/etc/mha/mha.conf
mha (pid:8930) is running(0:PING_OK), master:192.168.0.100

9.在master-db1上,手动配置vip,这个和master_ip_failover里面脚本的配置vip网卡信息等要一致

[root@rhel64-64bit ~]# ifconfig eth0:1 192.168.0.109/24
[root@rhel64-64bit ~]# ip add
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:0c:29:22:98:4d brd ff:ff:ff:ff:ff:ff
    inet 192.168.0.100/24 brd 192.168.0.255 scope global eth0
    inet 192.168.0.109/24 brd 192.168.0.255 scope global secondary eth0:1
    inet6 fe80::20c:29ff:fe22:984d/64 scope link 
       valid_lft forever preferred_lft forever

10.故障转移和切换测试:

1>在master-db1上停止mysqld,
结果vip192.168.0.109没有漂移到105上,但是master已经切换到105了
查看/home/mysql-mha/manager.log:

IN SCRIPT TEST====root|sudo /sbin/ifconfig eth0:1 down==root|sudo /sbin/ifconfig eth0:1 192.168.0.109/24===

Enabling the VIP - 192.168.0.109/24 on the new master - 192.168.0.105 
sudo: sorry, you must have a tty to run sudo
可以见vip切换失败是因为没有设置好sudo
------------
Started automated(non-interactive) failover.
Invalidated master IP address on 192.168.0.100(192.168.0.100:3306)
The latest slave 192.168.0.105(192.168.0.105:3306) has all relay logs for recovery.
Selected 192.168.0.105(192.168.0.105:3306) as a new master.
192.168.0.105(192.168.0.105:3306): OK: Applying all logs succeeded.
192.168.0.105(192.168.0.105:3306): OK: Activated master IP address.
192.168.0.107(192.168.0.107:3306): This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
192.168.0.107(192.168.0.107:3306): OK: Applying all logs succeeded. Slave started, replicating from 192.168.0.105(192.168.0.105:3306)
192.168.0.105(192.168.0.105:3306): Resetting slave info succeeded.
Master failover to 192.168.0.105(192.168.0.105:3306) completed successfully.
sudo: sorry, you must have a tty to run sudo
虽vip没有正常漂移,但是主已经从100服务器切换到105服务器了

检查master_ip_failover脚本,并把下面
my $ssh_start_vip = "sudo /sbin/ifconfig eth0:$key $vip";
my $ssh_stop_vip = "sudo /sbin/ifconfig eth0:$key down";
my $ssh_Bcast_arp= "sudo /sbin/arping -I bond0 -c 3 -A #vip#";
修改为:
my $ssh_start_vip = "/sbin/ifconfig eth0:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig eth0:$key down";

2>重新测试,停止master-db1上的mysqld服务,检查这次vip可正常漂移到105服务器:

[root@rhel64-64bit .ssh]# ip add
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:0c:29:5c:b2:0e brd ff:ff:ff:ff:ff:ff
    inet 192.168.0.105/24 brd 192.168.0.255 scope global eth0
    inet 192.168.0.109/24 brd 192.168.0.255 scope global secondary eth0:1
    inet6 fe80::20c:29ff:fe5c:b20e/64 scope link 
       valid_lft forever preferred_lft forever

在107的slave-db3的主已经指向新的master,即slave-db2 192.168.0.105服务器了

mysql> show slave status\G;
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.0.105
                  Master_User: replicant
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mybinlogg.000015
          Read_Master_Log_Pos: 154
               Relay_Log_File: rhel64-64bit-relay-bin.000002
                Relay_Log_Pos: 320
        Relay_Master_Log_File: mybinlogg.000015
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes

PS:每次一次自动切换完成后,mha manager进程便停止,并生成一个文件
        /home/mysql-mha/mha.failover.complete
        需要删除,重启mha开始新的监控,否可可能出现下次切换不成功的情况


11.总结:
mysql-mha特点:
1>当master-db1 down后,mha自动从candidate_master选择新的master,保存源master上的binlog,并识别最新
    日志点的slave,应用差异的relay和binlog到新的master,最大程度减小数据的丢失
mysql-mha缺点:
1>并不是绝对保证数据不丢失,有可能不能保存down掉的源master的最新的binlog,硬盘问题或ssh问题都可能导致
     日志保存失败
2>SSH免密登录配置存在一定安全风险
3>本身不支持vip转移,需要借助keepliaved第三方工具或自动开发脚本
4>从masterha_check_status输出,它只监控master



猜你喜欢

转载自blog.csdn.net/aryoyo/article/details/80655787