分析MHA的manager.log查看failover的四个阶段

MHA failover5个阶段：

* Phase 1: Configuration Check Phase..  检查配置文件，选候选slave为新的master
* Phase 2: Dead Master Shutdown Phase.. 强制关闭，并执行master ip漂移
* Phase 3: Master Recovery Phase..      构建新的master
** Phase 3.1: Getting Latest Slaves Phase..    获得最新的binary log file/position
    The latest binary log file/position on all slaves is mysql_bin.000005:190   最新的binary log file和日志位置
    The oldest binary log file/position on all slaves is mysql_bin.000005:190   最旧的binary log file和日志位置
    差异日志
** Phase 3.2: Saving Dead Master's Binlog Phase..  保存Dead master's binlog
   Fetching dead master's binary logs..
** Phase 3.3: Determining New Master Phase..       选取New Master
   From:
     centos129 (current master)
        +--centos128
        +--centos130
   To:
     centos128 (new master)
        +--centos130
** Phase 3.3: New Master Diff Log Generation Phase..  New Master差异日志生成阶段
** Phase 3.4: Master Log Apply Phase.. 
   All relay logs were successfully applied.          所有relay log被运用之后,new master的binlog和position信息：
   Getting new master's binlog name and position..
   mysql_bin.000020:230                               其它的slave 开始复制的日志点
   vip漂移：
   /usr/local/bin/master_ip_failover --command=start --ssh_user=root --orig_master_host=centos129 --orig_master_ip=192.168.44.129 --orig_master_port=3306 --new_master_host=centos128 --new_master_ip=192.168.44.128 --new_master_port=3306 --new_master_user='root' --new_master_password='123456' 
   IN SCRIPT TEST====/usr/sbin/ifconfig ens33:1 down==/usr/sbin/ifconfig ens33:1 192.168.44.211===
* Phase 4: Slaves Recovery Phase..   slave的恢复阶段
* Phase 4.1: Starting Parallel Slave Diff Log Generation Phase.. 开始并行slave差异日志的产生阶段
* Phase 4.2: Starting Parallel Slave Log Apply Phase..           开始并行slave差异日志的应用阶段
* Phase 5: New master cleanup phase..                            新master的清理阶段

通过脚本masterha_secondary_check 多种线路检测master是否存活

masterha_secondary_check -s centos129 -s centos128  -user=root --master_host=centos128 --master_ip=192.168.44.128 --master_port=3306  --user=root  --master_host=centos129  --master_ip=192.168.44.129  --master_port=3306

决定是否Failover

Monitoring server centos129 is reachable, Master is not reachable from centos129. OK.
Tue May 22 13:53:54 2018 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.44.129' (111))
Tue May 22 13:53:54 2018 - [warning] Connection failed 1 time(s)..
Monitoring server centos128 is reachable, Master is not reachable from centos128. OK.
Tue May 22 13:53:54 2018 - [info] Master is not reachable from al l other monitoring servers. Failover should start.
Tue May 22 13:53:55 2018 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.44.129' (111))
Tue May 22 13:53:55 2018 - [warning] Connection failed 2 time(s)..
Tue May 22 13:53:56 2018 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.44.129' (111))
Tue May 22 13:53:56 2018 - [warning] Connection failed 3 time(s)..
Tue May 22 13:53:56 2018 - [warning] Master is not reachable from health checker!
Tue May 22 13:53:56 2018 - [warning] Master centos129(192.168.44.129:3306) is not reachable!

读配置文件/etc/masterha/app1.cnf，选择侯选的slave为新的master
 Reading server configurations from /etc/masterha/app1.cnf..
Tue May 22 13:53:57 2018 - [info] Dead Servers:
Tue May 22 13:53:57 2018 - [info]   centos129(192.168.44.129:3306)
Tue May 22 13:53:57 2018 - [info] Alive Servers:
Tue May 22 13:53:57 2018 - [info]   centos128(192.168.44.128:3306)
Tue May 22 13:53:57 2018 - [info]   centos130(192.168.44.130:3306)
Tue May 22 13:53:57 2018 - [info] Alive Slaves:
Tue May 22 13:53:57 2018 - [info]   centos128(192.168.44.128:3306)  Version=5.7.18-enterprise-commercial-advanced-log (oldest major version between slaves) log-bin:enabled
Tue May 22 13:53:57 2018 - [info]     Replicating from 192.168.44.129(192.168.44.129:3306)
Tue May 22 13:53:57 2018 - [info]     Primary candidate for the new Master (candidate_master is set)
Tue May 22 13:53:57 2018 - [info]   centos130(192.168.44.130:3306)  Version=5.7.18-enterprise-commercial-advanced-log (oldest major version between slaves) log-bin:enabled
Tue May 22 13:53:57 2018 - [info]     Replicating from 192.168.44.129(192.168.44.129:3306)

 * Phase 1: Configuration Check Phase..  检查配置文件，选候选slave为新的master
Tue May 22 13:53:57 2018 - [info] 
Tue May 22 13:53:58 2018 - [info] Dead Servers:
Tue May 22 13:53:58 2018 - [info]   centos129(192.168.44.129:3306)
Tue May 22 13:53:58 2018 - [info] Checking master reachability via mysql(double check)..
Tue May 22 13:53:58 2018 - [info]  ok.
Tue May 22 13:53:58 2018 - [info] Alive Servers:
Tue May 22 13:53:58 2018 - [info]   centos128(192.168.44.128:3306)
Tue May 22 13:53:58 2018 - [info]   centos130(192.168.44.130:3306)
Tue May 22 13:53:58 2018 - [info] Alive Slaves:
Tue May 22 13:53:58 2018 - [info]   centos128(192.168.44.128:3306)  Version=5.7.18-enterprise-commercial-advanced-log (oldest major version between slaves) log-bin:enabled
Tue May 22 13:53:58 2018 - [info]     Replicating from 192.168.44.129(192.168.44.129:3306)
Tue May 22 13:53:58 2018 - [info]     Primary candidate for the new Master (candidate_master is set)
Tue May 22 13:53:58 2018 - [info]   centos130(192.168.44.130:3306)  Version=5.7.18-enterprise-commercial-advanced-log (oldest major version between slaves) log-bin:enabled
Tue May 22 13:53:58 2018 - [info]     Replicating from 192.168.44.129(192.168.44.129:3306)
Tue May 22 13:53:58 2018 - [info] ** Phase 1: Configuration Check Phase completed.

 * Phase 2: Dead Master Shutdown Phase..强制关闭，并执行master ip漂移
Tue May 22 13:53:58 2018 - [info] 
Tue May 22 13:53:58 2018 - [info] Forcing shutdown so that applications never connect to the current master..
Tue May 22 13:53:58 2018 - [info] Executing master IP deactivatation script:
Tue May 22 13:53:58 2018 - [info]   /usr/local/bin/master_ip_failover --orig_master_host=centos129 --orig_master_ip=192.168.44.129 --orig_master_port=3306 --command=stopssh --ssh_user=root  

IN SCRIPT TEST====/usr/sbin/ifconfig ens33:1 down==/usr/sbin/ifconfig ens33:1 192.168.44.211===

Disabling the VIP on old master: centos129 
SIOCSIFFLAGS: 鏃犳硶鎸囧畾琚姹傜殑鍦板潃
Tue May 22 13:53:58 2018 - [info]  done.
Tue May 22 13:53:58 2018 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Tue May 22 13:53:59 2018 - [info] * Phase 2: Dead Master Shutdown Phase completed.


Tue May 22 13:53:59 2018 - [info] * Phase 3: Master Recovery Phase.. 构建新的master
Tue May 22 13:53:59 2018 - [info] 
Tue May 22 13:53:59 2018 - [info] * Phase 3.1: Getting Latest Slaves Phase..  获得最新的binary log file/position 
Tue May 22 13:53:59 2018 - [info] 
Tue May 22 13:53:59 2018 - [info] The latest binary log file/position on all slaves is mysql_bin.000005:190   最新的binary log file和日志位置
Tue May 22 13:53:59 2018 - [info] Latest slaves (Slaves that received relay log files to the latest):
Tue May 22 13:53:59 2018 - [info]   centos128(192.168.44.128:3306)  Version=5.7.18-enterprise-commercial-advanced-log (oldest major version between slaves) log-bin:enabled
Tue May 22 13:53:59 2018 - [info]     Replicating from 192.168.44.129(192.168.44.129:3306)
Tue May 22 13:53:59 2018 - [info]     Primary candidate for the new Master (candidate_master is set)
Tue May 22 13:53:59 2018 - [info]   centos130(192.168.44.130:3306)  Version=5.7.18-enterprise-commercial-advanced-log (oldest major version between slaves) log-bin:enabled
Tue May 22 13:53:59 2018 - [info]     Replicating from 192.168.44.129(192.168.44.129:3306)
Tue May 22 13:53:59 2018 - [info] The oldest binary log file/position on all slaves is mysql_bin.000005:190  最旧的binary log file和日志位置
Tue May 22 13:53:59 2018 - [info] Oldest slaves:
Tue May 22 13:53:59 2018 - [info]   centos128(192.168.44.128:3306)  Version=5.7.18-enterprise-commercial-advanced-log (oldest major version between slaves) log-bin:enabled
Tue May 22 13:53:59 2018 - [info]     Replicating from 192.168.44.129(192.168.44.129:3306)
Tue May 22 13:53:59 2018 - [info]     Primary candidate for the new Master (candidate_master is set)
Tue May 22 13:53:59 2018 - [info]   centos130(192.168.44.130:3306)  Version=5.7.18-enterprise-commercial-advanced-log (oldest major version between slaves) log-bin:enabled
Tue May 22 13:53:59 2018 - [info]     Replicating from 192.168.44.129(192.168.44.129:3306)
Tue May 22 13:53:59 2018 - [info] 
Tue May 22 13:53:59 2018 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase..  保存Dead master's binlog 
Tue May 22 13:53:59 2018 - [info] 
Tue May 22 13:53:59 2018 - [info] Fetching dead master's binary logs..
Tue May 22 13:53:59 2018 - [info] Executing command on the dead master centos129(192.168.44.129:3306): save_binary_logs --command=save --start_file=mysql_bin.000005  --start_pos=190 --binlog_dir=/data/3306/logs/log --output_file=/tmp/saved_master_binlog_from_centos129_3306_20180522135357.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.54
  Creating /tmp if not exists..    ok.
 Concat binary/relay logs from mysql_bin.000005 pos 190 to mysql_bin.000005 EOF into /tmp/saved_master_binlog_from_centos129_3306_20180522135357.binlog ..
  Dumping binlog format description event, from position 0 to 123.. ok.
  No need to dump effective binlog data from /data/3306/logs/log/mysql_bin.000005 (pos starts 190, filesize 190). Skipping.
 /tmp/saved_master_binlog_from_centos129_3306_20180522135357.binlog has no effective data events.
Event not exists.
Tue May 22 13:54:00 2018 - [info] Additional events were not found from the orig master. No need to save.
Tue May 22 13:54:00 2018 - [info] 
Tue May 22 13:54:00 2018 - [info] * Phase 3.3: Determining New Master Phase.. 选取New Master
Tue May 22 13:54:00 2018 - [info] 
Tue May 22 13:54:00 2018 - [info] Finding the latest slave that has all relay logs for recovering other slaves..
Tue May 22 13:54:00 2018 - [info] All slaves received relay logs to the same position. No need to resync each other.
Tue May 22 13:54:00 2018 - [info] Searching new master from slaves..
Tue May 22 13:54:00 2018 - [info]  Candidate masters from the configuration file:
Tue May 22 13:54:00 2018 - [info]   centos128(192.168.44.128:3306)  Version=5.7.18-enterprise-commercial-advanced-log (oldest major version between slaves) log-bin:enabled
Tue May 22 13:54:00 2018 - [info]     Replicating from 192.168.44.129(192.168.44.129:3306)
Tue May 22 13:54:00 2018 - [info]     Primary candidate for the new Master (candidate_master is set)
Tue May 22 13:54:00 2018 - [info]  Non-candidate masters:
Tue May 22 13:54:00 2018 - [info]  Searching from candidate_master slaves which have received the latest relay log events..
Tue May 22 13:54:00 2018 - [info] New master is centos128(192.168.44.128:3306)
Tue May 22 13:54:00 2018 - [info] Starting master failover..
Tue May 22 13:54:00 2018 - [info] 
From:
centos129 (current master)
 +--centos128
 +--centos130

To:
centos128 (new master)
 +--centos130
Tue May 22 13:54:00 2018 - [info] 
Tue May 22 13:54:00 2018 - [info] * Phase 3.3: New Master Diff Log Generation Phase..
Tue May 22 13:54:00 2018 - [info] 
Tue May 22 13:54:00 2018 - [info]  This server has all relay logs. No need to generate diff files from the latest slave.
Tue May 22 13:54:00 2018 - [info] 
Tue May 22 13:54:00 2018 - [info] * Phase 3.4: Master Log Apply Phase..
Tue May 22 13:54:00 2018 - [info] 
Tue May 22 13:54:00 2018 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
Tue May 22 13:54:00 2018 - [info] Starting recovery on centos128(192.168.44.128:3306)..
Tue May 22 13:54:00 2018 - [info]  This server has all relay logs. Waiting all logs to be applied.. 
Tue May 22 13:54:00 2018 - [info]   done.
Tue May 22 13:54:00 2018 - [info]  All relay logs were successfully applied.   所有relay log被运用之后,new master的binlog和position信息：
Tue May 22 13:54:00 2018 - [info] Getting new master's binlog name and position..  
Tue May 22 13:54:00 2018 - [info]  mysql_bin.000020:230                          其它的slave 开始复制的日志点
Tue May 22 13:54:00 2018 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='centos128 or 192.168.44.128', MASTER_PORT=3306, MASTER_LOG_FILE='mysql_bin.000020', MASTER_LOG_POS=230, MASTER_USER='repl', MASTER_PASSWORD='xxx';
Tue May 22 13:54:00 2018 - [info] Executing master IP activate script:
Tue May 22 13:54:00 2018 - [info]   /usr/local/bin/master_ip_failover --command=start --ssh_user=root --orig_master_host=centos129 --orig_master_ip=192.168.44.129 --orig_master_port=3306 --new_master_host=centos128 --new_master_ip=192.168.44.128 --new_master_port=3306 --new_master_user='root' --new_master_password='123456'  

IN SCRIPT TEST====/usr/sbin/ifconfig ens33:1 down==/usr/sbin/ifconfig ens33:1 192.168.44.211===

Enabling the VIP - 192.168.44.211 on the new master - centos128 
Undefined subroutine &main::FIXME_xxx_create_user called at /usr/local/bin/master_ip_failover line 120.
Set read_only=0 on the new master.
Creating app user on the new master..
Tue May 22 13:54:00 2018 - [error][/usr/local/share/perl5/MHA/MasterFailover.pm, ln1261]  Failed to activate master IP address for centos128 with return code 10:0
Tue May 22 13:54:00 2018 - [warning] Proceeding.
Tue May 22 13:54:00 2018 - [info] ** Finished master recovery successfully.
Tue May 22 13:54:00 2018 - [info] * Phase 3: Master Recovery Phase completed.



Tue May 22 13:54:00 2018 - [info] * Phase 4: Slaves Recovery Phase..   slave的恢复阶段
Tue May 22 13:54:00 2018 - [info] 
Tue May 22 13:54:00 2018 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase.. 开始并行slave差异日志的产生阶段
Tue May 22 13:54:00 2018 - [info] 
Tue May 22 13:54:00 2018 - [info] -- Slave diff file generation on host centos130(192.168.44.130:3306) started, pid: 100996. Check tmp log /var/log/masterha/app1/centos130_3306_20180522135357.log if it takes time..
Tue May 22 13:54:01 2018 - [info] 
Tue May 22 13:54:01 2018 - [info] Log messages from centos130 ...
Tue May 22 13:54:01 2018 - [info] 
Tue May 22 13:54:00 2018 - [info]  This server has all relay logs. No need to generate diff files from the latest slave.
Tue May 22 13:54:01 2018 - [info] End of log messages from centos130.
Tue May 22 13:54:01 2018 - [info] -- centos130(192.168.44.130:3306) has the latest relay log events.
Tue May 22 13:54:01 2018 - [info] Generating relay diff files from the latest slave succeeded.
Tue May 22 13:54:01 2018 - [info] 
Tue May 22 13:54:01 2018 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..  开始并行slave差异日志的应用阶段
Tue May 22 13:54:01 2018 - [info] 
Tue May 22 13:54:01 2018 - [info] -- Slave recovery on host centos130(192.168.44.130:3306) started, pid: 101008. Check tmp log /var/log/masterha/app1/centos130_3306_20180522135357.log if it takes time..
Tue May 22 13:54:02 2018 - [info] 
Tue May 22 13:54:02 2018 - [info] Log messages from centos130 ...
Tue May 22 13:54:02 2018 - [info] 
Tue May 22 13:54:01 2018 - [info] Starting recovery on centos130(192.168.44.130:3306)..
Tue May 22 13:54:01 2018 - [info]  This server has all relay logs. Waiting all logs to be applied.. 
Tue May 22 13:54:01 2018 - [info]   done.
Tue May 22 13:54:01 2018 - [info]  All relay logs were successfully applied.
Tue May 22 13:54:01 2018 - [info]  Resetting slave centos130(192.168.44.130:3306) and starting replication from the new master centos128(192.168.44.128:3306)..
Tue May 22 13:54:01 2018 - [info]  Executed CHANGE MASTER.
Tue May 22 13:54:01 2018 - [info]  Slave started.
Tue May 22 13:54:02 2018 - [info] End of log messages from centos130.
Tue May 22 13:54:02 2018 - [info] -- Slave recovery on host centos130(192.168.44.130:3306) succeeded.
Tue May 22 13:54:02 2018 - [info] All new slave servers recovered successfully.
Tue May 22 13:54:02 2018 - [info] 
Tue May 22 13:54:02 2018 - [info] * Phase 5: New master cleanup phase..  新master的清理阶段
Tue May 22 13:54:02 2018 - [info] 
Tue May 22 13:54:02 2018 - [info] Resetting slave info on the new master..
Tue May 22 13:54:02 2018 - [info]  centos128: Resetting slave info succeeded.
Tue May 22 13:54:02 2018 - [info] Master failover to centos128(192.168.44.128:3306) completed successfully.
Tue May 22 13:54:02 2018 - [info] 

----- Failover Report -----

app1: MySQL Master failover centos129 to centos128 succeeded

Master centos129 is down!

Check MHA Manager logs at centos128:/var/log/masterha/app1/manager.log for details.

分析MHA的manager.log查看failover的四个阶段

猜你喜欢