mha failover过程

1 Executing secondary network check script
2 Executing SSH check script: save_binary_logs --command=test -
3Monitoring server xxx is reachable, Master is not reachable from xxx. OK
4探测3次  Master is not reachable from health checker!
5 读取配置文件，找server信息
6 检查存活的slave，readonly relay_log_purge filter等信息
7 Terminating monitoring script.
8 开始failover(failover之后会执行脚本，mha server会关闭,每次关闭后都需要在workdir下删除之前failvoer文件etc/mha/workdir/3334/3334.failover.error,正常切换后，其他从库会自动指向新主库)
Phase 1: Configuration Check Phase completed.
Phase 2: Dead Master Shutdown Phase..
 Forcing shutdown so that applications never connect to the current master..
Tue Dec 17 20:24:14 2019 - [info] Executing master IP deactivation script:
Tue Dec 17 20:24:14 2019 - [info]   /etc/mha/scripts/3334/master_ip_failover.sh --orig_master_host=10.xxx --orig_master_ip=10.xxxx--orig_master_port=3334 --command=stopssh --ssh_user=root  
2019-12-17 20:24:14 master_ip_failover.sh [info]: -------start stop or stopssh
Executing SHUTDOWN script:
Tue Dec 17 20:24:14 2019 - [info]   /etc/mha/scripts/3334/power_manager.sh --command=stopssh --ssh_user=root  --host=10.xx  --ip=10.2xx --port=3334  --pid_file=/data00/mysql_3334/mysqld.pid 
-------------power off script do not power off machine-------------
Tue Dec 17 20:24:14 2019 - [info]  Power off done.
Tue Dec 17 20:24:15 2019 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Tue Dec 17 20:24:15 2019 - [info] 
Tue Dec 17 20:24:15 2019 - [info] * Phase 3: Master Recovery Phase..
Tue Dec 17 20:24:15 2019 - [info] 
Tue Dec 17 20:24:15 2019 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Tue Dec 17 20:24:15 2019 - [info] 
Tue Dec 17 20:24:15 2019 - [info] The latest binary log file/position on all slaves is mysql-bin.000003:154
Tue Dec 17 20:24:15 2019 - [info] Latest slaves (Slaves that received relay log files to the latest):
Tue Dec 17 20:24:15 2019 - [info]   10.2xx(10.xxx:3334)  Version=5.7.23-log (oldest major version between slaves) log-bin:enabled
Tue Dec 17 20:24:15 2019 - [info]     Replicating from 10.xx(1xx:3334)
Tue Dec 17 20:24:15 2019 - [info]     Primary candidate for the new Master (candidate_master is set)
Tue Dec 17 20:24:15 2019 - [info]   10.2xxx(10.xxx:3334)  Version=5.7.23-log (oldest major version between slaves) log-bin:enabled
Tue Dec 17 20:24:15 2019 - [info]     Replicating from 10.23xx(10.xx:3334)
Tue Dec 17 20:24:15 2019 - [info]     Primary candidate for the new Master (candidate_master is set)
Tue Dec 17 20:24:15 2019 - [info] The oldest binary log file/position on all slaves is mysql-bin.000003:154
Tue Dec 17 20:24:15 2019 - [info] Oldest slaves:
Tue Dec 17 20:24:15 2019 - [info]   xxx(xx:3334)  Version=5.7.23-log (oldest major version between slaves) log-bin:enabled
Tue Dec 17 20:24:15 2019 - [info]     Replicating from 10.xx(10.xx:3334)
Tue Dec 17 20:24:15 2019 - [info]     Primary candidate for the new Master (candidate_master is set)
Tue Dec 17 20:24:15 2019 - [info]   1xxx(1xxx:3334)  Version=5.7.23-log (oldest major version between slaves) log-bin:enabled
Tue Dec 17 20:24:15 2019 - [info]     Replicating from 10.2xxx(10.2xxx:3334)
Tue Dec 17 20:24:15 2019 - [info]     Primary candidate for the new Master (candidate_master is set)
Tue Dec 17 20:24:15 2019 - [info] 
Tue Dec 17 20:24:15 2019 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase..
Tue Dec 17 20:24:15 2019 - [info] 
Tue Dec 17 20:24:15 2019 - [warning] Dead Master is not SSH reachable. Could not save it's binlogs. Transactions that were not sent to the latest slave (Read_Master_Log_Pos to the tail of the dead master's binlog) were lost.
Tue Dec 17 20:24:15 2019 - [info] 
Tue Dec 17 20:24:15 2019 - [info] * Phase 3.3: Determining New Master Phase..
Tue Dec 17 20:24:15 2019 - [info] 
Tue Dec 17 20:24:15 2019 - [info] Finding the latest slave that has all relay logs for recovering other slaves..
Tue Dec 17 20:24:15 2019 - [info] All slaves received relay logs to the same position. No need to resync each other.
Tue Dec 17 20:24:15 2019 - [info] Searching new master from slaves..
Tue Dec 17 20:24:15 2019 - [info]  Candidate masters from the configuration file:
Tue Dec 17 20:24:15 2019 - [info]   xx(xxx:3334)  Version=5.7.23-log (oldest major version between slaves) log-bin:enabled
Tue Dec 17 20:24:15 2019 - [info]     Replicating from 10.xxx(10.xx:3334)
Tue Dec 17 20:24:15 2019 - [info]     Primary candidate for the new Master (candidate_master is set)
Tue Dec 17 20:24:15 2019 - [info]   1xxx(1xxx:3334)  Version=5.7.23-log (oldest major version between slaves) log-bin:enabled
Tue Dec 17 20:24:15 2019 - [info]     Replicating from 10.xxx(10.xxx:3334)
Tue Dec 17 20:24:15 2019 - [info]     Primary candidate for the new Master (candidate_master is set)
Tue Dec 17 20:24:15 2019 - [info]  Non-candidate masters:
Tue Dec 17 20:24:15 2019 - [info]  Searching from candidate_master slaves which have received the latest relay log events..
Tue Dec 17 20:24:15 2019 - [info] New master is 10xxx2(10.xx:3334)
Tue Dec 17 20:24:15 2019 - [info] Starting master failover..

Tue Dec 17 20:24:15 2019 - [info] * Phase 3.4: New Master Diff Log Generation Phase..
Tue Dec 17 20:24:15 2019 - [info] 
Tue Dec 17 20:24:15 2019 - [info]  This server has all relay logs. No need to generate diff files from the latest slave.
Tue Dec 17 20:24:15 2019 - [info] 
Tue Dec 17 20:24:15 2019 - [info] * Phase 3.5: Master Log Apply Phase..
Tue Dec 17 20:24:15 2019 - [info] 
Tue Dec 17 20:24:15 2019 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
Tue Dec 17 20:24:15 2019 - [info] Starting recovery on 10.xxx(xxx:3334)..
Tue Dec 17 20:24:15 2019 - [info]  This server has all relay logs. Waiting all logs to be applied.. 
Tue Dec 17 20:24:15 2019 - [info]   done.
Tue Dec 17 20:24:15 2019 - [info]  All relay logs were successfully applied.
Tue Dec 17 20:24:15 2019 - [info] Getting new master's binlog name and position..
Tue Dec 17 20:24:15 2019 - [info]  mysql-bin.000002:1564
Tue Dec 17 20:24:15 2019 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='10.xxx', MASTER_PORT=3334, MASTER_LOG_FILE='mysql-bin.000002', MASTER_LOG_POS=1564, MASTER_USER='repl', MASTER_PASSWORD='xxx';
Tue Dec 17 20:24:15 2019 - [info] Executing master IP activate script:
Tue Dec 17 20:24:15 2019 - [info]   /etc/mha/scripts/3334/master_ip_failover.sh --command=start --ssh_user=root --orig_master_host=10.xx --orig_master_ip=10.2xxx --orig_master_port=3334 --new_master_host=10.xxx --new_master_ip=10.xxx --new_master_port=3334 --new_master_user='mha'   --new_master_password=xxx
2019-12-17 20:24:15 master_ip_failover.sh [info]: new master ip----->10xxx
2019-12-17 20:24:15 master_ip_failover.sh [iofo]: new master port---->3334
mysql_cmd---->/usr/local/mysql-5.7.10/bin/mysql -umha -ppassword -h10.xxx -P3334
mysql_cmd_meta--->/usr/local/mysql-5.7.10/bin/mysql -u mha -pxxx -h 10.xxx
2019-12-17 20:24:15 master_ip_failover.sh [info]: set new master read_only=0
mysql: [Warning] Using a password on the command line interface can be insecure.
ERROR 1193 (HY000) at line 1: Unknown system variable 'rpl_semi_sync_slave_enabled'
2019-12-17 20:24:15 master_ip_failover.sh [error]: mysql set readonly=0 on new master fail
Tue Dec 17 20:24:15 2019 - [error][/root/perl5/lib/perl5/MHA/MasterFailover.pm, ln1612]  Failed to activate master IP address for xx(1xx:3334) with return code 1:0
Tue Dec 17 20:24:15 2019 - [error][/root/perl5/lib/perl5/MHA/ManagerUtil.pm, ln177] Got ERROR:  at /root/perl5/bin/masterha_manager line 65.
Tue Dec 17 20:24:15 2019 - [info] 

----- Failover Report -----

3334: MySQL Master failover xxx(xx:3334)

Master 1xx(xxx:3334) is down!

Check MHA Manager logs at n230-036-009.novalocal:/etc/mha/workdir/3334/3334.log for details.

Started automated(non-interactive) failover.
Invalidated master IP address on 1xx(10.xx:3334)
Power off 10.xxx.
The latest slave 10.2xx(10.xx:3334) has all relay logs for recovery.
Selected 10xxx(10.xxx:3334) as a new master.
10.xxx(10.xx:3334): OK: Applying all logs succeeded.
Failed to activate master IP address for 10.2xx(10.xxx:3334) with return code 1:0
Got Error so couldn't continue failover from here.
Tue Dec 17 20:24:15 2019 - [info] Sending mail..
orig_master_host====1xx
new_mster_host=====
----body--->Master 1xx(10.xx:3334) is down! Check MHA Manager logs at n230-036-009.novalocal:/etc/mha/workdir/3334/3334.log for details. Started automated(non-interactive) failover. Invalidated master IP address on 10.xx(10.xx:3334) Power off 1xx. The latest slave 10.23xx(10.xx:3334) has all relay logs for recovery. Selected 10.2xxx(10.2xx:3334) as a new master. 10.2xx(10.xx:3334): OK: Applying all logs succeeded. Failed to activate master IP address for 10.2xx(10.xx:3334) with return code 1:0 Got Error so couldn't continue failover from here.
2019-12-17 20:24:15 send_master_failover_mail.sh [info]: ------failover success-----
2019-12-17 20:24:15 send_master_failover_mail.sh [info]: Master 10.23xx(10.xx:3334) is down!

Check MHA Manager logs at n230-036-009.novalocal:/etc/mha/workdir/3334/3334.log for details.

Started automated(non-interactive) failover.
Invalidated master IP address on 10.2xx(10.2xx:3334)
Power off 10.xx.
The latest slave 10.xx(10.xx:3334) has all relay logs for recovery.
Selected 10.xx(10xx:3334) as a new master.
10.2xx(10.2xx:3334): OK: Applying all logs succeeded.
Failed to activate master IP address for 10.2xx(10.2xx:3334) with return code 1:0
Got Error so couldn't continue failover from here.

cat: /mha/workdir/3334/3334.log: No such file or directory
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    93  100    17  100    76     72    324 --:--:-- --:--:-- --:--:--   326
aoerqileng
发布了605 篇原创文章 · 获赞 41 · 访问量 80万+
他的留言板关注
猜你喜欢