MySQL based on MHA high availability-theory

Preface

MySQL high-availability system
MySQL is highly available, as the name implies, when any failure occurs in the MySQL host or service, another host can immediately take over its work, and the minimum requirement is to ensure data consistency. Therefore, the goals to be achieved for a MySQL high-availability system are as follows:

(1) Data consistency guarantee This is the most basic and prerequisite. If the primary and secondary data are inconsistent, then the switch cannot be performed. Of course, the consistency here is also relative, but final consistency should be achieved.

(2) Fast failover. When the master fails, it can be a machine failure or an instance failure. It is necessary to ensure that the business can be switched to the standby node in the shortest time so that the business is affected for the shortest time.

(3) Simplify daily maintenance, and automatically complete high-availability deployment, maintenance, monitoring and other tasks through a highly available platform, which can liberate the manual operation of the DBA to the greatest extent and improve the efficiency of daily operation and maintenance.

(4) Unified management. When there are many replication sets, the highly available instance information, monitoring information, and switching information can be managed uniformly.

(5) The high-availability deployment should have no impact on the existing database architecture. If the database architecture needs to be changed or adjusted because of the high-availability deployment, the cost will increase.

The current MySQL high-availability solutions can achieve high database availability to a certain extent, such as MMM, heartbeat+drbd, and NDB Cluster. There are also MariaDB's Galera Cluster, and MySQL 5.7.17 Group Replication. These highly available software have their own advantages and disadvantages. When selecting a high-availability solution, it is mainly based on the business's requirements for data consistency. Finally, due to the high availability and high reliability of the database, it is recommended to use the MHA architecture, because MySQL GP cannot be used in production, but I believe it will be used in the production environment gradually in the future.

1. Introduction of MHA Technology

MHA (Master High Availability) is currently a relatively mature solution for MySQL high availability. It was developed by youshimaton, a Japanese DeNA company (now working at Facebook). It is an excellent set of failover and master solutions for MySQL high availability. From the promotion of high-availability software. In the MySQL failover process, MHA can automatically complete the database failover operation within 0~30 seconds, and in the process of failover, MHA can ensure data consistency to the greatest extent to achieve true High availability in the sense. In addition to failover, MHA also supports online master switching, which is very safe and efficient, and only needs (0.5 ~ 2 seconds) blocking write time.

The software consists of two parts: MHA Manager (management node) and MHA Node (data node). MHA Manager can be deployed on a separate machine to manage multiple master-slave clusters, or it can be deployed on a single slave node. MHA Node runs on each MySQL server. MHA Manager will periodically detect the master node in the cluster. When the master fails, it can automatically promote the latest data slave to the new master, and then redirect all other slaves to the new master. The master. The entire failover process is completely transparent to the application.

2. MHA provides the following functions

At present, MHA mainly supports a one-master and multiple-slave architecture. To build MHA, a replication cluster must have at least three database servers, one master and two slaves, that is, one serves as the master, one serves as the standby master, and the other serves as the slave. Of course, if you are considering cost, you can also use a two-node MHA, one master and one slave (measured).

To sum up, MHA provides the following functions:

(1) Master automatic monitoring and failover integration (Automated master monitoring and failover)

(2) MHA can monitor the status of the master in a replication group. If it hangs, it can automatically failover.

(3) MHA guarantees the consistency of data through the difference relay-log of all slaves.

(4) When MHA is performing failover and log compensation for these actions, it usually only takes 10 to 30 seconds.

(5) Under normal circumstances, MHA will choose the latest slave as the new master, but you can also specify which are the candidate masers, so when the new master is elected, it will choose from these hosts.

(6) Consistency problems that cause interruption of the replication environment will not occur in MHA, please rest assured to use.

In the process of MHA automatic failover, MHA tries to save the binary log from the down main server to ensure that the data is not lost to the greatest extent, but this is not always feasible. For example, if the main server hardware fails or cannot be accessed via ssh, MHA cannot save the binary log, and only fails over and loses the latest data. Using semi-synchronous replication of MySQL 5.5 and above can greatly reduce the risk of data loss. MHA can be combined with semi-synchronous replication. If only one slave has received the latest binary log, MHA can apply the latest binary log to all other slave servers, so the data consistency of all nodes can be guaranteed.

(7) Manual-interactive master failover (Interactive manually initiated Master Failover)

MHA can be configured to manually-interactively perform failover and does not support monitoring the status of the master.

(8) Non-interactive master failover (Non-interactive master failover)

Non-interactive, automatic failover, does not provide monitoring master status function, monitoring can be done by other components (such as: Pacemaker heartbeat).

(9) Online switching master to a different host

If you have a faster and better master and plan to replace the old master with a new one, then this feature is particularly suitable for such scenarios.

It's not that the master is really down, but we have a lot of needs for routine maintenance of the master.

3. Advantages of MHA

  1. The master failover and slave promotion are very fast.

  2. Automatic detection, multiple detection, support for calling other script interfaces during the switching process.

  3. The master crash will not cause data inconsistency, and automatically fill in data to maintain data consistency.

  4. No need to modify any settings of replication, simple and easy to deploy, and has no impact on the existing architecture.

  5. There is no need to add a lot of additional machines to deploy MHA, and support multi-instance centralized management.

  6. There is no performance impact.

  7. Support online switching.

  8. Cross storage engine, support any engine.

4. MHA workflow

The following figure shows how to manage multiple sets of master-slave replication through MHA Manager. The working principle of MHA can be summarized as follows:
Insert picture description here

1. How does MHA monitor master and failover?

1.1 Verify replication settings and confirm the current master status

Connect all hosts, MHA automatically confirms which is the current master, and there is no need to specify which is the master in the configuration file.

If any of the slaves hangs up, the script immediately exits and stops monitoring.

If some necessary scripts are not installed on the MHA Node node, then MHA is terminated at this stage and monitoring is stopped.

1.2 Monitoring master

Monitor the master until the master hangs up.

At this stage, MHA does not monitor the slave, and Stopping/Restarting/Adding/Removing operations on the slave will not affect the current MHA monitoring process. When you add or delete a slave, please update the configuration file, preferably restart MHA.

1.3 Check whether the master failed

If MHA Manger fails to connect to the master server at three intervals, it will enter this stage.

If you set the secondary_check_script, then MHA will call the script to do a second check to determine whether the master is really down.

The next step is the workflow of masterha_master_switch.

1.4 Verify the slave configuration again

If any illegal replication configuration is found (some slave masters are not the same), MHA will stop monitoring and report an error. You can set ignore_fail to ignore.

This step is for security considerations. It is very likely that the copied configuration file has been changed, so double check is the recommended way.

Check the status of the last failover (failover)

If the last failover reported an error, or the last failover ended too close (default 3 days), MHA stops monitoring and stops failover, then set ignore_last_failover and wait_on_failover_error in the masterha_manager command to change this detection. This is also for safety reasons. Frequent failover, check whether there is a problem with the network or other errors?

1.5 Shut down the failed master server (optional)

If master_ip_failover_script and/or shutdown_script are defined in the configuration file, MHA will call these scripts.

Turn off the dead master to avoid split brain (debatable).

1.6 Recover a new master

Save binlog from the crashed master server to Manager (if possible

If the dead master can SSH, copy binary logs from the end_log_pos (Read_Master_Log_Pos) position on the latest slave.

Election of a new master

Generally, you decide who to elect according to the configuration file settings. If you want to set up some candidate masters, set candidate_master=1; if you want to set up some hosts, you will never elect, set no_master=1; confirm the latest slave (this slave has the latest Relay-log).

Restore and promote the new master

According to the old master binlog, the difference log is generated, and the log is applied to the new master. If an error occurs in this step (such as: duplicate key error), MHA stops recovering and the rest of the slaves also stop recovering.

(2) How does MHA switch master online quickly?

The following steps are what masterha_master_switch—master_state=alive does.

2.1 Verify replication settings and confirm the current master status

Connect to all the hosts listed in the configuration file, MHA automatically confirms which is the current master, and there is no need to specify which is the master in the configuration file.

Execute the flush tables command on the master (optional). This can shorten the time of FLUSH TABLES WITH READ LOCK.

Neither monitor master nor failover.

Check whether the following conditions are met.

A. Is the IO thread running on all slaves?

B. Is the SQL thread running on all slaves?

C. Whether Seconds_Behind_Master is less than 2 seconds (--running_updates_limit=N).

D. Whether there is no long update statement on the master greater than 2 seconds.

2.2 Confirm the new master

The new master needs to be set: -new_master_host parameter.

The original master and the new master must have the same replication filter conditions (binlog-do-db and binlog-ignore-db).

2.3 The current master stops writing

If you define master_ip_online_change_script in the configuration, MHA will call it. You can perfectly prevent writing by setting SET GLOBAL read_only=1.

Execute FLUSH TABLES WITH READ LOCK on the old master to prevent all writes (–skip_lock_all_tables can ignore this step).

2.4 Waiting for other slaves to catch up with the current master, synchronization without delay

Call this function MASTER_LOG_POS().

2.5 Ensure that the new master can be written

Execute SHOW MASTER STATUS to determine the binary log file name and position of the new master.

If master_ip_online_change_script is set, it will be called. You can create users with write permissions, SET GLOBAL read_only=0.

2.6 Let other slaves point to the new master

Execute CHANGE MASTER, START SLAVE in parallel.

MHA component introduction
MHA software consists of two parts, Manager toolkit and Node toolkit. The specific instructions are as follows.

The Manager toolkit mainly includes the following tools:

(1) masterha_check_ssh #Check the SSH configuration status of MHA;

(2) masterha_check_repl #check MySQL replication status;

(3) masterha_manger #Start MHA;

(4) masterha_check_status #Detect the current MHA running status;

(5) masterha_master_monitor #Detect whether the master is down;

(6) masterha_master_switch #Control failover (automatic or manual);

(7) masterha_conf_host #Add or delete configured server information;

The Node toolkit (these tools are usually triggered by MHA Manager scripts without human operation) mainly include the following tools:

(1) save_binary_logs #Save and copy the binary log of the master;

(2) apply_diff_relay_logs #Identify differential relay log events and apply their differential events to other slaves;

(3) purge_relay_logs #Clear relay logs (will not block the SQL thread);

Note: In order to minimize the data loss caused by the main library hardware damage and downtime, it is recommended to configure MySQL semi-synchronous replication while configuring MHA. Regarding the principle of semi-synchronous replication, please check it yourself (not necessary).

Guess you like

Origin blog.csdn.net/BIGmustang/article/details/108287945