Distributed database stability data collation

what this article is about

This is a preliminary study of how stateful services such as distributed databases ensure high availability of the system. There may be mistakes. Guidance is welcome.

text

When talking about high availability of distributed databases, we mainly talk about how to ensure the high availability of the system during downtime and network partitions. This is slightly different from the high availability we mentioned in our online application stability management. On-line application stability governance requires system stability governance in three dimensions: capacity, dependence, and online changes. The high availability of distributed databases refers to the subdivision of dependencies in online application stability governance: How to achieve high system availability under downtime and network partition scenarios, that is, the high availability mentioned by distributed databases is an aspect of online application stability management.

General idea of ​​high availability implementation of distributed database

The idea of ​​​​distributed databases to deal with downtime and network partition failures is the data copy mechanism. The
general idea of ​​​​the data copy mechanism is the master-slave architecture.
The mechanism to ensure data consistency under the master-slave architecture is the replication state machine.

The following is a schematic diagram of the master-slave architecture:
One master two slave structure

The following is the description of the replication state machine by the raft algorithm

复制状态机通常使用复制日志实现,每个服务器存储一个包含一系列命令的日志,其状态机按顺序执行日志中的命令。 
每个日志中命令都相同并且顺序也一样,因此每个状态机处理相同的命令序列。 这样就能得到相同的状态和相同的输出序列。

The replication state machine can guarantee the consistency of master-slave data under normal circumstances (that is, no downtime, network partition situation), but in the case of master downtime, how to ensure the consistency of cluster data is a new problem. This will be discussed later in the chapter on replica data consistency .

High availability in case of downtime

There are two types of downtime, one is downtime for the master node, and the other is downtime for the slave node.

master node down

In the case of master-slave architecture, the master node will be a single point. When the master node hangs up, the entire system will not be able to run. At this time, we need to switch the master node from master to slave.

Master-slave switching includes two functions of discovery and switching. In the early stage, we introduced sentinels to realize these two functions.

single sentinel mechanism

The sentinel service will monitor the overall database cluster. When the master node hangs up, the slave node is selected to be promoted to the master node and continue to provide services. A typical example is the MHA of mysql, and its overall structure is shown in the figure below.
insert image description here

Sentinel Cluster

But this introduces another problem: what to do when Sentinel goes down?
The solution is that the Sentinel also has a multi-copy mechanism. Since the decision-making can only be made by one Sentinel node, this node is called the master node in the Sentinel cluster, and the other nodes are the backup nodes of the Sentinel cluster, so master-slave switching is also possible on the Sentinel cluster. A set of such logic is needed to ensure the high availability of the sentinel cluster, and we are back to the same place after going around.

The overall architecture diagram becomes like this:
[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-ykvuFSBM-1680792798373) (/Users/hancongcong/Desktop/Work/Work Daily/Blog/Distributed Database Stability /screenshot 2023-04-06 pm 10.52.45.png)]

Sentinel High Availability

If we take out the sentinel cluster separately to make it highly available, we can use the consensus algorithm to select the master of the sentinel cluster, such as raft, gossip, etc. At this time, the sentinel cluster becomes a high-availability cluster, as shown in the figure below.

insert image description here

At this time, a very interesting situation arises. Since the sentinel achieves high availability through the distributed consensus algorithm, it is better to directly combine the database master-slave instance with the distributed consensus algorithm, without deploying sentinel nodes, and directly implement the distributed database copy. High availability, and now there are many databases that do this, such as TIDB's TIKV, which uses the Raft consensus algorithm for master-slave switching and data synchronization, and the latest redis-cluster watchdos mechanism, based on the gossip consensus protocol to achieve master-slave downtime Switch from.

Please add a picture description
redis-cluster watchdogs mechanism

slave node down

The detection of slave node downtime is still the responsibility of Sentinel, which will send an alarm after the slave node is down.
Since the slave node is a backup node, the impact on the entire cluster is not great after the downtime. Therefore, after most of the slave databases are down, there is no need for Sentinel to re-apply for the machine to make the slave database. The subsequent processing is that if the slave database is in If the slave library is restarted within the specified time (when the main library log exists), the slave library will continue to synchronize logs from the main library. If the slave library does not restart within the specified time, a new slave library needs to be recreated.

network partition

Network partitions cause two problems,

The first one is the misjudgment problem after the sentinel and all the machines in the database are partitioned.

The second is the misjudgment after the network partition between the sentinel and the master node, which leads to the master-slave switch of the cluster, which leads to the split-brain problem.

In the chapter of Sentinel High Availability , the Sentinel cluster and the database cluster are deployed separately. If the network between the Sentinel cluster and the database cluster is partitioned, the Sentinel cluster cannot detect the status of the database cluster, and all machines are determined to be dead. At this time, the judgment is wrong, and the main database may still be running normally.

The solution is exactly the same as the method described in this chapter of Sentinel High Availability , as long as the two are combined. If you want to ensure that there will be no split-brain problem, the choice of consensus protocol, the industry generally chooses distributed consensus algorithms such as raft, zab, and paxos. Because in a cluster of 2n+1 machines, the consensus algorithm can ensure that m (m <= n) machines are down, and there will be no split-brain problem.

Replica data consistency

Under normal cluster operation conditions, the replication state machine can satisfy the data consistency of the master-slave copy, but when the master library has a problem and goes down, and the master-slave switch occurs in the cluster, the slave library is promoted. Asynchronous replication will result in data loss.

If you don’t want to lose data, one of the solutions is to use gongs for data synchronization. Various distributed consensus algorithms require that n+1 machines be written in a 2n+1 cluster before the write is considered successful. This is to avoid database In the case of master downtime, data loss occurs during master-slave switching. Of course, this constraint is also to prevent split brains.

Universal consensus algorithm and high availability of distributed database

Pros and Cons of Consensus Algorithms

From the above chapters, we can see that the frequency of distributed consensus algorithms is very high. It solves the problem of master-slave switching in distributed databases, solves the problem of data consistency between master and slave nodes, and solves the problems caused by network partitions. The split-brain problem, so we see the final form of a high-availability cluster without data loss (the final form deduced from the inventory in the author's mind), that is, a distributed database based on a distributed consensus algorithm. Examples in the industry include TIDB, xenon, etc.
However, due to the data of distributed consensus algorithms such as raft, zab, and paxos, the data must be written to n + 1 machines before the write can be considered successful. As a result, it is impossible to use such algorithms for high availability on some high-performance databases. Trade off is carried out in the model selection, such as redis cluster, which cannot guarantee that data will not be lost when the split brain and the master are down, but it retains its own high performance. It chooses the gossip algorithm only for cluster metadata management and master Database failure automatic disaster recovery.

Note: In fact, the consensus problem is the most basic problem of the distributed system. After solving this problem, the distributed system can work as one and work like a single node

How does the automatic disaster recovery of the consensus algorithm ensure that data is not lost when the main library is down (raft, zab)

Most consensus algorithms are guaranteed by the following two points

  1. In a 2n+1 cluster, writing to n+1 nodes is considered successful.
  2. In a 2n+1 cluster, when m ( m <= n ) machines are down, from the remaining n+1 machines, the machine with the latest log is selected as the master (raft) (in Among the remaining n + 1 machines, there must be one with the latest commit log, find the node with the highest log, which must include the latest commit log), or select the machine with the latest log, and then Synchronize its logs (zab).

How does the consensus algorithm avoid split-brain in the case of network partition (raft, zab)

When the master is down, there must be a majority cluster in the 2n + 1 cluster. This cluster contains n + 1 cluster machines. In this majority cluster, all machines agree to be the new master. The master can be elected successfully, so when a 2n + 1 cluster is split into n and n + 1 clusters, and the master is in the n cluster, the master in the n cluster cannot complete the write action at this time, because a write action It requires n + 1 machines to confirm, and at this time, there are only n machines in the n cluster. And it is impossible to elect the master in the n cluster, only the n+1 cluster can elect the master.

References

xenon

Guess you like

Origin blog.csdn.net/whodarewin2005/article/details/129340458