Zookeeper: Detailed explanation of zab algorithm

The zab protocol is a distributed consensus protocol designed for Zookeeper .

1. What is the ZAB agreement? Introduction to ZAB Agreement

  1. The full name of the ZAB protocol: Zookeeper Atomic Broadcast ( Zookeeper Atomic Broadcast ).

  2. Zookeeper is an efficient and reliable distributed coordination service for distributed applications. In terms of solving distributed consistency, Zookeeper did not use Paxos, but adopted the ZAB protocol.

  3. ZAB protocol definition: ZAB protocol is a support protocol specially designed for the distributed coordination service Zookeeper 原子广播  崩溃恢复  . Below we will focus on these two things.

  4. Based on this protocol, Zookeeper implements a  主备模式 system architecture to keep the replicas in the cluster 数据一致性. The details are shown in the figure below:

image.png

The figure above shows how Zookeeper processes data in the cluster:

All client write data is written to the main process (called Leader), and then copied by the Leader to the backup process (called Follower). So as to ensure data consistency . From the design point of view, it is similar to Raft.

5. So what about the copying process? The replication process is similar to 2PC , ZAB only needs more than half of Followers to return Ack information to perform submission, which greatly reduces synchronization blocking. It also improves usability .

After the brief introduction, begin to focus on  消息广播 and  崩溃恢复. The entire Zookeeper switches between these two modes .  In short, when the Leader service can be used normally, it enters the message broadcast mode, and when the Leader is unavailable, it enters the crash recovery mode .

2. News Broadcast

The message broadcast process of the ZAB protocol uses an atomic broadcast protocol , similar to a  two-phase commit process .

All write requests sent by the client are received by the Leader. The Leader encapsulates the request into a transaction Proposal and sends it to all Follwers. Then, according to the feedback of all the Follwers, if more than half of the Follwers respond successfully, the commit operation is executed (submit first Yourself, send a commit to all follwers).

Basically, the entire broadcast process is divided into 3 steps:

1. Copy all the data to Follwer

image.png

2. Wait for Follwer to respond to Ack, and the minimum number is more than half to be successful

image.png

3. When more than half of the response is successful, execute commit and submit yourself at the same time

image.png

Through the above 3 steps, data consistency between clusters can be maintained. In fact, there is a message queue between Leader and Follwer to decouple the coupling between them, avoid synchronization, and achieve asynchronous decoupling.

There are some details:

  1. After the Leader receives the client request, it encapsulates the request into a transaction, and assigns a globally increasing unique ID to the transaction, called the transaction ID (ZXID). The ZAB protocol needs to guarantee the order of the transaction, so it must be Each transaction is sorted and processed according to ZXID.

  2. There is also a message queue between Leader and Follwer to decouple the coupling between them and unblock synchronization.

  3. In the zookeeper cluster, in order to ensure that all processes can be executed in an orderly order, only the Leader server can accept write requests. Even if the Follower server receives the client's request, it will be forwarded to the Leader server for processing.

  4. In fact, this is a simplified version of 2PC, which cannot solve single point problems. Later we will talk about how ZAB solves the single point problem (ie Leader crash problem).

3. Crash recovery

We just said, what should I do if the Leader crashes during the message broadcasting process? Can the data be guaranteed to be consistent? What should I do if the leader submits locally first, and then the commit request is not sent out?

In fact, when the Leader crashes, it enters the crash recovery mode we mentioned at the beginning (crash means: Leader loses contact with more than half of Follwer). Let's talk in detail below.

Assumption 1: Leader crashes after copying data to all followers, what should I do?
Hypothesis 2: What if the leader crashes after receiving an Ack and submitting itself, and sending some commits at the same time?

In response to these problems, ZAB has defined 2 principles:

  1. The ZAB protocol ensures that transactions that have been committed in Leader will eventually be committed by all servers.
  2. The ZAB protocol ensures that transactions that are only proposed/replicated by the Leader but not committed are discarded.

Therefore, ZAB has designed the following election algorithm:
it can ensure that transactions that have been submitted by the leader are submitted, while discarding transactions that have been skipped.

In response to this requirement, if the Leader election algorithm can ensure that the newly elected Leader server has the transaction of all the machine numbers in the cluster (that is, the ZXID is the largest), then it can be guaranteed that the newly elected Leader must have all the submitted proposals.
And this has an advantage: it can save the Leader server to check the transaction commit and discard the work of this step.

image.png

In this way, the two problems we just assumed can be solved. Assumption 1 will eventually discard the data not submitted by the call, and Assumption 2 will eventually synchronize the data of all servers. At this time, a question arises, how to synchronize?

4. Data synchronization

After the crash is recovered, before the formal work (receiving the client request), the Leader server first confirms whether the transaction has been submitted by more than half of the Follwer, that is, whether the data synchronization is completed. The purpose is to keep the data consistent.

When all Follwer servers are successfully synchronized, Leader will add these servers to the list of available servers.

In fact, the Leader server relies on ZXID to process or discard transactions, so how is this ZXID generated?

Answer: In the ZXID design of the transaction number of the ZAB protocol, ZXID is a 64-bit number, of which the lower 32 bits can be seen as a simple incrementing counter. For each transaction request of the client, the Leader will generate a new transaction. Proposal and +1 operation on the counter.

The upper 32 bits represent the ZXID of the largest transaction Proposal in the local log taken from the Leader server, and the corresponding epoch value is parsed from the ZXID, and then this value is added by one.

image.png

The high 32 bits represent the uniqueness of each generation of Leader, and the low 32 bits represent the uniqueness of transactions in each generation of Leader. At the same time, it can also allow Follwer to identify different Leaders through the upper 32 bits. Simplified the data recovery process.

Based on this strategy: when Follower links to Leader, the Leader server will compare the ZXID last submitted on its own server with the ZXID on Follower, and the result of the comparison will either be rolled back or synchronized with Leader.

5. Summary

It's time to summarize.

The ZAB protocol and the Raft protocol we saw before are actually similar. For example, there is a Leader to ensure consistency (Paxos does not use the Leader mechanism to ensure consistency). Then there is a mechanism to ensure that the service is available (in fact, Paxos and Raft both do this).

ZAB allows the entire Zookeeper cluster to switch between the two modes, message broadcast and crash recovery. Message broadcast can be said to be a simplified version of 2PC. It solves the single point problem of 2PC through crash recovery, and solves the synchronization blocking problem of 2PC through queues. .

The data accuracy after crash recovery is supported by data synchronization, which is guaranteed based on the uniqueness of the ZXID of the transaction. Through the + 1 operation, the order of the transactions can be distinguished.

 

reference:

Zookeeper source code;

Distributed Theory (7)-ZAB of Consensus Protocol

"From Paxos to Zookeeper-Principle and Practice of Distributed Consistency"

 

 

Guess you like

Origin blog.csdn.net/ScorpC/article/details/114120948