The principle of ZooKeeper's ZAB protocol

If some concepts are not clear, you can look first: the basic concepts of ZooKeeper

There are many conformance protocols, such as Paxos, Raft, 2PC, etc. ZooKeeper uses the ZAB protocol.

ZAB agreement definition

ZAB (Zookeeper Atomic Broadcast) is a crash recovery atomic broadcast protocol designed for ZooKeeper, also known as the zk atomic broadcast protocol. It guarantees the consistency of ZooKeeper cluster data and the global order of commands.

ZAB includes two modes: message broadcasting and crash recovery.

Agreement process

  1. When the cluster is started, or the Leader server is abnormal, or there is no normal communication between more than half of the followers and the leader, the ZAB protocol will enter the crash recovery mode, and a new leader will be elected.
  2. When a new leader is elected, and more than half of the followers in the cluster have completed state synchronization (ie data synchronization) with the leader server, the ZAB protocol will exit the crash recovery mode and enter the message broadcast mode.
  3. At this time, if a server that complies with the ZAB protocol joins the cluster, because there is already a Leader server broadcasting a message in the cluster at this time, the newly added server automatically enters the recovery mode: find the Leader server and complete data synchronization. After the synchronization is completed, participate in the message broadcast process as a new follower.

News broadcast

  • Read request: Direct response from the current node.
  • Write request: If the current node is not the leader, it will be forwarded to the leader for execution. After the Leader receives the client request, it encapsulates the request into a transaction, and assigns a globally increasing unique transaction ID (ZXID) to the transaction. In order to ensure the order of the transaction, each transaction is sorted by ZXID for processing. . Then the Leader proposes the encapsulated transaction to the Follower node in the form of broadcast. When more than half of the follower servers in the cluster give correct ACK feedback, the leader will commit the transaction message first, and then send the commit to all learners. This process can be referred to as 2pc transaction commit for short.

Crash recovery

There are two stages of crash recovery: Leader election and initialization synchronization.

Through the election algorithm, it can be ensured that the newly elected Leader server has the transaction of all the machine numbers in the cluster (that is, the ZXID is the largest), then it can be guaranteed that the newly elected Leader must have all the submitted proposals.

When the election is over, a quasi-Leader will be generated, and the quasi-Leader can become a real leader after initialization and synchronization.

Principles of Recovery

  • The transaction submitted by the leader will eventually be submitted by all servers
  • Discard the transaction not committed by the leader

Initialize synchronization

After the crash is restored, before the formal work (receiving the client request), the Leader server first confirms whether the transaction has been submitted by more than half of the Follwer, that is, whether the data synchronization is completed. The purpose is to keep the data consistent.

When all Follwer servers are successfully synchronized, Leader will add these servers to the list of available servers.

Synchronization process:

  1. In order to ensure that the leader sends proposals to Learner in an orderly manner, Leader will prepare a queue for each Learner server;
  2. Leader encapsulates the transactions that are not synchronized by each Learner as Proposal;
  3. The Leader sends these Proposal to each Learner one by one, and a commit message is followed immediately after each Proposal, indicating that the transaction has been submitted, and the Learner can receive and execute it directly;
  4. Learner receives Proposal from Leader and updates it locally;
  5. When the Learner is successfully updated, it will send an ACK message to the prospective leader;
  6. After the Leader server receives the ACK from the Learner, it will add the Learner to the list of truly available Followers or Observers. If there is no feedback of ACK or learners that the Leader has not received, the Leader will not add them to the corresponding list.

Guess you like

Origin blog.csdn.net/Anenan/article/details/115077088