Zookeeper cluster consistency protocol-ZAB algorithm

ZAB protocol is an atomic broadcast protocol specifically designed for distributed coordination service ZooKeeper to support crash recovery. In ZooKeeper, it mainly relies on the ZAB protocol to achieve distributed data consistency. Zookeeper is to establish a master-standby model according to the zab protocol to complete the data synchronization of the cluster (to ensure data consistency). In the Zookeeper cluster deployment , various roles of the Zookeeper cluster, such as Leader and Follower, are introduced, and the roles are introduced.
Insert picture description here


The master-standby architecture model in Zookeeper refers to that in the Zookeeper cluster, there is only one Leader (master node) responsible for processing external client transaction requests (write operations), and the Leader node is responsible for synchronizing the client's write operation data to all Follower node.
Insert picture description here

The core of the zab protocol is that there is only one node in the entire Zookeeper cluster, and the Leader converts all client write operations into transactions (Proposal is proposed). After the leader node finishes writing the data, it will send a data broadcast request (data replication) to all Follower nodes, and wait for the feedback of all Follower nodes. In the zab protocol, as long as more than half of the follower nodes feedback OK, the Leader node will send all Follower nodes to all Follower nodes. The server sends a commit message to synchronize the data on the Leader node to the Follower node.
Insert picture description here

Message broadcasting

We found that the entire process is actually similar to the Paxos protocol, which is divided into two stages. In fact, we can think of the zab protocol as the implementation of the Paxos algorithm. In the zab protocol, only half of the follower nodes are required to feedback OK. This is a mode in the ZAB protocol.Message broadcasting The specific steps are as follows:

  1. The client initiates a write operation request

  2. The Leader server converts the client's request request into a transaction proposal (Proposql), and at the same time assigns a globally unique 64-bit self-increment ID, ZXID, to each proposal (Proposql). Causal ordering can be achieved through the comparison of the size of ZXID. characteristic.

  3. There is a first-in-first-out queue between the Leader server and each Follower (implemented through the TCP protocol to achieve the feature of global order), and the Leader will send the message with ZXID as a proposal To the queue to notify all Followers

  4. The Follower machine takes the message from the queue for processing, and after processing is completed (written in the local transaction log), it sends an ACK confirmation to the Leader server

  5. After the Leader server receives more than half of the Follower's ACK, it considers that it can send a commit

  6. Leader sends commit messages to all Follower servers
    Insert picture description here

Above did not like the coherence protocol - Paxos algorithm , as in have multiple proponent (Proposer), in coherence protocol - Paxos algorithm eventually mentioned, it is to ensure that the activity of its algorithm, to avoid falling into an infinite loop.


In addition, in order to further prevent blocking, there is a separate queue between the Leader server and each Follower to send and receive messages. Using queue messages can achieve asynchronous decoupling. Just send a message to the queue between Leader and Follower. If the synchronization method is used, it is easy to cause blocking. The performance is much lower.




Crash recovery

In addition to the above-mentioned message recovery, another important part of the ZAB agreement isCrash recoveryTo ensure that all processes in the Zookeeper cluster can be executed in an orderly manner, only the Leader server accepts write requests. Even if the Follower server receives a client request, it will be forwarded to the leader server for processing.


However, if the Leader server crashes (restart is a special crash, and there is no Leader at this time), the zab protocol requires the Zookeeper cluster to perform crash recovery and Leader server election (during this period, the node is temporarily unavailable and the node status is LOOKING ), For example, there are currently 5 servers, Server1 is the Leader server. At this time, Server1 and Server have found a fault and crashed. At this time, the process of crash recovery and Leader server election is as follows:
Insert picture description here

  1. Each server will issue a vote, the first time is to vote for themselves. Voting information: (myid, ZXID)
  2. Collect votes from various servers
  3. Process the vote and re-vote,Processing logic: compare ZXID first, then compare myid
  4. Statistical voting, as long as more than half of the machines receive the same voting information, the Leader can be determined
  5. Change server status

Here we take a look, how to compare ZXID first, and choose the node with the larger ZXID as the Leader? Because the crash recovery requirements in the ZAB protocol meet the following 2 requirements:

  1. Make sure that the proposal (Proposal) submitted by the Leader must be finally submitted by all Follower servers
  2. Make sure to discard the proposal that has been submitted by the leader but has not been submitted (Proposal)

Therefore, the newly elected Leader node contains the highest ZXID, which can ensure that the Leader server can get the latest data. The advantage of this is that it can avoid the submission and discarding of the Leader Server ’s inspection proposal (Proposal).

It means that after the previous Leader node crashes, the node with the largest ZXID must have received the latest command from the former Leader node. If the node with the largest ZXID data has been submitted, other nodes have not received the instruction and are waiting, then do n’t The node should be submitted; if the node with the largest ZXID has data waiting to be submitted, it means that no node has received the commit command before and cannot determine whether it should be submitted. At this time, it should be discarded to ensure data consistency.


Among them, ZXID is the transaction ID, which is the 64-bit self-increasing ID maintained by the system. In the consensus protocol-Paxos algorithm , it refers to the number ID of the proposal (Proposal) proposed by the Proposer, and myid is our deployment in the Zookeeper cluster. The value in the myid file configured in.




Zookeeper cluster life cycle

After understanding the ZAB agreementMessage broadcastingwithCrash recoveryLet's take a look at the life cycle of the Zookeeper cluster in the following figure, as follows:
Insert picture description here


1. Crash recovery

We combine the Zookeeper cluster deployment with our hands-on deployment of Zookeeper instances to understand. First, after configuring the Zookeeper configuration, we started the first machine and found that the Zookeeper cluster did not successfully run (not reaching the size of the cluster) Majority), because the Zookeeper cluster needs to reach more than half when it is newly started, so that the election can be conducted. After the new Leader is selected, the cluster can run normally.

2. Normal operation

After we start the second machine in the Zookeeper cluster deployment , the Zookeeper cluster can run normally, because this has reached more than half (excluding the configured Observer machine), so we can elect the Leader, at this time the cluster Can run normally

3. Crash recovery

This crash recovery mainly refers to when the leader node crashes during the normal operation of the Zookeeper cluster. At this time, we need to elect a new leader. After selecting the new leader, a data synchronization operation will be performed. The entire process is crash recovery

4. Data synchronization

This data synchronization process means that during the normal operation of the Zookeeper cluster, some nodes have crashed, but after a period of time, it is normally connected to the cluster. At this time, it needs to perform data synchronization to ensure The consistency of its data.

286 original articles published · Liked12 · Visitors 10,000+

Guess you like

Origin blog.csdn.net/newbie0107/article/details/105022940