[Turn] ten minutes understand ZAB (Zookeeper Atomic Broadcast) protocol

Zookeeper based ZAB (Zookeeper Atomic Broadcast), the system architecture implemented in the standby mode, data consistency between copies of each cluster.

ZAB protocol defines election (election), discovery (discovery), synchronization (sync), broadcast (Broadcast) four stages.

Election (election) is elected which is the master; discovery (discovery), synchronization (sync) is selected when the main stage to do data recovery;

Broadcast (Broadcast) when the host and from the selected data and synchronizing good, normal phase synchronization from master to write write data.

Simply introduce the following ZAB agreement, the purpose is to quickly understand its essence, grasp the ZAB agreement. Then through the paper to understand the specific details of the agreement. It introduces the election and broadcast in two stages.

basic concepts

We understand some of the basic concepts under zk.

zk cluster has three roles:

  • leader is what we call the Lord;

  • follower from what we say;

  • observer can be considered the master of the clone copy, do not vote, it can be ignored;

A cluster node zk, there are three states:

  • looking state elections, the current rudderless;

  • leading leader only state;

  • following follower only state;

Each write was successful, has a globally unique identifier, called zxid. It is a positive integer 64bit high 32 called epoch representation election era, the low 32-bit auto-incremented id, plus a write-once each. Imagine ancient reign of China, for example, fifteen years Wanli, Wanli is epoch, fifteen years is id.

zk cluster machines are generally an odd number (2n + 1), only one host leader, the rest are the slave follower. Primary election or write data, there must be> = n + 1 sets the same election, to perform operations elections.

Voting priority: first Compare zxid, if equal, then compare the machine id, they are in descending order.

election

When the new cluster, or host crash, or more than half, or the host and slave lost contact after will trigger select a new host operating. There are two algorithms fast paxosand basic paxos.

fast paxos

ZAB default algorithm used is fast paxos algorithm.

An election must be the election rounds plus a similar zxid in the epoch field, different rounds of elections to prevent mutual interference.

Each node looking into the state, beginning to vote for himself, and then vote message to other machines. Content <first rounds of voting, the vote node zxid, pitched node number> .

After looking the other node state receives,

1 first determines whether the ticket is valid. Whether effective way to look at the number of votes in the voting round and round the local record number of votes for equality:

2.1 If the ratio of the number of small local voting rounds, discarded.

2.2 If the number of voting than the local big wheel

  1. 证明自己投票过期了,清空本地投票信息,

  2. 更新投票轮数和结果为收到的内容。

  3. 通知其他所有节点新的投票方案。

2.3 If the local number of votes equal to the wheel, according to the votes received by the priority compare to vote and cast out their own ballots.

  2.3.1 如果收到的优先级大,更新自己的投票为对方发过来投票方案,把投票发出去。 

  2.3.2 如果收到的优先级小,则忽略该投票。

  2.3.3 如果收到的优先级相等,Voting corresponding node is updated.

After 3 each collected a vote to see the voting results already received a list of records to see if any node can reach more than half of the votes. If you have reached the termination vote, it announced that the elections are over, update their status. Then discovery and synchronization phase. Otherwise, continue to collect voting.

basic paxos

1 each looking to the requesting node, other nodes asked to vote.

Other nodes return to their vote <zk of id, zxid>. The first cast themselves.

2 After receiving the results, if the vote received than their vote zxid big, update their vote.

3 Upon receipt of all nodes return, vote count, the election of a node has reached more than half of the success of the elections. Otherwise, continue to the next round to start asking, choose the leader until the end.

basic paxos and fast paxos difference

Here is an active push out fast, as long as the result is updated, it is immediately synchronized to other nodes. Other nodes might not notice his own ticket to all the nodes, found himself low priority votes cast, voted to update, and then re-update notification to all nodes.

Each node will have to ask basic finish, in order to know the new results, then go ask the new election results other nodes.

fast faster than basic place, it is a node, and each node after not voting information exchange in order to know their vote whether or not to update. It will reduce the number of interactions.

Radio - master-slave synchronization

Synchronization data from the master is relatively simple, when there is a write operation, if it is received by the slave, the host will go. Do a forward, to ensure that write are carried out on the host.

The main proposal before the transaction, after receiving more than half of reply, submitted recurrence. When the master received a write operation, the first locally generated business for the transaction to generate zxid, and then sent to all nodes follower. When a follower receive affairs, first proposed transaction logs written to the local disk, after a successful return to the leader. leader after receiving more than half of feedback submitted to the transaction. Then inform all the follower to commit the transaction, also commits the transaction after the follower received after the submission can be distributed to the client.

to sum up

Only by writing and then synchronize from the main control to ensure that the conflict does not generate the global zxid. Globally unique zxid can make priorities for election and synchronize data. Returning section for fast paxos principle can be. The core idea is increasing zxid order to ensure to have the highest priority when the main node. Master-slave synchronization by the proposal and submitted in two stages , with more than half of the nodes successful write, write data is considered successful.


 

Welcome attention


---------------------
Author: owenandhisfriends
Source: CNBLOGS
Original: https: //www.cnblogs.com/owenandhisfriends/p/9622208.html
Disclaimer: This article author original article, reproduced, please attach Bowen link!

Guess you like

Origin www.cnblogs.com/webfactory/p/11539719.html