Zab algorithm Detailed

       Zookeeper uses a protocol called Zab (Zookeeper Atomic Broadcast) as its core copy of consistency, according to its authors say this is a new emerging algorithm, which is characterized by fully taking into account the specific circumstances of Yahoo: high throughput, low latency, robust, simple, but not overly requires its scalability. The following will show some of the core content of the agreement:

Also, this article discusses only the coherence protocol Zookeeper use rather than to discuss its source code to achieve

Zookeeper realize there is a Client, Server constituted, Server-side provides a consistent replication, storage services, Client-side will provide some specific semantics, such as distributed locks, election algorithms, distributed mutual exclusion. From storage for content, Server-side data store some more of the state, rather than the data content itself, and thus can be used as a small Zookeeper file system. Amount of data stored is relatively small state, may all be completely loaded into memory, thus greatly eliminating the communication delay.

Crash Server can restart, taking into account the fault tolerance, Server must "remember" the state before the data, the data needs to be persisted, but high throughput, disk IO system has become a bottleneck, its solution is to use a cache, the random writes into a continuous writing.

Zookeeper taking into account the state of the main operational data, in order to ensure a consistent state, Zookeeper proposed two security attributes (Safety Property)

 

  • Full sequence (Total order): If a message is sent before the message is b, then all Server should see the same results
  • Causal ordering (Causal order): If the message is a message occurs before B (B leads to a), and is transmitted together, it is always executed before a b.
In order to ensure both security attributes, Zookeeper using the TCP protocol and Leader. By using the TCP protocol to ensure that the characteristics of the total order message (starting first come first), to solve the problem by causal sequence Leader: First Leader executing first. Because of the Leader, Zookeeper architecture becomes: Master-Slave mode, but this mode Master (Leader) would Crash, therefore, Zookeeper introduced Leader election algorithms to ensure the robustness of the system. Zookeeper sum up the whole work in two phases:
  • Atomic Broadcast
  • Leader election

1. Atomic Broadcast

Leader same time there is a node, the other node is called "Follower", if the update request, if the client node is connected to the Leader, which is performed by Leader requesting node; Follower if connected to the node, the node need to forward the request to perform Leader . However, read requests, read directly from the Client Follower data, if necessary to read the latest data, it is necessary from Leader node, is designed to read and write Zookeeper ratio 2: 1.
 
Leader submit a simplified version of the two-stage mode, but submitted to two distinct and different from the other two-stage transmission request Follower:
  • Because there is only one Leader, Leader submitted a request to the Follower will be accepted (no other Leader interference)
  • You do not need all of the Follower are successful response, as long as a majority to
In layman's terms, if there is 2f + 1 nodes, a node fails allowing f. Because there must be a majority of any two intersection when switching Leader, by the intersection of these nodes can get the latest status of the current system. If there is not a majority (surviving nodes less than f + 1) then the algorithm process ends. But there is one exception:
If you have A, B, C three nodes, A Leader is, if B Crash, the A, C works fine, because A is the Leader, A, C still constitute a majority; A Crash if you can not continue working because Leader It can not constitute a majority election.

2. Leader Election

Leader election Paxos algorithm is mainly dependent on the specific algorithm process please refer to other blog, just to consider some of the issues Leader election brings. The biggest problem encountered Leader election is whether the "old and new interactive" issue, new Leader Leader to continue the old state. Here Yao An old Leader Crash timing dotted situations:
  1. In the old Leader COMMIT before Crash (has been submitted to local)
  2. Leader COMMIT in the old after Crash, but some Follower to Commit request is received
The first case, these data only old Leader you know, when the old Leader reboot, and needs to be synchronized to delete the data from the local and New Leader, in order to maintain a consistent state.
The second case, the new Leader should pass a majority to get the latest data submitted by the old Leader
After restarting the old Leader, you may also think of themselves as Leader, may continue to send requests are not completed, and thus exist as two Leader leads algorithm process fails, the solution is the Leader Join in each message id, Zookeeper in referred zxid, zxid is a 64-bit number, the upper 32 bits of leader yet Epoch, incremented each time leader conversion; lower 32 bits of message number should be re-numbered from 0 Leader conversion. By zxid, Follower can easily find request has come from old Leader, to reject the request of the old Leader.
 
Because there are data deletion (case 1) in the old Leader, so Zookeeper data storage to support compensation operations, which also require the same record as the log database.

3. Zab and Paxos

The authors believe that Zab Zab and paxos not the same, only it did not use because Paxos Paxos is not totally ordered sequence to ensure that:
Because multiple leaders can
propose a value for a given instance two problems arise.
First, proposals can conflict. Paxos uses ballots to detect and resolve conflicting proposals. 
Second, it is not enough to know that a given instance number has been committed, processes must also be able to figure out which value has been committed.
Paxos algorithm is not really a logical relationship between the order of the request, but only consider the whole sequence between the data, but few people directly paxos algorithm will go through a simplified and optimized.
Usually there will be several Paxos simplified form, one of which is, in the presence of Leader, can be reduced to one step (Phase2). Only one stage scenario requires a robust Leader, and therefore the focus becomes Leader elections, taking into account the Learner of the process, but also a "learning" stage, in this way, Paxos be simplified into two phases :
  • Before Phase2
  • Learn
If we consider the majority Learn To succeed, this is actually Zab agreement. Paxos algorithm is emphatically stressed the control of the electoral process, not much learning to consider the resolution, Zab just conducted a supplement.
Before it was said, all distributed algorithms are simplified form Paxos, though it is absolute, but in many cases it is true, but I do not know whether Zab authors agree with this statement?

4. End

This article is think in terms of protocol, algorithm analysis Zookeeper, rather than analyzing its source code to implement because Zookeeper version changes the scene described herein may have can not find the corresponding implementation. Another paper also tried to expose a fact: Zab is a simplified form of Paxos.
[References]
  • A simple totally ordered broadcast protocol
  • paxos

Guess you like

Origin www.cnblogs.com/aibabel/p/10973596.html