The realization principle of crash recovery

We have already understood the message broadcast process in the ZAB protocol. The message broadcast process of the ZAB protocol based on the atomic broadcast protocol has no problems under normal circumstances, but once the leader node crashes, or the leader server is lost due to network problems If more than half of the follower nodes are in contact (the leader loses contact with more than half of the follower nodes, it may be a network partition between the leader node and the follower node, then the leader at this time is no longer a legal leader), then it will enter a crash Recovery mode. The zab protocol needs to do two things in the crash recovery state

1. Election of a new leader

2. Data synchronization

When I explained the message broadcast earlier, I knew that the message broadcast mechanism of the ZAB protocol is a simplified version of the 2PC protocol. This protocol only requires more than half of the nodes in the cluster to respond and submit. But it cannot deal with the data inconsistency caused by the crash of the Leader server. Therefore, a "crash recovery mode" was added to the ZAB protocol to solve this problem.

Then the crash recovery in the ZAB protocol needs to be guaranteed. If a transaction proposal is successfully processed on one machine, then the transaction should be processed successfully on all machines, even if there is a failure. In order to achieve this goal, let's first imagine what scenarios will lead to data inconsistency in zookeeper, and how to deal with the crash recovery in the zab protocol for this scenario.

 

 

Guess you like

Origin blog.csdn.net/Leon_Jinhai_Sun/article/details/112912417