Discarded messages cannot appear again

When the leader receives a message request to generate a proposal, it hangs up, and other followers did not receive the proposal. Therefore, after the leader is reselected after the recovery mode, the message is skipped. At this point, the previously hung leader is restarted and registered as a follower. He retains the proposal state of the skipped message, which is inconsistent with the state of the entire system and needs to be deleted.

The ZAB protocol needs to meet the above two conditions, and it is necessary to design a leader election algorithm: it can ensure that the transaction proposal that has been submitted by the leader can be submitted, and at the same time discard the transaction proposal that has been skipped.

For this request

1. If the leader election algorithm can ensure that the newly elected leader server has the transaction proposal with the highest number (the largest ZXID) of all machines in the cluster, then it can be guaranteed that the newly elected leader must have the submitted proposal. Because all proposals must have more than half of the follower ACKs before COMMIT, that is, there must be a proposal for the proposal in the transaction log of the server of more than half of the nodes. Therefore, as long as there is a legal number of nodes working normally, one node must be saved The proposal status of all COMMIT messages

2. The other one, zxid is 64 bits, and the upper 32 bits are the epoch number. Each time a new leader is elected by the leader, the new leader will +1 the epoch number. The lower 32 bits are the message counter. Each time a message is received This value is +1, and the value is reset to 0 after the new leader is elected. The advantage of this design is that the old leader will not be elected as the leader when it restarts after it hangs, so its zxid must be smaller than the current new leader. When the old leader connects to the new leader as a follower, the new leader will let it clear all the proposals with the old epoch number that have not been COMMIT

 

Guess you like

Origin blog.csdn.net/Leon_Jinhai_Sun/article/details/112912528