table of Contents

Detailed Paxos consensus algorithm

1. Problem description

Assuming that there is a set of processes that can propose proposals, then the following points need to be guaranteed for a consensus algorithm:

Of these proposed proposals, only one will be selected.
If no proposal is proposed, then there will be no selected proposal.
When a proposal is selected, the process should be able to obtain information about the selected proposal.

And in order to meet safety consistency, there are the following requirements:

It is required that only the proposed proposal can be selected.
Only one proposal can be selected.
If a process believes that a certain proposal has been selected, then the proposal must be the one that was actually selected.

In the Paxos consensus algorithm, there are three participating roles: Proposer, Acceptor, and Learner. It is assumed that different participants can communicate by sending and receiving messages. Among them, Proposer makes a proposal, Acceptor decides which proposal is the selected proposal, and Learner is the recipient of the proposal.

2. Derivation process

There are a lot of descriptive texts in this part. If you are not interested in this, you can directly look at the algorithm flow in Part 3.

First of all, we hope that even if only one proposal is proposed, there can still be a proposal selected. This implies that we:

P1: An Acceptor must approve the first proposal it receives.

However, according to P1, multiple proposals may be approved by Acceptors with the same or similar numbers. For example, A and B are approved by 1 and 2 respectively, so it is still impossible to determine the only proposal. How to solve it?

P2: A proposal must be approved by more than half of the Acceptors.

The introduction of this requirement indicates that the Paxos algorithm requires an odd number of nodes to form a cluster, and its fault tolerance is 2F+1, which means that at most F nodes are allowed to fail simultaneously.

So far we have guaranteed that as long as there is a proposal, there will be a proposal selected. And the combination of P1 and P2 implies that an Acceptor must be able to approve multiple proposals. Here we introduce a globally ordered number, ProposalID to uniquely identify each proposal approved by the Acceptor.

When a proposal—we use value to refer to its content, is approved by more than half of Acceptors, we think that the value has been selected and the proposal has also been selected. At this time, the proposal becomes a number and value The combination of composition: [ProposalID, Value].

But at this time there is another problem. Acceptor can approve a proposal, which leads to multiple different proposals being selected, which violates the requirement that only one proposal is selected in Section 1. and so:

P3: If the proposal [ProposalID0, Value0] is selected, then the Value of all subsequent selected proposals must be Value0.

As long as P3 is satisfied, even if multiple proposals are selected, we can guarantee that all selected proposals have the same value.

How to meet the requirements of P3? We can further get

P3-1: If the proposal [ProposalID0, Value0] is selected, the value of all subsequent proposals approved by the Acceptor must be Value0.

As long as P3-1 is satisfied, P3 is definitely satisfied, but the problem is coming. If the proposal [ProposalID0, Value0] is selected at this time, but AcceptorN happens to not approve the proposal because the communication is asynchronous, that is, it does not know that the proposal has been selected, and it cannot set the subsequent approved proposals , Which will still make P3 unsatisfied. So we can further require Proposer:

P3-2: If the proposal [ProposalID0, Value0] is selected, then the value of any proposal proposed by the Proposer must be Value0.

In summary, we can get the execution process of Propser:

1. Proposer sends a message containing only ProposalID to more than half of Acceptors. This request is called a preparation request.

2. If Proposer receives more than half of Accepotor's replies, it uses the largest ProposalID value in the reply, plus the ProposalID of the prepared message as a proposal. If the value in the reply is empty, any value can be proposed.

3. Send an Accept request to the responding Acceptors, requesting them to approve the proposed Proposal.

The execution process of Acceptor is as follows:

1. When a preparation request is received, if the ProposalID is greater than the preparation message received before, it will reply to the proposal with the largest number it has approved.

2. When receiving an Accept request, if the Acceptor has not responded to a preparation message with a larger ProposalID, accept the Proposal and reply.

3. Algorithm flow

We can get the algorithm flow of Paxos:

1. Proposer selects a proposal number Mn (to ensure uniqueness, timestamp + ServerID can be used), and then sends a preparation request numbered Mn to more than half of the Acceptors.

2. If an Acceptor, A receives a preparation request numbered Mn, and Mn>A has responded to the maximum number of the preparation message before (maxM_pre), then A will approve the proposal with the largest number (including both number and Contains value, assumed to be [Ma_max, va]) as a response to feedback to Proposer, and let maxM_pre=Mn (that is, no longer reply to preparation messages with a number less than or equal to Mn).

3. If the Proposer receives feedback from more than half of the Acceptors for preparing the message Mn, then he will send a proposal [Mn,Vn] to Acceptors, where Vn is the value of the proposal with the highest number in the received feedback (ie May be va), if the response does not contain any proposals, it can be any value. This step is called Accept request.

4. If the Acceptor receives this Accept request for the [Mn,Vn] proposal, as long as maxM_pre<=Mn, that is, it has not responded to a larger number of preparation requests, the proposal is passed. At the same time, let Ma_max=Mn,va=Vn.

The pseudo code is as follows:

Through the pseudo code, we can find a problem. Suppose there are two Proposer A and B. A sends a preparation to get more than half of the feedback. When the accept request is in progress, B also sends a preparation. At this time, A detects an updated proposal , Will return to step 1, repeat the above process, will form a livelock (Livelock) situation.

You can consider choosing a Proposer as the main Proposer, and stipulate that only it can make a proposal, so that the activity of the algorithm process can be guaranteed.

4.Multi-Paxos algorithm

The original Paxos algorithm (Basic Paxos) can only form a resolution for one value. The formation of a resolution requires at least two network round trips. In high concurrency situations, more network round trips may be required, and in extreme cases, a livelock may even be formed. If you want to determine multiple values continuously, the original Paxos algorithm cannot solve it.

In practical applications, it is almost always necessary to determine multiple values continuously, and it is hoped that there will be higher efficiency. Multi-Paxos is proposed to solve this problem. Multi-Paxos has made two improvements based on the original Paxos:

For each value to be determined, an instance of Paxos algorithm is run once to form a decision. Each Paxos instance is identified by a unique Instance ID.
Election of a Leader among all Proposers, and the Leader uniquely submits Proposal to Acceptors for voting. In this way, there is no Proposer competition and the livelock problem is solved. In the case that there is only one Leader in the system for Value submission, the Prepare stage can be skipped, thereby turning the two stages into one stage and improving efficiency.

Multi-Paxos process:

Multi-Paxos first needs to elect a leader. The determination of the leader is also the formation of a resolution, so a Basic Paxos instance can be executed to elect a leader. After the leader is selected, the proposal can only be submitted by the leader. After the leader is down, the service is temporarily unavailable, and the leader needs to be re-elected to continue the service. When there is only one leader in the system for proposal submission, the Prepare stage can be skipped.

Multi-Paxos changes the scope of the Prepare phase to all instances submitted by the leader later, so that the continuous submission of the leader only needs to execute the Prepare phase once, and then only needs to execute the Accept phase, which changes the two phases into one phase, which improves efficiency. In order to distinguish multiple consecutively submitted instances, each instance is identified by an Instance ID, and the Instance ID can be incrementally generated by the Leader locally.

Multi-Paxos allows multiple nodes that consider themselves Leaders to submit Proposal concurrently without affecting its security. Such a scenario degenerates into Basic Paxos.

Both Chubby and Boxwood use Multi-Paxos. Zab used by ZooKeeper is also a variant of Multi-Paxos.

5. References

[1] https://zhuanlan.zhihu.com/p/31780743

[2] "From Paxos to Zookeeper Distributed Consistency Principle and Practice"