Distributed theory (9)-Detailed explanation of Paxos consensus algorithm

In a distributed system, due to various reasons such as node failures and network delays, according to CAP theory, we can only guarantee two of Consistency, Availability, and Partition Tolerance .

For systems with high consistency requirements, such as bank ATMs, they will choose to sacrifice availability and refuse service when they fail. MongoDB, Redis, MapReduce use this scheme.

For static websites and query databases with weak real-time performance, consistency will be sacrificed and inconsistency will be allowed for a period of time. Simple distributed protocol Gossip, database CouchDB, Cassandra use this scheme.

figure 1

background

In 1990, Leslie Lamport proposed the Paxos algorithm in the paper "The Part-Time Parliament". Due to the way the thesis used stories, no mathematical proofs were used, and it was not valued at first. The paper was not formally accepted until 1998. Later in 2001, Lamport reorganized the paper and published "Paxos Made Simple" . As an early contributor in the field of distributed systems, Lamport won the Turing Award in 2013.

The Paxos algorithm is widely used in distributed systems. Mike Burrows, author of Google Chubby, said: "There is only one consensus algorithm in this world, that is Paxos (There is only one consensus protocol, and that's Paxos)."

Later, the Raft algorithm simplified and improved Paxos and became easier to understand and implement.

Paxos type

Paxos was originally an island in a fictional story, and the parliament reached a consensus by voting. But the legislator may leave, the messenger may be lost, or repeat the message. Correspond to the node failure and network failure of the distributed system.

figure 2

As shown in Figure 2, suppose that the MP wants to propose what to eat at noon. If one or more people propose at the same time, but only one proposal can be passed at a time, this is Basic Paxos, which is the most basic protocol in Paxos.

Obviously, Basic Paxos is not efficient enough. If you put Basic Paxos in parallel and make multiple proposals at the same time, such as what to eat at noon, where to go after eating, who will invite guests, etc., the parliamentarian can also pass multiple proposals. This is the Multi-Paxos protocol.

Basic Paxos

Character

There are three roles in the Paxos algorithm: Proposer, Acceptor, and Learner. In the implementation, a node can assume multiple roles.

image 3

  • Proposer is responsible for making proposals
  • Acceptor is responsible for voting on the proposal
  • Learner gets poll results and helps spread

Learner does not participate in the voting process. To simplify the description, we directly ignore this role.

algorithm

The operation process is divided into two stages, Prepare stage and Accept stage.

Proposer needs to issue two requests, Prepare request and Accept request. Acceptor accepts or rejects proposals based on the information it collects.

Prepare stage

  • Proposer chooses a proposal number n and sends a Prepare (n) request to more than half (or more) Acceptors.
  • After Acceptor receives the message, if n is greater than the number it has seen before, it will reply to the message, and will not accept proposals less than n in the future. In addition, if you have previously accepted a proposal that is smaller than n, reply to the proposal number and content to Proposer.

Accept stage

  • When Proposer receives more than half of the replies, it can send an Accept (n, value) request. n is your own proposal number, and value is the value corresponding to the largest proposal number that Acceptor replies. If Acceptor does not reply to any proposal, value is Proposer's own proposal content.
  • After Acceptor receives the message, if n is greater than or equal to the largest number seen before, record the proposal number and content, and reply to the request to accept.
  • When Proposer receives more than half of the replies, it means that its proposal has been accepted. Otherwise, return to the first step to re-initiate the proposal.

The complete algorithm is shown in Figure 4:

Figure 4

Acceptor needs to persistently store the three values ​​minProposal, acceptedProposal, and acceptedValue.

three situations

There are three possible situations in the Basic Paxos consensus process. Introduced separately below.

Situation 1: The proposal has been accepted

As shown in Figure 5. X and Y represent the client, and S1 to S5 are the server, representing both Proposer and Acceptor. To prevent duplication, the number proposed by Proposer consists of two parts:

序列号.Server ID

For example, the proposal number proposed by S1 is 1.1, 2.1, 3.1 ...

 

Figure 5

The above picture is from page 13 of Paxos lecture (Raft user study)

This process indicates that S1 received the proposal X from the client, so S1 acts as a Proposer and sends a Prepare (3.1) request to S1-S3. Since Acceptor S1-S3 has not accepted any proposal, it accepts it. Proposer S1-S3 then sends an Accept (3.1, X) request, and proposal X is successfully accepted.

After proposal X is accepted, S5 receives proposal Y from the client, and S5 sends a Prepare (4.5) request to S3-S5. For S3, 4.5 is larger than 3.1, and has accepted X, it will reply to this proposal (3.1, X). After receiving the reply from S3-S5, S5 replaces his Y with X, and then sends an Accept (4.5, X) request. S3-S5 accepts the proposal. Eventually all Acceptors agree that they all have the same value X.

The result of this situation is that the new Proposer will use the accepted proposal

Case 2: The proposal is not accepted, the new Proposer is visible

Figure 6

The above picture is from page 14 of Paxos lecture (Raft user study)

As shown in Figure 6, S3 accepted the proposal (3.1, X), but S1-S2 has not yet received the request. At this time S3-S5 receives Prepare (4.5), S3 will reply to the accepted proposal (3.1, X), S5 replaces the proposal value Y with X, and sends Accept (4.5, X) to S3-S5. , Number 4.5 is greater than 3.1, so this proposal will be accepted.

Then S1-S2 accept Accept (3.1, X), and finally all Acceptors agree.

The result of this situation is that the new Proposer will use the submitted value and both proposals will succeed

Situation 3: The proposal is not accepted and the new Proposer is not visible

Picture 7

The above picture is from page 15 of Paxos lecture (Raft user study)

As shown in Figure 7, S1 accepted the proposal (3.1, X), and S3 received Prepare (4.5) first and then Accept (3.1, X). Since 3.1 is less than 4.5, it will directly reject the proposal. Therefore, proposal X cannot receive more than half of the responses, and this proposal is blocked. Proposal Y can be passed smoothly.

The result of this situation is: the new Proposer uses its own proposal, the old proposal is blocked

Livelock (livelock)

The chance of livelock is very small, but it will seriously affect performance. That is, two or more Proposers preempt each other in the Prepare phase.

Picture 8

The above picture is from page 16 of Paxos lecture (Raft user study)

The solution is to give a random waiting time after Proposer fails, which reduces the possibility of simultaneous requests.

Multi-Paxos

The livelock mentioned in the previous section can also be solved using Multi-Paxos. It will select a leader from Proposer, only the leader will submit Proposal, and it can also save the Prepare stage, reducing the performance loss. Of course, it is also possible to directly move the mechanism of multiple Proposers of Basic Paxos, but the performance is not high enough.

After Basic Paxos is parallelized, multiple proposals can be processed at the same time, so to be able to store different proposals, also ensure the order of the proposals.

The structure of Acceptor is shown in Figure 9. Each square represents an Entry, which is used to store the proposal value. Use increasing Index to distinguish Entry.

Picture 9

Multi-Paxos needs to solve several problems, we will look at each one.

1. Leader election

One of the simplest election methods is to be the leader with the largest Server ID.

Each server sends heartbeat packets to other servers at intervals of T. If a server does not receive a heartbeat from a higher ID within 2T, it becomes the leader.

Other Proposers must reject the client ’s request or forward the request to the leader.

Of course, other more complicated election methods can also be used, which will not be detailed here.

2. Omit the Prepare phase

The role of Prepare is to block old proposals and check whether there are accepted proposal values.

When there is only one Leader sending a proposal, Prepare will not conflict, you can omit the Prepare stage, so that you can reduce the RPC request by half.

The logic of the Prepare request is modified to:

  • Acceptor records a global maximum proposal number
  • Reply to the maximum proposal number. If the current entry and all subsequent entries have not accepted any proposal, reply noMoreAccepted

When the Leader receives more than half of the noMoreAccepted replies, the Prepare phase is no longer needed, and only the Accept request is sent. Until Accept is rejected, the Prepare phase is needed again.

3. Complete information flow

The information is incomplete so far.

  • Basic Paxos only needs more than half of the nodes to reach agreement. But in Multi-Paxos, this approach may prevent some nodes from getting complete entry information. We hope that each node has all the information.
  • Only Proposer knows whether a proposal has been accepted (according to the response received), and Acceptor cannot know this information.

The solution to the first problem is very simple, that is, Proposer sends Accept requests to all nodes.

The second question is slightly more complicated. First, we can add a Success RPC and let Proposer explicitly tell Acceptor which proposal has been accepted. This is completely feasible, but it can be optimized to reduce the number of requests.

In the Accept request, we add a firstUnchosenIndex parameter to indicate the first unaccepted Index of Proposer. This parameter implicitly means that for Proposer, all proposals that are smaller than Index have been accepted. Therefore, Acceptor can use this information to mark proposals that are smaller than Index as accepted. In addition, it should be noted that only the Proposer's proposal can be marked, because if a Leader switch occurs, different Proposers may have different information. If Proposer is directly marked, it may be inconsistent.

Picture 10

As shown in Figure 10, Proposer is preparing to submit an Accept request with Index = 2. 0 and 1 are accepted proposals, so firstUnchosenIndex = 2. When the Acceptor receives the request and compares the Index, the Dumplings proposal can be marked as accepted.

Due to the Leader switching situation mentioned earlier, an explicit request is still required to obtain complete information. When the Acceptor replies to the Accept message, bring your own firstUnchosenIndex. If it is smaller than Proposer, then you need to send Success (index, value), the Acceptor marks the received index as accepted, and then reply to the new firstUnchosenIndex, and so on until the two indexes are equal.

to sum up

Paxos is an important consensus algorithm in distributed consistency problems. This article introduces the most basic Basic Paxos and Multi-Paxos that can be parallelized.

In Basic Paxos, three basic characters Proposer, Acceptor, Learner, and three basic situations that may occur during the proposal are introduced. In Multi-Paxos, three problems to be solved are introduced: Leader election, Prepare omitting, and complete information flow.

In the next article, we will implement a simple demo to verify this algorithm, the implementation process will involve more details.



Author: Hash performing artist
link: https: //www.jianshu.com/p/c3264228dfbe
Source: Jane books
are copyrighted by the author. For commercial reproduction, please contact the author for authorization, and for non-commercial reproduction, please indicate the source.

Published 59 original articles · 69 praises · 270,000+ views

Guess you like

Origin blog.csdn.net/pansaky/article/details/103046143