Distributed | Paxos consensus algorithm

1 Introduction

  • The Paxos consensus algorithm is a type of non-Byzantine fault-tolerant algorithm, which is used to solve the consensus problem of faulty behavior but no malicious behavior under distributed conditions.
  • It is a strong consistency model that requires more than half of the read or write to be successful before the operation is successful.
  • The most commonly used consensus algorithms such as Fast Paxos algorithm, Cheap Paxos algorithm, Raft algorithm, ZAB protocol, etc. are all improved based on Paxos algorithm
  • Paxos algorithm contains 2 types
    • Basic Paxos algorithm: describes how to 某个值reach a consensus on (the value of the proposal) between multiple nodes
    • Multi-Paxos idea: Describes the execution of multiple Basic Paxos instances to 一系列值reach a consensus

2. State machine replication

What exactly does the so-called guarantee consistency guarantee?
1. Generally speaking, distributed and consistent solutions are solved through 状态机复制, for example, each operation is a log, and each node has its own operation log, and the so-called ensuring consistency is to ensure the log between nodes Consistency, through log synchronization with other nodes.

2.When the logs are guaranteed to be consistent, the 相同的状态机final result after the same log sequence is input and output is the same.

Insert picture description here
About the state machine

  • The goal of the state machine is to achieve the same output from the same input sequence. For example, each server is running the same state machine S, and then each state machine is in the initial state S0, and then these state machines calculate 相同的输入序列is {x0, x1, x2, x3, ...xn}, then these state machines will go through the same process as the state transitions: S0->S1->S2->S3...Sneventually reached 相同的状态Sn, and the output 相同的结果序列is: {out1(S1), out2(S2), out3(S3), .... outn(Sn)}.

For example, there is the following input sequence (from top to bottom), as long as each server has the same state machine, and the state machine is in the same state before processing the input sequence, then after the state machine, the final state they reach is the same For example, after the 指令重放state machine is realized, after the input sequence, a is equal to 4 and b is equal to 3 to reach the same state.
Insert picture description here

3. Base Paxos algorithm

3.1 Decision model

  • 客户端(Client):
    • Responsible for initiating read and write requests
  • 提议者(Proposer):
    • Responsible for the client's request to initiate a 提案voting internally .
    • It represents the access and coordination function. After receiving the client's request, it initiates a two-phase submission and conducts consensus negotiation;
  • 接收者(Acceptor):
    • Responsible for voting on each proposal and storing locally voted proposals
    • Representatives vote to negotiate and store data, vote on the proposed value, and accept the consensus value, store it;
  • 学习者(Learner):
    • When the vote is passed, the voting result of the proposal (that is, the result of reaching a consensus) is stored
    • On behalf of storing data, it does not participate in consensus negotiation, and only accepts consensus values, which are stored and saved.

Insert picture description here
Rough consensus negotiation process:

  • In the first stage, the client wants to initiate a proposal 提议者Proposer(for example, a write request set a=3, etc.), and then the proposer initiates a proposal request such as [1,null] to the first one 接受者Acceptor, and then proceeds based on the state machine replication model Copy the proposal request to others 接受者Accpetor. When more than half of the acceptors responded that they allowed to send a proposal, indicating that the first stage preparation request was successful. Then the proposer starts the second stage, and the proposer puts the proposal id and its content (such as [1,3 ]) sent to the recipient Accpetor, and then the same spit out more than half of the responses indicating that the proposal was passed, and then the passed proposal will be persistently stored and recorded by the learner.

3.2 How does Base Paxos reach consensus

  • Through the 提议者(Proposer)launch 二阶段提交to complete a consensus negotiation,

The first stage is called准备阶段:

  • 1) Prepare the request phase:
    • The main reason is that the proposer sends a proposal preparation request with a proposal number N, and the proposal N is greater than the proposal number proposed by the proposer before (to ensure that the proposal id is globally unique and self-increasing)
  • 2) Promise preparation response phase
    • Mainly the acceptor decides whether to respond to the proposer's proposal. If the current proposal is less than any proposal number previously accepted by the acceptor, then reject the response, otherwise accept and respond

The second stage is called接受阶段

  • 1) Accept the request phase:
    • After the proposer receives more than half of the responses from the acceptor, it will enter the second stage and start sending the proposal acceptance request. This proposal contains the number and specific content of the proposal
  • 2) Accepted to accept the response stage
    • The main thing is for the acceptor to decide whether to accept the proposer's proposal finally. If the current proposal is less than any previous proposal number accepted by the acceptor, it will refuse to accept it, otherwise it will be accepted.

[协商规则]

  • The recipient does not accept proposals that are less than the ID of the proposal it currently has guaranteed
  • The proposal ID needs to ensure that it is globally unique and self-increasing, the purpose is to保证提案的有序性

3.2.1 Case 1: A single proposal reached consensus

One stage

  • At this time Prepare准备请求阶段, the proposer P sends a proposal with a proposal id of 1 to the three recipients A, B, and C (the proposal content is not required at this time, so it is null),
  • As shown in Figure 1, the three recipients received the first-stage proposal request of the proposer P at 2 o'clock, 3 o'clock, and 4 o'clock respectively.
  • As shown in Figure 2. Since the acceptors A, B, and C are not locally 保证或者没有已通过的任何提案(such as null), the responder says that they can accept the proposal with this id, and then store it locally, and guarantee that it will no longer accept a proposal with an id more than this id. Small proposal. 并发送Promise响应请求(If there is no guarantee or pass any proposal before the local, the response content is empty, which means there is no proposal yet, otherwise the response content is passed 提案[id, value]). When the proposer P 接收到半数以上Promise prepares for the success of the response phase, as shown in Figure 2. If there are 3 more than half, you can enter the next stage [接受阶段].

Insert picture description here

Insert picture description here

Second stage

  • Since most of the recipients’ Promise preparation responses have been received before, they have entered Accept接受请求阶段. Start to initiate the acceptance request, and this time will carry the proposal content, but 提案的内容will start with the proposal with the largest proposal number from the response content of the first-stage Promise preparation response stage The value is used as the content of this proposal. As shown in Figure 2, because recipients A, B, and C have not passed any proposals locally before, they return a response of "[No proposal]", which is actually empty in the preparation response . So I took my proposal value of 3 as the content of this proposal.
  • In the Accepted response stage, since the proposal number 1 of the proposal is not less than the locally guaranteed or passed proposal id of recipients A, B, and C, the proposal [1, 3] is passed, and the value 7 is accepted. In, the three nodes reached a consensus on this

Insert picture description here

Insert picture description here

3.2.2 Case 2: Concurrent proposal reached consensus

At the time 一阶段的Prepare准备请求阶段, the proposer P initiated the proposal [1:null]. After the receivers A and B received it at the time of 1, 2, because there was no approved proposal locally, the receivers A and B would guarantee the proposal, and It is guaranteed not to respond to proposals smaller than the proposal id[1]. Then at the 3 o’clock timeline, the proposer Q also initiated
a proposal [6,null], and recipients A, B, and C received the request at 3, 4, and 5, respectively, because the current proposal 6 is better than the recipient A , B’s locally guaranteed proposal 1 is greater than that,
so recipients A and B will change the guaranteed proposal to 6, and recipient C has no locally approved proposal nor guaranteed any proposal, so it will accept the proposal and update the guaranteed proposal to 6.
Subsequently, at time point 6, the receiver C receives the proposal request [1,null] from the proposer P, and because the receiver C has ensured that the proposal 6 no longer responds to a proposal smaller than the proposal 6, the proposal 1 is rejected.

在一阶段的Promise准备响应阶段, Since recipients A, B, and C have not passed any proposals locally, the content in response to proposers P and Q is that there is no proposal yet
(but if the proposal has been passed, the passed proposal [id, value] will be used as Response content).

在二阶段的Accept接收请求阶段, Proposer P received two responses from recipients A and B. Therefore, more than half of the recipients responded, so the two-stage submission request phase can be started
并根据响应中提案编号最大的提案的值作为此次二阶段接受请求中提案的值. Since the responses of recipients A and B are both empty (that is, there is no proposal yet), they use their proposal value of 3 as the value of the proposal, and send an acceptance request with a proposal of [1, 3] to recipients A, B, and C .
Proposer Q has received three response requests and will also start to enter the second phase. Also, the value of the proposal with the largest proposal number in the response is used as the value of the proposal in the second phase acceptance request.
Since the responses of recipients A, B, and C are all empty, they use their proposed value of 7 as the value of the proposal, and send an acceptance request with a proposal of [6, 7] to recipients A, B, and C.

在二阶段的Accepted接受响应阶段, Recipients A, B, C, and
the proposal [1, 3] that received the acceptance request from the proposer P first, but since the proposal id is 1 less than their guaranteed proposal 6, the proposal [1, 3] will be accepted Persons A, B, and C refused. Subsequently, recipients A, B, and C received the acceptance request [6,7], and the proposal number 6 was not less than the guaranteed proposal 6, so the proposal [6,7] was passed,
that is, the value 7 was accepted, and the three recipients A consensus was reached on the value of 7.

Insert picture description here

3.2.3 Case 3: Livelock problem arising from concurrent proposals

  • In fact, each node 既可以是提议者也可以是接收者, because each node can be responsible for read and write requests. The node that receives the write request will initiate the proposal as a proposer, and then other nodes will vote as the receiver. So there may be multiple proposers at the same time Initiating a proposal caused a livelock problem
  • 所谓活锁问题就是所有提案都被拒绝并且一直持续下去, We know that the recipient will reject the current proposal when it encounters a proposal smaller than the current guaranteed proposal. The reason for the livelock is that under concurrent proposals, the current proposal only completes the first-stage request. Before preparing to enter the second-stage request, The proposal was covered by higher-level proposals issued by other proposers, resulting in rejection when the second-stage request was to be issued later. If the cycle continues in this way, it will cause a livelock.
  • A recipient of such experienced 提案[1]的一阶段 ⇒ 提案[2]的一阶段 ⇒ 提案[1]的二阶段 => 提案[3] 的一阶段 => 提案2的二阶段 ⇒ 提案[4]的一阶段 ⇒ 提案[3]的二阶段.....if the recipient A conversion process after all these states, then the proposal will be rejected recipient A, 2, 3. Similarly, as the cycle continues recipient A cross will never be able to accept the proposal.

Insert picture description here

4 Multi Paxos thought

  • Basic Paxos can only reach a consensus on a single value at a time, and it is prone to livelock problems under concurrent proposals.
  • In fact, we are not that the proposer sends a proposal request to each recipient individually, but after sending a recipient, it will replicate and synchronize to other recipients. So BasePaxos only has replication, and replication is required in both stages. Cause too many RPC calls.
  • Multi Paxos thought mainly divided into two Basic Paxos, 第一个Basic Paxos叫选举, 第二个Basic Paxos叫复制. Elected a leader needs to reach a consensus through a Basic Paxos, a proposal to resolve the conflict by introducing the leader as the only proposer, then all recipients to the proposal leader is Standard, the leader can directly let the acceptor Acceptor to accept the request, that is, it does not need to go through the first stage of preparation, but directly enters the second stage of submission, while omitting the process of state replication.

election:

  • Recipients send internal proposals to vote for who is the leader. The election negotiation process is also a Base Paxos model (the general process is the same as above), and a new leader is elected after a two-stage submission.

copy

  • When the leader is elected, the proposer will give all the proposed proposals to the leader, and the leader alone will decide whether to pass the proposal. After the proposal is passed, the leader will replicate and synchronize the proposal to other acceptors to directly accept it.

Insert picture description here

10. Reward

If you find the article useful, you can encourage the author (Alipay)

Insert picture description here

Guess you like

Origin blog.csdn.net/weixin_41347419/article/details/114647423