Big Data Foundation (4) - Paxos Protocol

1. Overall introduction

  The previous article introduced the commonly used consensus protocols. Since the Paxos protocol is more complicated in the consensus protocol, it is introduced separately for easy understanding.

  The Paxos algorithm has a very important position in the distributed field, but the Paxos algorithm has two obvious shortcomings: 1. Difficult to understand 2. Engineering implementation is more difficult.

2. Background of the Paxos Protocol

  The Paxos algorithm is a consensus algorithm based on message passing and highly fault-tolerant. It is currently recognized as one of the most effective algorithms for solving distributed consistency problems.

  There are many articles on the Internet explaining the Paxos algorithm, and the quality is uneven. The best material for learning Paxos is the paper "Paxos Made Simple", followed by Wikipedia's introduction to Paxos. Those who are interested can learn more about it.

  Mike Burrows, author of Google Chubby, said: "All consensus protocols are essentially either Paxos or a variant of it".

  In a common distributed system, there will always be situations such as machine downtime or network anomalies (including message delay, loss, duplication, disorder, and network partition). The problem that the Paxos algorithm needs to solve is how to quickly and correctly reach a consensus on the value of a certain data within the cluster in a distributed system where the above-mentioned abnormalities may occur, and ensure that the entire system will not be damaged regardless of any of the above-mentioned abnormalities. consistency.

  Note: The value of a certain data here is not just a certain number in the narrow sense, it can be a log or a command. Depending on the application scenario, the value of a certain data has different meanings.

3. Replica state machine model

  In a distributed environment, the consensus protocol generally uses a replica state machine for abstract expression.

  The following describes the Log replica, which is a typical way to implement a replica state machine.
replica state machine

  Each of the multiple servers in the cluster keeps a copy of the Log and the internal state machine. The operation instructions sent by the client are sequentially recorded in the Log, and the server executes the instructions in the Log one by one to reflect them on the internal state machine. If each machine is guaranteed The content of the copy of the Log in the log is completely consistent, and the corresponding state machine can also ensure the consistency of the overall state. The role of the consistency protocol is to ensure the consistency of each Logbook data.

  Implement the consistency protocol in the replica state machine for the following purposes:

  1. Security guarantee: the state machine never returns wrong results, and only one of multiple proposals is selected;
  2. Availability guarantee: As long as most servers are normal, the entire service remains available. For 2n+1 replica state machine configurations, up to n state machine failures can be tolerated;
  3. Under normal circumstances, most state machines can quickly notify the client that the operation is successful, and avoid slowing down the response speed of the entire request by a few of the slowest state machines;

4. Basic concepts

Paxos is divided into two types:

  • Single-Decree Paxos (Single-Decree Paxos) : decides a single Value.
  • Multi-Paxos (Multi-Paxos) : Multiple Values ​​are continuously decided, and the order on each node is guaranteed to be completely consistent. Multi-Paxos is often the result of concurrent execution of multiple single-Paxos protocols.

Paxos has 3 roles, which are as follows:

  • Proposer: Proposer can put forward a proposal (value or operation command, etc.) for voting. There can be multiple Proposers. Proposer proposes a motion (value). Value can be any operation, such as "setting the value of a variable to value ", different Proposers can propose different values, for example, a Proposer proposes to "set the variable X to 1", and another Proposer proposes to "set the variable X to 2", but for the same round of Paxos process, at most only one value is proposed approve;
  • Acceptor (Acceptor) : It is possible to vote on the proponent's proposal and select the only one from many proposals. There are N Acceptors, and the value proposed by the Proposer must be approved by more than half (N/2+1) of the Acceptors Only then can it pass. Acceptors are completely peer-to-peer and independent;
  • Learner (Learner) : No initiative voting rights, you can know which proposal was finally selected from the recipients, as mentioned above, as long as more than half of the accpetors pass, it can be passed, then the purpose of the role of learner is to pass the certainty value Synchronize to other undetermined Acceptors;

Parallel process : corresponds to the consistency module of each server on the replica state machine.

Non-Byzantine Model in Asynchronous Communication Mode (Non-Byzantine Model)

  1. The behavior of concurrent processes can be executed at human speed, allowing failures to be run, and perhaps restarting and running again after failure;
  2. Concurrent processes send information communication asynchronously. The communication time can be arbitrarily long, and the information may be lost during the transmission process. It is also allowed to send the same information repeatedly, and the order of multiple information can be arbitrary. It should be noted that the information is not allowed to be tampered with;

5. Paxos protocol process

  The proposer will initiate a proposal (value) to all accpetors. After more than half of the accpetors are approved, the proposer will write the proposal into the accpetor. Finally, all accpetors will obtain a consistent and deterministic value, and subsequent modifications are not allowed.

  The agreement is divided into two phases, each phase is divided into two steps A/B:

5.1 Preparation stage (pit occupation stage)

The first phase A : Proposer selects a proposal number n, and broadcasts a Prepare(n) request to all Acceptors;

The first stage B : Acceptor receives the Prepare(n) request, if the proposal number n is larger than the previously received Prepare request, it promises not to accept the proposal with the proposal number smaller than n, and bring the previous Accept proposal The largest proposal whose number is less than n, otherwise it will be ignored;

5.2 Acceptance Phase (Submission Phase)

The second phase A : the most critical point of the whole protocol: the Proposer gets the Acceptor response

  • If no more than half of the accpetors respond, it will directly turn into a proposal failure;

  If the commitment of the majority of Acceptors is exceeded, the following situations apply:

  • If all Acceptors have not received the value (all are null), then send your own value and proposal number n to all Acceptors;
  • If some Acceptors have received values, select the one with the largest corresponding proposal number from all accepted values ​​as the proposed value, and the proposal number is still n, but at this time the Proposer cannot propose its own value, and can only trust the Acceptor to pass The value of , maintaining the principle that once a deterministic value is obtained, it cannot be changed;

The second stage B : After the Acceptor receives the proposal, it receives any Accept request with the Proposer number n, and the Acceptor accepts the request, which triggers that during this period, the Acceptor has responded to a Prepare request with a higher number than n.

5.3 The essence of Paxos protocol

  Understanding the following two points basically understands the essence of the Paxos protocol:

  1. Understand the processing flow of accpetor in the first stage: if it has been written locally, it will no longer accept and agree to all subsequent requests, and return the value written locally; if it has not been written locally, record the version number of the request locally, and No longer accept requests for other version numbers. In short, only trust the last submitted version number request, and invalidate the writing of other version numbers;
  2. Understand the processing flow of the proposer in the second stage: If more than half of the accpetors respond, the proposal fails; if more than half of the accpetor values ​​are empty, submit the value you want to write, otherwise choose the value with the largest version number among the non-null values ​​to submit, and the largest The difference is whether the submitted value is its own or uses a previously submitted one.

6. Protocol Example

  The simplest example: In the case of 1 processor, 3 Acceptors, and no learner, the proposer writes the name variable as v1 to the 3 acceptorts, as shown in the figure below:

Paxos protocol example

The execution process is as follows:

  • The first stage A : the proposer initiates prepare(name, n1), n1 is the incremental proposal version number, and sends it to 3 Acceptors, saying, I want to write the variable name now, and my version number is n1
  • The first stage B : Acceptor receives the proposer's message, compares the content saved internally, finds that the name variable (null, null) has not been written before and has not received a proposal, returns it to the proposer, and records the name internally This variable has already been proposed by a proposer, and the proposed version number is n1;
  • The second phase A : The proposer receives 3 responses from the Acceptor, and the response content is: the name variable has not been written yet, you can write it. The proposer confirms that more than half of the acceptors agree, and initiates the second phase of the write operation: accept (v1, n1), telling the acceptor that I am going to name variable agreement v1, and my version number is n1 that has just been approved;
  • The second stage B : accpetor receives accept (v1, n1), compares its own version number to be consistent, saves successfully, and responds accepted (v1, n1);
  • Result stage : the proposer receives 3 accepted responses successfully, and more than half of the responses are successful, so the name variable is determined to be v1;

  reference article

Guess you like

Origin blog.csdn.net/initiallht/article/details/123991713