Big Data Foundation (5) - Raft Protocol

1. Overall introduction

  Different from Paxos, on the premise of satisfying similar consistency functions, the Raft consensus protocol has two main goals, one is understandability, and the other is system realizability. Raft has a clear definition and description of each technical detail, which also facilitates a clearer implementation of the system.

2. The main implementation ideas of the Raft protocol

  In order to achieve the above two goals, the Raft protocol mainly has the following two ideas:

  1. The entire consensus protocol is divided into three clear sub-problems, namely: leader election, Log replication and security.
  2. Transform the P2P mode of Paxos into the Master-Slave mode.

3. Introduction to basic concepts

  For a more in-depth understanding of the follow-up, first understand the following basic concepts.

3.1 Service state machine

  • At any moment, the servers in the cluster will only be in one of three states: Leader, Follower, and Candidate;
  • Under normal conditions, only one server in the Leader state acts as the leader in the cluster and is responsible for responding to all clients;
  • All other servers are in the Follower state, passively receiving RPC messages, and not actively sending any messages;
  • The Candidate state is the state that the Follower state server needs to transition to before it is ready to initiate a new leader election;

  The state transition diagram is as follows:

Raft state transition diagram

3.2 Term

  Raft divides the entire system execution time into a sequence of time segments with different time interval lengths. Each segment becomes a Term, and Term is an incremental number.

  • Each Term starts with Election, during which several servers in the Candidate state compete for the new leader;
  • If a server wins the election, it will act as the new leader for the rest of this Term;
  • There may be a failure to elect a new leader in some Term due to vote splitting;
  • The Raft protocol can guarantee that at most one server will be elected as the new leader within one Term;

  The Term sequence diagram is as follows:

Term sequence diagram

3.3 Destruction

  There is more than one leader in the system at the same time, which is called a brain split. This is a very serious problem, which will lead to the loss of data coverage.

  The following two aspects of Raft guarantee this property:

  1. A node can only vote at most one vote in a certain term;
  2. Only the node with the majority of votes becomes the leader;

3.4. Master-Slave mode

  As mentioned earlier, the Raft protocol is to transform the P2P mode of Paxos into the Master-Slave mode. A large part of the complexity of Paxos is caused by the P2P mode. There is no primary and secondary relationship between multiple concurrent processes, and they have the same status. In this role, Raft performs mode transformation by electing a leader, which simplifies the maintenance of consistency. If the leader fails, a new leader will be re-elected to continue the follow-up tasks.

4. Three independent sub-problems of Raft

4.1 Leader Election

  Raft's election is based on the heartbeat mechanism. When the entire system starts, all servers are in the Follower state. Unless the server receives an RPC command from a server in the Leader or Candidate state, the previous state will remain unchanged.

  Leader will periodically send heartbeat packets to all nodes to maintain its authority. If the follower does not receive any heartbeat information after a period of time (election timeout), it can be considered that the leader no longer exists, and a new leader election process is triggered.

  Before the election, prepare for the following:

  1. Follower increases its Term number and transfers to Candidate state;
  2. Send a RequestVote RPC message to all other servers in the cluster;

  Next, it will remain in the Candidate state, waiting for the following to happen:

  1. won this election;
  2. Another server S declares and confirms itself as the new leader;
  3. After a certain period of time, if there is still no new leader, enter the next Term;

  The following is a detailed explanation of the above three situations

  • Case 1: If Candidate receives votes from most other servers with the same Term, it wins the election to become the new leader, and then sends RPC heartbeats to other servers to announce and maintain its leader status. By the way, the voting server intelligently votes for one of the electors.

  • Case 2: Candidate may receive a new RPC message during the waiting process, which is sent by another server claiming to be the new leader. If the Term number in the RPC is greater than or equal to the Candidate's own Term number, the Candidate recognizes that the new leader is valid, and turns itself into a Follower state. Otherwise, Candidate refuses to recognize the new leader and maintains the Candidate state.

  • Case 3: If there is a tie vote, everyone will wait until the timeout expires, and then re-initiate the election. This also prolongs the unavailable time of the system, so raft introduces a leader election timeout mechanism to avoid tie votes as much as possible. At the same time, in the leader-based consensus algorithm, the number of nodes is an odd number, and the leadership the emergence of.

4.2 Log Replication

  All client requests are responded to by the leader.

  1. After the leader receives the operation command from the client, it will be appended to the end of the Log as a new item;
  2. Send an AppendEntries RPC request to all other servers in the cluster, causing other servers to copy new operation commands;
  3. When other servers have safely copied the new operation command, the leader applies the operation command to the internal state machine and returns the execution result to the client;

  The log structure diagram of the server is as follows:

Log structure diagram of the server

  Each Log in the above figure contains 2 items:

  • operation command
  • Term number: This is the Term ID when the leader receives the operation command

  At the same time, there is also a global index to indicate the sequential number of the Log item in the Log.

  The leader will decide which Log items can be safely applied to the state machine, and the items applied to the state machine are called committed items.

  Raft guarantees the persistent storage of these committed items and allows all servers to execute these operation commands in the same order.

  Raft guarantees the security requirements of the consensus protocol through the following two measures:

  1. In the Logs of different servers, if two Log items have the same global index number and the same Term number, the corresponding operation commands of these two items must also be the same;
  2. In the Logs of different servers, if two Log items have the same global index number and the same Term number, then all predecessor Log items before this item in the Log are exactly the same;

4.3 Safety constraints (Safety)

  Although the above two measures can guarantee the normal operation of Raft under normal circumstances, they cannot guarantee complete security, that is, they cannot guarantee that each state machine executes the same operation commands in the same order.

  Let's take an example. Suppose a follower is in an invalid state while the leader submits several operation commands. After the follower replies, it is elected as the new leader. At this time, the new leader will overwrite the old one with its own log. The log information of the leader cannot achieve security guarantee at this time. The old leader has reflected its newly added operation commands on its own state machine, and the new leader has not stored these operation commands, so other servers do not. It is possible to apply these sequences of operations, resulting in an inconsistent state of the state machine.

  In order to achieve real security, Raft adds the following two constraints:

  1. Constraints limit which servers can be elected as leaders, and only those servers whose Log contains all submitted operation commands have the right to be elected as new leaders;
  2. Constraints limit which operation command submissions can be considered as real submissions. For the new leader, only if it has submitted the operation commands of the current Term, it is considered a real submission;

  Of course, the Raft protocol may still have situations where security cannot be guaranteed under special circumstances, and it still has room for further improvement and perfection.


  reference article

https://zhuanlan.zhihu.com/p/404786050
https://blog.csdn.net/yangmengjiao_/article/details/120191314
https://www.cnblogs.com/sfzlstudy/p/16015450.html
https://blog.csdn.net/joyblur/article/details/119323442
https://github.com/etcd-io/etcd/tree/main/raft

Guess you like

Origin blog.csdn.net/initiallht/article/details/124279998