[Distributed] Graphical Raft Protocol

1. Distributed Consensus

When there is only 1 node in the system (the picture below is blue, you can understand it as a database for storing data). If the client (green) sends 1 x=8 data to the database, the database will immediately change X to 8. Since there is only one database node, the value of X will soon be agreed: x=8.

But, what if there are multiple database nodes? As shown in the figure, assume that the value of x in the three database nodes is 7. The client sends a request to change x=8, and 2 of the 3 database nodes are updated successfully, x=8; the other node fails to update, and the stored x is still 7. In this way, for the value of the same variable x, the values ​​stored in the three nodes are different. This is the problem of distributed consensus.
Insert picture description here

2. Distributed Consensus Solution-Raft Protocol

Raaft is a protocol used to solve distributed consensus. Three roles are defined in the Raft protocol:

  • Follower
  • Candidate
  • Leader
    Insert picture description here

First, we briefly introduce the two most important parts of the Raft protocol: Leader election and log replication.

1. Leader election process (Leader Election)

The initial state of each node is a follower. When a follower node no longer listens to the leader node, then the node becomes a candidate.

Then, the candidate node will send a voting request to all nodes, requesting them to vote for themselves, so that they can be elected as the leader node, and each node will return a voting response after receiving the request. If the candidate node gets the votes of most nodes, then he becomes the new leader.

The process is as follows (the node in the lower left corner is the Candidate node, initiates a vote, and becomes the Leader node after receiving most of the node responses):
Insert picture description here

2. Log Replication

The client's request will first be processed by the Leader node, and each data modification will be put into the node's log, in an uncommitted state (for example, the value of X in the Leader node at this time is still the previous value).

After that, the leader node will send the modified data to the follower node. After the follower node receives the data sent by the leader node, it will write it into its own log segment, and then respond to the leader node.

When the leader node receives the write completion response of most of the follower nodes, the leader node will submit, change the value of x to a new value: 5. Then notify each follower to submit the log.

After the above steps, the nodes in the cluster reached a consensus on the value of data X, that is, X = 5.
Insert picture description here

Third, the detailed process of the Leader election

There are two timeout controls in the Raft algorithm:

  • Election timeout (election timeout)
  • Heartbeat timeout

1. The election timeout (election timeout)

When the Server starts, the initial state of each node is Follower, and each node will set itself an election timeout period (usually 150~300ms);

If a heartbeat message from Leader or Candidate is received within the timeout period set by each, the timer will be restarted; otherwise, an election process will be initiated and a request for election to other servers will be sent (the node that has timed out first initiates a vote);

If other nodes do not vote during the term numbered 1, they will vote for node A and increase their term number.

for example:

Assume that the election timeouts of the three nodes of Node A, Node B, and Node C are 200ms, 210ms, and 150ms, respectively.

If the heartbeat message is not received, the election timeout of Node C expires first, and Node C will vote for itself, Vote Count: 1, and the term Term is also set to 1 (Term of each node is 0, and each vote is added 1),

Then initiate a request to [choose me to be the leader] to other follower nodes: Node A and Node B. If no other nodes vote during the term (Term = 1), they will vote for Node C and reset their election timeout; otherwise, they refuse to vote. When Node C receives the majority of votes, it becomes the leader.
Insert picture description here

2. Heartbeat timeout

When Node C becomes the Leader, it will continuously send [Add Log] messages to the follower nodes Node A and Node B at the interval of heartbeat timeout. During this period, Node A and Node B will receive Node A's request and append to their respective In the log.
Insert picture description here

3. Re-election

Let us look at the process of re-election if the Leader node goes down.

If the follower does not receive the heartbeat message sent by the leader within the heartbeat timeout time, the node that has timed out first becomes the candidate, and the election process is re-initiated.

Suppose that the leader node Node B is down, and Node A and Node C do not receive the message sent by the leader node within the heartbeat timeout time. Node C takes the lead to time out and become a candidate, and then initiates a vote request of [Pick me as Leader] to other nodes, and The term of Term is upgraded to 2.

During the term of office 2, Node A did not respond to other voting requests, so it can vote for Node C. Node C gets a majority of votes (including one vote for itself), so Node C becomes the new leader.
Insert picture description here

Suppose there are 4 nodes, and two of them initiate the election process at the same time and obtain the same votes at the same time. What should we do at this time?
Insert picture description here

As shown in the figure above, Node B and Node D have timed out and become candidates at the same time, and then initiate a voting request at the same time. Node B and Node D obtain the votes of Node A and Node C respectively during the term of 4, so that Node B and Node D get With the same number of votes, the leader cannot be elected and will continue to wait.

After the next heartbeat timeout, Node C takes the lead to time out and initiate a voting process, and finally obtains the majority of votes to become the new Leader node.

Fourth, the detailed process of log replication

In the Raft algorithm, after the Leader node receives a request from the client, it will copy and apply log entries to the state machine during processing. In order to achieve data consistency, the log is synchronized to the follower node, and the copy data exists in the form of a log. The log in Raft must be continuous.
Insert picture description here

Let's take a look at the detailed process of log replication and the process of sending [additional entries].

  • First, the Leader node Node A receives the value modification request sent by the client (green node): change X to 5, and the Leader node (Node A) changes the value of x to 5 but does not submit the change, but creates a new one Log entries are appended to the local log. Then send this change to the Follower node.

  • Follower nodes (Node B, Node C) will copy the log to the local after receiving the request, and return a response to the leader node;

  • When Node A receives the response from most of the follower nodes, it will submit its own local changes (x=5), and all the follower nodes will submit the changes at the same time;

  • After the leader applies the log item to the log item, there is no need to send a message to notify the follower, but through a heartbeat response or the next log RPC message to notify the follower to apply the log item to its local state machine;

In this way, the three nodes are consistent with the value of x=5.
Insert picture description here

Let's send another ADD 2instruction to review the above process:
Insert picture description here

Five, Raft handles network partitioning

If there is a network partition, Raft also handles it?

Assuming that the initial state is as follows, Node B is the Leader node and continuously sends heartbeat messages to each Follower node.
Insert picture description here

Suppose there is a network partition, Node A, Node B are in one network, and Node B, Node C, and Node D are in another network. When Node C, Node D, and Node E fail to receive the heartbeat message of Node B, they will initiate an election. Suppose Node D gets 3 votes to become the leader.

At this time, the Leader nodes of the two partitions are Node D and Node B respectively.
Insert picture description here

When adding a client node. When client 1 (the green node below) tries to change x to 3, it sends a request to partition 1. The Node B node receives the request and sends it to all Follower nodes. Since it can only receive the response from Node A, the submission fails and x=3cannot be modified successfully.

When client 2 (the green node above) tries to change x to 8, it sends a request to partition 2. Node D receives the request and sends it to all Follower nodes, and receives responses from Node C and Node D, and the modification is successful.
Insert picture description here

When the network is restored, after Node B finds that there is a higher term (you will understand that there is a new leader), it will stop replication, Node B and Node A will roll back the previously uncommitted logs, and will synchronize the new leader Node D log.
Insert picture description here

Guess you like

Origin blog.csdn.net/noaman_wgs/article/details/108260615