Raft Graduation Project - Design and Implementation of Consensus Algorithm Raft Based on Raft + Blockchain (Graduation Thesis + Program Source Code) - Consensus Algorithm Raft

Design and implementation of consensus algorithm Raft based on Raft+blockchain (graduate thesis + program source code)

Hello everyone, today I will introduce to you the design and implementation of Raft, a consensus algorithm based on Raft+blockchain. At the end of the article, the thesis and source code download address of this graduation project are attached. Friends who need to download the proposal report PPT template and thesis defense PPT template, etc., can go to my blog homepage to view the self-service download method in the bottom column on the left.

Article directory:

1. Project introduction

  1. Blockchain, as the underlying supporting technology of the currently popular Bitcoin, integrates distributed data storage, P2P transmission, consensus algorithm, encryption and other computer technologies. The most important of these is the consensus algorithm. For open network environments such as public chains, PoW is often used, while for relatively closed network environments such as private chains, Raft or Paxos is often used because all nodes are trustworthy. This article mainly focuses on the blockchain consensus algorithm to compare the similarities, differences and usage scenarios of commonly used algorithms, and provides a specific implementation of Raft to understand how the blockchain reaches consensus in a distributed environment. It implements two core modules of Raft: leader election and log replication.


2. Resource details

Project difficulty: medium difficulty
Applicable scenario: graduation project on related topics
Word count of supporting paper: 12,940 words and 30 pages< a i=3>Contains: full set of source code + completed thesisRecommended download method for ppt templates such as proposal report, thesis defense, project report, etc.:


Insert image description here


3. Keywords

Raft, blockchain, consensus algorithm, leader election, log replication

4. Introduction to Bishe

Tip: The following is a brief introduction to the graduation thesis. The complete source code of the project and the download address of the complete graduation thesis can be found at the end of the article.

1 Introduction
1.1 Introduction to Blockchain
Blockchain technology originated from a paper "Bitcoin" written by "Satoshi Nakamoto" in 2008. : A peer-to-peer electronic cash system", there is currently no strict industry-recognized definition of blockchain. But in a narrow sense, it is a chained data structure that combines data blocks in a sequential manner in chronological order. It is somewhat similar to a data structure such as a linked list in computer science, except that each element of the linked list The structure of each node (here called a block) is more complex, and the connection method of each node is also different from that of a linked list. From a broad perspective, blockchain actually uses block chain data structures to verify and store data, uses distributed node consensus algorithms to generate and update data, uses cryptography to ensure the security of data transmission and access, and uses Smart contracts composed of automated script codes are a new distributed infrastructure and computing method for programming and operating data.

To put it simply, blockchain is a decentralized distributed database. Why is it called a database? Because its main function is to store information, any information that needs to be saved can be written to the blockchain or read from it. It is readable and writable, so the blockchain is a database. Why is it said to be decentralized? Because anyone can set up a server, join the blockchain network, and become a node in it. Unlike many current networks that have a manager, the blockchain has ensured from the beginning that there is no manager and cannot be Anyone controls it. There is no central node in this network. Each node is equal and saves the entire database. You can read and write data to any node. In the end, all nodes will be synchronized to maintain the final consistency of the blockchain network.
Insert image description here

Schematic diagram of blockchain. Each block is distributed all over the world and is connected into a chain by the network.

The blockchain is composed of blocks, which are connected into a chain. So what does each block look like? How are they connected into a chain?
In the blockchain network, data will be permanently recorded in the form of files. We call these files blocks. A block is a set of records of some or all of the latest Bitcoin transactions that have not been recorded by other previous blocks. Its specific structure is shown in the figure below:
Insert image description here

Each block includes a constant called a magic number, the size of the block, the block header, the number of transactions included in the block, and some or all of the recent new transactions. In each block, the block header plays a decisive role in the entire blockchain. The structure of the block header is shown in the figure below:
Insert image description here

In summary, each block mainly contains two parts. One is the block header, which is the picture above, which records the meta-information of the current block, and the other is the block body, which records the actual data. Each block header contains the Hash of the previous block, so that every time a new block comes, this new block points to the previous block through this Hash to achieve the effect of connecting it into a chain.

As mentioned before, the block header contains a lot of content, including the hash of the current block body and the hash of the previous block. This means that if the content of the current block body changes, or the hash of the previous block changes, it will definitely cause the hash of the current block to change. This has great implications for blockchain. If someone modifies a block, the hash of the block changes. In order for subsequent blocks to still be connected to it (because the next block contains the hash of the previous block), the person must modify all subsequent blocks in sequence, otherwise the modified block will be removed from the blockchain . Since the calculations designed in the blockchain are time-consuming, it is almost impossible to modify multiple blocks in a short period of time unless someone controls more than 51% of the computing power of the entire network. Therefore, the blockchain guarantees its own reliability, and once the data is written, it cannot be tampered with.

It is precisely because of the various technical characteristics introduced above that blockchain can support the digital cryptocurrency system represented by Bitcoin. For blockchain, its core advantage is decentralization. It can realize point-to-point transactions based on decentralized credit in a distributed system by using data encryption, consensus algorithms, distributed data storage and economic incentives. This provides a solution to the common problems of high cost, low efficiency and insecure data storage in centralized institutions. With the rapid development and popularity of Bitcoin in recent years, the research and application of blockchain technology have also shown an explosive growth trend. It is considered to be the prototype of the next generation of cloud computing, and is expected to completely reshape the form of human social activities like the Internet, and Realize the transformation from the current information Internet to the value Internet.

1.2 Paper content arrangement
For blockchain, its infrastructure model is shown in the figure below:

Insert image description here

Generally speaking, a blockchain system consists of a data layer, a network layer, a consensus layer, an incentive layer, a contract layer and an application layer. Among them, the data layer encapsulates the underlying data blocks and related data encryption and other technologies; the network layer includes distributed networking mechanisms, data dissemination and verification mechanisms, etc.; the consensus layer mainly encapsulates various consensus algorithms of nodes; the incentive layer combines economic factors with Integrated into the blockchain technology system, it mainly includes the issuance mechanism and distribution mechanism of economic incentives; the contract layer mainly encapsulates various scripts, algorithms and smart contracts, which is the basis of the programmable features of the blockchain; the application layer encapsulates Various application scenarios and cases of blockchain.
This article is the implementation of the blockchain-based consensus algorithm Raft. It mainly focuses on the most important consensus layer in the blockchain. As for other contract layers, application layers, etc., it will not be involved. The main arrangement of the paper is as follows:
Chapter 1 introduces the overall situation of the blockchain, mainly involving the related technologies and development of the blockchain.
Chapter 2 will focus on the consensus layer of the blockchain, which mainly includes common consensus algorithms such as public chains and private chains, and will compare and analyze the similarities, differences, and applicable scenarios of various consensus algorithms.
Chapter 3 will introduce Raft in the consensus algorithm in detail, and give the specific design and implementation of Raft. Finally, Raft will be tested as a whole through MIT’s testing framework and its performance will be measured.
Chapter 4 summarizes the project, mainly lists some problems encountered in learning and implementing Raft and how to solve them, and summarizes what areas need improvement.

2 Consensus Algorithm
Blockchain is first of all a distributed system. It has evolved from a traditional single-node structure (CS model) to a multi-machine distributed system. The most important one is The problem is how to ensure the data consistency of multiple machines to ensure that the system reaches a consensus. These are often guaranteed through certain protocols (or consensus algorithms). Generally speaking, consensus algorithms can be divided into traditional distributed consensus algorithms and blockchain-original consensus algorithms. In some specific scenarios, such as private blockchains, using traditional distributed consensus algorithms will be more efficient. , the effect will be better. Therefore, in a sense, traditional distributed consensus algorithms are also a subset of consensus algorithms. This chapter will mainly discuss traditional distributed consensus algorithms represented by Raft and PBFT and unique blockchain algorithms such as Pow and PoS. The similarities and differences of consensus algorithms, and what scenarios they are suitable for in the blockchain.
2.1 Similarities and differences between traditional distributed consistency consensus algorithm and blockchain-unique consensus algorithm
Similarities:
Append Only, Once the data is written, it cannot be modified. Only additional logs or new blocks are added
. The minority obeys the majority principle. Once the majority of nodes reach a consensus, it can be judged that the system has reached a consensus. For the problem of separation coverage, generally It is a long chain covering a short chain, and multiple nodes covering a few node logs

Differences:
Most traditional distributed consensus algorithms do not consider Byzantine fault tolerance (except Byzanetine Paxos), that is, it is assumed that all nodes only experience non-human problems such as downtime and network failures. It does not consider the problem of malicious nodes tampering with data, and the unique consensus algorithms of the blockchain almost all consider Byzantine fault tolerance
The traditional distributed consensus algorithm is log (database) oriented, which is more general situation, and the blockchain consensus model is transaction-oriented, so to a certain extent, the traditional distributed consistency consensus algorithm can sometimes be counted as the lower layer of the blockchain consensus model.

2.2 Applicable Scenarios
Blockchain is divided into three categories, which are introduced in detail in the book "Blockchain: Defining the New Pattern of Future Finance and Economics" issued by Currency. Mainly public chain, private chain, industry chain (alliance chain). Among them:
Shared chain:
An open ecological network, any individual or group in the world can send transactions, and the transactions can be effectively confirmed by the blockchain , anyone can participate in its consensus process. The public blockchain is the earliest blockchain and the most widely used blockchain. The virtual digital currencies of all major Bitcoin series are based on the public blockchain.

Private chain:
A closed ecological network that only uses the general ledger technology of the blockchain for accounting. It can be a company or an individual who has exclusive access to the blockchain. With write permissions, this chain is not much different from other distributed storage solutions.

Industry chain (alliance chain):
A semi-closed ecological network, where multiple pre-selected nodes are designated as bookkeepers within a certain group, and each block is generated by all The pre-selected nodes jointly decide (the pre-selected nodes participate in the consensus process). Other access nodes can participate in the transaction, but do not participate in the accounting process (it is essentially still managed accounting, but it becomes distributed accounting. How to determine the number of pre-selected nodes? The bookkeeper of each block becomes the main risk point of the blockchain), and anyone else can conduct limited queries through the open API of the blockchain.

Since the private chain is a closed ecological network, it means that using the traditional distributed consistency consensus algorithm should be optimal. For the semi-closed and semi-open nature of the alliance industry chain, it is optimal to use Delegated Proof of Stake. You can also consider adding Byzantine fault tolerance/security protection mechanisms based on the traditional consensus algorithm for improvement. Public chain PoW and PoS should be better choices, and Bitcoin uses the PoW consensus algorithm.

2.3 Various consensus algorithms
As mentioned earlier, for private chains, since it is a closed ecological network, every node in it is trustworthy and adopts methods similar to PoW. , PoS and other algorithms are not very efficient, and it is often better to use traditional distributed consensus consensus algorithms such as Raft, PBFT, and Paxos. For traditional distributed consistency consensus algorithms, the following mainly introduces Raft and PBFT. As for Paxos, although it is also a traditional distributed consistency consensus algorithm, it is difficult to understand and should not be used in engineering, and the Raft algorithm itself is an A simplified improvement of Paxos, there are many similarities, so Paxos will not be introduced. If you are interested, you can check the relevant information by yourself.
The environment in which cryptocurrency systems like Bitcoin operate is often a shared chain. They are global development ecological networks, and the problems they face are extremely complex. Traditional distributed consistency consensus Algorithms are often not sufficient, so consensus algorithms such as PoW, PoS, and DPOS are used. We only give a brief introduction to the most common algorithms.
2.3.1 Raft
Raft converts the consistency of the state into the consistency of the log by replicating the state machine. Simply put, for N databases, If their initial states are consistent and subsequent operations are consistent, then the final data will definitely remain consistent.
As shown in the figure below, the replication state machine is implemented by replicating the log. Each server saves a log, which contains a series of commands (such as x = 3, y = x, etc.), and the state machine executes these commands in sequence. Because the state machine of each computer is deterministic, the state of each state machine is the same, the commands executed are the same, and the final execution results are the same.

Insert image description here

Copy state machine

Raft divides the server into three roles: Leader, Follower, and Candidate, which can be converted into each other. It is mainly divided into two processes, one is leader election, and the other is log replication. When a distributed system starts, Raft elects a leader. Once the leader is elected, he is responsible for receiving logs from the client and copying the logs to other machines to ensure the consistency of the distributed system. More details about Raft, including leader downtime, network segmentation, and a series of other issues will be introduced in detail in Chapter 3: The Implementation of Raft.

2.3.2 PBFT
For Raft, it has a premise that Byzantine fault tolerance is not considered. The so-called Byzantine problem discusses the existence of a few nodes in a system doing evil (also That is, messages may be forged) How to maintain consensus in a distributed system.
Before introducing this algorithm, let’s briefly understand the Byzantine problem. It is a fictional model proposed by Leslie Lamport to explain the consistency problem. Byzantium was the capital of the ancient Eastern Roman Empire. Due to its vast territory, multiple generals guarding the border (equivalent to multiple nodes in the system) needed to pass messages through messengers and reach certain unanimous decisions. However, since there may be traitors among the generals (node ​​errors in the system), these traitors will try to send different messages to different generals in an attempt to interfere with the achievement of consistency. The BFT algorithm is used to solve the above problems, but this algorithm has been criticized for being too complex. It was not until the PBFT algorithm exponentially reduced the complexity to the polynomial level that it became widely used. This algorithm can guarantee Safety and Liveness when failed nodes do not exceed 1/3 of the total number, that is, n >= 3f+1, where n is the total number of nodes in the system and f is the number of nodes allowed to fail.
For example, there are 4 participants, one is elected as the corps commander and 3 division commanders. The army commander received an order from the commander-in-chief: march forward 500 kilometers. The corps commander will issue the order to the three division commanders. After receiving the news, the three division commanders will execute the order and report the results. Division commanders A and B said I was 500 kilometers east of the capital, and division commander C said I was 100 kilometers east of the capital. The army commander summarized the reports of the three division commanders and found that 500 kilometers east of the capital accounted for the majority (2 votes > 1 vote), so he ignored the report results of division commander C and reported to the commander-in-chief, okay, the troops are now in the capital. 500 kilometers east. This is the PBFT algorithm.

client: request a volunteer, in the above example it refers to the commander-in-chief.
replica: replica, all nodes participating in providing services, the above example refers to the army commander and division commander.
primary: the node that assumes the main responsibility of providing services, the above example is Commander
backup: other copies, but relative to the primary role. The above example refers to the teacher.
View: A relatively stable relationship in a primary-bakup scenario is called a view.

1. Request phase: The client sends a request to the primary node, such as the commander-in-chief giving an order to the army commander.
2. Pre-prepare phase: The primary node sends pre-preparation messages to all backup nodes. For example, the army commander says to the division commanders: This is my era (view), I am the army commander, all Everyone must listen to me. Now announce the commander-in-chief’s order.
3. Prepare phase: After backup node i accepts the pre-preparation message, it enters the preparation phase. At the same time as the preparation phase, the node sends prepare messages to all replica nodes, and writes pre-prepared messages and prepare messages to its own message log.
4.commit phase: The replica node receives 2f consistent pre-prepared messages from different replica nodes. A total of 2f+1 consistent pre-prepared messages confirms the correctness of the message, and then Execute requests sequentially according to sequence number n.

5 reply stage: result feedback.

2.3.3 PoW
The Raft and PBFT algorithms mentioned above are often applied to private chains. When faced with extremely complex public chains, probability-based algorithms are often used. The consensus of economic games can theoretically be overturned, but the price that attackers have to pay increases with time. For example, all miners in the Bitcoin network must pay the price of mining and consume computing power. Once it fails, these computing power will become a silent cost. Such consensus algorithms are represented by PoW and others. The PoW consensus algorithm requires guessing a value (nonce) through calculation so that the content Hash of the transaction data meets the specified conditions. A qualified Hash consists of N leading zeros, and the number of zeros depends on the difficulty value of network adjustment. To get a reasonable Hash requires a lot of trial and error calculations, and the calculation time depends on the hashing speed of the machine. When a node provides a reasonable Hash value, it means that the node has indeed gone through a lot of trial calculations. This mechanism can ensure that only a very small number of nodes in the entire system can write new blocks. If someone wants to maliciously destroy it, they need to pay The substantial economic cost almost outweighs the possible benefits of ensuring data consistency. If a data fork is encountered at the same time, that is, more than one data is written, the blockchain can automatically ensure that the longer chain is selected to continue and the short chain is discarded. This ensures the longest chain of the blockchain. consistency.
2.3.4 PoS
PoW can solve the consensus problem in the blockchain, but there is one thing. In order to obtain the Hash that meets the requirements, only a large number of Meaningless violent calculations are a waste of computer resources. In order to improve this shortcoming, the PoS algorithm was introduced. It is somewhat similar to the shareholder mechanism in real life. The more shares that people own, the easier it is to obtain bookkeeping rights. The typical process is to use a deposit to bet on legal blocks to become new blocks, and the income is the interest on the mortgage capital and transaction service fees. The more deposits provided, the greater the probability of obtaining accounting rights, and the legal accountants get benefits.

2.3.5 DPoS
PoS avoids the waste of resources caused by meaningless calculations, but choosing based on the equity balance will lead to the richest man’s account having too much power and may dominate the accounting right. The principle of DPOS is the same as that of POS. The main difference is that the node elects several agents, who verify and keep accounts. Its Chinese name is the Share Authorization Proof Mechanism (also known as the Trustee Mechanism). The principle is to allow everyone who holds BitShares to vote, resulting in 101 representatives. Any currency holder can participate in voting and election. Trustees in both processes. Users can vote and withdraw their votes at any time, and the weight of each user's vote is proportional to his or her currency holdings. Voting and withdrawal can be carried out at any time. After each round of election, the 101 users with the highest vote rate (usually 101, it can also be other numbers) become the trustees of the project, responsible for packaging blocks, Maintain the operation of the system and obtain corresponding rewards.
The fundamental purpose of the election is to elect the 101 users in the community who are most beneficial to the development and operation of the project through everyone's vote. These 101 users' server nodes can efficiently maintain the operation of the system, and they will also contribute their own capabilities to promote the development of blockchain projects. This is somewhat similar to our country's 'people's representatives' system (but the cycle is shorter and more efficient. high). In this way, decentralized election consensus is achieved while ensuring the operating efficiency of the entire system and reducing energy waste.

3 Design, implementation and testing of Raft
Chapter 2 gives a brief summary of various common consensus algorithms in the blockchain, and analyzes their application scenarios, similarities and differences . This chapter will introduce one of the consensus algorithms, Raft, in detail, and give specific design, implementation and testing plans.
3.1 Algorithm Introduction
Raft originated from a 2013 paper "In Search of an Understandable Consensus Algorithm (Extended Version)", which mainly solves the problem of How to ensure data consistency in a distributed environment so that the system can reach consensus. Prior to this, the Paxos algorithm occupied a dominant position in issues related to consistency in distributed environments, and the vast majority of consistency implementations were based on Paxos or influenced by it. However, the biggest shortcoming of the Paxos algorithm is that it is difficult to understand and apply in engineering practice, and it is under this circumstance that the Raft algorithm came into being. It can not only meet the consistency requirements of distributed environments, but more importantly, it is simpler and easier to understand than other algorithms. It can be widely used in engineering practice. At the same time, its security and efficiency are comparable to other algorithms.
In order to satisfy the understandability and practicality of the algorithm, the Raft algorithm uses two common technologies. The first technique is problem decomposition: decomposing a complex problem into several relatively independent, solvable and understandable sub-problems. For example, the Raft algorithm is divided into several sub-problems such as leader election, log replication, and security. The second method is to simplify the state space by reducing the number of states, making the system more coherent and eliminating uncertainty as much as possible. For example, Raft limits the ways to make the logs inconsistent.
Insert image description here

Server status diagram

A Raft cluster includes n servers, and it must ensure that at least n/2+1 machines are normal. For example, for a 5-server cluster, up to 2 machines can be tolerated not working properly, and the remaining 3 machines must be normal, so that the entire system can Maintain normal service. At any time, each server must be in one of the following three states: Leader, Candidate, and Follower. Under normal circumstances, only one server is the leader and the remaining servers are followers. Followers are passive: they do not send any requests and only respond to requests from leaders and candidates. If the follower receives no messages, it becomes a candidate and an election begins, with the candidate receiving the majority of server votes becoming the new leader. As for the leader, he will handle all requests from clients (if a client communicates with a follower, the follower will send the information to the leader). The leader will remain the leader until it crashes, at which point a new election will be held to select the leader and repeat the process. The server status diagram above illustrates this process very well.

Insert image description here

As shown in the figure above, the Raft algorithm divides time into terms of arbitrary lengths. Tenure is expressed as a consecutive number. The beginning of each term is a leadership election. After a successful election, a leader will manage the entire cluster during the term (normal operation). In some cases (no emerging leader), the votes are divided and it is possible that no leader is elected, then another term will begin and the next election will begin immediately. The Raft algorithm guarantees that there will be at least one leader in a given term.
Servers in Raft communicate through RPC. Basic Raft only requires 2 types of RPC. The RequestVote RPC is triggered by candidates during the election process, and the AppendEntries RPC is triggered by leaders.
3.1.1 Leader election
Raft uses a heartbeat mechanism (heartbeat) to trigger leader selection. When the server starts, all servers will be initialized into follower state. Servers will remain in this state as long as they receive valid RPCs from leaders or candidates. The leader will periodically send heartbeats (AppendEntries RPC without log entries) to all followers to prevent others from being elected. If a follower does not receive a heartbeat message within a period, called an election timeout, then it will assume that there is no leader and start an election to elect a new leader.
To start an election, a follower increments its current term and switches status to candidate. It then votes for itself and sends RequestVote RPCs to other servers in the cluster. There are three possibilities for this to happen:
1. It wins the election;
2. Another server wins the election; < a i=7> 3. After a period of time no server wins the election For 1, a candidate wins if it receives votes from a majority of servers in the cluster during a term election. A server can vote for at most one candidate during a term. Once a candidate wins the election, it becomes the leader. It then sends heartbeat messages to other servers to establish its leadership position. For 2, when a candidate is waiting for votes from others, it may receive AppendEntries RPCs from other servers declaring it to be the leader. If the leader's term is greater than the current candidate's current term, the candidate considers the leader legitimate and switches his status to a follower. If the term in this RPC is less than the candidate's current term, the candidate will reject this RPC and continue to remain a candidate. For 3, a candidate neither wins nor loses the election: if many followers become candidates at the same time, the votes will be dispersed and no candidate may win a large vote. majority of votes. When this happens, each candidate times out and starts a new election by incrementing its term number and issuing another round of RequestVote RPCs. In order to avoid this situation from happening or even continuing in an infinite loop, Raft uses a random election timeout to ensure it. The election timeout is randomly selected within a fixed interval (for example, 150~300ms). This mechanism ensures that in most cases only one server will time out first. It will win the election and send heartbeat information to other servers before they time out. The same mechanism is used if votes are initially split. Each candidate resets a random election timeout when starting an election and waits before timing out for the next election. This reduces the likelihood of votes being split at the outset in a new election. 3.1.2 Log Replication Once a leader is elected, it starts receiving requests from clients. Each client request contains a command that needs to be executed by the replicated state machine. The leader writes this command to the log, and then initiates AppendEntries RPC to other servers in parallel, asking other servers to copy this entry. Once the entry is safely replicated, the leader applies the entry to its state machine and returns the execution results to the client. Each log entry stores a command executed by the state machine and the term number when the log entry was received by the leader. The term number in the log entry is used to detect inconsistencies in the logs on different servers. It is used to detect whether the log entry is up to date. Each log entry also contains an integer index to represent its position in the log. The details are shown in the figure below:






Insert image description here

log chart

The leader decides when it is safe to apply a log entry to the state machine; such entries are said to be committed. Raft guarantees that committed log entries are persistent and will eventually be executed by all available state machines. Once an entry created by the leader has been replicated to a majority of servers, the entry is said to be in the committed state.
Raft must meet the following two characteristics for logs: First, if two entries in different logs have the same index and term number, then they store the same instructions. Second, if two entries in different logs have the same index and term number, then all previous log entries before them will also be the same. This is accomplished through a consistency check performed by the AppendEntries RPC, in which the leader includes the index position and term number of the previous log entry when sending the AppendEntries RPC. If the follower does not find an entry in its log containing the same index position and term number, then it rejects the new log entry. Therefore, whenever the AppendEntries RPC returns success, the leader knows that the follower's log must be the same as its own.
Under normal circumstances, the logs of the leader and the follower will always remain consistent. However, once the leader crashes, the logs may be inconsistent. In order to ensure the log consistency between the follower and the leader, the leader needs to find the place where the follower is consistent with its log, then delete the follower's entries after that position, and then send its own entries after that position to the follower. These operations are completed when AppendEntries RPC performs consistency checks. The leader maintains a nextIndex for each follower, which represents the index of the next log entry that the leader will send to the follower. When a leader is elected, it initializes nextIndex to its latest log entry index + 1. If a follower's log is inconsistent with the leader's, the AppendEntries consistency check will fail on the next AppendEntries RPC. After a failure, the leader will decrement nextIndex and retry the AppendEntries RPC. Eventually nextIndex will reach a place where the leader and follower logs are consistent. This removes the conflicting log entries from the followers and adds the missing leader's log entries. In this way, once AppendEntries returns successfully, the follower and leader logs will be consistent.

3.2 Algorithm design and implementation
3.2.1 Preparation work
The Raft algorithm can be implemented in a variety of languages, such as C++, Go, Python, Java, etc. Here, we use the Go language for implementation. The reasons are as follows: Go is a language with built-in garbage collection and type safety. It has good support for multi-threading (Goroutine) and also contains a very streamlined RPC libraries can be used directly, which can help us better implement Raft and focus on algorithm logic without having to worry too much about many details such as requesting to release memory, network communication, and concurrent programming. At the same time, the MIT6.824 course provides a Raft testing framework based on the Go language. This testing framework provides key interface descriptions for algorithm implementation, so it can reduce the difficulty of algorithm implementation and testing, and also improve the accuracy of algorithm testing. .
Therefore, based on the above reasons, the final experimental environment is Linux using the Go language to design, implement and test the Raft algorithm. At the same time, you need to download and configure a set of test frameworks that come with the MIT6.824 course. We need to complete the following steps:

  1. 安装Go语言和Git
    sudo apt-get install golang-go
    sudo apt-get install git
  2. Download and configure the Raft test framework
    git clone git://g.csail.mit.edu/6.824-golabs-2018 6.824
    cd 6.824
    export “GOPATH= P W D " c d " PWD" cd " PWD"c Raft uses RPC to complete network communication, and RPC is called remote Procedure call, which allows us to call remote services just like calling local services, shielding the details of network communication. The test framework of MIT6.824 provides a set of concise RPC interfaces without the need to implement them yourself. If you need to use it in Raft AppendEntries RPC or RequestVote RPC only needs to be called like this:
    After the above steps are successfully executed, the basic The development environment is configured. We need to do some more preparation work before the specific implementation of the Raft algorithm. The Raft algorithm is a consensus algorithm used to ensure the consistency of multiple machines in a distributed system, and network programming is essential for this. and concurrent programming, Go language has very good support for these two points, which is an important reason why Go language is used to complete Raft.GOPATH/src/raft”"d
func (rf *Raft) sendRequestVote(server int, args RequestVoteArgs, reply *RequestVoteReply) bool 
{
    
    
		ok := rf.peers[server].Call("Raft.RequestVote", args, reply)
		return ok
}
func (rf *Raft) sendAppendEntries(server int, args AppendEntryArgs, reply *AppendEntryReply) bool 
{
    
    
		ok := rf.peers[server].Call("Raft.AppendEntries", args, reply)
		return ok
}

Among them, the server locates the RPC request issued by the specific server, args represents the request parameters, reply represents the reply result, Call represents the call to the RPC interface, and the first string parameter represents the function name of the call.
Another important basic knowledge in Raft is concurrent programming, and the Go language provides good support. In a multi-threaded environment, a common way to avoid critical sections is to lock, but Go provides a set of automatic resource management methods similar to C++RAII:
mutex.Lock( )
defer mutex.Unlock()
// Here are the numbers that need to be read and written in a multi-threaded environment, such as i++
This can avoid possible deadlocks caused by abnormal program crashes. At the same time, Go itself also uniquely provides a set of Goroutine and Channel mechanisms to ensure the control of shared variables between multiple threads.

func Say(str string, done chan int) {
    
    
    fmt.Println(str)                                                                                                                 
    time.Sleep(3 * time.Second)
    done<-1
}              
func main() {
    
    
    done := make(chan int)
    go Say("hello", done)
    <-done
}    

In Go language, it is very simple to start a thread to perform corresponding operations. You only need to write go fun(). Each thread operates its own variables, so no synchronization is required. If you need to operate shared variables, use the channel To pass, Go's design philosophy lies in this: don't communicate through shared memory, but share memory through communication. In the above example, the main thread executes in the main function, starts another thread for execution through go Say, prints str and sleeps for 3s and then tells the main thread to complete through done. The channel of the main thread ensures that other threads have completed execution through <-done. , if other threads have not finished executing, they will remain blocked.
3.2.2 Specific design and implementation
The basic process of the Raft algorithm is to initialize each node (Raft structure) and start the leader election. Once the leader election is successful, accept customers End request, distribution log requires each node to copy the log, and finally reaches the committed state. Once the leader goes down, it needs to be re-elected and the above process is repeated. The flow chart is as follows:

The first is the initialization of Raft. According to the paper, a Raft node needs to save at least the following states:

Therefore, Raft’s data structure is defined as follows:

type Raft struct {
    
    
	mu        sync.Mutex 
	peers     []*labrpc.ClientEnd // 用于保存集群里的所有机器的数组
	persister *Persister // 用于持久化
	me        int // 自己
	// Persistent state on all servers
	currentTerm int
	votedFor    int
	logs        []LogEntry

	// Volatile state on all servers
	commitIndex int
	lastApplied int 

	// Volatile state on leaders
	nextIndex  []int
	matchIndex []int 

	voteNum int // 统计投票的票数
	state   string
	applyCh chan ApplyMsg
	timer *time.Timer // 用于设置定时器
}

At the beginning, you need to call the Make function to initialize the Raft structure. After the initialization is completed, all nodes are in follower status. After a period of time (usually a random value from 150ms to 300ms), one node will start to initiate an election. First switch back to the candidate's identity, and then send RequestVote RPC to multiple nodes in the cluster in parallel. The key code is as follows:

	rf.state = CANDIDATE
	rf.currentTerm += 1
	rf.votedFor = rf.me
	rf.voteNum = 1
	rf.persist()
	for server := 0; server < len(rf.peers); server++ {
    
    
		if server == rf.me {
    
    
			continue
		}
		// ……
		// 每个节点开启一个线程发送RPC请求
		go func(server int, args RequestVoteArgs) {
    
    
var reply RequestVoteReply				ok := rf.sendRequestVote(server, args, &reply)
			if ok {
    
    
				rf.handleVoteResult(reply)
			}
		}(server, args)
	}

Once the leader election is successful, the entire Raft cluster is in service state. If a client sends a request, such as x = 8, the leader will write this command to the Log and distribute it to each node in the next AppendEntries RPC to ask them to replicate. , once most nodes replicate successfully, this log will be in the committed state.

func (rf *Raft) Start(command interface{
    
    }) (int, int, bool) {
    
    
		rf.mu.Lock()
		defer rf.mu.Unlock()
		index := -1
		term := -1
		isLeader := false
		nlog := LogEntry{
    
    command, rf.currentTerm}
		if rf.state != LEADER {
    
    
			return index, term, isLeader
		}

		isLeader = (rf.state == LEADER)
		rf.logs = append(rf.logs, nlog)
		index = len(rf.logs)
		term = rf.currentTerm
		rf.persist()
		return index, term, isLeader
}

As above, the Leader calls the Start function to accept a command and write it to the log, and then distributes this log to all followers in the next AppendEntries RPC.

func (rf *Raft) sendAppendEntriesToAllFollowers() {
    
    
		for i := 0; i < len(rf.peers); i++ {
    
    
			if i == rf.me {
    
    
				continue
			}
			// ……
			// 每个节点开启一个线程发送RPC请求
			go func(server int, args AppendEntryArgs) {
    
    
				var reply AppendEntryReply
				ok := rf.sendAppendEntries(server, args, &reply)
				if ok {
    
    
					rf.handleAppendEntries(server, reply)
				}
			}(i, args)
		}
}
在这里,sendAppendEntries(server, args, &reply)用于发送个某个sever的AppendEntries RPC请求,它调用了AppendEntries函数,保证了日志的一致性。

3.3 ​​Test of algorithm
In the previous section, the design and implementation of the algorithm were completed, and the key parts of the code were given. In this section, Raft is mainly given For algorithm testing, a set of testing frameworks provided by MIT 6.824 are used here, because if the Raft algorithm is to be applied to distributed systems, it has to face details such as network communication, RPC framework, concurrent programming, etc., which makes it easier to test the algorithm. It brings a lot of uncertainty. At the same time, in a large-scale distributed system, the network is extremely complex, problems that arise are often difficult to simulate, and test coverage is also difficult to guarantee. Therefore, in order to focus For the design and implementation of algorithms, the test framework provided by MIT6.824 simulates an RPC framework based on the channel mechanism for us. This not only avoids reinventing the wheel and writes the RPC framework, but also allows the RPC framework based on channel simulation to simulate as many complex tasks as possible. Network dynamics, such as machine downtime and restart, network segmentation, etc.
Before testing, it is necessary to simulate the distributed environment and build a cluster of multiple servers to test the Raft algorithm. The testing framework provides the following interfaces:
Omitted

4 Conclusion
The starting point of this graduation project is to lay the foundation for work. My department is engaged in distributed CDN and will start working on blockchain-related content this year. , so the topic set this time is the implementation of the Raft consensus algorithm based on the blockchain. Of course, Raft is not only used in private chains, but also widely used in distributed systems.
Through this graduation project, I learned a lot. The most basic thing is to learn the Raft algorithm. However, in the process of learning, I gained some preliminary knowledge of the basic knowledge of distributed systems. At the same time, in the process of implementing the Raft algorithm, I learned the network programming, concurrent programming, RPC, etc. of the Go language. Through the Raft algorithm, I learned about the various consensus mechanisms of the blockchain and the corresponding applications. Scenes.
Of course, limited by time and ability, I have only completed the two core parts of the Raft algorithm, leader election and log replication, while log compression and configuration update are only briefly understood and not implemented. I hope it can be further improved and optimized in the future.
Overall, this is a test of my willpower and learning ability. It is an improvement of my programming skills and problem-solving ability. It will have a great impact on my future study and work life. s help.

Please pay
Omitted

References
[1] Robert Morris, Malte Schwarzkopf. MIT 6.824: Distributed Systems 2018.2
https://pdos.csail.mit. edu/6.824/
[2] Yang Baohua, Chen Chang. Principle, design and application of blockchain [M]. Machinery Industry Press, 2017.8
[3 ] Jiang Yong. Vernacular Blockchain[M]. Machinery Industry Press, 2017.10
[4] Nakamoto S. Bitcoin. a peer-to-peer electronic cash system
https://bitcoin.org/bitcoin.pdf, 2009.1
[5] Diego Ongaro, John Ousterhout. In Search of Understandable Consensus Algorithm 2013
[6] Yuan Yong, Wang Feiyue. Current status and development of blockchain technology 2016.4


5. Resource download

The source code and complete paper of this project are as follows. Friends in need can click to download. If the link does not work, you can click on the card below to scan the code and download it yourself.

serial number A complete set of graduation project resources (click to download)
Source code of this project Consensus algorithm Raft design and implementation based on Raft + blockchain (source code + documentation)_Raft_Consensus algorithm Raft.zip

Guess you like

Origin blog.csdn.net/m0_66238867/article/details/131126856