Exploration of the Next Generation Consensus Mechanism——DAG-based BFT Consensus

In the semantics of the blockchain, the BFT consensus is a mechanism that attempts to allow N verification nodes (of which there are at most f Byzantine nodes) to reach a consensus on an infinitely growing sequence of proposals (blocks or transaction sets).

As we all know, the classic BFT-based consensus algorithm, whether it is PBFT or the improved HotStuff, has relatively high communication complexity, poor scalability, and high delay when the network is unstable.

In recent years, with the wide application of DAG technology in the blockchain, a DAG-based BFT consensus has been proposed and continuously improved, using the efficient implementation of DAG and its natural asynchronous communication mechanism to improve the scalability of the consensus, It has obvious advantages in shortening confirmation time and improving transaction throughput. However, as an asynchronous operation, DAG does not have a global sorting mechanism, which may cause deviations in the data stored between nodes after running for a period of time. Under such deviations, how to finally reach a consensus on the sequence is the DAG consensus The essential.

This article will first introduce the basic theory: DAG-based BFT consensus and round-based BFT consensus, and then explain the DAGRider, Tusk and Bullshark protocols in detail.

DAG-based BFT consensus

DAG, the English full name Directed Acyclic Graph, Chinese means directed acyclic graph.

Graph consists of two parts: vertex and edge. The so-called directed acyclic graph is actually: directional edges. These edges will not form a closed loop in a graph.
insert image description here

The above is a schematic diagram of directed tree, DAG graph and directed graph, from which we can see the difference between the three:

  • Directed tree: each vertex can only point to one previous vertex, and the entire data has an obvious flow direction
  • DAG graph: each vertex can point to multiple previous vertices, and the entire data flow also has an obvious direction
  • Directed graph: Unlike DAG, a directed graph allows data to flow back, and the data flow direction of the entire structure is not obvious

Application of DAG in Consensus

In DAG-based consensus, each consensus message contains a proposal (a block or a set of transactions) and a set of references to previous messages. The references here can be understood as: each new message must clearly point to A number of previous messages, citing a message means approval and voting for the message.

Correspondingly, in a DAG graph, a message is a vertex, and its reference is an edge, so all consensus messages together form a growing DAG graph.

DAG-based consensus can be divided into two layers:

  • Network communication layer: Responsible for reliably disseminating and receiving proposals, voting messages, and drawing messages into DAG graphs
  • Sequencing layer with zero communication overhead: Each verification node independently extracts the public order of proposals by parsing its local DAG copy without sending additional messages, and can ensure that all verification nodes finally reach a consensus on the serial submission order of proposals unanimous

From the generation of the message to the final submission of the message, due to the asynchronous nature of the network, the DAG views on different verification nodes may be slightly different at any time, so how to ensure that all verification nodes finally reach a consensus on the order of submission is the key to all DAG-based consensus algorithms key points and difficulties.

Comparison with Traditional BFT

The classic BFT-based consensus algorithm, whether it is PBFT or the improved HotStuff, requires a Leader node to collect transactions to generate blocks, broadcast and receive votes from other nodes, after three stages of interaction (pre-prepare, prepare, commit ) finally reached a consensus on the block submission, and the protocol relies more on the reliable delivery of consensus messages between nodes.

Based on the DAG consensus, the DAG graph is used to abstract the network communication layer, and the message dissemination is separated from the consensus logic (proposal sequencing). The benefit of this separation is that each node can asynchronously calculate the consensus state and submission sequence, thereby reducing the delay and reducing the network overhead in the communication process. In addition, the efficient implementation of DAG makes the DAG-based consensus more efficient. Good scalability and high throughput.

Round-based BFT consensus

In a round-based DAG, each vertex is associated with an integer (round number). Each verification node only broadcasts one message in each round, and each message refers to at least N−f messages in the previous round. That is to say, in order to advance to the r-th round, the verification node first needs to obtain N−f messages from different verification nodes in the r-1 round.
insert image description here
The above figure is an example of round-based BFT. There are 4 verification nodes in the figure. Each message in each round refers to at least 3 messages from the previous round. If a round cannot receive all 3 messages, it cannot advance to the next round.

This is somewhat similar to the HotStuff algorithm, each block voting message must carry the parent quorum of the previous block, because this can ensure that most nodes in the network advance to the next round in an orderly and balanced manner, which is also based on the quorum The basis of the BFT fault-tolerant consensus.

three characteristics

In the round-based DAG consensus protocol, when a message is confirmed to be inserted into the DAG, it has the following guarantees:

  • Reliability: A copy of the message is stored on enough validating nodes so that eventually all honest participating nodes can download it
  • Non-equivocation: For any validator v and round r, if both validators have a vertex from v in their local view of r, then both validators have exactly the same vertex and reference relationship between identical vertices
  • Causal ordering: Messages carry an explicit reference to the previous round of delivered messages. Therefore, this ensures that any validator that commits to a vertex has exactly the same causal history (messages from previous rounds) of that vertex in their partial view

As mentioned above, the DAG-based BFT consensus protocol allows each verification node to only interpret its local DAG graph, and can also finally reach an agreement on the submission sequence. Using the above features is the key to solving this difficulty.

At the same time, it is precisely because these characteristics eliminate the ability of Byzantine nodes to do evil. Compared with the traditional BFT algorithm, the consensus logic can be simplified, making the design of the protocol simpler.
DAG-Rider, Tusk and Bullshark are relatively mature and popular wheel-based BFT consensus protocols. Below we will introduce and compare these three protocols in detail.

DAG-Rider

In the DAG-Rider protocol, each verification node divides the local DAG graph into waves in units of 4 consecutive rounds, and randomly selects a leader from the first round in the last round of each wave and tries to submit the vertex .

In the figure below, each horizontal line represents 4 different verification nodes, each vertex represents the consensus message sent by different verification nodes in each round, and vertices v2 and v3 are randomly selected leaders of wave2 and wave3 respectively .
insert image description here

Vertex submission process:

  1. Tried to submit vertex v2 in round8 of wave2, but since only two vertices (less than 2f+1) lead to v2 in round8, v2 does not meet the submission condition at round8
  2. Since there are 2f+1 vertices in round12 leading to v3, v3 complies with the commit rule
  3. At the same time, since there is a path from v3 to v2, submit v2 -> v3 first in wave3

How to ensure the final consistency of the commit

Due to the asynchronous nature of the network, the local DAG graphs of different validators may be slightly different at any time. In other words, some vertices may be submitted by some nodes (here assumed to be M1) at the end of a certain wave, while some nodes (M2) directly enter the next wave without submitting, so will there be M2 nodes submitting in the next wave? However, the vertices selected by the previous wave still cannot be submitted, resulting in inconsistencies in the status of each node?

In fact, this is impossible. The reason why M1 can submit vertices means that there are at least 2f+1 vertices leading to v2 in round8, and v3 in round9 must refer to at least 2f+1 different vertices in the previous round, so There are at least f+1 paths from v3 to v2.

Tusk

The design idea of ​​Tusk is improved based on DAG-Rider. Each wave in Tusk consists of 3 rounds, and the last round of each wave is the first round of the next wave.

In the figure below, r1~r3 represent wave1, r3~r5 represent wave2, vertices L1 and L2 are the leaders of two waves selected randomly.
insert image description here

Vertex submission process:

  1. In round1, each verification node generates a block and broadcasts it
  2. In round2, each verification node votes on the vertices of the previous round (that is, vertices referencing round1)
  3. A random coin will be selected in round3 (that is, a leader is randomly selected in round1, L1 in the figure)
  4. If there are f+1 vertices referencing L1 in round2, then submit it, otherwise go directly to the next wave

In the figure, L1 cannot be committed immediately because only one vertex in round2 references L1. When L2 is elected, L2 is referenced by more than f+1 vertices in round4, L2 conforms to the submission rules, and there is a path from L2 to L1, so L1 -> L2 is submitted first in wave2.

Similarly, if some nodes submit L1 in wave1, it means that there are at least f+1 vertices leading to L1 in round2, and L2 in round3 must refer to at least 2f+1 different vertices in the previous round, so L2 exists at least A path to L1. This also ensures that all validators eventually agree on the order of submissions.

Comparison with DAG-Rider

Under normal circumstances, DAG-Rider needs 4 rounds to submit a vertex, while Tusk only needs 3 rounds to submit, reducing the submission delay.

However, through 4 rounds of voting, DAG-Rider can guarantee that the probability of submitting a vertex in each wave is at least 2/3, while Tusk has one less round of voting, so the probability of submitting a vertex in each of its waves is 1 /3.

That is to say, in the asynchronous situation, DAG-Rider only needs 3/2 waves (6 rounds) to submit a vertex, while Tusk needs to go through 3 waves to submit a vertex. It is known that each wave contains 3 rounds , and the last round of each wave is the first round of the next wave, so Tusk needs 7 rounds to submit a vertex.

Bullshark

The Bullshark paper proposes two versions of the protocol.

  • Asynchronous version: In the regular case, it only takes 2 rounds to submit a vertex. Compared with DAG-Rider and Tusk, the submission delay is further reduced
  • Partial synchronization version: based on Tusk, it only takes 2 rounds to submit vertices, which reduces the delay of block generation rounds

async version

Similar to DAG-Rider, each wave also contains 4 rounds in the asynchronous protocol version. But BullShark has 3 leaders in each wave, which are the fallback leader randomly selected from the first round and the steady-state leader pre-defined in the first and third rounds.

In the figure below, S1A and S1B are steady-state leaders, and F1 is the fallback leader (randomly generated in the last round of each wave).
insert image description here

Vertex submission process:

  1. In round2, the node finds that 2f+1 nodes have voted for S1A, then it can submit S1A
  2. In round4, the node found that only one vote was cast for S1B and could not submit S1B, and the voting type of the verification node in wave2 changed to fallback
    type
  3. In round5, the node observes the other 2f+1 nodes in round4, and finds that none of them can submit S1B, so the current node can determine that other nodes in wave2 are all fallback voting types, and the steady-state leader in wave2 (S2A and S2B) have lost the opportunity to submit
  4. In round8, the node finds that 2f+1 nodes voted for F2, so it can submit F2, and try to submit S1B before submitting F2, but because S1B does not meet the submission rules (in the local view of the current node, S1B is less than f+ 1 ref), so at this point only submit F2

It can be seen that under normal circumstances, a steady-state leader can be submitted every two rounds, which is further optimized compared to the Tusk protocol. At the same time, the design of the fallback leader also ensures that even under the worst asynchronous conditions, the consensus is still active.

code analysis

Pseudocode of the submission process:
insert image description here
From the try_ordering function, we can see that:

  • In round%4==1 round: try to submit the steayd leader or fallback leader in the third round in the previous wave, and confirm the voting status of yourself and other validators in the current wave
  • In round%4==3 rounds: only the steayd state leader of the first round in the current wave needs to be submitted

It is found from try_steady_commit and try_fallback_commit that 2f+1 is used here as the number of vertex submission votes, which is different from Tusk's f+1 because Bullshark not only needs to determine whether the vertex can be submitted, but also determines the majority (not less than 2f+ 1) Nodes are in the same voting state in the next wave.

Commit Leader

From the previous knowledge, when submitting a vertex, we have been able to determine that there are at least 2f+1 nodes in the same voting type as us under the current wave.

The role of commit is to recursively submit the vertices that were temporarily uncommitted in the previous round, after it has been determined that a certain vertex can be submitted.

When the last round of leader has been submitted by other nodes, it satisfies the condition that there are at least 2f+1 votes, and at least 2f+1 nodes are in the same voting state, so for the current validator, its There are at least f+1 votes for the local view.
insert image description here
partial sync version

This version simplifies the Tusk consensus protocol, which is easier to code and is currently implemented in the Sui consensus mechanism.

Unlike Tusk where all leader vertices are randomly generated, each odd round in Bullshark’s partially synchronous version of the DAG has a predefined leader vertex (highlighted in solid green in the image below). The commit rule for a vertex is simple: commit the leader if it gets at least f+1 votes in the next round. In the figure, L3 submitted with 3 votes, while L1 and L2 had less than f+1 votes and did not submit.
insert image description here
However, due to the asynchronous nature of the network, the local view of the DAG may be different for different verification nodes. Some nodes may have successfully submitted L1 and L2 vertices in previous rounds. Therefore, when submitting L3 vertices, it is necessary to start from L3 to observe whether there is a The path to L2 and L1, if it exists, needs to submit the vertices of the previous round first. In the figure, there is no path from L3 to L2, so L2 can be skipped, but there is a path from L3 to L1, so submit L1 -> L3 first.

performance comparison

Referring to the Bullshark paper, the following is the comparison of the delay and throughput of the three protocols of HotStuff, Tusk and BullShark for different numbers of verification nodes under normal conditions
insert image description here

  • HotStuff: Under the consensus of 10 nodes, 20 nodes and 50 nodes, the throughput is 70,000 tx/s, 50,000 tx/s and 30,000 tx/s respectively. Latency is low, about 2 seconds
  • Tusk: The throughput of Tusk is significantly higher than that of HotStuff. For 10-node consensus, its peak value is 110,000 tx/s, and for 20 or 50 nodes, its peak value is about 160,000 tx/s. The throughput increases with the increase of node size (this Thanks to the efficient implementation of DAG). Despite its high throughput, Tusk has a higher latency than HotStuff, around 3 seconds
  • BullShark: BullShark balances the high throughput of Tusk with the low latency of HotStuff. Its throughput is also significantly higher than that of HotStuff, reaching 110,000 tx/s and 130,000 tx/s under 10-node and 50-node consensus respectively. At the same time, it can be observed from the figure that regardless of the node size, its delay is 2 seconds about

Summarize

  • The theoretical starting point of the Tusk protocol is DAG-Rider, which is an improved version based on DAG-Rider, which reduces the delay of block generation rounds under normal circumstances
  • The synchronous version of Bullshark continues to optimize on the basis of Tusk, which further reduces latency and is simpler to implement. But it, like HotStuff, has no liveness guarantee when the eventual synchronization assumption does not hold
  • The asynchronous version of Bullshark adds a fallback voting mode based on the steady mode, so that it has the same low latency as the synchronous version under normal circumstances, and can still guarantee liveness under asynchronous conditions

In this article, we mainly discuss three round-based BFT consensuses. In fact, there are many DAG-based BFT consensuses, such as Fantom's lachesis, which does not strictly follow the round-based algorithm. This consensus opens up a new way of thinking. It is believed that as more projects are added to it, the DAG consensus will be more complete and diversified, providing more room for imagination to improve the scalability of the blockchain.

Reference source:

  1. DAG-Rider paper: [PODC 2021]All You Need Is DAG
  2. Narwhal and Tusk paper:[2105.11827] Narwhal and Tusk: A DAG-based Mempool and Efficient BFT Consensus
  3. Bullshark paper: [To appear at CCS 2022] Bullshark: DAG BFT
    Protocols Made Practical
  4. DAG Meets BFT - The Next Generation of BFT Consensus
  5. Narwhal, Bullshark, and Tusk, Sui’s Consensus Engine | Sui Docs

Guess you like

Origin blog.csdn.net/Matrix_element/article/details/128290972