[Paper Reading] 1 SkyChain: A dynamic blockchain sharding system based on deep reinforcement learning

1. Introduction to the literature

1.1 Document Title
SkyChain: A Deep Reinforcement Learning-Empowered Dynamic Blockchain Sharding System
1.2 Author
author
Zhongshan School of Systems Science and Engineering, China Institute of Data and Computer Science, Guangzhou
Department of Computing, Hong Kong Polytechnic University, National Engineering Research Center for Digital Life, Sun Yat-sen University
1.3 Year1.3 Year a>: ICPP
1.4 Journal: August 2020

2. Introduction and important information

2.1 Research background

1) Sharding is to divide the network into multiple disjoint groups and process transactions in parallel to improve throughput. The main innovation of the article is based on dynamic sharding, which can better cope with the dynamic environment of the blockchain. In a sharded system, the system will be divided into independent smaller parts called shards (or committees), each of which maintains an independent ledger by the nodes in the shard. Participants in different shards can process transactions in parallel, meaning multiple blocks can be created and verified in parallel across the entire system, and transaction throughput can be significantly improved.
2) The blockchain is dynamic means: blockchain nodes can join and leave the system, and malicious attackers can actively destroy honest nodes, which can dynamically affect the blockchain system the number of nodes in .
3) Challenges faced in dynamics: resetting sharding frequency, number of shards, adjusting block size

2.2 Research purpose and significance

2.3 Innovations in the literature

proposed SkyChain, the first dynamic public blockchain sharding protocol that enables blockchain systems to automatically generate shards based on the current system state.
Because the dynamic characteristics of the blockchain sharding system can be modeled as Markov Decision process (MDP), And the environment in the blockchain system is high-dimensional, so we use the deep reinforcement learning (DRL) method to obtain the optimal results under different environmental conditions. Sharding strategy. Deep reinforcement learning can learn the characteristics of the blockchain sharding system from previous experience and adopt appropriate sharding strategies based on the current network status to obtain long-term returns, and an optimization framework for performance and security evaluation is proposed. .

1) We propose the first dynamic sharding-based framework in public blockchains that can maintain a long-term balance between performance and security in the dynamic environment of blockchain systems. ·
2) We propose an adaptive ledger protocol that guarantees efficient merging or splitting of ledgers based on dynamic sharding results and no conflicts.
3) We quantified a general sharding system and designed a DRL-based sharding method to dynamically adjust the re-sharding interval in the dynamic environment of the blockchain system. Number of shards and block size.

3. Research content

3.1Model

1. Based on account transaction model.
2. Use the DRL method to help the system dynamically formulate sharding strategies during reconstruction. DRL is a unique type of machine learning that combines deep learning (DL) with reinforcement learning (RL) to maximize the cumulative reward of interactions between an agent and its environment in high-dimensional data
3. The environment is a dynamic blockchain environment, and the agent is maintained by each node.

3.2 Adaptive Ledger Protocol

An adaptive ledger protocol is proposed to guarantee efficient merging and splitting of ledgers based on the resharding results and without conflicts.
1) State blocks are defined: To solve the fast bootstrapping problem of reconfiguring nodes (new nodes or swap nodes switching from the original shard to another shard) and ledger merging and splitting For efficiency, we define state blocks. Compared with the transaction block that stores transaction data, the sharded status block records the latest information in the sharded ledger, including account address and account status
2) sbti represents the status block of shard i in epoch t.

3.2.1 Status block creation

Shard i traverses all transactions in state block sbt-1, creates a mapping of accounts and addresses, and sets the default value of the mapping. Put the Kerr tree root at the head of sbt and put the mapping into sbt Body. Then a consensus is reached. After reaching an agreement, the main body of sbt-1 can be discarded.

State blocks help shard reconfiguration nodes quickly obtain the entire ledger state, because these nodes only need to download the latest state block to synchronize the current state of the shard. Additionally, state blocks simplify ledger merging and splitting.

3.2.2 Merger process

Ledger consolidation
Green is maintained by shard i, blue is maintained by shard j, and yellow is maintained by shard k
1) In the reconfiguration phase at time t, shard i creates the state Block sbti, j also creates sbt j.
2) After the DRL agent reaches consensus, shards i and j exchange the headers of the status block and create a new block sb that contains both sbt 3) The two shards jointly execute the consensus agreement and reach an agreement. Eventually the chains are connected and shards i and j are merged into k. . k

3.2.3 Split process

split process
1) After the DRL agent reaches a consensus, shard k obtains information and creates sbtk
2) Create new blocks sbti and sbt 3) The nodes of shard k store one of the status blocks according to the shard information, and execute the consensus protocol to add it to the end of the blockchain. In this way, chain k is divided into chain i and chain j, each of which maintains a disjoint ledger behind the state block. Finally, they will handle different transactions in the next epoch. , split all nodes of shard k into two disjoint subsets. j

3.3 Evaluation framework

Each period is divided into a consensus period and a reconfiguration period, so the delay of each period is composed of the sum of the delays of the two periods. That is
Tepoch=Tcons+Treco

3.3.1 Performance

3.3.1.1 Consensus delay

The number of rounds in the consensus period rc
The consensus delay in each round Tround
∴ Tcons=rc✖T< /span>v The time it takes for a message to be accepted by all nodes is at most O ( log m) m is the fragment size is the block sizeB S is the block header size H S is the cost of adding a new block to the blockchain. a t is the data transfer ratet R is the verification time of each staget where The waiting time of one round can be calculated: Take PBFT as an example: Byzantine fault-tolerant consensus. It is divided into three stages: pre-preparation, preparation and submission. In order to reduce the cost of data transmission, new blocks are only broadcast in the pre-preparation stage, and only block headers are broadcast in the last two stages. round


Calculation formula 1







3.3.1.2 Resharding delay

Reconfiguration delay includes:
1) Randomly generate Trand
2) Each shard Generation of state block Ts
3) The new node submits its identity to the blockchain Tv So:r
4) Splitting and merging of ledgers T

Calculation formula 2

3.3.1.3 Number of transactions processed

In a sharded system, a cross-shard transaction is a transaction whose related addresses are recorded in the ledgers of different shards. When processing cross-shard transactions, different shard systems use different mechanisms to ensure the atomicity and consistency of cross-shard transactions.
For example, RapidChain uses transaction splitting and Monoxide uses relay transactions.
However, they all introduce redundant transactions, which means that the sharded system needs to handle multiple redundant transactions when processing cross-shard transactions. Assuming Rr is the average number of redundant transactions for a transaction in a sharded system, then we can calculate the number of transactions processed by a shard in a period
Calculation formula 3
ST represents the average transaction size
Rr is the average number of redundant transactions for a transaction in the sharded system
rc
< a i=13> is the number of rounds Therefore, the transaction throughput of each shard can be calculated as
Calculation formula 4
The total transaction throughput is
Ototal=kO
where k is the number of shards

3.3.1.4 Constraints

In a blockchain system, due to network delays, transactions must wait for several rounds of consensus before receiving final confirmation. In order to maintain the consistency of the ledger as much as possible and prevent blocks from being discarded before entering reconstruction, the waiting time should be limited to a part of the total consensus time, that is
Restrictions

3.3.2 Security

If insecure fragmentation exists, the system will become insecure. Calculate the probability of a failed system using the hypergeometric distribution.
X represents the number of malicious nodes in the shard, and F=sn represents the total number of malicious nodes in n nodes and s shards. Therefore, the probability of a faulty system (the probability of forming at least one insecure shard among m nodes with more than mf malicious nodes) is:
Calculation formula 5
In order to make the probability of the formation of a faulty committee negligible Regardless, use parameters to limit the probability of error committee formation. It is safe enough if the following inequality is satisfied.
Constraint 2
Settings
The system Byzantine fault tolerance is set to 1/4, and the sharding Byzantine fault tolerance is set to 1/3.
According to the faulty system For the calculation formula of probability, we should appropriately increase the size of the committee so that the probability of insecurity is within a given limit and more nodes join the blockchain system.
Node destruction
In each period, honest nodes may be destroyed by malicious nodes. Assuming that malicious nodes have limited attack capabilities, average node damage takes the necessary time. An epoch can be considered safe if the following inequality is satisfied.
Calculation formula 6

3.3.3 Problem introduction

calculate

4. Dynamic sharding framework based on DRL

DRL strives to research universal sharding strategies from past experience based on the current blockchain environment and given rewards, which allows it to adapt to complex and dynamic blockchain environments. Taking into account the continuity of the action space, useDeep Deterministic Policy Gradient (DDPG) algorithmto train our model.

4.1Model design

Three key components in reinforcement learning: state space, action space and reward function,
1) State space:
The system has n nodes, where nodes can leave at any time, and new nodes can be added only during reconfiguration. q represents the number of pending transactions.
Therefore, the state space at time t can be expressed as:
st=[q,n]t
2) Action space
When the arrival of nodes obeys the distribution, the epoch length will determine the number of system nodes in the next epoch. Additionally, the number of shards and block size can change the state of the transaction pool by affecting the rate at which transactions are processed. Therefore, they should adapt to the dynamic environment. We define the action space at time t as:
at=[Tepoch,k,SB]t
L is the maximum length set by epoch length Tepoch ∈ (0, L).
To ensure that the ledger is merged or split efficiently and without conflicts, we set k = 2i, i = 0, 1, 2...C, where C is a constant.
Set Ns = 2C, indicating the maximum number of shards. Value M to constrain the range of block sizes.
3) Reward function
Since scalability can be easily quantified by transaction throughput, we use transaction throughput as our reward function.
The constraints and rewards are defined as follows:
reward function
When the constraint is broken, the reward is set to 0.

4.2 Training methods

The DDPG algorithm has to be learned separately.
algorithm
1) At each time step t, select and execute the sharding action at according to the current blockchain state st, and then apply noise N for exploration.
2) The blockchain environment will give a reward measured by system security and throughput, and enter the next state st+1
3) Will transform ( st, at, rt, st+1) stored in R
4) Take a constant number of previous transitions from the reply buffer to update the parameters
5) Use soft to change the target network

4.3 Distributed deployment

To use a trained agent, an intuitive and simple way is to apply the trained agent to determined nodes, but due to centralization, this will lead to some potential security issues.
In our blockchain sharding system, we adopt a distributed deployment approach to solve this problem.
La is one of the current leaders and was selected as the proposer of the sharding strategy based on the current period. When sharding completes consensus, La uses the current system status information as input to create a sharding strategy.
Four stages:
1) Broadcast: LaParameters, system Status is sent to other leaders.
2) Reply: After other leaders receive it, they mark it and broadcast it again. Define a threshold to represent the maximum difference that the leader can tolerate. Only if the difference is within the threshold range, it will be marked as YES.
3) Reception: If an honest leader receives the same echo from more than half of the leaders, it accepts the sharding strategy and broadcasts it to other leaders again with an accept tag, and A verification that it received more than half of the identical echoes.
4) Update: After La receives more than half of the accepts, it will perform a state transition update and broadcast to Other leaders.

5. Assessment

Environment: tensorflow, python3.6 in Windows Server 2016
The generation of new blocks can be modeled as a Poisson process with time-dependent intensity, which means a reduction in transaction volume It is also a Poisson process.
Model the transaction arrival in the blockchain sharding system as a Poisson process with arrival rate λt = 10000.
The number of nodes in the blockchain changes dynamically, so it is assumed that the change in the number of nodes obeys a normal distribution with variance σ2 = 100 and expected value En = 0, where N > 0 represents the new Nodes join, while N < 0 means nodes leave.
The parameter settings are as follows:
parameter settings

comparison plan:
1) Suggested solution to fix the epoch length.
2) A solution to fix the number of shards is proposed.
3) A solution to fix the block size is proposed.
Compare parameters
Convergence performance, security and latency performance, throughput

5.1 Convergence performance

Assessment 1

We can see that the throughput of all schemes increases rapidly from a low level at the beginning of the learning process and levels off after about 5000 training epochs.

5.2 Security and latency

Assessment 2

Set the shard size m = 80, 90, 110, 130, 150, and calculate the probability of a failed system. As shown in the figure, when the number of system nodes is less than 10,000 and the shard size m = 150, the security probability can reach 98%. In addition, it can also be observed that as more and more new nodes join the system, the insecurity probability slowly increases, which means that the blockchain sharding system needs to adjust the shard size when the number of system nodes changes to ensure security.
Assessment 3

Figure 5 shows the evolution of the delay, which is the consensus time within the committee. As the block size gradually increases, it can be seen that the consensus latency is related to the block size and shard size. The block size cannot be increased without limit as it would increase the time for new blocks to be added to the blockchain, causing one of the constraints to not be satisfied.

5.3 Throughput

Assessment 4
Assessment 5

The impact of different system parameters on the performance of the blockchain sharding system is shown in Figures 6 to 11. The throughput of the DRL-based dynamic sharding framework is compared with baselines with different thresholds of consensus latency, security parameters, average transaction size, transfer rate, initial number of nodes, and shard number limits.
Figure 6: We can observe that it remains stable in the fixed block size scheme while the others decrease as the limit ratio increases.
Figure 7: Discuss the impact of security parameters on throughput. The changes are smaller after feathering 5, which means that when the fragment size is large enough, the system has a high probability of safety. While the throughput of the scheme with a fixed number of shards changes steadily because its shard size ensures a low insecurity probability.
Figure 8 and Figure 9Discuss the impact of transaction size and transfer rate on throughput.
It is obvious that throughput increases significantly as transaction size decreases and transfer rate increases. The reason is that a block can pack a greater number of transactions for smaller transactions and allow for faster communication at higher transfer rates. And the throughput of this scheme can be the highest.
Figure 10The impact of the initial number of nodes on throughput is discussed. Throughput can increase as nodes are added and eventually stops increasing due to limiting the number of shards (here we set the maximum number of shards to 64).
Figure 11 discusses the impact of the number of shards.
Throughput scales efficiently as the number of shards increases. When the maximum number of shards is 128, the throughput of our proposed scheme can reach 110,000 TPS. When the maximum number of shards is 128, the fixed shard number scheme is better than the fixed epoch length and fixed block size, but its throughput is still lower than our proposed scheme.
This result shows that our proposed solution can better adapt to different environments.

6. Summary

The article proposes an adaptive ledger protocol that ensures effective merging or splitting of ledgers based on the results of dynamic sharding without conflict.
SkyChain adopts a DRL-based dynamic sharding method to adjust the epoch length, number of shards and block size to maintain a long-term balance between performance and security.
Outlook:
In future work, we plan to consider more factors related to dynamic environments in blockchain sharding systems and base our DRL's dynamic sharding framework is applied to real blockchain systems.

Guess you like

Origin blog.csdn.net/RuRu_Bai/article/details/134692111