Blockchain storage optimization - from MPT tree to KV storage

Advantages and disadvantages of MPT tree storage

If the blockchain uses MPT tree storage, it will probably have the following advantages:

  1. The root hash of global data can be used for consensus, and data tampering will be discovered immediately;
  2. All the data at the corresponding time of any historical block can be queried;
  3. It is convenient to synchronize data from the specified block, because as mentioned above, each block contains all the data at that time;
  4. Facilitates block rollback and replay, which is helpful when forks occur and the longest chain is corrected.

Probably because of these advantages, Ethereum uses an account tree to record the information of all accounts, and a storage tree is attached to the account to record the contract information of the account. This is all that Ethereum MPT stores.

Ethereum MPT storage

So is there no downside to this storage structure?

Of course there is. One of the problems is that as the amount of data increases, the query delay becomes larger and larger. This is a problem that has to be faced with blockchain services with high performance pressure. You will see that as the amount of data increases, the TPS is getting lower and lower, and the performance of contract execution is especially degraded. Through analysis, it is found that the time is mainly spent on reading levelDB.


Root cause analysis of performance problems

The figure below shows the general structure of the MPT tree in memory. In the real environment, the tree is much deeper because the key-value string is relatively long. When the application layer reads a key, it will trigger the bottom layer to read a series of nodes to build an MPT tree and finally find the value stored in the leaf node. If the value corresponding to the key is modified, all nodes on the path from the corresponding leaf node to the root node must be updated and written to disk. To put it bluntly, the MPT tree has the problem of read and write amplification.

MPT tree


Solution Discussion

In order to solve the problem of read-write amplification, we tried to directly use key-value storage (hereinafter referred to as kv storage), that is, we no longer build an MPT tree. In this way, the problem of read and write amplification no longer exists, but the following new problems are introduced:

  1. Previously, the hash of the root node of the MPT tree was used for consensus, what is used for consensus now?
  2. What should I do if I can find historical data with MPT storage, but there is no historical data in direct KV storage?
  3. If the kv storage node only writes part of the data after executing the block and then powers off, how can it continue to run based on the previous block after restarting?
  4. When a chain using a POW-like probabilistic consensus algorithm encounters a change in the main chain, how can it repeat the longest chain based on historical blocks?

Let me talk about the thoughts related to these issues and the details we noticed during the implementation.

The first question, the root node hash of the MPT tree was used for consensus before, what is used for consensus now?

MPT is a consensus on all data, and kv uses the ordered set of modified data after block execution to make a consensus. This is somewhat similar to the full backup and incremental backup of data. KV uses the "incremental" part after each block is executed for consensus. Nodes stored in kv can verify all blocks and data from the genesis block. When implementing, we need to pay attention to the "ordered set". When the nodes of the entire blockchain network execute block modification, they must use uniform rules to hash the changed data as a consensus. The unified rules we use to write test codes are modified consensus key-value is sorted by key. This part of the change has greatly improved the contract execution performance, and the TPS can be increased by about 3 times.

The second question is that historical data can be found with MPT storage, but what should I do if there is no historical data in direct kv storage?

A "full node" that can provide historical data query services can be built based on kv storage. The specific implementation is that the full amount of nodes provides a mapping from the kv consensus hash of each block to the MPT tree root hash, that is, the full amount of nodes uses MPT storage but constructs a kv ordered set in memory to participate in the consensus.

The third question, if the kv storage node executes the block and only writes part of the data and then powers off, how can it continue to run based on the previous block after restarting?

What this question is about is that the data modification set generated by a block is an atomic transaction, and either all of them are successfully executed, or none of them can be modified. When writing the modification set generated by a block in batches, the program may crash or be killed, the computer room may be powered off, similar situations are inevitable, and the kv storage node has no historical data, so some mechanism must be used to ensure that the node can return to The state after the previous block executed successfully. The solution is to write the old state of the relevant data to the wal file before updating the data. In this way, when the node restarts and finds that the execution of the block fails, it can restore the previous state, and then re-execute the block of this height. When the execution is successful, the associated block execution failure flag can be cleared.

The last question, when a chain using a POW-like probabilistic consensus algorithm encounters a change in the main chain, how to replay the longest chain based on historical blocks?

If you adopt an absolute consensus consensus algorithm like PBFT like us, then this problem does not exist. Of course, the blockchain cannot just switch to the consensus algorithm, so we still try to discuss this issue. In fact, the solution to this problem is similar to the third problem. Using wal can return to the state of the previous block, and using a series of wal can of course return to the state before several blocks. You only need to go back to the first common ancestor block of the two chains, and then execute the blocks of the other chain in turn to correct the longest chain. Of course, we may encounter other problems when using kv storage to support replay for chains that adopt the probabilistic consensus algorithm. We have not considered other problems. If there are friends who are interested in making kv storage modifications when there is replay, you can share them. one time.


summary

Replacing MPT storage with kv storage will greatly improve performance, but you also need to consider the problems that will arise, and have a suitable solution.

Original link: https://zhuanlan.zhihu.com/p/75953913 

(Free subscription, permanent learning) Learning address:  Dpdk/Network Protocol Stack/vpp/OvS/DDos/NFV/Virtualization/High Performance Expert-Learning Video Tutorial-Tencent Classroom

If you need more DPDK-related learning materials, you can sign up for learning by yourself, subscribe for free, learn permanently, or click here to add qun to receive for free
, follow me for continuous updates!! 

Guess you like

Origin blog.csdn.net/lingshengxiyou/article/details/128104613