In-depth explanation of data storage in Ethereum

Foreword: The purpose of this article is to talk about the overall structure of Ethereum and storage-related content in simple terms. It will focus on storage, and at the same time, it will be explained in conjunction with the source code. The whole process can also experience the subtlety of the author's design ideas.

One, block

block is one of the most important data structures, mainly composed of header and body

1, block source code (some important fields)

type Block struct {
	header       *Header            //区块头
	uncles       []*Header          //叔节点
	transactions Transactions       //交易数组
	hash atomic.Value
	size atomic.Value
	td *big.Int                      //所有区块Difficulty之和
	ReceivedAt   time.Time
	ReceivedFrom interface{}
}
1.1,header
type Header struct {
	ParentHash  common.Hash    //指向父区块的指针
	UncleHash   common.Hash    //block中叔块数组的RLP哈希值
	Coinbase    common.Address //挖出该区块的人的地址
	Root        common.Hash    //StateDB中的stat trie的根节点的RLP哈希值
	TxHash      common.Hash    //tx trie的根节点的哈希值
	ReceiptHash common.Hash    //receipt trie的根节点的哈希值
	Bloom       Bloom          //布隆过滤器,用来判断Log对象是否存在
	Difficulty  *big.Int       //难度系数
	Number      *big.Int       //区块序号
	GasLimit    uint64         //区块内所有Gas消耗的理论上限
	GasUsed     uint64         //区块内消耗的总Gas
	Time        *big.Int       //区块应该被创建的时间
	Nonce       BlockNonce     //挖矿必须的值
}
1.2,body
type Body struct {
	Transactions []*Transaction //交易的数组
	Uncles       []*Header      
}

Second, the MPT tree

Looking at the source code is always the best way, let's first look at the fields of the trie structure

1,Trie

type Trie struct {
	root         node   //根节点
	db           Database   //数据库相关,在下面再仔细介绍
	originalRoot common.Hash    //初次创建trie时候需要用到
	cachegen, cachelimit uint16 //cache次数的计数器,每次Trie的变动提交后自增
}

From the above we can see that the node type is node, then let's take a look at the implementation classes of node

2. Each implementation class of node

type (
	fullNode struct {
		Children [17]node
		flags    nodeFlag
	}
	shortNode struct {
		Key   []byte
		Val   node
		flags nodeFlag
	}
	hashNode  []byte
	valueNode []byte
)

(1) fullNode

Can have multiple child nodes, a node array of length 17, the first 16 bits correspond to hexadecimal, and the child nodes are inserted into the corresponding position according to the first bit of the key. No. 17, it is not clear what the specific role is.

(2) shortNode

A node with only one child node. Its member variable Val points to a child node

(3) valueNode

Leaf node, carrying the RLP hash value of the data part, and the RLP encoded value of the data is stored in the database as a match of valueNode

(4) hashNode

Is the RLP hash value of the fullNode or shortNode object, in the form of a member of the nodeFlag structure (nodeFlag.hash), which is indirectly held by fullNode and shortNode

3. Encode the key

Next, let's see how the key is encoded in the MPT tree. In encoding.go, we can see that there are three encoding methods

(1) KEYBYTES:

Is the real key (a []byte), no special meaning

(2) HEX:

First look at a picture, combined with the picture to illustrate:

 

 

Store the upper 4 bits and the lower 4 bits of a byte into two bytes (each 4 bits is a nibble), and then add a mark at the end to indicate that this is a HEX encoding method. In this way, each byte can be represented as a hexadecimal and added to the children array of fullNode mentioned above

(3) COMPACT:

Again, look at a graph:

 

Then let's see how HEX is converted to COMPACT

 

func hexToCompact(hex []byte) []byte {
	terminator := byte(0)
	//判断是否是包含真实的值
	if hasTerm(hex) {
		terminator = 1
		hex = hex[:len(hex)-1]  //截取掉HEX的尾部
	}
	buf := make([]byte, len(hex)/2+1)
	buf[0] = terminator << 5 // the flag byte
	if len(hex)&1 == 1 {    //说明有效长度是奇数
		buf[0] |= 1 << 4 // odd flag
		buf[0] |= hex[0] // first nibble is contained in the first byte
		hex = hex[1:]
	}
	decodeNibbles(hex, buf[1:])
	return buf
}

Three, storage

The front is just a brief introduction. This is the focus of this article. Next, we will learn how various data are stored. The database used in Ethereum is levelDB

(1) header and block storage

headerPrefix        = []byte("h") // headerPrefix + num (uint64 big endian) + hash -> header
tdSuffix            = []byte("t") // headerPrefix + num (uint64 big endian) + hash + tdSuffix -> td
numSuffix           = []byte("n") // headerPrefix + num (uint64 big endian) + numSuffix -> hash blockHashPrefix = []byte("H") // blockHashPrefix + hash -> num (uint64 big endian) bodyPrefix = []byte("b") // bodyPrefix + num (uint64 big endian) + hash -> block body blockReceiptsPrefix = []byte("r") // blockReceiptsPrefix + num (uint64 big endian) + hash -> block receipts lookupPrefix = []byte("l") // lookupPrefix + hash -> transaction/receipt lookup metadata bloomBitsPrefix = []byte("B") // bloomBitsPrefix + bit (uint16 big endian) + section (uint64 big endian) + hash -> bloom bits 

From the above code, we can see the corresponding rules of storage, and then explain several fields. num: block number (uint64 big endian format); hash: block hash value;

There is a point that needs special attention here: because the forward pointer of the Header cannot be modified, then when writing the Header to the database, we must first ensure that the parent and the parent of the parent have been written to the database.

(2) Transaction Storage

Here we look at the code

func WriteTxLookupEntries(db ethdb.Putter, block *types.Block) error {
	// 遍历每个交易并且编码元数据
	for i, tx := range block.Transactions() {
		entry := TxLookupEntry{
			BlockHash:  block.Hash(),
			BlockIndex: block.NumberU64(),
			Index:      uint64(i),
		}
		data, err := rlp.EncodeToBytes(entry)
		if err != nil {
			return err } if err := db.Put(append(lookupPrefix, tx.Hash().Bytes()...), data); err != nil { return err } } return nil } 

(3) StateDB module

In Ethereum, the presentation form of an account is a stateObject, and all accounts are managed by StateDB. There is a member in StateDB called trie, which stores stateObject, and each stateObject has an address of 20 bytes, which can be used as a key; each time before the transaction of a block is executed, the trie is recovered by a hash value (hashNode). There is also a map structure, which also stores stateObject, and the address of each stateObject is used as the key of the map

 

It can be seen that this map is used as the local first-level cache, the trie is the second-level cache, and the underlying database is the third-level cache.

 

(4) Storage account (stateObject)

Each stateObject corresponds to an account (Account contains data such as the balance, the number of times the contract is initiated), and it also contains a trie (storage trie) for storing State data. The relevant information is as follows

 

 

Fourth, the harvest

Not only a better understanding of the storage principle of Ethereum, but also in terms of system design, Ethereum also has a lot to learn from, such as: multi-level cache, data storage methods and so on.


Author: wacxt
Link: https://juejin.im/post/5a4f3aa9f265da3e5468e08e
Source: Nuggets The
copyright belongs to the author. For commercial reprints, please contact the author for authorization, and for non-commercial reprints, please indicate the source.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325017462&siteId=291194637