Foreword: The purpose of this article is to talk about the overall structure of Ethereum and storage-related content in simple terms. It will focus on storage, and at the same time, it will be explained in conjunction with the source code. The whole process can also experience the subtlety of the author's design ideas.
One, block
block is one of the most important data structures, mainly composed of header and body
1, block source code (some important fields)
type Block struct {
header *Header //区块头
uncles []*Header //叔节点
transactions Transactions //交易数组
hash atomic.Value
size atomic.Value
td *big.Int //所有区块Difficulty之和
ReceivedAt time.Time
ReceivedFrom interface{}
}
1.1,header
type Header struct {
ParentHash common.Hash //指向父区块的指针
UncleHash common.Hash //block中叔块数组的RLP哈希值
Coinbase common.Address //挖出该区块的人的地址
Root common.Hash //StateDB中的stat trie的根节点的RLP哈希值
TxHash common.Hash //tx trie的根节点的哈希值
ReceiptHash common.Hash //receipt trie的根节点的哈希值
Bloom Bloom //布隆过滤器,用来判断Log对象是否存在
Difficulty *big.Int //难度系数
Number *big.Int //区块序号
GasLimit uint64 //区块内所有Gas消耗的理论上限
GasUsed uint64 //区块内消耗的总Gas
Time *big.Int //区块应该被创建的时间
Nonce BlockNonce //挖矿必须的值
}
1.2,body
type Body struct {
Transactions []*Transaction //交易的数组
Uncles []*Header
}
Second, the MPT tree
Looking at the source code is always the best way, let's first look at the fields of the trie structure
1,Trie
type Trie struct {
root node //根节点
db Database //数据库相关,在下面再仔细介绍
originalRoot common.Hash //初次创建trie时候需要用到
cachegen, cachelimit uint16 //cache次数的计数器,每次Trie的变动提交后自增
}
From the above we can see that the node type is node, then let's take a look at the implementation classes of node
2. Each implementation class of node
type (
fullNode struct {
Children [17]node
flags nodeFlag
}
shortNode struct {
Key []byte
Val node
flags nodeFlag
}
hashNode []byte
valueNode []byte
)
(1) fullNode
Can have multiple child nodes, a node array of length 17, the first 16 bits correspond to hexadecimal, and the child nodes are inserted into the corresponding position according to the first bit of the key. No. 17, it is not clear what the specific role is.
(2) shortNode
A node with only one child node. Its member variable Val points to a child node
(3) valueNode
Leaf node, carrying the RLP hash value of the data part, and the RLP encoded value of the data is stored in the database as a match of valueNode
(4) hashNode
Is the RLP hash value of the fullNode or shortNode object, in the form of a member of the nodeFlag structure (nodeFlag.hash), which is indirectly held by fullNode and shortNode
3. Encode the key
Next, let's see how the key is encoded in the MPT tree. In encoding.go, we can see that there are three encoding methods
(1) KEYBYTES:
Is the real key (a []byte), no special meaning
(2) HEX:
First look at a picture, combined with the picture to illustrate:
Store the upper 4 bits and the lower 4 bits of a byte into two bytes (each 4 bits is a nibble), and then add a mark at the end to indicate that this is a HEX encoding method. In this way, each byte can be represented as a hexadecimal and added to the children array of fullNode mentioned above
(3) COMPACT:
Again, look at a graph:
Then let's see how HEX is converted to COMPACT
func hexToCompact(hex []byte) []byte {
terminator := byte(0)
//判断是否是包含真实的值
if hasTerm(hex) {
terminator = 1
hex = hex[:len(hex)-1] //截取掉HEX的尾部
}
buf := make([]byte, len(hex)/2+1)
buf[0] = terminator << 5 // the flag byte
if len(hex)&1 == 1 { //说明有效长度是奇数
buf[0] |= 1 << 4 // odd flag
buf[0] |= hex[0] // first nibble is contained in the first byte
hex = hex[1:]
}
decodeNibbles(hex, buf[1:])
return buf
}
Three, storage
The front is just a brief introduction. This is the focus of this article. Next, we will learn how various data are stored. The database used in Ethereum is levelDB
(1) header and block storage
headerPrefix = []byte("h") // headerPrefix + num (uint64 big endian) + hash -> header
tdSuffix = []byte("t") // headerPrefix + num (uint64 big endian) + hash + tdSuffix -> td
numSuffix = []byte("n") // headerPrefix + num (uint64 big endian) + numSuffix -> hash blockHashPrefix = []byte("H") // blockHashPrefix + hash -> num (uint64 big endian) bodyPrefix = []byte("b") // bodyPrefix + num (uint64 big endian) + hash -> block body blockReceiptsPrefix = []byte("r") // blockReceiptsPrefix + num (uint64 big endian) + hash -> block receipts lookupPrefix = []byte("l") // lookupPrefix + hash -> transaction/receipt lookup metadata bloomBitsPrefix = []byte("B") // bloomBitsPrefix + bit (uint16 big endian) + section (uint64 big endian) + hash -> bloom bits
From the above code, we can see the corresponding rules of storage, and then explain several fields. num: block number (uint64 big endian format); hash: block hash value;
There is a point that needs special attention here: because the forward pointer of the Header cannot be modified, then when writing the Header to the database, we must first ensure that the parent and the parent of the parent have been written to the database.
(2) Transaction Storage
Here we look at the code
func WriteTxLookupEntries(db ethdb.Putter, block *types.Block) error {
// 遍历每个交易并且编码元数据
for i, tx := range block.Transactions() {
entry := TxLookupEntry{
BlockHash: block.Hash(),
BlockIndex: block.NumberU64(),
Index: uint64(i),
}
data, err := rlp.EncodeToBytes(entry)
if err != nil {
return err } if err := db.Put(append(lookupPrefix, tx.Hash().Bytes()...), data); err != nil { return err } } return nil }
(3) StateDB module
In Ethereum, the presentation form of an account is a stateObject, and all accounts are managed by StateDB. There is a member in StateDB called trie, which stores stateObject, and each stateObject has an address of 20 bytes, which can be used as a key; each time before the transaction of a block is executed, the trie is recovered by a hash value (hashNode). There is also a map structure, which also stores stateObject, and the address of each stateObject is used as the key of the map
It can be seen that this map is used as the local first-level cache, the trie is the second-level cache, and the underlying database is the third-level cache.
(4) Storage account (stateObject)
Each stateObject corresponds to an account (Account contains data such as the balance, the number of times the contract is initiated), and it also contains a trie (storage trie) for storing State data. The relevant information is as follows
Fourth, the harvest
Not only a better understanding of the storage principle of Ethereum, but also in terms of system design, Ethereum also has a lot to learn from, such as: multi-level cache, data storage methods and so on.
Author: wacxt
Link: https://juejin.im/post/5a4f3aa9f265da3e5468e08e
Source: Nuggets The
copyright belongs to the author. For commercial reprints, please contact the author for authorization, and for non-commercial reprints, please indicate the source.