IPFS Series-Merck Directed Acyclic Graph (Merkle DAG)

The full name of Merkle DAG is Merkle Directed Acyclic Graph (Merkle Directed Acyclic Graph). It is constructed on the basis of Merkle Tree, which was patented by American computer scientist Merkle in 1979.

Merkle DAY 跟 Merkle Tree

Merkle DAG is very similar to Merkle Tree, but not exactly the same. Merkle DAG does not need to perform tree balancing operations, and non-leaf nodes are allowed to contain data, etc. The ipfs add command will create a Merkle DAG from the data in the specified file. Following the unixfs data format (protobuf) means that files are broken down into blocks and then arranged in a tree structure using "link nodes" and connected together. The "hash" of a given file is actually the hash of the root node in the DAG.

When the content of the file exceeds 256k or the add is a directory, the file will be split into block storage and rearranged into a tree-like structure using link nodes. How to understand the unixfs data format, such as creating a block by yourself, use the following command:

echo "block test" | ipfs block put

You cannot use the ipfs cat command to view the blocks created by yourself, because the data viewed by ipfs cat is based on unixfs data format.

Functions of Merkle DAG

Merkle DAG is very different from Merkle Tree in function. Merkle Tree is mainly for verification, such as verifying digital signatures and Bitcoin Merkle Proof; for Merkle DAG, the main purposes are as follows:

Content addressing: Use multiple hashes to uniquely identify the content of a data block.
Anti-tampering: Conveniently check the hash value to confirm whether the data has been tampered with.
Deduplication: Data blocks with the same content have the same hash value, which can remove duplicate data and save storage space.

Among them, Article 3 is the most important feature of the IPFS system. In the IPFS system, the size of each Blob is limited to 256KB (tentatively 256KB, this value can be modified according to actual performance requirements). Those same data It can be filtered out by Merkle DAG, just add a file reference without occupying storage space.

At the same time, each node can ipfs pin a certain resource and save it locally to improve the redundancy of the entire network

By default, the resources added by ipfs add are automatically fixed in the local warehouse space

Data object format

Defines the object format of Merkle DAG. IPFSObject is the storage structure of IPFSde, which limits the size of each data to less than 256K. In the IPFSObject object, there are two parts, one is Link, which is used to store references to other block data, and the other is data The content of this object. Link mainly consists of three parts, Link name, Hash and Size. Link is just a reference to an IPFSObject. The advantage of this design is that combined with Git, Merkle DAG will greatly reduce storage space consumption. If you modify a part of the source file, you may only modify a few IPFSObjects, and there is no need to modify the entire content. The following is the data structure of IPFSObject and Link:

type IPFSObject struct{
    
    
    links []IPFSLink
    data []byte
}
type IPFSLink struct{
    
    
    Name string
    Hash Multihash
    Size int    
}