IPFS Series - Working Principles and Mechanisms

Working principle and mechanism of IPFS

(1) IPFS assigns a unique hash value (file fingerprint: created according to the content of the file) for each file. Even if the content of two files is only 1 bit different, the hash value Are not the same. This approach enables IPFS to support addressing based on file content;

(2) IPFS removes duplicate files throughout the network and establishes version management for files;

(3) When querying a file, the IPFS network searches according to the hash value of the file (unique in the entire network);

(4) The hash value is not easy to remember, and it will cause difficulties in propagation. IPFS uses IPNS to map the hash value into an easy-to-remember name;

(5) In addition to storing the data it needs, each node also stores a hash table to record the location where the file is stored. Used to query and download files;

(6) The final content that IPFS essentially solves for us is: data storage. It can greatly reduce the cost of data storage and increase the speed of data download. The birth of IPFS is to solve the shortcomings of the current Internet;

IPFS Architecture - Identity Layer and Routing Layer

The identity and routing layers are bundled in nature. The generation of peer node identity information and routing rules are formulated through the Kademlia protocol. The essence of the KAD protocol is to build a distributed loose Hash table, referred to as DHT. Everyone who joins the DHT network must generate their own identity information, and then Only through this identity information can we be responsible for storing the resource information in this network and the contact information of other members.

insert image description here

The main idea of ​​DHT (Distributed Hash Table) is as follows: the entire network maintains a huge file index hash table, and the entries of this hash table are in the form of <Key, Value>. Here, the Key is usually the hash value of a file under a certain hash algorithm (it can also be the file name or file content description), and the Value is the IP address where the file is stored. When querying, you only need to provide the Key, and you can query the address of the storage node from the table and return it to the query node. Of course, this hash table will be divided into small blocks and distributed to each node of the entire network according to certain algorithms and rules. Each node only needs to maintain a small hash table. In this way, when a node queries a file, it only needs to route the query message to the corresponding node.

Is the IPFS system reliable for storing data?

The redundant backup technology adopted by the IPFS system is Erasurecoding, referred to as "EC". Simply put: n copies of original data, m copies of verification data are added. At this time, any n copies of data in n+m copies of data can be used to restore the original Data, that is, the maximum number of data failures that can be tolerated is m.

For example, if you want to tolerate 4 disks, use n+4 mode. Traditional RAID6 allows two disks to fail, and the corresponding EC is n+2 mode.

IPFS storage file information security issues

If the file you store is a file that you don't want others to see, you can encrypt the file before storing it in IPFS, so that even if someone else has the file hash, you still need the private key to check the data.

The construction and use principle of the MerkleDAG tree structure:

When storing files on IPFS, the files are first sliced ​​into 256KB files.

Then call the (MerkleDAG.Add) method cyclically to build the file MerkleDAG [Merkle directed acyclic graph (Merkle directed acyclic graph)].
insert image description here

File hash value creation process:

a. Perform sha-256 operation on the sliced ​​file

b. Select 0~31 digits for the operation result

c. The selection result is encoded according to base58, and Qm is appended before the operation result, which is the final result as the 46-bit hash value of the file. According to the calculation of the underlying code of IPFS, the Merkle DAG is a multi-fork tree structure, with a maximum of 174 fork trees.

DAGService maintenance In the source code, DAGService is usually used to maintain MerkleDAG and provide deletion and addition permissions for MerkleDAG.

Is the data of each node complete, and will the data be automatically synchronized between nodes?

  • No, the same file is only stored once in the IPFS network, and three backups will be made in the IPFS network at a time.
  • IPFS uses distributed hash tables to quickly find nodes with data for retrieval
  • The node retrieved by the client is connected, and then downloaded

Synchronized content, whether each node in the index table is complete

In addition to storing the data it needs, each node also stores a hash table to record the location of the file storage. Used to query and download files

How the network retrieval of IPFS files is carried out

IPFS uses a distributed hash table to quickly find the node that owns the data for retrieval, uses the hash to verify whether it is the correct data, and finds the corresponding file;

In order to improve network robustness and usage efficiency, duplicate files with the same hash value are deleted, the version history of each file is tracked, and redundant duplication is judged.

Redundant backup capability

  • The number of backups is determined by the order. IPFS has the most basic backup mechanism. The default basic redundancy is a minimum of three backups of the same file. Although there is no upper limit, the market effect will make the extra redundancy that is not frequently accessed be stored and provided Fang voluntarily deletes.

Is there only one copy of the same file on the IPFS network?

  • The same file is only stored once in the IPFS network, and three backups will be made in the IPFS network at a time.
  • If you do a hash calculation on a certain file that everyone has, and the hash value of the two files is the same, it will not be backed up multiple times when you upload it again.
  • Files corresponding to the same hash on the IPFS network will only be saved once. You only need to use the same hash value to access that file, and this hash value is the address of the file.
  • And changing a word becomes a new version, and the hash value will change, so it needs to be stored again.

data loss problem

  • IPFS uses redundant backup technology and erasure codes to solve the problem of data loss.

  • IPFS uses the calculation method of f(n,m) to increase the security of data storage. As long as the verification data of m is added, the original data N can be obtained, but the storage cost also increases, depending on user needs.

  • In addition, IPFS has its own repair data skills. If IPFS detects that the system has lost files, the system will automatically repair them.

  • hardware requirements

    • Open a docker to run IPFS nodes, the requirements are not high
  • Ecology and activity of IPFS

    • Github: https://github.com/ipfs/go-ipfs
    • Star: 10.1k
  • Besides Filecoin, is there any other chain that combines with IPFS?
    The chain is gone, and there are IPLD and Multiformats in the IPFS ecosystem

Guess you like

Origin blog.csdn.net/wcc19840827/article/details/127974993