IPFS underlying foundation of IPFS technology series

foreword

  This article mainly introduces the underlying foundation related to IPFS, including distributed hash table (DHT), block exchange protocol (BitTorrent), version control (Git), self-verifying file system (SFS), Merkle Tree and Merkle DAG


1. Distributed Hash Table (DHT)

First of all, let us understand the working principle of the previous file network system:
  the first generation of P2P file network mainly relies on a central database, the database server receives all queries, and the server returns the required database address list to the client accordingly .
Disadvantages: Such a design can easily lead to a single point of failure, and what's more, it will cause the entire network to be paralyzed.
  In the second-generation distributed file system, Gnutella uses the message flooding method (message flooding) to locate data. The query message will be announced to all nodes in the entire network until the message is found, and then returned to the queryer.
Disadvantages: Blind requests will cause the network to be exhausted quickly, and the network requests are very large, which is easy to cause congestion.
  In the third-generation distributed file system, DHT (Distributed Hash Table) is mainly used to maintain a huge file index hash table in the whole network, which is shaped like <Key, Value>. The Key here is usually the hash value under a certain hash algorithm in the file, and the Value stores the IP address. By providing the Key, the address of the storage node can be queried from the table and returned to the query node. Reduced query time.
There are three representative partition table types cited by IPFS, namely Kademlia DHT, Coral DHT, and S/Kademlia.

1.Kademlia DHT

The details of the KAD network algorithm are as follows:

1) Kademlia binary state tree

First, the node ID of the Kademlia network is maintained by a binary tree. It has the following characteristics:
1. Each network node starts from the root node and arrives along its shortest unique prefix.
2. Each network node is a leaf node.
For any node of a tree, we can follow its prefix as a path and decompose it into a series of subtrees that do not include itself.

2) Node routing table K-Bucket

The node routing table is used to save the connection information between each node and other nodes within a certain distance from itself. Each piece of routing information consists of the following three parts: IP Address, UDP port, and Node ID.

2.Coral DSHT

  Coral DSHT is one of the core components of Coral CDN. The Kademlia protocol uses the XOR distance, that is, the information is always the node with the closest XOR distance in storage. The characteristics of this design are obvious. It ignores factors such as the delay between nodes and the location of data, and wastes a lot of network bandwidth and storage space. Coral uses a different idea. It evaluates the connection status of all nodes, and then divides them into several levels according to the cycle time, and queries key-value pairs according to different levels.
  Coral DSHT is more suitable for key-value pair retrieval in soft state, that is to say, the same Key may store multiple Values. This mechanism maps a given Key to a Coral server address on the network.
Coral DSHT mainly has the following characteristics:
1. Index mechanism and layering
2. Routing layer based on key-value pairs
3. Sloppy storage

3.S/Kademlia DHT

  Kademlia is used in a completely open P2P network. If no security measures are provided, it is
vulnerable to various attacks from malicious nodes. Based on the Kademlia protocol, the S/K protocol adds implicit identity authentication and brother broadcast to the node ID.
S/K can resist common eclipse attack and Sybil attack.

2. Block Exchange Protocol (BitTorrent)

  BitTorrent is a content distribution protocol that uses content distribution and peer-to-peer technology to reduce the load on centralized servers. In the BitTorrent network, each user needs to upload and download data at the same time, and users forward their own file parts to each other until each user's download is completed.

Terms involved in BitTorrent:
.torrent: It is a metadata file received by the server (usually ending in .Torrent), generally used to record information about downloaded data
tracker: refers to the server on the Internet responsible for coordinating the actions of BitTorrent clients to help peers Connect each other.
peer: peer is another server on the Internet that can connect and transmit data. Peers download and upload each other.
seed: A computer with a complete copy of a particular torrent is called a seedy. On initial release, enable initial sharing.
.swarm: A group of all devices connected to a torrent.
Chocking: Chocking blocking is a strategy of temporarily rejecting uploads. BitTorrent requires each peer to upload to each other.
For uncooperative peers, a temporary blocking strategy will be adopted.
Pareto efficiency: Pareto efficiency (Pareto efficiency) refers to the resource allocation has reached the stage of best use.
Tit-for-tat: also known as tit for tat, in BitTorrent, as much download speed as the Peer contributes to itself, then the corresponding upload speed will be contributed to him.
BitTorrent is based on a peer-to-peer implementation, which includes three parts: content publishing, block exchange, and segment selection algorithm.

3. Version control (Git)

  A version control system is a system used to record changes in the content of one or several files for future reference to specific version revisions. There are many software for version control, but they can be roughly divided into three categories: local version control system, centralized version control system, and distributed version control system.

1. Local version control system

  This method is what we often use to copy the entire directory as a backup, and sometimes add the backup time or other identifiers as a distinction. Later, some people developed many local version control systems, using some kind of simple database to record the previous update differences of files. The shortcomings of the local version control system are also obvious, and developers on different systems cannot work together.

2. Centralized version control system

  Such systems, such as CVS and Subversion, are relatively common. They all have a single centralized management server, which saves revisions of all files. Submit an update.

3. Distributed version control system

  The distributed version control system avoids the risk of a single point of failure in the centralized version control system. The more common ones are Git, Mercurial, etc. It not only extracts the latest version of the file snapshot, but mirrors the code warehouse completely. If any collaborative work server fails, it can be restored with any mirrored local warehouse.

Git usually has the following characteristics:

Snapshot stream: Git stores snapshots that change over time, which means that the snapshots obtained by each person may be different.
Execute operations locally: The vast majority of Git operations only require access to local files and resources.
Only adding data: All Git operations we perform are essentially additions to the Git database, and all operations are reversible.
Integrity check: All data in Git will calculate the checksum before storage, and then refer to it with the checksum.
Workspace and working status: Git's workspace includes working directory, Git warehouse and temporary storage area. Git warehouses include local warehouses and remote warehouses. The working status of Git includes committed, modified and staged.
The basic Git workflow is as follows:
1. Modify files in the working directory.
2. Temporary file, put the snapshot of the file into the temporary storage area.
3. Submit the update, find the file in the staging area, and permanently store the snapshot in the Git repository.
Branch: A branch in Git is essentially just a mutable pointer to a commit object. Git uses master as the default name for branches. The trunk can be cloned into other branches, and each branch's mutable pointer is automatically moved forward with each commit. As shown below:

4. Self-Verifying File System (SFS)

  Self-Certifying File System (SFS) is designed to design a file system shared by the entire Internet. The global SFS system is under the same namespace.
The biggest problem in realizing a globally shared file system is how to let the server provide authentication for the client. For this problem, SFS adopts the method of embedding public key information into the file name. The advantage of this is that it no longer needs to implement secret key management inside the file system, and users can choose the encryption method they need according to their needs.

The core idea of ​​SFS has the following points:
1. The SFS file system has a self-verifying path name, and there is no need to implement key management inside the file system
.
2. It is easy to set up various key management mechanisms on SFS, including various combination mechanisms.
3. SFS decouples key management from key distribution.
4. Implement a global file system.

5. Merkle Tree and Merkle DAG

1.Merkle Tree

Merkle Tree (Merkle tree) is a kind of tree, most of them are binary trees, and they can also be multi-fork trees. It has the following characteristics:
1. The value of the leaf node of the Merkle Tree is the unit data or unit data Hash of the data set.
2. The value of the non-leaf node is calculated according to the hash algorithm based on the values ​​of all the leaf nodes below it out.

Application of Merkle Tree

Digital signature: The original Merkle tree is to efficiently process Lamport single signature. Before that, each Lamport key could only be used to sign one message, but combined with the Merkle tree, multiple messages can be signed.

P2P: In P2P networks, Merkle trees are used to ensure that data blocks received from other nodes are not damaged and have not been replaced, and even check that other nodes do not cheat or publish false blocks.
Bitcoin: The earliest application of Merkle Proof is Bitcoin (Bitcoin), created by Satoshi Nakamoto. Bitcoin's Blockchain uses Merkle proofs to store transactions for each block. The benefit of this is to simplify payment verification.

2.Mark the DAY

Merkle DAG (Merkel directed acyclic graph), which is built on the basis of Merkle tree.
Merkle DAG is also very different from Merkle Tree in function. Merkle DAG mainly has the following three functions:
Content addressing: use multiple Hash to uniquely identify the content of a data block.
Anti-tampering: It is convenient to check the Hash value to confirm whether the data has been tampered with.
Deduplication: The Hash values ​​of database blocks with the same content are the same, so it is easy to remove duplicate data.

Summarize

  This article mainly introduces some underlying technologies of IPFS, including distributed hash table (DHT), block exchange protocol (BitTorrent), version control (Git), self-verifying file system (SFS), Merkle Tree and Merkle DAG

Guess you like

Origin blog.csdn.net/ggj89/article/details/122581743