IPFS module analysis of IPFS technology series

foreword

This article mainly introduces the three major components of IPFS, including Multiformat (self-describing format protocol library), libp2p (P2P network protocol module library) and IPLD (data structure model library), which cooperate with each other while maintaining a certain degree of independence

1. Multiformat

Multiformat is a self-describing format protocol component in IPFS, which is used to solve the problem that various programming languages ​​or data types are difficult to distinguish in detail. The way it adopts is to add self-describing fields on the data, and only need to judge the attributes of the data on the fields.
Multi-Format currently supports the following five protocols:

1.Multi-hash

Multi-Hash has the following characteristics:
1. Multi-Hash will prompt the user, for example, some hash values ​​may no longer be safe, and there is a risk of being deciphered.
2. Make it easier to update the hash algorithm, and it is easier to standardize the type of hash algorithm and the length of the hash value.
3. Most tools no longer need to do any checks on the hash.
Multi-Hash format
The format of Multi-Hash stores three types of information, namely type, length and hash value. The naming format is (type-length-value)

<Multi-Hash> ::= <type-哈希类型><长度><哈希值>

As can be seen from the format of Multi-Hash, it brings many benefits to our use.
1. Get a hash value, you can judge its encryption method through the first two bytes of this value.
2. It provides convenience for updating the encryption algorithm of the system in the future.
3. Does not take up too much extra space.
Multi-Hash records more than 100 common hash types, and the names and hexadecimal numbers of these hash algorithms can be queried through the table

2.Multi-Base

  Multi-Base is a self-describing basic encoding protocol, which is used to save data and describe how the data is encoded. It can freely choose the encoding type of input and output, and other programs can also obtain its encoding type through this value.
The format of the Multi-Base stores two types of information, which are the encoding code type and the encoding data value. Here, it does not need to define the length, and only needs 1 byte to distinguish various types.

<Multi-Base> ::= <type编码类型><编码内容>

The advantage of Multi-Base is that users can quickly distinguish various encoding methods, and can switch between various encoding methods by calling Multi-Base.
Multi-Base also has its own mapped lookup table. As shown below:

3.Multi-Addr

  The purpose of the Multi-Addr component is to add self-describing information to address data. Multi-Addr is divided into two versions, one is a readable UTF-8 encoded version, which is used to display to users; the other is a hexadecimal version, which is convenient for network transmission.
  The format of the Multi-Addr also has two types of information, which are the address type code type and the coded data value. Each Multi-Addr is represented by a type/value cycle, such as: /address type code name/address/address type code name/address

<UTF-8 Multi-Address> ::= /<UTF-8 type-地址类型>/<UTF-8地址>

Multi-Addr has the following function type table, which has been integrated into Multi-Formasts.

4.Multi-Codec

Multi-Codec is to make the data more compact and self-describing codec.
In addition to defining data types such as Multi-Hash, Multi-Addr, and Multi-Base, Multi-Codec also defines JSON file type, compression type, image type, and IPLD.
The format of Multi-Code is as follows:

<Multi-Codec> ::= /<十六进制 type >/<数据内容>

Multi-Codec is compatible with the previous types of Multi-Formats, because when designing the Multi-Codec form, it has been considered to avoid the previously occupied code.
As mentioned earlier, Multi-Codec defines multiple types of data, including raw data, IPLD data, blockchain data, serialized data, and some other Multi-Formats.
As shown below:

5.Multi-Stream

Multi-Stream is a self-describing encoded stream protocol, used to implement self-describing bit strings, and its main scenario is mainly transmission in the network.
Multi-Stream contains 3 fields, which are stream length, Multi-Codec type and encoded data itself, separated by two delimiters.
The format of Multi-Stream is as follows:

<Multi-Codec> ::= <流长度length>/<Multi-Codec type>\n<编码数据>

Two, libp2p

libp2p is the most important module in the implementation of the IPFS protocol stack.
libp2p is responsible for the network communication, routing, exchange and other functions of IPFS data.
  libp2p abstractly integrates some tool attribute functions that all developers basically need. Including link reuse between nodes; mutual exchange of node information; designated relay nodes; network address translation (NAT); distributed hash table (dht) addressing; message round-trip delay (RTT) statistics, etc. As shown below:

In the attribute functions shown in the figure, contributors from Protocol Labs and the open source community have achieved certain realizations and planned long-term goals.
As shown below:

Purpose:
libp2p is a multi-module, easy-to-extend network stack library specially designed for P2P applications. Its application scenarios mainly focus on the Internet of Things, blockchain, distributed messaging, and file transfer.
1. Internet of Things: As we are familiar with the security scene, a direct connection is established between the security camera and the mobile phone, thereby reducing the bandwidth pressure of the central server.
2. Blockchain: There are also some projects in the blockchain field that use libp2p as their underlying service. Such as Filecoin and Polkadot (Poka chain) projects.
Libp2p is used in Filecoin's "block data synchronization", "file transfer" and "node search".
3. Distributed message: The distributed message system can directly establish a connection between nodes without the transfer function of the central server for sending and receiving messages.
4. Transfer files: Both Filecoin and IPFS are based on libp2p for data transfer.

3. IPLD

IPLD is the abstraction layer of a data model based on content addressing. IPLD can connect various data structures that focus on content addressing, such as blockchain data, Git, BitTorrent, etc. [IPLD data
model]
IPLD defines 3 kinds of data Types: Merkle-Links, Merkle-DAG, and Merkle-Paths.
Merkle link
Merkle link has two main functions:
1. Encryption integrity verification: users can test the integrity of data by hashing the target object.
2. The data structure cannot be changed: the data structure with the Merkle link cannot be changed after being referenced.

[Content Identifier (CID)]
CID is a self-describing content-addressing identifier that uses hashing to implement content addressing.
CID currently has two versions, namely CIDv0 and CIDv1. CIDv0 is only applicable to IPFS default encoding rules and encryption algorithms. The CIDv1 adaptation algorithm and coding rules are greatly increased.
1. CIDv1
CIDv1 contains 4 fields, which are multibase type prefix code, cid version number, multi-codec content identifier, and complete multihash.
The format of CIDv1 is as follows:

<cidv1> ::= <multibase type><cid-版本号><multicodec><multihash>

2. CIDv0
CIDv0 is defined in the same way as CIDv1. However, in CIDv0, the Multi-Base type code defaults to Base58, the CID version number defaults to 0, and the Multi-Codec defaults to Protobuf.

Summarize

This article mainly makes a brief introduction to several modules of IPFS, including Multiformat (self-describing format protocol library), libp2p (P2P network protocol module library) and IPLD (data structure model library).

Guess you like

Origin blog.csdn.net/ggj89/article/details/122583098