An in-depth understanding of IPFS The core of IPFS is a versioned file system

"After you have IPFS, you can start to look at everything else in a specific way, and then you will realize that you can replace them all."-
Juan Benet, founder of IPFS

01 simple understanding of IPFS

This section will attempt to provide high-level insights into the following in-depth technical summary of Dr. Christian Lundkvist.

IPFS was originally proposed by Juan Benet with the purpose of trying to build a versioned scientific data system that can move quickly. Version control allows you to track software status over time (similar to Git). Since then, IPFS has been considered a distributed, permanent Web. "IPFS is a distributed file system designed to connect all computing devices with the same file system. In some ways, it is similar to the original address of the Web, but in fact it is more similar to a single bittorrent (bitstream) group exchanging Git objects. IPFS may become a new major subsystem of the Internet in the future. If it is successfully built, it will be able to supplement or replace HTTP, and even replace more. It sounds crazy, it is crazy."

The core of IPFS is a versioned file system, you can get files and manage them, you can also store them in a certain address, and then track the version over time. IPFS also considers how these files move across the network, so it is also a distributed file system.

Insert picture description here
IPFS has rules similar to bittorrent for how data and content move on the network. The file system layer provides very interesting properties, such as:

-A fully distributed website
- A website without an origin server-A website that
can be run completely on the client browser
-There is no server website with which to talk

Content addressing

IPFS does not refer to objects (pictures, articles, videos) through the server where the objects are stored, but refers to all content through the hash on the file. The principle is that if you want to visit a specific page in a browser, IPFS will ask the entire network "Does anyone own the file corresponding to this hash?" A node on IPFS can return the file so that you can access it .

IPFS uses content addressing at the HTTP layer. This is a convention, we will create some representation of the content itself, rather than creating an identifier that locates things by location. This means that the content will determine the address. The mechanism is to obtain a file, and then hash it in an encrypted manner, so that you can get a very small and secure representation of the file, thus ensuring that someone can't just take out another with the same hash value. A file and use it as an address. The address of a file in IPFS usually starts with a hash that identifies the root object, and then a path that moves downward. You are talking to a specific object instead of talking to the server, you can view the path within that object.

HTTP vs IPFS find and retrieve files

HTTP has a nice property, where the identifier is the location, so it is easy to find the computer hosting the file and talk to it. This is useful and usually works well, but it cannot be used in offline situations or in large distributed schemes where you want to minimize the load on the entire network.

In IPFS, the steps can be divided into two parts: use content addressing to identify the file, to find it-when you have the hash value, you will ask the connected network "who owns this content? (hash)", and then Connect to the corresponding node and download. The result is point-to-point coverage, which can provide you with very fast routing.

Insert picture description here
02 IPFS example

Technical inspection and IPFS (Interplanetary File System) are a combination of tested Internet technologies such as DHTs, Git version system and Bittorrent. It creates a P2P group that can exchange IPFS objects. The total number of IPFS objects forms an encrypted and verified data structure called Merkle DAG, which can be used to model many other data structures. We will introduce IPFS objects and Merkle DAG in this article, and provide structural examples that can be modeled using IPFS.

IPFS objects

IPFS is essentially a P2P system for retrieving and sharing IPFS objects. The IPFS object is a data structure with two fields:

Data—Blobs of unstructured binary data less than 256 kb in size.
Links-an array of link structures, these are links to other IPFS objects.

The link structure has three data fields:

Name-the name of the link.
Hash—The hash of the linked IPFS object.
Size—The cumulative size of the linked IPFS object, including the position that follows its link.

This size field is mainly used to optimize P2P networks, and we will basically ignore it here, because conceptually speaking, it is not needed for logical structures.

IPFS objects are usually referenced by their Base58-encoded hash. For example, let's use the IPFS command line tool to view the IPFS object with the hash QmarHSr9aSNaPSR6G9KFPbuLV9aEqJfTk1y9B8pdwqK4Rq (please try it at home):

Readers may notice that all hashes start with "Qm". This is because the hash is actually multihash, which means that the hash itself specifies the hash function and hash length in the first two bytes of multihash. In the above example, the first two bytes of hexadecimal are 1220, where 12 means this is the SHA256 hash function, and 20 means the length of the hash (in bytes), which is 32 bytes.

The data and named links provide the structure of the Merkle DAG for the collection of IPFS objects-DAG stands for directed acyclic graph, and Merkle stands for an encrypted and authenticated data structure that uses encrypted hashes to process content. This is an exercise left for the reader to think about why there can be no cycles in this chart.

In order to visualize the graph structure, we will use a graph to visualize the IPFS object. The graph contains the data in the nodes. The links are directed to the graph edges of other IPFS objects, where the name of the link is a label on the graph edge. The above example is as follows:

Now we will illustrate the various data structures that can be represented by IPFS objects.

File system

IPFS can easily represent a file system composed of files and directories.

Small file

A small file (<256 kB) is represented by an IPFS object. The data is the content of the file (plus a small header and footer). There is no link, that is, the link array is empty. Please note that the file name is not part of the IPFS object, so two files with different names and the same content will have the same IPFS object representation and therefore the same hash value.

We can use the command ipfs to add a small file to IPFS:

We can use ipfs cat to view the file content of the above IPFS object:

Use ipfs objects to view the infrastructure to gain benefits:

We visualize the file as follows:

Large file

Large files (> 256 kB) are represented by a linked list of file blocks smaller than 256 kB, and only the smallest data specifies that this object represents a large file. The name of the link to the file block is an empty string.

Directory Structure

A directory is represented by a list of links pointing to IPFS objects representing files or other directories. The name of the link is the name of the file and directory. For example, consider the following directory structure of the directory test_dir:

The files hello.txt and my_file.txt both contain the string Hello World! \ n. The file testing.txt contains the string Testing 123\n.

When representing this directory structure as an IPFS object, it looks like this:

Note that it contains Hello World! \nThe file is automatically deduplicated,\n, the data in the file is only stored in a logical location in IPFS (addressed by its hash address).

The IPFS command line tool can seamlessly follow the directory link name to traverse the file system:

Version file system

IPFS can represent the data structure used by Git for versioned file systems. The Git commit object is described in the Git Book. At the time of writing, the structure of IPFS submission objects has not yet been fully specified, and discussions are still ongoing.

The main attribute of the submission object is that it has one or more links with names of parent0, parent1, etc., which point to the previous submission, and has a link to the name object (called tree in Git), which points to the file referenced by the object system structure.

Let’s take the previous file system directory structure and two submissions as an example: the first submission is the original structure, and in the second submission, we have updated the file my_file.txt to represent another world, not the original" Hello World!".

Also note here that we have automatic deduplication, so the new objects in the second submission are just the home directory, the new directory my_dir and the updated file my_file.txt.

Insert picture description here
03 Blockchain

This is one of the most exciting use cases for IPFS. The blockchain has a natural DAG structure, because past blocks are always linked by the hash value of their successor blocks. More advanced blockchains such as the Ethereum blockchain also have an associated state database, which has a Merkle-Patricia tree structure and can also be simulated using IPFS objects.

We assume a simple blockchain model, where each block contains the following data:

The list of transaction objects;
the link to the previous block;
the hash of the state tree/database.

The blockchain can then be modeled in IPFS as follows:

When putting the state database on IPFS, we saw the benefits of deduplication-between two blocks, only the changed state items need to be stored explicitly.

The interesting point here is the difference between storing data on the blockchain and storing data hashes on the blockchain. On the Ethereum platform, you need to pay a considerable fee to store data in the associated state database to minimize the expansion of the state database (blockchain expansion). Therefore, this is a common design pattern, that is, larger data does not store the data itself, but an IPFS hash of the data in the state database.

If a blockchain with a related state database is already represented in IPFS, the difference between storing hashes on the blockchain and storing data on the blockchain becomes a bit blurred, because everything is the same anyway. Stored in IPFS, and only the hash of the block is needed to hash the state database. In this case, if someone stores an IPFS link in the blockchain, we can seamlessly follow the link to access the data, just like the data is stored in the blockchain itself.

However, we can still distinguish between on-chain and off-chain data storage. We do this by looking at what the miner needs to deal with when creating a new block. In the current Ethereum network, miners need to process transactions that will update the state database. For this, they need to access the complete state database so that they can update it anywhere after the change.

Therefore, in the blockchain state database represented by IPFS, we still need to mark the data as “on-chain” or “off-chain”. For miners, "on-chain" data is essential for local mining, and this data will be directly affected by transactions. "Off-chain" data will have to be updated by users, and miners will not need to touch it.

Guess you like

Origin blog.csdn.net/weixin_49795899/article/details/114695080