Image distribution tool in the cloud native era-introduction to Dragonfly

Welcome to follow the WeChat public account "Cloud Native Notes"

Article Directory

background

What I’m going to share today is Dragonfly, which is an open-source tool for image distribution by Alibaba. You first learned about this tool. It may be because of the research on large-scale container image distribution solutions. Initially, this tool was indeed used for To solve the problem of image distribution, of course, it is still used for image distribution, but in the future this tool will be positioned as an enterprise-level file tool.

The key for dragonfly to solve the problem of large-scale container image distribution is the use of p2p technology, which translates to perr to peer and end to end. Here is a little introduction to the background of container image distribution. Everyone knows that mirror warehouses are generally simple to build using the registry. Of course, there are also enterprise-level mirror warehouses harbor. Normal use, these two tools are no problem, but when there is a large-scale mirror pull request (for example, 1000 nodes are pulling mirrors), the harbor is very likely to be unable to support and hang up or take too long to obtain files Many applications are waiting. Some people will say that it is possible to expand the harbor horizontally, but this is not a permanent solution. Besides, whether the storage of the harbor (such as s3 storage) can be supported is still two questions. Based on this, Ali launched dragonfly, and Uber launched Kraken. The focus of this article is dragonfly, which Kraken has the opportunity to share.

The principle of dragonfly

The key to dragonfly's large-scale container image distribution is the use of p2p technology, so how is it solved? We analyze the issue of large-scale container image distribution in a little deeper, and there are several issues that need to be resolved:

1000 nodes must pull the mirror image from the mirror warehouse. Assuming that the warehouse service is not crashed, but the time to obtain the mirror image will be very long, this will be reflected in the following performance graph.
1000 nodes must pull images from the mirror warehouse, and many of them may pull the same images, or many of them pull the same layer files. So the small problem here is how to solve the problem of repeatedly pulling the same image/layer file. In programming, if the same code appears 2-3 times, it needs to be extracted as a function;
When the mirror is pulled with high concurrency, the storage of the mirror warehouse may collapse, which is determined by the storage system. The storage system itself has a bottleneck. Expanding the harbor horizontally does not reduce the pressure on storage.

For the above two problems, dragonfly's solution strategy is as follows:

Do not repeatedly pull the same image/layer file to reduce the unnecessary concurrency pressure of harbor;
Use the pulled client node as storage to provide external file services to reduce the storage pressure on the harbor.
By going to other client nodes to obtain layer files instead of going to the mirror warehouse to obtain files, the speed of obtaining files will be improved.

So how effective is dragonfly in solving large-scale image distribution?
For Dragonfly, no matter how many clients start to download files, the average download time hardly increases (12s in the experiment, which means that all clients only need 12s to complete the file/image download in total).
For wget, when you have more clients, the download time will continue to increase. When the number of wget clients reaches 1200 (in the following experiment), the file source will crash, so it cannot serve any clients.
From the statistical results, the use of dragonfly definitely speeds up the image acquisition time. This is the official statistical result of dragonfly. The number of samples is counted from 100 nodes. The optimization effect is indeed obvious. It is enough to show that concurrent pulling of mirrors at 100 nodes and above is efficient.

test environment	statistical results
Dragonfly server	2 * (24core 64GB 2000Mb/s)
File origin server	2 * (24core 64GB 2000Mb/s)
Client	4core 8GB 200Mb/s
Target file size	200MB

Insert picture description here

Concepts in dragonfly

Before introducing dragonfly in depth, we need to introduce some concepts in dragonfly:

SuperNode (Super Node): This is a long-running process that mainly provides the following two functions:
- Schedule the network path of downloading piece for each peer node (you can understand it as a seed file, which tells the client which nodes to get which data), superNode is a scheduler and tracker, because the peer node will notify the supernode of the node piece file That is to say, the super node owns the piece file information on all the peer nodes under it.
- The CDN server caches data from the source to avoid downloading duplicate data from the source. Before downloading files, deget will register with the super node and tell the super node dfget the information to be downloaded. The super node will immediately go to the source to download these target files.
Dfget: It is a tool for obtaining files in dragonfly, responsible for downloading file data. Similar to wget, at the same time, he also plays the role of peer (the dfget server command is the peer service). The node of the peer role can transmit data to other clients in the p2p network that use the dfget command.
Dfdaemon: It is an agent. Dfdaemon acts as an agent between the container engine (docker daemon) and the registry (registry or harbor). It is also a local long-running process. When pulling the image, he will intercept the request sent by the docker daemon, and then forward the request for non-layer files directly, and for the layer file acquisition request, it will use Dfget to download these layer files after intercepting. Docker needs to configure proxy parameters and connect to dfdaemon to make dfdaemon effective.
P2P: peer to peer, a distributed application architecture.
Task: The task will store some meta-information about the taskFile, and the task will store some meta-information about the taskFile, pieces and other content. There is a one-to-one correspondence between tasks and files on the disk identified by the task ID. If you want to download files from Supernode, you should register a task, which means that you tell the server the file information to download before actually performing the operation.
DfgetTask: DfgetTask represents the download process initiated by Dfget or other clients. When Dfget tries to download a file from the p2p network, Supernode will create a DfgetTask object to manage the life cycle of the download process.
Peer: In a P2P network, both peer nodes are both resource providers and consumers. Therefore, before Dfget starts downloading tasks from Supernode, Dfget will start a web server that provides downloaded file services for other peers to download in the P2P network, and sends a peer/Register request to Supernode to join the P2P network. Only in this way can Dfget download files from the P2P network.
Piece: A piece is a part of the file to be downloaded and can be interpreted as a piece of file. In dragonfly, the downloaded file is not transmitted completely, but transmitted in segments.

How dragonfly works

There are three components in dragonfly: supernode, dfdaemon and dfget. These components are explained below.

dfdaemon

What exactly did dfdaemon do? What role did it play?

On the nodes that use dragonfly, the proxy parameters of docker will be configured as follows:

vi /usr/lib/systemd/system/docker.service
[Service]
Environment="HTTP_PROXY=http://10.154.12.120:65001"

The http://10.154.12.120:65001 address is the address of the dfdaemon service. Generally, each node will start a dfdaemon service, and the docker proxy parameter on the node will point to the dfdaemon service. The role of the proxy parameter setting of docker is that all requests sent by the docker daemon will be sent to http://10.154.12.120:65001, which is the dfdaemon service, and processed by dfdaemon.

Interception of dfdaemon

Insert picture description here

dfdaemon will not intercept all the requests sent by the docker daemon, it only intercepts requests containing blobs/sha256.*, and it only forwards other requests. That is to say, when pulling the mirror, dfdaemon will make and forward the request to obtain the mirrored blob file.

What happens after dfdaemon intercepts

Dfaemon will intercept docker daemon requests, but except for blob file requests that will be intercepted for additional processing, other requests will only be forwarded. For example, when docker pulls the image, it will need to obtain the Manifest file information, like this request dfdaemon is directly forwarded, and then returned to the docker daemon according to the forwarding result. After obtaining the manifest file, the docker daemon will request to obtain the blob file. Such requests will be intercepted by dfdaemon for other processing. Originally, docker daemon used to obtain blob files from registry or harbor. Now it is intercepted by dfdaemon, and dfdaemon obtains blob files. So how does dfdaemon get the Blob file? In fact, dfdaemon does not directly obtain the blob file, but uses the dfget command to obtain the blob file. The executed dfget command is as follows:

"/opt/dragonfly/df-client/dfget" "-u" "http://10.154.12.121:7999/v2/library/rabbitmq/blobs/sha256:04f8f8815c88ec1ed64a013b039aef36b2ebc09c66101a35a916a2f73bab6ae3" "-o" "/root/.small-dragonfly/dfdaemon/data/5e096c2a-a93c-42c1-a2b6-9f88585a3d92" "--node" "10.142.113.43" "--expiretime" "30m0s" "--alivetime" "5m0s" "-f" "Expires&Signature" "--dfdaemon" "-s" "20MB" "--totallimit" "20MB" "--node" "10.154.12.127,10.154.12.127" "--header" "User-Agent:docker/18.09.9 go/go1.11.13 git-commit/039a7df kernel/4.14.78-300.el7.bclinux.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/18.09.2 \\(linux\\))" "--header" "X-Forwarded-For:10.154.12.127"

dfdaemon transfers the task of pulling the blob file to dfget. After the dfget command is executed, dfdaemon will read the content of the downloaded blob file locally (dfget will save the file locally after obtaining the file), and then return the file content to docker daemon, then docker daemon even gets the Blob file.

dfget

As mentioned above, dfdaemon uses dfget to obtain blob files, so how does dfget obtain it? What is the difference with docker daemon to get files directly?

dfget get blob process

In fact, the current version of dragonfly is mainly used for image distribution. By default, it only intercepts and processes blob file requests. Dragobfly's dfget is responsible for obtaining blob files. The following picture shows the whole process of dfget obtaining blob files:
Insert picture description here

After the dfget command is started, register to the supernode first, tell which blob file is needed locally, and notify the supernode that a peer service has been started locally, how many addresses, and how many service ports. The supernode will return a taskId to dfget; one detail needs to be paid attention to here. There may be multiple peer services locally in a dfclient, and the peer service will occupy the port, so should we pay attention to it when using it in k8s?
dfget requests the supernode again according to the taskId to obtain the seed file of the downloaded blob file. The seed file tells you which peer nodes to get which pieces, download all these pieces, and you can put together a completed blob file. Then store the blob file in a temporary directory, and the external program will transfer the blob file in the temporary directory to the correct target directory (dfdaemon will go to the target directory to read the file and return it to the docker daemon).

Note: Is there a question here? In case the supernode hangs, does dfget fail to get the Blob file? This problem has been considered, and the problem has been solved in the source code. If dfget cannot get the complete data through dragonfly, then dfget will go directly to the source warehouse to get the blob file and save it in a temporary directory.

supernode

The super node plays a central scheduling role in the entire dragnfly system. dfget needs to obtain the seed file from the super node. If the super node is lost, dfget can only use the original method to obtain the blob file and return to the original request. So what operations will super nodes do:

Register yourself as a peer service, and supernode uses nginx as its file server.
Provide services such as registration, obtaining seed files, reporting node status, and of course related monitoring interfaces.

Registration interface

The above one learned that every time dfget is started, it will register with the supernode and inform the supernode of the peer service information started locally. What will the supernode do in this interface?

The dfget registration information will contain the file information expected to be downloaded, and supernode will parse it out;
The supernode forms a unique taskId according to the file information to be downloaded and returns it to the client.
The supernode will get the file size expected to be downloaded and calculate the piece size.
The supernode will download the desired file asynchronously:
- Check whether the file already exists locally, exit if there is;
- If the file does not exist locally, then go to the source warehouse to obtain the file and store it locally on the supernode.
After dfget obtains the taskId, it will request the supernode again to obtain the seed file. The supernode will finally form a seed file according to the block size and number of the file: which peer node to download each block to. If a Blob file is 196M, then each piece has 4M, which is divided into 49 pieces in total. The super node will tell deget where to download the 49 pieces.
After dfget downloads each piece, it will report the download status to the super node, and the super node will know which pieces are available at the peer node under its jurisdiction.

Insert picture description here

The above picture is an interaction diagram about dragobfly I drew. The interaction is explained below. First introduce the deployment environment:

Use docker to deploy supernode on 1 node;
Docker is used to deploy dfclient on both nodes. dfclient is composed of dfdaemon and dfget. The docker of both nodes is configured with proxy parameters.

Get the description of the Blob file:

Execute the docker pull command on a node.
The docker daemon of the node forwards the request to the local dfdaemon
dfdaemon chooses forwarding or intercepting processing according to the filtering conditions;
dfdaemon intercepts blob file requests;
dfdaemon uses dfget to get Blob files;
dfget registered as Peer;
dfget requests supernode to obtain the seed file;
dfget obtains the Blob file according to the seed file and saves it in the temporary directory;
dfget reports the download status to supernode;
dfget is moved to the target directory via temporary files, and dfget is executed;
dfdaemon obtains the file content from the target directory and returns it to the docker daemon;
After downloading a blob file, start the next blob file.

to sum up

Advantages of dragonfly:

Obviously speed up the pulling speed, especially when there are many clients, the optimization effect is obvious, so the advantages will not be discussed.

Risk points of dragonfly:

We can see that dragonfly has a lot of data placement behavior, and a lot of data is read from the disk, so the performance requirements of the disk are high.
Frequent placing and reading behavior hinders further performance improvement.
Because local disks are used, disk capacity management is also very important, and it is overwhelming. Of course dragonfly will clean up the disk regularly and quantitatively.

Note: Dragondly is suitable for large-scale concurrency scenarios. If your usage scenario does not involve 100 nodes pulling images concurrently, it is recommended not to use Dragonfly. It is sufficient to use horizontally expanded harbor. After all, using dragonfly will bring a certain degree of complexity.

For the above risk points, it will be resolved in dragonfly2.0, and there will be a stream mode. In addition, dragonfly 2.0 will have a preheating function. In short, the development of this component has just stepped onto the right track, and the road is still long.