Introduction to HULK Container Mirror Warehouse

 Zhang Zhifei  360 Cloud Computing

Heroine declaration

Mirror warehouse, as the name suggests, is to store mirror images. The concept of Docker warehouse is similar to Git, and the registration server can be understood as a hosting service like GitHub. The user makes a mirror and pushes it to the warehouse, so that next time you use the mirror on another machine, you only need to pull it from the warehouse. This article mainly introduces Harbor, the mirror warehouse used by HULK .

PS: rich first-line technology, a wide range of forms, all in " 3 60 cloud computing " point of concern Oh!

What is Harbor

Habor is an open-source container image repository by VMWare, an enterprise-level Registry server for storing and distributing Docker images.

Harbor mainly provides some enterprise-level management functions, and the docker registry is used for image storage, which is equivalent to the reverse proxy of the docker registry .

1

Harbor architecture

image

As shown in the figure above, Harbor is composed of 6 components:

  1. Proxy: nginx reverse proxy. The above picture is from the official website and has been lagging behind. At present, all requests to harbor must go through the UI, including the Proxy–> Registry in the above figure.

  2. Registry: Responsible for storing Docker images and processing Docker push/pull commands. Because Harbor needs to strengthen the access control to the image, the registry will guide the client to the token service in order to obtain a valid token for each pull or push request.

  3. Core services: Harbor's core functions, mainly providing the following services:

    1. UI: Provides a web management page, of course, it also includes a front-end page and back-end API.

    2. Webhook: Configuration in the Registry, mirror replication, and log update are all realized through this function.

    3. Token service: Token service. If there is no token in the request sent from the Docker client, the registry will redirect the request to the token service.

    4. Job services: mirror copy.

    5. Log collector: Log collection.

2

Harbor function used by HULK

User Management


Role-based access control: Users are divided into three roles: project administrator (MDRWS), developer (RWS) and visitor (RS), and of course there is a creator admin system administrator. 
Note: M: management, D: delete, R: read, W: write, S: query.

Project management


Project management is the most important functional module of the system. A project is a logical collection of a set of mirrored warehouses, and is the division of authority management and resource management. There are multiple mirror repositories under a project, and multiple members of different roles are associated. Mirror replication is also project-based. By adding replication rules, the mirror under the project can be migrated from one harbor to another.

Configuration management


Configuration management is mainly to configure the authentication mode of the harbor, which is used internally by the enterprise, usually connected to the company LDAP, and the database authentication we currently use; you can also set the valid time of the token.

Mirror copy


The HULK multi-computer room is realized through the mirror copy function, which can synchronize mirroring between different data centers and different operating environments.

Currently on HULK, after users apply for container services, we will create a Harbor project for them (xxl-api in the figure below is the project name in Harbor),

And assign two user names, one RWS, one RS, xxl-api is a read-only user, and one xxl-api-p developer user who is hidden from the user. In order to achieve the purpose that users can only operate their own private warehouse.

3

Harbor's high availability

  • Load balancing

High-availability deployment is completed through three harbors, and services are provided externally through a load balancer (LVS on HULK). Shared database and cache.


image.png


  • Multiple computer rooms

Multi-computer rooms can cope with s3 exceptions in a single computer room, isolated islands in the computer room, and other special situations, while reducing the burden on the host room.

At present, we have two sets of harbors, bjyt (master) and shyc2 (slave), push to the master, and k8s to pull the mirror can choose to pull the master or the slave.

The harbor components of each computer room are completely independent, including s3 and database. The purpose is to not affect the service even if there are islands.

What is mirror

Mirror is the Union File System (UnionFS), and the driver currently used is overlay2.

The basic layer of mirroring is rootfs: there will be dependencies when any program runs, whether it is a dependent library of the development language layer, or various system libs, operating systems, etc., these libraries may be different on different systems, or there may be missing . In order to make the container runtime consistent, docker integrates and packages the dependent operating system and various lib dependencies (ie mirroring), and then when the container starts, it serves as its root directory (root file system rootfs), making the container process various Dependent calls are all in this root directory, so that the consistency of the environment is achieved.

Layer: The foundation in Dockerfile is rootfs, and every subsequent operation is a layer, such as RUN, ADD and other commands. So in order to mirror the lower case volume, you can integrate multiple RUN commands into one line, so that multiple layers become one layer.

Only the top level of the mirror is read-write, and the rest are read-only (the whiteout attribute of the directory ). In the so-called whiteout attribute union file system, if the deleted file is in the read-only layer, the top layer sees that the file has been deleted, but the read-only layer file still exists, and the whiteout hidden file is changed at the top layer . The rm mnt/haha.log operation has the same effect as touch a/.wh.haha.log.


1

Mirror mount of the container

Docker supports a variety of graphDrivers, including vfs, devicemapper, overlay, overlay2, and aufs. The docker-ce image storage driver currently uses overlay2.

The default storage directory of docker is /var/lib/docker

[root@p22295v zhangzhifei]# ls -lrt /var/lib/docker/total 156drwx--x--x   3 root root  4096 Dec  6  2018 containerddrwx------   4 root root  4096 Dec  6  2018 pluginsdrwx------   3 root root  4096 Dec  6  2018 imagedrwx------   2 root root  4096 Dec  6  2018 trustdrwxr-x---   3 root root  4096 Dec  6  2018 networkdrwx------   2 root root  4096 Dec  6  2018 swarmdrwx------   2 root root  4096 Dec  6  2018 builderdrwx------  89 root root 12288 Jul 17 11:07 volumesdrwx------   2 root root  4096 Jul 17 14:30 runtimesdrwx------   2 root root  4096 Jul 23 12:51 tmpdrwx------ 758 root root 94208 Jul 29 19:12 overlay2drwx------  80 root root 12288 Jul 29 19:12 containers

Let's run a container demo:

[root@p22295v zhangzhifei]# docker run -it -d  kraken-agent:dev 83555ad8c034682ad885fc9e320bfb1f8b75498b61a1a8684d738c411caa930b

Start a container, and generate a container view layer in the /var/lib/docker/overlay2 directory. The directories include diff, link, lower, merged, and work.

 diff records the data of each layer's own content, link records the link directory of this layer (actually the link to the layer under the l directory), such as creating a directory in the container or adding the directory in diff.

According to the storage data and functions, these layers can be divided into 3 parts:

1. Read-only layer

2. The init layer (sandwiched between the read-only layer and the read-write layer, used to store /etc/hosts, /etc/resolv.conf and other information. The reason for the need for such a layer is that these files are originally read-only Part of the system image layer, but users often need to write some specified values ​​such as hostname when starting the container, so they need to be modified at the read-write layer. However, these modifications are often only valid for the current container. When you don’t want to perform docker commit, submit these information along with the readable and writable layer. Therefore, Docker's approach is to mount these files in a separate layer after modifying these files. And users who perform docker commit will only submit Read and write layer, so these contents are not included.)

3. Read and write layer (before the file is written, this directory is empty. Once the write operation is done in the container, the content you modify will appear in this layer in an incremental manner)

View the container mount directory:

[root@p22295v zhangzhifei]# cat /var/lib/docker/image/overlay2/layerdb/mounts/83555ad8c034682ad885fc9e320bfb1f8b75498b61a1a8684d738c411caa930b/mount-id 3695f349587aaa2cdc82fcde1a380c7b567ef870a47e4c28b8b279e4edc9eb40[root@p22295v zhangzhifei]# #读写层[root@p22295v zhangzhifei]# ls /var/lib/docker/overlay2/3695f349587aaa2cdc82fcde1a380c7b567ef870a47e4c28b8b279e4edc9eb40/diff/[root@p22295v zhangzhifei]##只读层[root@p22295v zhangzhifei]# ls /var/lib/docker/overlay2/65e5cdd72f2995da4c73f2d9b90e8d974b9d2f18829a2479296aaec24e67d185/diff/bin  boot  dev  etc  home  lib  lib64  media  mnt  opt  proc  root  run  sbin  srv  sys  tmp  usr  var#只读层(Dockerfile时ADD的二进制程序)[root@p22295v zhangzhifei]# ls -lrt /var/lib/docker/overlay2/852fa5138c3da5070b59e6402348a5a281378b28ee08fede9c635e4101f91092/diff/usr/bin/total 28836-rwxr-xr-x 1 root root 29526888 Jul 10 16:23 kraken-origin

In the end, these layers are jointly mounted to the /var/lib/docker/overlay2/3695f349587aaa2cdc82fcde1a380c7b567ef870a47e4c28b8b279e4edc9eb40/merged directory, showing a complete file system and runtime environment for the container.

[root@p22295v zhangzhifei]# mount | grep 3695f349587aaa2cdc82fcde1a380c7b567ef870a47e4c28b8b279e4edc9eb40overlay on /var/lib/docker/overlay2/3695f349587aaa2cdc82fcde1a380c7b567ef870a47e4c28b8b279e4edc9eb40/merged type overlay (rw,relatime,lowerdir=/var/lib/docker/overlay2/l/Z7QMVXSKSNAKCUEJ6ZMU5YTFWG:/var/lib/docker/overlay2/l/2OYCXTK7M4QN3DT7IYJK6J7VYT:/var/lib/docker/overlay2/l/UZTDJDVUOBHU2VERRLXF5KMIQO:/var/lib/docker/overlay2/l/NAXXPRFMO4ATUIG6SFPU4LBUUV:/var/lib/docker/overlay2/l/AM4PHUFWOD4UHYIVO5Q6GVZ5L7:/var/lib/docker/overlay2/l/7XLJNT7Q3UQIKHDNV4QG4EX2C3:/var/lib/docker/overlay2/l/3RAVSDXXRS3BASAKZFPT2ESY2K:/var/lib/docker/overlay2/l/FFNAQF5ADFSTEBNZZ4O2R3CP4N:/var/lib/docker/overlay2/l/X6BOWOZKYRN3DZFY6QLLP7OFDP:/var/lib/docker/overlay2/l/P3EO3WHIM2XPDNPIFUP42EGMQI:/var/lib/docker/overlay2/l/EOSBLWDBASO7GKSDILC4XVGO45:/var/lib/docker/overlay2/l/7K7266OIDWAVXLAN6AA3SZXZQZ,upperdir=/var/lib/docker/overlay2/3695f349587aaa2cdc82fcde1a380c7b567ef870a47e4c28b8b279e4edc9eb40/diff,workdir=/var/lib/docker/overlay2/3695f349587aaa2cdc82fcde1a380c7b567ef870a47e4c28b8b279e4edc9eb40/work)[root@p22295v zhangzhifei]# ls  /var/lib/docker/overlay2/3695f349587aaa2cdc82fcde1a380c7b567ef870a47e4c28b8b279e4edc9eb40/mergedbin  boot  dev  etc  home  lib  lib64  media  mnt  opt  proc  root  run  sbin  srv  sys  tmp  usr  var

Mirror storage in the mirror warehouse

1

Mirror storage directory structure

Take local storage as an example. The default is /data/registry/docker/registry/v2, and any layer of mirror storage will not be repeated.

├── blobs│   └── sha256│       │   └── dfa94d685d1c2179324f02bf2a119f6d8ee0d380cef5506566012f7c4936a04a│       │       └── data│       ├── e6│       │   └── e6ae4ac760c8457aca9be07de8ca66b3a358a19b950389a0d158ae885178f6cf│       │       └── data│       ├── e7│       │   └── e71de1ca8f2b18993c258e2bf50edea8c23ea4a78a821bcfef181de50b3c32f4│       │       └── data└── repositories    ├── registry-share-private    │   ├── push-mount    │   │   ├── _layers    │   │   │   └── sha256    │   │   │       ├── 1b1ad4542c99b8881265610cf5dc09e37d38445529a7584edb2a607fd783216f    │   │   │       │   └── link    │   │   ├── _manifests    │   │   │   ├── revisions    │   │   │   │   └── sha256    │   │   │   │       └── 9e4cf4691735c02e59dd49ee561a3f5e56bccf78d57eaa94581e29f69a5162bd    │   │   │   │           └── link    │   │   │   └── tags    │   │   │       └── v1    │   │   │           ├── current    │   │   │           │   └── link    │   │   │           └── index    │   │   │               └── sha256    │   │   │                   └── 9e4cf4691735c02e59dd49ee561a3f5e56bccf78d57eaa94581e29f69a5162bd    │   │   │                       └── link    │   │   └── _uploads    │   ├── push-new    │   │   ├── _layers    │   │   │   └── sha256    │   │   │       ├── 1b1ad4542c99b8881265610cf5dc09e37d38445529a7584edb2a607fd783216f    │   │   │       │   └── link    │   │   ├── _manifests    │   │   │   ├── revisions    │   │   │   │   └── sha256    │   │   │   │       └── 9e4cf4691735c02e59dd49ee561a3f5e56bccf78d57eaa94581e29f69a5162bd    │   │   │   │           └── link    │   │   │   └── tags    │   │   │       └── v1    │   │   │           ├── current    │   │   │           │   └── link    │   │   │           └── index    │   │   │               └── sha256    │   │   │                   └── 9e4cf4691735c02e59dd49ee561a3f5e56bccf78d57eaa94581e29f69a5162bd    │   │   │                       └── link

1、blobs

The directory is a specific file that stores each layer of data (gzip) and a mirrored manifests information (json)

2、repositories

Store the organization information of the image, similar to metadata

  • Warehouse name

registry-share-private/push-mount is a warehouse name, registry-share-private is equivalent to the concept of a project, push-mount container name

  • _layers

The directory is similar to the blobs directory, but it does not store real data, but only saves the sha256 encoding of each layer as a link file. Save the sha256 encoding information of all layers passed by the repository for a long time

  • _manifests

The manifest information of all the uploaded versions (tags) of this repository. There are revisions directory and tags directory under its directory

  • tags

Each tag has a group of records (v1), each tag has a current directory and an index directory, the link file in the current directory saves the sha256 code of the tag’s current manifest file, and the index directory lists the tag’s historical upload Sha256 encoding information for all versions of

  • _revisions

The directory stores all the sha256 encoding information of the uploaded version in the history of the repository

  • _uploads

It is a temporary directory, once the image upload is completed, the files in this directory will be deleted

2

Upload image process

  • Certification

  • Go to authentication service to get token

  • Query whether there is a layer to be uploaded in the warehouse

  • Start uploading blob

Use patch to transfer large blocks and put small blocks. After uploading in blocks, a put request is also required to indicate the completion of the upload.

  • Upload mainfest

When all the blobs are uploaded, the file list needs to be uploaded.

note:

If a certain layer of the uploaded image already exists in the warehouse and has read permission. Docker will first obtain the token, and then carry the toke to mount, reduce the upload of repeated layers, and speed up the push

Mount information processing is actually in the production of the corresponding layer information in the _layers directory.


For the existing layer, but without permission, the client needs to upload again, but the final storage is still a copy. However, when the file system is moved, first determine whether the destination path exists, and if it exists, it will not be overwritten.


For the existing mirror HEAD request, the world returns 200, indicating that no upload is required.


related articles

  • https://docs.docker.com/registry/spec/api/ 

  • https://docs.docker.com/storage/storagedriver/overlayfs-driver/#how-the-overlay-driver-works

  • https://arkingc.github.io/2017/05/05/2017-05-05-docker-filesystem-overlay/

  • https://blog.csdn.net/u010278923/article/details/77941995 

  • https://github.com/uber/kraken  https://github.com/dragonflyoss/Dragonfly/


Guess you like

Origin blog.51cto.com/15127564/2666655