Registry container image server details

introduction

Usually when we use clusters or containers, we will come into contact with images stored locally, and we also have a certain understanding of local image storage. But what about the mirror storage details of the server? This article mainly introduces the server-side storage structure of container mirroring. Students who are interested in self-built mirroring services or those who are interested in the underlying principles or optimization of container mirroring can learn about it.

Related open source projects

There are currently two main open source projects related to container mirroring services.

Registry has basic image uploading, downloading, and third-party authentication capabilities. Harbor has made a corresponding enterprise-level expansion project based on the Registry. Provides more permissions, auditing, mirroring and other functions, and is currently one of CNCF incubation projects. For other details, refer to related articles. This article mainly explains the storage details of the Registry project.

Mirror details

Before understanding the server, let's take a look at the storage environment of the client's image container.

Union File System UnionFS (Union File System)

The implementation of Docker's storage driver is based on UnionFS. Briefly enumerate some of the characteristics of storage mirroring under UnionFS.

First of all, UnionFS is a hierarchical file system. A Docker image may consist of multiple layers (note that they are in order).

Second, only the top layer is writable, and the other layers are read-only. The advantage of this mechanism is that the mirror layer can be shared by multiple mirrors. For Docker images, all layers are read-only. When an image is running, a container layer is added to the image. Ten identical images are started, only adding ten container layers. When the container is destroyed, only one container layer is destroyed.

  • UnionFS is a hierarchical file system. A Docker image may consist of multiple layers (note that they are in order).

  • Only the top layer is writable, the other layers are read-only. The advantage of this mechanism is that the mirror layer can be shared by multiple mirrors. For Docker images, all layers are read-only. When an image is running, a container layer is added to the image. Ten identical images are started, only adding ten container layers. When the container is destroyed, only one container layer is destroyed.

    • When the container needs to read a file : start searching from the uppermost image, look down, read the file after finding the file and put it into the memory, if it is already in the memory, use it directly. (That is, docker containers running on the same machine share the same files at runtime).
    • When the container needs to add files : add files directly to the writable layer of the uppermost container layer without affecting the mirroring layer.
    • When the container needs to modify the file : look for the file from top to bottom, and copy it to the writable layer of the container after it is found. Then, for the container, you can see the file in the container layer, but you can’t see the file in the mirror layer. file. The container modifies this file at the container level.
    • When the container needs to delete a file : look for the file from the top down, and record the deletion in the container when found. That is, the file is not actually deleted, but soft deleted. This will cause the mirror volume to only increase, not to decrease.

From this, you can think about many security and mirror optimization issues.

  • Is it safe to record sensitive information during the image build and then delete it in the next build instruction? (Unsafe)
  • Can installing the software package in the image build and then cleaning the software package in the next build command reduce the image size? (Does not)

UnionFS generally has two implementation schemes: 1. Based on files. Overwrite the entire file. 2. Based on the block implementation, the modification of the file only modifies a few blocks.

Mirrored server storage details

Provide a mirror meta information (manifest) for reference:

➜  ~ docker pull ccr.ccs.tencentyun.com/paas/service-controller:7b1c981c7b1c981c: Pulling from paas/service-controllerDigest: sha256:e8b84ce6c245f04e6e453532d676f7c7f0a94b3122f93a89a58f9ae49939e419Status: Image is up to date for ccr.ccs.tencentyun.com/paas/service-controller:7b1c981cccr.ccs.tencentyun.com/paas/service-controller:7b1c981c
{   "schemaVersion": 2,   "mediaType": "application/vnd.docker.distribution.manifest.v2+json",   "config": {      "mediaType": "application/vnd.docker.container.image.v1+json",      "size": 4671,      "digest": "sha256:785f4150a5d9f62562f462fa2d8b8764df4215f0f2e3a3716c867aa31887f827"   },   "layers": [      {         "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",         "size": 44144090,         "digest": "sha256:e80174c8b43b97abb6bf8901cc5dade4897f16eb53b12674bef1eae6ae847451"      },      {         "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",         "size": 529,         "digest": "sha256:d1072db285cc5eb2f3415891381631501b3ad9b1a10da20ca2e932d7d8799988"      },      {         "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",         "size": 849,         "digest": "sha256:858453671e6769806e0374869acce1d9e5d97f5020f86139e0862c7ada6da621"      },      {         "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",         "size": 170,         "digest": "sha256:3d07b1124f982f6c5da7f1b85a0a12f9574d6ce7e8a84160cda939e5b3a1faad"      },      {         "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",         "size": 8461461,         "digest": "sha256:994dade28a14b2eac1450db7fa2ba53998164ed271b1e4b0503b1f89de44380c"      },      {         "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",         "size": 22178452,         "digest": "sha256:60a5bd5c14d0f37da92d2a5e94d6bbfc1e2a942d675aee24f055ced76e8a208f"      },      {         "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",         "size": 22178452,         "digest": "sha256:60a5bd5c14d0f37da92d2a5e94d6bbfc1e2a942d675aee24f055ced76e8a208f"      }   ]}

Registry container image server details

Next is the most important content of this article. Through the understanding of the above picture, we can understand the details of Registry server storage. \

  • The blue one in the picture is the directory stored on the server. The text is the name of the directory, and this name is fixed.

  • The purple ones in the picture are the files stored on the server. The text is the file name, and linkthe content of the file is a sha256 hash value. dataThe file stores the real metafile and mirror layer.

  • The orange in the figure is the dynamic directory of the server. The name of the directory is related to the warehouse name, mirror label or sha256.

The whole picture is from top to bottom. For example, if the manifest we described above is stored on the server (file hash: sha256:e8b84ce6c245f04e6e453532d676f7c7f0a94b3122f93a89a58f9ae49939e419). It stores the path should be: /docker/registry/v2/blobs/sha256/e8/e8b84ce6c245f04e6e453532d676f7c7f0a94b3122f93a89a58f9ae49939e419/data. The corresponding figure should be all the way down along the left side.

We began to disassemble and analyze its structural details.

  • On the left is the actual storage of all contents of the mirror , which occupies almost most of the storage space, including the mirror layer and the mirror meta information Manifest.

    • For example sha256:e80174c8b43b97abb6bf8901cc5dade4897f16eb53b12674bef1eae6ae847451, the storage location of the mirror layer should be/docker/registry/v2/blobs/sha256/e8/e80174c8b43b97abb6bf8901cc5dade4897f16eb53b12674bef1eae6ae847451/data

Registry container image server details

  • On the right is where the mirror meta information is stored . Mirror meta information is stored in two-level directories according to the namespace and warehouse name.

    • Each warehouse is divided into the following _layers, _manifeststwo parts

    • _layersResponsible for recording which mirror layer files are referenced by the warehouse.

    • _manifestsResponsible for recording the meta information of the mirror

      • revisionsContains the image meta information of all versions that have been uploaded under the warehouse
    • tagsContains all tags in the warehouse

      • currentRecord the mirror to which the current label points
      • indexThe directory records the historical mirror that the tag points to.
    • Calculating sha256 for the manifest provided above will get the hash value of the meta-information file. sha256:e8b84ce6c245f04e6e453532d676f7c7f0a94b3122f93a89a58f9ae49939e419The storage location of this meta-information should be in/docker/registry/v2/blobs/sha256/e8/e8b84ce6c245f04e6e453532d676f7c7f0a94b3122f93a89a58f9ae49939e419/data

Give an example of mirror download:

We want to know how ccr.ccs.tencentyun.com/paas/service-controller:7b1c981cthe current meta-information of this image can be found in the server-side storage.

  1. Find the /docker/registry/v2/paas/service-controller/_manifests/tags/7b1c981c/current/linkfile. There are sha256 information of meta information. Content should besha256:e8b84ce6c245f04e6e453532d676f7c7f0a94b3122f93a89a58f9ae49939e419
  2. Find the actual storage file ( /docker/registry/v2/blobs/sha256/e8/e8b84ce6c245f04e6e453532d676f7c7f0a94b3122f93a89a58f9ae49939e419/data). The json content of the file is given in the previous article.
  3. According to the source file information, the client downloads the corresponding files in turn. (Refer to the reference document for the authentication process)
  • ImageConfig

    sha256: 785f4150a5d9f62562f462fa2d8b8764df4215f0f2e3a3716c867aa31887f827

  • ImageLayer

    sha256: e80174c8b43b97abb6bf8901cc5dade4897f16eb53b12674bef1eae6ae847451 sha256: d1072db285cc5eb2f3415891381631501b3ad9b1a10da20ca2e932d7d8799988 sha256: 858453671e6769806e0374869acce1d9e5d97f5020f86139e0862c7ada6da621 sha256: 3d07b1124f982f6c5da7f1b85a0a12f9574d6ce7e8a84160cda939e5b3a1faad sha256: 994dade28a14b2eac1450db7fa2ba53998164ed271b1e4b0503b1f89de44380c sha256: 60a5bd5c14d0f37da92d2a5e94d6bbfc1e2a942d675aee24f055ced76e8a208f

Tips:

  1. Obviously, there will only be one copy of the same image layer file in storage. Using the same basic image will save a lot of storage costs.
  2. If you want to calculate the hash value of the above-mentioned meta-information file, please ensure that there is no EOL at the end of the file you copy. [noeol]

Several problems based on storage

How to optimize image construction?

According to the characteristics of UnionFS, targeted optimization:

  1. When building, a build instruction will generate a mirror layer, try to avoid junk files in the mirror layer, for example, delete the software package after installing the software.
  2. Deleting sensitive resources does not make the content disappear, and avoids security problems caused by sensitive content. For example, compiling the mirror and deleting the code at the end is an invalid deception.
  3. Through multi-stage construction, intermediate products and dependencies in the compilation environment are reduced.

Do I need to re-upload after uploading to the server mirror and then uploading to other warehouses?

Yes, in the design of the Registry, the warehouse is the smallest unit of authority, and users are managed and isolated according to the warehouse. Considering that if this piece of design is ignored here and the mirror layer exists to avoid repeated uploads, the client can obtain the mirror images of other users without authority by constructing false mirror meta information. _layersThis is the purpose of recording all the mirror layers that the warehouse has permission to obtain.

How to optimize in the mirror copy scenario?

The difference between the copy mirror scene and the upload scene is that the source mirror is actually owned by the user. Here, we can optimize the replication by thinking about the above problems. When the mirror layer already exists at the destination address, directly mark the warehouse as having the layer to avoid unnecessary uploads.

Mirror historical version

According to the characteristics of the storage structure, this question can be answered relatively easily. In theory, as long as Registry GC is not performed, and the warehouse meta-information is not deleted, the mirror image of the historical version of the warehouse will always be kept in the warehouse.

How to get the meta information of the mirror?

I will not explain it in detail here, and interested students can refer to the following documents:

Cloud service storage docking

Registry as an open source software, adapting to various cloud storage products is a standard feature. Registry provides a standard storage driver interface, as long as this set of interfaces can be adapted and run.

related articles

Guess you like

Origin blog.51cto.com/14120339/2541632