Harbor: An open source implementation for replicating Docker images across data centers

VMware open-sourced the enterprise-level Registry project Harbor in March, which was developed by the VMware China R&D team. Harbor can help users quickly build enterprise-level registry services. It provides management graphical interface, role-based access control RBAC, mirror remote replication (synchronization), AD/LDAP integration, and audit logs and other functions required by enterprise users. Since the project was launched more than 4 months ago, it has received more than 900 like stars and more than 200 forks on GitHub. The Github address:

https://github.com/vmware/harbor

In the recently released version, Harbor has added a policy-based Docker image replication function, which can synchronize images between different data centers and different operating environments, and provides a friendly management interface, which greatly simplifies the actual operation and maintenance. For mirror management, there have been cases where users have deployed remote mirror bidirectional replication. This article will introduce the realization principle of this function in detail.

Harbor image replication management interface

Function introduction

In terms of functional design, Harbor is still centered on "projects". By configuring "replication strategies" for projects, it indicates the projects and mirrors that need to be replicated. The administrator specifies the target instance in the replication policy, that is, the "destination" of replication, and sets its address and the username and password used for connection. When the replication strategy is activated, all images under the source project will be copied to the target instance; in addition, when the mirrors under the source project are added or deleted (push or delete), as long as the strategy is still active, the changes in the mirror will be Synchronize to the target instance, as shown in the following figure:

 

In larger container clusters, multiple Registry servers are often required for load balancing, and the master-slave publishing mode can be adopted. The image only needs to be published once, and then it can be pushed to multiple Registry instances. It also supports dual-master replication and hierarchical multi-level mirror publishing, as shown in the following figure: 

Design and Implementation

Copying images between different Registry instances is a very common requirement. In the past, the common practice was to copy the image data, such as periodically synchronizing the mirrored data in the file system through rsync, or, for deployment on IaaS services, by pairing The IaaS storage service layer is configured to implement object replication. These methods often use different tools according to the storage used by the registry. However, for Harbor, we hope to reduce this dependency and improve flexibility. For example, a user may have a development registry that uses the file system as storage, and wants to synchronize the image to the registry for remote publishing based on S3 storage. . Considering this situation, we choose to download and transfer the image by calling the API of the registry itself, so as to be independent of the underlying storage.

In terms of control, we have introduced a new component, Job Service, to manage the mirror replication task. When copying is performed in units of projects, a series of tasks (jobs) will be generated in units of mirrors, which are scheduled and managed by the Job Service. The Job Service updates the status of each task to the database during the execution of the task, so that the user can pass the UI. Check. The general structure is shown in the following figure:

Let's introduce the implementation of Job Service. From the outside, it also receives requests, schedules and executes tasks through the REST API. There are two main problems. First, when a large number of replication requests are received, the current needs to be limited to avoid consuming too many IO resources; Second, the replication strategy may change during task execution, such as failure, which requires a mechanism to intervene in the running task from the outside world.

We implement the producer-consumer model through task queue, dispatcher (dispatcher) and worker pool. Using the channel built in Go language, each task will be put into the channel through the scheduler, and the dispatcher will obtain the task through the channel. At the same time, the worker is working After the end, it will be put into another channel, and the dispatcher is paired with the worker through this channel. Therefore, the idle worker obtains the task id through the dispatcher and executes the task, which can easily control the number of concurrency through the number of workers in the worker pool:

 

For another problem, each worker is an abstract state machine (state machine), which completes specific work by registering handlers for different states. At the same time, the state machine can be intervened, and tasks can be cancelled midway. Or when an exception occurs in the task execution, the task is placed in the error state and discarded or handed over to the scheduler to retry. In addition, since the state of the state machine is customizable, it is easy to expand and adjust. For an abstract task, its state transition is shown in the following figure: 

For specific remote synchronization mirroring tasks, the Running state will be further subdivided into multiple sub-states, as shown in the following figure:

First, download the manifest of the corresponding tag from the source Harbor instance, analyze the blobs it contains, and check whether it already exists in the target instance for each blob. If not, synchronize the blob. Finally, check if the manifest already exists in the target instance, if not, upload the manifest. Checking the existence of blob can effectively reduce unnecessary network traffic; and because the upload of manifest may trigger mirror synchronization, checking the existence of manifest can avoid entering when multiple synchronized Harbors form a loop. An infinite loop state that is constantly synchronized. Repeat the above process for each tag in the same image to complete the synchronization of the entire image.

Summary and Outlook

This paper introduces the design and implementation of the remote mirror replication function in the new version of Harbor. In the future, we will expand this function, such as adding richer control and filter conditions to the policy to facilitate users to select the mirrors to be copied, and to control the time when the copy occurs. We also hope that readers and users will provide us with more feedback. Harbor project website:

https://github.com/vmware/harbor

 

https://my.oschina.net/vmwareharbor/blog/728085

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326746935&siteId=291194637