DockOne Technology Sharing (28): Interpretation of OCI Standards and RunC Principles

With the development of the Internet and container technology in the past two years, almost all major IT vendors and cloud service providers have begun to adopt solutions based on container technology, and container-related organizations have also mushroomed. Therefore, in order to ensure the portability of containers, the establishment of container formats and runtime standards is particularly important.

Therefore, the Linux Foundation established the OCI (Open Container Initiative) organization in June 2015, aiming to develop an open industrial standard around container formats and runtimes. As soon as the organization was established, it was supported by a series of cloud computing vendors including Google, Microsoft, Amazon, and Huawei.

1. What is the container format standard?

In general, the purpose of formulating container format standards is not to be bound by the upper structure, such as a specific client, orchestration stack, etc., and also not bound by a specific supplier or project, that is, not limited to a specific operating system , hardware, CPU architecture, public cloud, etc.

The standard is currently maintained and developed by the libcontainer and appc project managers (maintainers), and its specification document is maintained as a project on GitHub .

1.1 Purpose of container standardization

The purpose of standardized containers is specifically divided into the following five.

  1. Standardized operations: Standardized operations for containers include creating, starting, and stopping containers using standard container tools, copying and creating container snapshots using standard file system tools, and downloading and uploading using standard network tools.
  2. Content-independent: Content-independent means that no matter what the specific container content is, the container standard operations can produce the same effect after execution. For example, a container can be uploaded and started in the same way, whether it is a PHP application or a MySQL database service.
  3. Infrastructure agnostic: Whether it's a personal laptop or AWS S3, or OpenStack, or any other infrastructure, all operations that support containers should be supported.
  4. Tailor-made for automation: One of the fundamental purposes of content-independent and platform-independent operations is to formulate a unified standard for containers, which is to automate the entire platform of container operations.
  5. Industrial-grade delivery: One of the goals of developing a container standard is to make software distribution a reality with industrial-grade delivery.

 

1.2 Container standard package (bundle) and configuration

A standard container package should contain at least three parts:

config.json : Basic configuration file, including specific information that is independent of the host and related to the application, such as security permissions, environment variables and parameters. details as follows:

  1. Container format version
  2. rootfs path and whether it is read-only
  3. Various file mount points and corresponding mount directories in the container (this configuration information must be consistent with the runtime.json configuration)
  4. Initial process configuration information, including whether to bind the terminal, the working directory for running the executable file, environment variable configuration, executable file and execution parameters, uid, gid, and additional gid, hostname, low-level operating system and CPU architecture information that need to be added.


runtime.json : A runtime configuration file that contains runtime host-related information such as memory limits, local device access, mount points, etc. In addition to the above configuration information, the runtime configuration file also provides the feature of "hooks", which can execute some custom scripts before the container runs and after it is stopped. The configuration of hooks includes the execution script path, parameters, environment variables, etc.

rootfs/ : The root file system directory, which contains the necessary environment dependencies for container execution, such as /bin, /var, /lib, /dev, /usr and other directories and corresponding files. The rootfs directory must exist at the top level of the container directory at the same time as the config.json file containing configuration information.

1.3 Container Runtime and Lifecycle

The container standard format also requires the container to persist its own runtime state to disk, so that other external tools can use and deduce this information. This runtime state is stored encoded in JSON format. It is recommended to store the JSON file of the runtime state in a temporary file system so that it will be automatically removed after a system restart.

For operating systems based on the Linux kernel, this information should be uniformly stored in the /run/opencontainer/containers directory, in a folder named after the container ID under this directory structure (/run/opencontainer/containers/<containerID>/state.json) Store the status information of the container and update it in real time. With such a default container state information storage location, external applications can easily find all running containers on the system.

The specific information contained in the state.json file needs to be:

  • Version information: Stores the specific version number of the OCI standard.
  • Container ID: Usually a hash value, or a human-readable string. The container ID is added to the state.json file so that the runtime hooks mentioned above can locate the container by simply loading the state.json, and then detect the state.json. If the file is missing, the container is considered to be shut down, and then execute The corresponding predefined script actions.
  • PID: The process ID of the first process running in the container on the host.
  • Container file directory: The directory where the container rootfs and corresponding configurations are stored. External programs can locate the container file directory on the host by simply reading state.json. The standard container lifecycle should consist of three basic processes.
  • Container creation: Create various contents including file systems, namespaces, cgroups, and user permissions.
  • Start the container process: run the container process, the executable file of the process is defined in the config.json, the args item.
  • Container suspension: The container can actually be shut down (kill) by an external program as a process, and then the container standard specification should include the capture of the container suspension signal and the corresponding resource recovery processing to avoid the appearance of orphaned processes.

 

1.4 Specific implementation based on the Open Container Format (OCF) standard

From the above points, the format requirements of the Open Container Specification are very loose. It does not limit the specific implementation technology nor the corresponding framework. There are already specific implementations based on OCF, and I believe there will be more and more in the near future. item appears.

The container runtime opencontainers/runc, the RunC project mentioned in this article, is the reference standard for latecomers.

The virtual machine runtime hyperhq/runv is an implementation of the open container specification based on Hypervisor technology.

Test the huawei-openlab/oct test framework based on the open container specification.

2. The working principle and implementation of runC

2.1 The transition of runC from libcontainer

The predecessor of runC is actually the evolution of Docker's libcontainer project. runC is actually libcontainer with a light client.

Essentially, a container provides an execution environment that shares the kernel with the host system but is isolated from other process resources in the system. Docker "isolates" an above-mentioned execution environment by calling the libcontainer package to manage and allocate namespaces, cgroups, capabilities, and file systems. Similarly, runC also calls the libcontainer package, removes advanced features such as mirroring and Volume included in Docker, and achieves container management implementation that conforms to the OCF standard in the most simple and concise way.

In general, since the libcontainer project has been transformed into the runC project, its functions and features have not changed much. The specific points are as follows.

  1. Remove the original nsinit, put it outside, change the command name to runC, and use cli.go to implement it at a glance.
  2. According to the open container standard, a configuration file with all the information mixed together is split into two: config.json and runtime.json.
  3. Added the hook script function that is executed before and after the container is stopped according to the open container standard.
  4. Compared with the original nsinit command, the runc kill command is added to send a SIG_KILL signal to the init process with the specified container ID.


In general, the features that runC wants to include are:

  1. All Linux namespaces are supported, including user namespaces. Currently user namespaces are not included.
  2. Supports all security-related features native to Linux systems, including Selinux, Apparmor, seccomp, cgroups, capability drop, pivot_root, uid/gid dropping, and more. The support for the above functions has been completed.
  3. Support container hot migration, realized through CRIU technology. At present, the function has been implemented, but there will be problems when using it.
  4. It supports containers running on the Windows 10 platform and is being developed by Microsoft engineers. Currently only supports the Linux platform.
  5. Supports Arm, Power, Sparc hardware architectures and will be supported by Arm, Intel, Qualcomm, IBM and the entire ecosystem of hardware manufacturers.
  6. Plans to support cutting edge hardware features such as DPDK, sr-iov, tpm, secure enclave, and more.
  7. High-performance adaptation optimization in production environment, contributed by Google engineers based on their experience in container deployment in production environment.
  8. As a formal, true and comprehensive and specific standard exists!

 

2.2 How does runC start the container?

From the open container standard, we have defined two configuration files and a dependency package about the container, and runC starts a container through these. First, let's follow the official steps.

RunC needs rootfs when running. The easiest way is that you have installed Docker locally, docker pull busyboxdownload a basic image, and then
docker export $(docker create busybox) > busybox.tarexport the rootfs file of the container image to compress the package and name it busybox.tar. Then unzip it into the rootfs directory, mkdir rootfstar -C rootfs -xf busybox.tar,
then we have the OCF standard rootfs directory. It should be noted that we use Docker only for the convenience of obtaining the rootfs directory, and the operation of runc itself does not depend on Docker.

Next, you also need config.jsonand runtime.json, use runc speccan generate a standard config.jsonand runtime.jsonconfiguration file, of course, you can also write your own according to the format.

If you have not installed runC, you need to install it according to the following steps. Currently runC only supports Linux platform.

# create a 'github.com/opencontainers' in your GOPATH/srccd github.com/opencontainersgit clone https://github.com/opencontainers/runccd runcmakesudo make install


At the end of the execution runc startyou start a container.

2.3 The operating principle of runC start

As mentioned above, runC is a thin layer of Cli wrapped around libcontainer. Among them, Cli is a development package implemented for the rapid development of command-line applications in Go language. It can handle things such as subcommand definition, flag definition and setting help information for you. And Cli is also an open source project hosted on Git, the address is: github.com/codegangsta/cli.
From the perspective of source code, analyze the execution process of runC start, the whole analysis process is as follows:

Picture3.jpg

 

2.3.1. Everything starts from the main() function

The whole program first executes the main() function in main.go. In this function, the program specifies the various subcommands, parameters, version numbers and help information of runC through the cli package. Then the program will call the corresponding processing function through the subcommand input by the user, here the startContainer() function in start.go is called.

2.3.2. Create logical container Container and logical process process

The so-called logical container container and logical process process are not really running containers and processes, but structures defined in libcontainer. The logical container container contains various configuration information such as namespace, cgroups, device, and mountpoint. The logical process process contains the instructions to be run in the container, its parameters and environment variables.

For runC, only one container definition is required, and different containers only have different instance contents (attributes and parameters). For libcontainer, since it needs to deal with the bottom layer, it is necessary to create completely heterogeneous "logical container objects" (such as Linux containers and Windows containers) on different platforms, which explains why the "factory pattern" is used here. ”: In the future, libcontainer can support the implementation of various types of containers on more platforms without changing the calling interface.

The following explains the creation process of the logical container Container and the logical process process.

In the startContainer() function, the program first loads *.json into the structure config that can be used by libcontainer. Then call with config as parameter. libcontainer.New() generates the factory used to generate containers. Then call factory.Create(config) to generate a logical container container that contains config. Next, call newProcess(config) to fill the process structure with the relevant information about the command to be run in the container in config, which is the logical process process. Use container.Start(process) to start the logical container.

2.3.3. Start the logical container container

runC will call Start(), and the Start() function is located in libcontainer/container_linux.go. The main job is to call newParentProcess() to generate a parentprocess instance (structure) and a pipe for communication between runC and the init process in the container.

In the parentprocess instance, in addition to recording the pipelines and various basic configurations that will communicate with the process in the container in the future, another extremely important field is cmd.
The cmd field is a structure defined in the os/exec package. The os/exec package is mainly used to create a new process and execute the specified command in this process. Developers can import the os/exec package in the project, and then fill in the cmd structure, that is, the path and program name of the required running program, the required parameters of the program, environment variables, various operating system-specific attributes and extended files. descriptors, etc.

In runC, the program fills the application path field Path of cmd as /proc/self/exe (that is, the application itself, runC). The parameter field Args is filled with init, indicating that the container is initialized. The SysProcAttr field is filled with various attributes such as namespace that runC needs to enable.

Then call parentprocess.cmd.Start() to start the init process in the physical container. Next, add the process ID of the init process in the physical container to the Cgroup control group to implement resource control over the processes in the container. The configuration parameters are then piped to the init process. Finally, wait for the init process to complete all initialization work according to the above configuration through the pipeline, or exit with an error.

2.3.4. Configuration and creation of physical containers

The init process in the container will first call the StartInitialization() function to receive various configuration parameters from the parent process through the pipeline. Then configure the container as follows:

  1. If specified by the user, the init process will be added to its specified namespace.
  2. Set the session ID of the process.
  3. Initialize network devices.
  4. Mount the file system in the specified directory, and switch the root directory to the newly mounted file system. Set hostname and load profile information.
  5. Finally, use the exec system call to execute the user-specified program running in the container.

 

3. Introduction to the configuration and principle of hot migration

3.1 Introduction to Hot Migration

The so-called live migration is to perform a Checkpoint operation on a container and obtain a series of files, which can be used to restore the container on the local machine or other hosts. Currently, CRIU is used in runC as a tool for live migration, and the Checkpoint and Restore functions for containers are implemented. The brief process is shown in the figure below.

Picture4.jpg

 

3.2 Introduction to the principle of runC hot migration

The work of live migration in runC is mainly done by calling CRIU (Checkpoint and Restore in Userspace). CIRU is responsible for freezing the process and will be stored on the hard disk as a series of files. and is responsible for restoring the frozen process using these files.

runC uses SWRK mode to call criu. This mode is a combination of criu's other two modes, CLI and RPC, allowing users to run criu as a command-line tool when needed, and to accept user requests for remote calls.

runC mainly completes the hot migration through the following two steps.

  1. To generate a container, generate the container structure through state.json or configuration file *.json.
  2. Use SWRK mode to call CRIU, runC first collects and organizes the relevant information of the container to perform Checkpoint or Restore operation, and fills in the structure to be sent to the CRIU in SWRK mode. The main contents of the structure are as follows:
    req: = & criurpc.CriuReq {
    Type: &t,     //C or R
    Opts: &rpcOpts, //criu related parameters 
    }  
    

    The field t specifies whether the request is to perform a Checkpoint operation or a Restore operation, and the field rpcOpts contains various user-specified options and parameters required for CRIU operation.


Then create a communication pipeline between runC (criuClient) and CIRU (criuServer) through syscall.Socketpair(). Then use the os/exec package in the go language to start criu in SWRK mode. Then send the request to criuServer through criuClient. Finally, you can receive the execution result through criuClient.

3.3 Configuration and use of runC live migration in the current version

Since the current version of CRIU is not very complete and cannot fully support a small number of features in runC, some modifications to the configuration files are required during live migration. The specific changes and reasons are as follows:

  • Because CRIU does not support seccomp, you need to empty the relevant content about seccomp in the config.json file.
  • Because CRIU does not support external terminals, you need to set the value of terminal in the config.json file to false.
  • Because CRIU requires the file system mounted by runC to be readable, set the read-write capability of the file system in the config.json file to readable.


Part of the configuration is shown in the figure below.

Picture5.jpg


After installing CRIU and its related dependencies correctly and making the above modifications to config.json, you can use the built-in command of runC to perform live migration of the container.

 

http://dockone.io/article/776

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326562556&siteId=291194637