Containers - NameSpace and Cgroups

container

The container itself has no value, what is valuable is " container orchestration ".

The container is actually a sandbox technology, a technology that "packages" your application. Between applications, there is no mutual interference because of the boundary ; and applications packed into containers can also be easily moved around.

The core function of container technology is to create a "boundary" for it by constraining and modifying the dynamic performance of the process .

For most Linux containers such as Docker, Cgroups technology is the main means to create constraints, while Namespace technology is the main method to modify the process view.

Docker

When a Docker container actually creates a container process, it specifies a set of Namespace parameters that the process needs to enable. In this way, the container can only "see" the resources, files, devices, status, or configurations defined by the current Namespace. So, the container is actually a special process .

Unlike real virtual machines, when using Docker, there is no real "Docker container" running inside the host machine. What the Docker project helps users start is the original application process, but when creating these processes, Docker adds various Namespace parameters to them. At this time, these processes will feel that they are the No. 1 process in their respective PID Namespaces, and can only see the directories and files mounted in their respective Mount Namespaces, and can only access the network devices in their respective Network Namespaces, as if running Inside each "container", isolated from the world.

Namespace

Namespace technology actually modifies the "view" of the entire computer that the application process sees, that is, its "sight" is limited by the operating system, and it can only "see" certain specified content. But for the host, these "isolated" processes are not much different from other processes.

Compared with the virtualization technology, the isolation mechanism based on Linux Namespace also has many deficiencies, the most important of which is: the isolation is not complete.

Since the container is just a special process running on the host, the operating system kernel of the same host is still used between multiple containers. In the Linux kernel, many resources and objects cannot be namespaced, the most typical example is: time.

Although the No. 1 process in the container can only see the situation in the container under the interference of "blindfolding", on the host machine, as the No. 100 process, it still has an equal competitive relationship with all other processes. This means that although process No. 100 is apparently isolated, the resources it can use (such as CPU and memory) can be occupied by other processes (or other containers) on the host at any time . Of course, the No. 100 process itself may eat up all resources. These situations are obviously not reasonable behaviors that a "sandbox" should exhibit.

Cgroups

Linux Cgroups is an important feature in the Linux kernel to set resource limits for processes. The full name of Linux Cgroups is Linux Control Group. Its main function is to limit the upper limit of resources that a process group can use , including CPU, memory, disk, network bandwidth, and so on.

Containers and Processes

A running Docker container is actually an application process with multiple Linux Namespaces enabled, and the amount of resources that this process can use is limited by the Cgroups configuration. This is also a very important concept in container technology, namely: container is a "single process" model.

Since the essence of a container is a process, the user's application process is actually the process with PID=1 in the container, and it is also the parent process of all other subsequent created processes. If process 1 hangs, the container also hangs. This means that in a container, you cannot run two different applications that are the same as process 1 at the same time , unless you can find a program with a common PID=1 in advance to act as the parent process of two different applications , which is why many people use software such as systemd or supervisord to replace the application itself as the startup process of the container.

This is because the design of the container itself is to hope that the container and the application can have the same life cycle.

The /proc directory under Linux stores a series of special files that record the current operating status of the kernel. Users can view information about the system and currently running processes, such as CPU usage, memory usage, etc., by accessing these files. It is also the main data source for the top command to view system information.

If you execute the top command in the container, you will find that the information it displays is actually the CPU and memory data of the host, not the data of the current container. The reason for this problem is that the /proc filesystem does not understand the existence of the Cgroups restriction . In the production environment, this problem must be corrected, otherwise the information such as the number of CPU cores and available memory read by the application in the container is the data on the host machine, which will bring great confusion and confusion to the running of the application. risk. This is also a common problem encountered by containerized applications in enterprises, and it is also another unsatisfactory place for containers compared to virtual machines.

Geekbang Time Copyright: https://time.geekbang.org/column/article/14653

Guess you like

Origin blog.csdn.net/xue_xiaofei/article/details/126356744