Use stories to tell the boss the core principles of Docker and successfully pretend
What is Docker?
"Docker is developed and implemented using the Go language launched by Google. Based on technologies such as Cgroup (resource control), Namespace (resource isolation) and OverlayFS (data storage) in the operating system kernel, it realizes virtualization technology based on the operating system level."
Those who understand have already understood the core essence of this sentence, and those who don't understand give him such an explanation is still in the fog. Then let's not rush to understand Docker
what it is. When it comes to Docker
containers, we have to talk about virtual machines ( Virtual Machine
). Docker
What is the difference between containers and virtual machines?
Docker vs virtual machine
The virtual machine is an all-too-familiar concept to our developers. For example, we often use VMware Workstation
virtual operating systems to deploy applications, use JVM
virtual machines to run Java
applications, etc., as shown in the figure below, "Usually use a virtual machine manager as an intermediate conversion layer to shield the underlying operations. System or hardware device differences" , such as the upper-layer virtual machine operating system ( Guest OS
) to execute programs or Java
run programs, etc. "This middleware conversion layer is like a translator, interpreting and translating the instructions executed by the upper layer into corresponding instructions of the lower-layer operating system for execution " .
Just as the "compile once, run everywhere" boasted in the Java world, "The virtual machine essentially shields the underlying differences through the middleware conversion layer, simulates a new environment, realizes platform independence, and achieves the purpose of isolation from the outside world. This is the virtual machine. The core idea of machine virtualization" .
It can be seen from the implementation of the virtual machine architecture that there is a big problem: all instructions must be translated and interpreted by the virtual machine manager, the intermediate conversion layer, before they can run on the real operating system, which means that the virtual machine has performance loss. . In addition, in order to simulate an Linux
application running in an environment, you need to use VMware to run and deploy a host ( Guest OS
), and then run the application on the host. The host itself occupies several G of storage space, 400-500MB+ memory space, and now microservices The architecture always means that 10+ or 100+ application components need to be deployed, so these components need to be deployed in isolation and using virtual machines is undoubtedly fatal.
The performance problems and resource waste of the virtual machine mentioned above have caused the virtual machine to be a bit powerless to isolate the fine-grained environment, and this is different from the current popular micro-service architecture scenario, where the system is split into dozens or hundreds of micro-services There are conflicts where application components need to be deployed independently. Docker
What is advocated is a lightweight container structure, that is, one container per application. So, Docker was pushed to the top as soon as it came out, so how does it solve the problem of virtual machine isolation?
Docker container core technology
Docker
The process in the container runs directly on the underlying operating system without an intermediate conversion layer, so there is no problem of performance loss. The key is how does it achieve isolation?
"Here comes Docker
the two core technologies that support containers: Namespace and Cgroups (Control Groups) . " Namespace is mainly used for "resource isolation" . For those computing resources, such as CPU
memory, disk IO and other resources that cannot be isolated, it is necessary to use Cgroups
" resource limitation" to prevent some containers that consume large resources. Occupy the hardware resources ( CPU
, Memory
disk IO, etc.) of the entire physical machine , thereby affecting the performance of other processes.
Namespace
And Cgroups
these two technologies are Linux
functions supported by the kernel itself. Docker
If only these two technologies are used, it is impossible to create the peak popularity of the debut. The Docker
innovation point is precisely the introduction of the concept of mirroring and the use of the joint file system ( UnionFS
) technology. Implemented image layering, so that application deployment media, dependent environment configuration files, and operating system binary files can be layered and superimposed to build an application runtime file system environment.
The image contains a base image ( Base Image
), which generally contains operating system media, such as centos
, debian
, but it only includes the operating system binary files used, and does not include kernel-related, so its volume is much smaller than the space occupied by deploying the entire operating system , such as a centos
base image probably only 70-80MB
. In addition, the image layering design further reduces storage usage. For example, now 100+ application components are centos
deployed based on the basic image. In actual deployment, only one centos basic image needs to be pulled, just like building blocks, and the files used in each layer Combine and superimpose, and finally build a complete directory structure when the program is running.
Vernacular Core Technology Relations
" Docker
Behind the popularity of container technology is actually the combination of Namespace
, Cgroups
and UnionFS
the three major technological innovations, creating Docker
this phenomenal product . " Let's use a more vivid metaphor to help you understand the three major technical relationships:
1. When the normal program is started, it runs directly on the operating system. Docker
When using the startup program, it also runs directly on the operating system, but Docker
the engine will wrap the program with a cube shell when starting the program (see the figure below);
2. The front, back, left, and right sides of the cube shell Namespace
are built using resource isolation technology, which Docker
isolates the process in the container from other processes and creates an illusion that the process in the container is running in an independent environment (see the figure below);
3. The upper surface of the cube shell Cgroups
is built using resource limitation technology to prevent the program from growing up and occupying the resources of other processes, thereby affecting the performance of other processes. In this way, a magic spell is added to the cover. It will restrict you to death (see the picture below);
4. Finally, let’s take a look at the bottom surface of the cube shell, which is built with UnionFS
technology to build the foundation of the file system when the process in the container is running. The operating system binary instructions, dependency configuration files, program media, etc. are layered and superimposed through the image to build the entire file system environment seen when the program is running; for example, the host machine is, but the basic image is, and the process in the Debian系统
container CentOS环境
sees CentOS系统
the No Debian系统
, at the same time yum install
, the installed dependent media is also packaged through the image, and the process in the container does not need to pay attention to whether the dependent media is installed on the host, etc., so that the process in the container sees a complete media with the program running, and An independent operating system isolated from the host operating system (see the figure below);
5. Therefore, the program runs in the cubic shell created by the three core technologies, and the blinded eyes are fooled into thinking that it is running in an independent computer environment, and cannot see the running status of external programs, nor can it affect the running of external programs.
How to check the PID of the Docker process on the host machine?
Docker
The process in the container runs directly on the host machine. You can docker inspect container
view Docker
the corresponding PID information of the process in the container on the host machine (see the figure below):
ps -ef
View the container process information on the host :
Because a container is running here nginx
, the corresponding nginx
main process is seen on the host machine, and the process creates two nginx worker
child processes at the same time.
Docker container flaws
"High performance and lightness are the biggest advantages of containers over virtual machines. Containers are essentially a special process."
However, there are advantages and disadvantages. Based on Namespace
resource isolation and Cgroups
resource restrictions are not so thorough, because the bottom layer between containers still share the use of the host's Linux
kernel, although you can use different versions of operating system files in the container, such as CentOS
or Ubuntu
, But that doesn't change the fact that the host kernel is shared. This means that if you want to Windows
run a container on the host machine Linux
, or run a higher version container on a lower version Linux
host machine Linux
, it will not work.
Secondly, in Linux
the kernel, there are many resources and objects that cannot be Namespace
allocated, the most typical example is: time. This means that if the program in your container modifies the time, the time of the entire host will be modified accordingly, which is obviously not in line with user expectations.
In addition, Namespace
similar to the situation above, Cgroups
there are many imperfections in the ability to limit resources. The most common problem here is /proc
the problem of the file system. Linux
The directory below /proc
stores a series of special files that record the current running status of the kernel. Users can view information about the system and currently running processes, such as usage, memory usage, etc., by accessing these files. These files are also commands to view the CPU
system top
. Primary data source for information. However, if you execute the command in the container top
, you will find that the information it displays is actually CPU
the data of the host machine and memory, not the data of the current container. /proc
The reason for this problem is that the Docker engine directly mounts many files under the host to the container when starting the process Docker
.