Container creation in kubernetes is undoubtedly a complex process, involving the unified collaboration of various internal components, as well as the connection to the external CRI runtime. This article attempts to explore the various details of the container creation process and understand its various component collaboration processes. Therefore, when problems arise in the follow-up, it is good to have some directions for investigation.

1. Foundation building

1.1 Container-managed threading model

The threading model in kubelet belongs to the master/worker model. It monitors various event sources through a single master, and creates a goroutine for each Pod to process the business logic of the Pod. The master and the worker communicate through a state pipeline.

1.2 Event-driven state eventual consistency

After creating a Pod through yaml, kubernetes will continue to adjust according to the current event and the current Pod state, so as to achieve the consistency of the final target state

1.3 Component collaboration process

The structure declaration of kubelet is as high as more than 300 lines of code, which shows its complexity, but we follow the process of container creation and observe its core process. In fact, it can be summarized into three parts: kubelet, containerRuntime, and CRI container runtime

2.Kubelet creates container process

2.1 Get Pod for admission check

The event source of kubelet mainly includes two parts: static Pod and Apiserver. We only consider ordinary Pods here, and Pods will be directly added to PodManager for management and admission check.

Admission check mainly includes two key controllers: eviction management and pre-selection check. The eviction management is mainly based on the current resource pressure to detect whether the corresponding Pod can tolerate the current resource pressure; the pre-selection check is based on the current active container and current node. information to check whether it satisfies the basic running environment of the current Pod, such as affinity check. At the same time, if the current Pod has a particularly high priority or is a static Pod, it will try to preempt resources for it, and it will be based on the QOS level. Preempt to satisfy its operating environment

2.2 Create event pipeline and container management main thread

When the kubelet receives a newly created Pod, it first creates an event pipeline for it, and starts a container-managed main thread to consume the events in the pipeline, and waits for the latest event in the current kubelet based on the last synchronization time (from the local podCache), if it is a new Pod, it is mainly through the update time operation in PLEG, and the default empty state of the broadcast is used as the latest state

2.3 Sync latest status

When the latest status information is obtained from the local podCache and the Pod information obtained from the event source, it will be updated in combination with the container status in the Pod in the current statusManager and probeManager, so as to obtain the latest perceived Pod status.

2.4 Admission Control Checks

The previous admission check is the check of the hard limit of resources running by the Pod, and the admission check here is the soft state, that is, some software running environment checks of the container runtime and version. If the check fails here, the corresponding container state will be reported. Set to Blocked

2.5 Update container state

After passing the admission check, the statusManager will be called to synchronize the latest status of the POd, which may be synchronized to the apiserver here

2.6 Cgroup configuration

After the update is completed, a PodCOntainerManager will be started. The main function is to update the Cgroup configuration for the corresponding Pod according to its QOS level.

2.7 Pod basic operating environment preparation

Next, kubelet will prepare the basic environment for Pod creation, including the creation of the Pod data directory, the acquisition of the mirror key, and waiting for the completion of volume mounting. Volume directory, and will generate the secret key information by pulling the secret key from the image configured by the Pod. At this point, the work of kubelet to create the container has been basically completed.

3.ContainerRuntime

We mentioned earlier that operations for Pods are ultimately completed based on the synchronization of events and states. In containerRUntime, it does not distinguish whether the corresponding events are creation or update operations, but only based on the current Pod information and the target state for comparison. , so as to construct the corresponding operation to achieve the target state

3.1 Calculating Pod container changes

The calculation of container changes mainly includes: whether the sandbox of the Pod is changed, whether the container with a short declaration cycle is completed, whether the initialization container has been completed, and whether the business container has been completed. Correspondingly, we will get a list of several corresponding containers: the list of containers that need to be killed, and whether the container needs to be killed. List of started containers, note that if our initialization container is not completed, the business container to be run will not be added to the list of containers that need to be started, you can see that there are two stages in this place

3.2 Failed to initialize try to terminate

If it is detected that the previous initialization container failed, it will check all the containers of the current Pod and the containers associated with the sandbox. If there are running containers, all of them will perform the Kill operation and wait for the operation to complete.

3.3 Unknown state container compensation

When some Pod containers are already running, but their status is still Unknow, unified processing will be performed in this place, and all will be killed, so as to clean up for the next restart. Here and 3.2, only one branch will be performed. , but the core goal is to clean up those containers that fail to run or fail to get status

3.4 Create a container sandbox

Before starting the container of the Pod, a sandbox container is first created for it. All the containers of the current Pod share the same namespace with the sandbox corresponding to the Pod and thus share the resources in a namespace. Creating a sandbox is more complicated, and will be introduced later.

3.5 Start Pod-related containers

Pod containers are currently divided into three categories: short-life-cycle containers, initialization containers, and business containers. The startup sequence is also from left to right. If the container fails to be created, the backoff mechanism will be used to delay the creation of the container. Here By the way, let's introduce the process of containerRuntime starting the container

3.5.1 Check whether the container image is pulled

When pulling an image, the corresponding container image will be spliced first, and then the previously obtained pulling key information and image information will be handed over to the CRI runtime to pull the underlying container image. Of course, there will also be various backoffs here. mechanism to avoid frequent pull failures affecting the performance of kubelet

3.5.2 Create a container configuration

Creating a container configuration is mainly to create corresponding configuration data for the operation of the container, which mainly includes: Pod hostname, domain name, mounted volume, configMap, secret, environment variable, mounted device information, and to-mounted directory information, Port mapping information, commands generated and executed according to the environment, log directory and other information

3.5.3 Call runtimeService to complete the creation of the container

Call runtimeService to pass the configuration information of the container, call CRI, and finally call the creation interface of the container to complete the state of the container

3.5.4 Call runtimeService to start the container

Start the corresponding container through the container ID returned by the container created before, and create the corresponding log directory for the container

3.5.5 Execute the callback hook of the container

If the container is configured with the PostStart hook, the execution of the corresponding hook will be performed here. If the type of the hook is the Exec class, the EXec interface of CNI will be called to complete the execution in the container.

4. Run the sandbox container

4.1 Pull the sandbox image

First, the sandbox image will be pulled

4.2 Create a sandbox container

4.2.1 Applying the SecurityContext

Before creating a container, the container SecurityContext will be configured according to the allocation information in the SecurityContext, mainly including privilege level, read-only directory, running account group and other information

4.2 Other basic information

In addition to applying SecurityContext, it also maps information such as disconnection, OOMScoreAdj, and Cgroup drivers.

4.3 Create a container

Create a container based on the above configuration information

4.3 Create checkpoint

Checkpoint mainly serializes the configuration information of the current sandbox and stores its current snapshot information

4.4 Start the sandbox container

Starting the sandbox container will directly call StartContainer and pass in the ID returned by the previously created container to complete the startup of the container, and will rewrite the dns configuration file of the container at this time.

4.5 Container Network Settings

The network configuration of the container is mainly to call the CNI plug-in to complete the configuration of the container network, which will not be expanded here.

5. Pod container startup summary

The kubelet is the core steward of container management. It is responsible for various admission control, state management, detection management, volume management, QOS management, and unified scheduling of CSI docking. The latest state Runtime layer reorganizes the data assembled by the kubelet according to the target configuration of the CRI runtime and the resource configuration information managed by the kubelet, and decides the start, stop, creation and other operations of the container according to the state of the container of the Pod. Complete the construction of the basic configuration environment of the container, and finally call the CRI to complete the creation of the container. When the CRI is running, it will further combine the various data passed over, and apply it to the host and the corresponding namespace resource restrictions, and according to the Its own container service organizes data, and calls the container service to complete the final creation of the container

This article is a basic version. We will continue to superimpose various details on this version in the future. Interested friends can help forward and pay attention. Thank you.

k8s source code reading e-book address: https://www.yuque.com/baxiaoshi/tyado3

> WeChat ID: baxiaoshi2020 > Follow the bulletin number to read more source code analysis articles 21 days greenhouse > For more articles, follow www.sreguide.com > This article is published by OpenWrite , a blog post multiple platform

Illustrating the bumpy journey at the beginning of Pod's life in kubernetes