Source code analysis: Use of CNI from the perspective of kubelet and container runtime

This is the third note on Kubernetes online learning.

In the previous article, we learned about the network configuration operations and related processes through the interpretation of the CNI specification. In addition to , and CNI_COMMAND, there are three other parameters that must be provided almost every time CNI_CONTAINERIDin several network operations . These parameters are nothing more than coming from the container runtime. This article will combine Kubernetes and Containerd source code to analyze the use of CNI.CNI_IFNAMECNI_NETNS

The source code of Kubernetes comes from the branch release-1.24, and the source code of Containerd comes from the branch release/1.6.

Use of CNIs

runtime-with-cni

Create pods

In the previous kubelet source code analysis, it was mentioned that changes from files , apiserver , and httpKubelet#syncLoop() will be continuously monitored to update the status of the pod. When I wrote that article, the analysis ended here. Because the work after this is left to the container runtime to complete the creation and running of the sandbox and various containers, see .kubeGenericRuntimeManager#SyncPod()

kubeletEncapsulate the sandbox and container creation and running requests, call the container runtime interface, and hand over the specific work to the container runtime to complete (Container Runtime Interface, CRI for short, find time to conduct research).

Reference source code

Sandbox container

Remember from the first part of the series , when we looked at the namespace on the node, the process for the network namespace was /pause.

lsns -t net
        NS TYPE NPROCS    PID USER     NETNSID NSFS                                                COMMAND
4026531992 net     126      1 root  unassigned                                                     /lib/systemd/systemd --system --deserialize 31
4026532247 net       1  83224 uuidd unassigned                                                     /usr/sbin/uuidd --socket-activation
4026532317 net       4 129820 65535          0 /run/netns/cni-607c5530-b6d8-ba57-420e-a467d7b10c56 /pause

When Kubernetes creates a pod, it first creates a sandbox container (using pausea mirror, which is executed when it starts and /pausegoes into hibernation). We know that multiple containers are allowed in the pod of Kubernetes. The sandbox container creates and maintains the network namespace, and other containers of the pod will join the namespace. Because the pause image is simple enough, it will not cause the network management space to be deleted when an error occurs. The sandbox container plays a crucial role as a process with PID 1 in the process tree of the PID process space, and other container processes use it as a parent process. Processes in other containers can be cleaned up when they become orphaned.

Create a Sandbox container

CRI RuntimeServiceServerdefines the service interface provided by the runtime. In addition to managing sandbox and container-related operations, there are also streaming-related operations, namely commonly used exec, attach, portforward. For streaming-related content, you can refer to the previous article "Source code analysis kubectl port-forward working principle" .

Let’s look at the container related parts.

Containerd criServiceimplements RuntimeServiceServerthe interface. The request to create a sandbox container enters the processing flow through CRI's UDS (Unix domain socket) interface . Now , responsible for creating and running the sandbox container and ensuring that the container status is normal./runtime.v1.RuntimeService/RunPodSandboxcriServicecriService#RunPodSandbox()

  1. When the container is running, the container object is first initialized and the necessary parameters are generated.CNI_CONTAINERID
  2. The pod network namespace will be created and the necessary parameters will be generated.CNI_NETNS
  3. Then call the CNI interface to configure the network space of the pod, such as creating a network interface, assigning an IP address, creating a veth, setting a route, and a series of operations. These operations are completed by specific network plug-ins, and there are differences in the implementation of different plug-ins. After understanding the specifications, network configuration is not difficult. 2 and 3 may be performed multiple times:
    1. read network configuration
    2. find binaries
    3. Execute the binary
    4. Feedback results to the container runtime
  4. The last step is to create the sandbox container. This process is related to the type of operating system. The method of the corresponding operating system will be called to complete the creation of the container.

To learn containers from scratch, it is recommended to read "Learning Containers From The Bottom Up" by Ivan Velichko

Reference source code:

Create other containers

The next step is to create other containers in the pod: temporary ( ephemeral), initialization ( init) and normal containers. When these containers are created, the sandbox container will be used. Will be added to sandox's network namespace. It will not be expanded here. For detailed logic, please refer to containerd containerStore#Create().

Reference source code

Summarize

Following the introduction of CNI specifications in the previous article, this time we introduce the use of CNI, how to interact with the container runtime, and the Pod creation process.

Different CNI plug-ins implement different network functions. In the next article, we will use Flannel as an example to learn about the implementation of CNI and the Kubernetes VXLAN network.

Why introduce flannel? Because k3s, one of my commonly used development environments, uses the flannel network by default. Another development environment is k8e . K8e uses Cilium by default . Cni of cilium is also one of the series of articles.

Articles are published uniformly on the public account云原生指北

The web version of Windows 12 deepin-IDE compiled by junior high school students was officially unveiled. It is known as "truly independently developed" QQ has achieved "three-terminal simultaneous updates", and the underlying NT architecture is based on Electron QQ for Linux officially released 3.2.0 "Father of Hongmeng" Wang Chenglu : Hongmeng PC version system will be launched next year to challenge ChatGPT, these 8 domestic AI large model products GitUI v0.24.0 are released, the default wallpaper of Ubuntu 23.10, a Git terminal written in Rust, is revealed, the "Tauren" in the maze JetBrains announces the WebStorm 2023.3 roadmap China Human Java Ecosystem, Solon v2.5.3 released
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/5110404/blog/5608196