kubernetes CRI Past and Present

  In the process of learning kubernetes, we will encounter CRI, CNI, CSI, OCI and other terms, this article attempts to default to a container runtime architecture by analyzing k8s present, to help us better understand the logic behind k8s runtime design. Further extraction CRI, OCI raised background.

A, k8s architecture

  We first need to build the master node when building k8s cluster, and secondly you need to create node node node node is added to the k8s cluster. When we build a good k8s cluster, we can by way of kubectl create -f nginx.yml command to create the application corresponding pod. When we execute the command

After the command will be submitted to the API  Server, it parses yml file and save it to etcd in the form of API objects. Then master components in the Controller Manager will arrange to do the work by way of a control loop to create Pod required for the application. Scheduler will watch the changes in the new Pod in etcd. Such as

If he found a new Pod out now, Scheduler will run the scheduling algorithm, the final choice of the best Node node scheduling algorithm, and writes the name of the Node node NodeName field pod objects above, this step is called Bind Pod to Node (marked in the figure below), and then bind the result back etcd.

       Secondly, when we were building k8s cluster, each node will be the default initialization process to create a kubelet will watch pod changes etcd in kubelet process, when kubelet process to update bind watch the pod, and bind the this node is a node, it will take over the next

Doing things, such as mirroring download, create container and so on.

Two, k8s default runtime schema container

  Next, through the integrated container k8s default runtime architecture kublete see how to create a container (shown below).

1. kubelet 通过 CRI(Container Runtime Interface) 接口(gRPC) 调用 dockershim, 请求创建一个容器, 这一步中, Kubelet 可以视作一个简单的 CRI Client, 而 dockershim 就是接收请求的 Server.

2. dockershim 收到请求后, 通过适配的方式,适配成 Docker Daemon 的请求格式, 发到 Docker Daemon 上请求创建一个容器。在docker 1.12后版本中,docker daemon被拆分成dockerd和containerd,containerd负责操作容器。

3. dockerd收到请求后, 调用containerd进程去创建一个容器。

4. containerd 收到请求后, 并不会自己直接去操作容器, 而是创建一个叫做 containerd-shim 的进程, 让 containerd-shim 去操作容器. 创建containered-shim的目的主要有:

1)让containerd-shim做诸如收集状态, 维持 stdin 等 fd 打开等工作. 

2)允许容器运行时(runC)启动容器后退出,不必为每个容器一直运行一个容器运行时runC。

3)即使在 containerd 和 dockerd 都挂掉的情况下,容器的标准 IO 和其它的文件描述符也都是可用的。

4)向 containerd 报告容器的退出状态

5)在不中断容器运行的情况下升级或重启 dockerd

5. 而containerd-shim 在这一步需要调用 runC 这个命令行工具, 来启动容器,runC是OCI(Open Container Initiative, 开放容器标准) 的一个参考实现。主要用来设置 namespaces 和 cgroups, 挂载 root filesystem等操作。

6.runC启动完容器后本身会直接退出, containerd-shim 则会成为容器进程的父进程, 负责收集容器进程的状态, 上报给 containerd, 并在容器中 pid 为 1 的进程退出后接管容器中的子进程进行清理, 确保不会出现僵尸进程。

 

 三、容器与容器编排背景简述 

       从k8s的容器运行时可以看出,kubelet启动容器的过程经过了很长的一段调用链路。这个是由于在容器及编排领域各大厂商与docker之间的竞争以及docker公司为了抢占paas领域市场,对架构做出的一系列调整。其实 k8s 最开始

的运行时架构远没这么复杂: kubelet 想要创建容器直接通过 docker api 调用 Docker Daemon,Docker Daemon 调 libcontainer 这个库把容器跑起来, 整个过程就搞完了。为了防止Docker垄断以及受控

docker运行时, 各大厂商于是就联合起来搞了开放容器标准OCI(Open Containers Initiative).大家可以基于这个标准开发自己的容器运行时。Docker公司则把 libcontainer 封装了一下, 变成 runC 捐献出来作为 OCI 的参考实现.

        接下来就是 Docker 要搞 Swarm 进军 PaaS 市场, 于是做了个架构切分, 把容器操作都移动到一个单独的 Daemon 进程 containerd 中去, 让 Docker Daemon 专门负责上层的封装编排. 可惜 Swarm 在 k8s 面前实在是不够打, 惨败之后

Docker 公司就把 containerd 捐给 CNCF ,专注于搞 Docker 企业版了. 

      与此同时,容器领域,core os公司搞出了个rkt容器运行时。为了能从 docker 那边分一杯羹, 希望 k8s 原生支持 rkt 作为运行时, 由于core os与google的关系,最终rkt运行时的支持在2016年也被合并进kubelet主干代码里. 这样做后不但

没有给rkt项目带来更多用户,反而给k8s中负责维护 kubelet 的小组 SIG-Node带来了更大的负担,每一次kubelet的更新都要维护docker和rkt两部分代码。与此同时,随着虚拟化技术强隔离容器技术runV(Kata Containers前身,后与intel

clear container 合并)的逐渐成熟。k8s上游对虚拟化容器的支持很快被提上了日程。为了从集成每一种运行时都要维护一份代码中解放出来,k8s SIG-Node工作组决定对容器的操作统一地抽象成一个接口,这样kubelet只需要跟这个接口

打交道,而具体地容器运行时,他们只需要实现该接口,并对kubelet暴露gRPC服务即可。这个统一地抽象地接口就是k8s中俗称的 CRI。

 

四、CRI(容器运行时接口)

       CRI 基于 gRPC 定义了 RuntimeService 和 ImageService 等两个 gRPC 服务,分别用于容器运行时和镜像的管理。如下所示:

// Runtime service defines the public APIs for remote container runtimes
service RuntimeService {
    // Version returns the runtime name, runtime version, and runtime API version.
    rpc Version(VersionRequest) returns (VersionResponse) {}

    // RunPodSandbox creates and starts a pod-level sandbox. Runtimes must ensure
    // the sandbox is in the ready state on success.
    rpc RunPodSandbox(RunPodSandboxRequest) returns (RunPodSandboxResponse) {}
    // StopPodSandbox stops any running process that is part of the sandbox and
    // reclaims network resources (e.g., IP addresses) allocated to the sandbox.
    // If there are any running containers in the sandbox, they must be forcibly
    // terminated.
    // This call is idempotent, and must not return an error if all relevant
    // resources have already been reclaimed. kubelet will call StopPodSandbox
    // at least once before calling RemovePodSandbox. It will also attempt to
    // reclaim resources eagerly, as soon as a sandbox is not needed. Hence,
    // multiple StopPodSandbox calls are expected.
    rpc StopPodSandbox(StopPodSandboxRequest) returns (StopPodSandboxResponse) {}
    // RemovePodSandbox removes the sandbox. If there are any running containers
    // in the sandbox, they must be forcibly terminated and removed.
    // This call is idempotent, and must not return an error if the sandbox has
    // already been removed.
    rpc RemovePodSandbox(RemovePodSandboxRequest) returns (RemovePodSandboxResponse) {}
    // PodSandboxStatus returns the status of the PodSandbox. If the PodSandbox is not
    // present, returns an error.
    rpc PodSandboxStatus(PodSandboxStatusRequest) returns (PodSandboxStatusResponse) {}
    // ListPodSandbox returns a list of PodSandboxes.
    rpc ListPodSandbox(ListPodSandboxRequest) returns (ListPodSandboxResponse) {}

    // CreateContainer creates a new container in specified PodSandbox
    rpc CreateContainer(CreateContainerRequest) returns (CreateContainerResponse) {}
    // StartContainer starts the container.
    rpc StartContainer(StartContainerRequest) returns (StartContainerResponse) {}
    // StopContainer stops a running container with a grace period (i.e., timeout).
    // This call is idempotent, and must not return an error if the container has
    // already been stopped.
    // TODO: what must the runtime do after the grace period is reached?
    rpc StopContainer(StopContainerRequest) returns (StopContainerResponse) {}
    // RemoveContainer removes the container. If the container is running, the
    // container must be forcibly removed.
    // This call is idempotent, and must not return an error if the container has
    // already been removed.
    rpc RemoveContainer(RemoveContainerRequest) returns (RemoveContainerResponse) {}
    // ListContainers lists all containers by filters.
    rpc ListContainers(ListContainersRequest) returns (ListContainersResponse) {}
    // ContainerStatus returns status of the container. If the container is not
    // present, returns an error.
    rpc ContainerStatus(ContainerStatusRequest) returns (ContainerStatusResponse) {}
    // UpdateContainerResources updates ContainerConfig of the container.
    rpc UpdateContainerResources(UpdateContainerResourcesRequest) returns (UpdateContainerResourcesResponse) {}
    // ReopenContainerLog asks runtime to reopen the stdout/stderr log file
    // for the container. This is often called after the log file has been
    // rotated. If the container is not running, container runtime can choose
    // to either create a new log file and return nil, or return an error.
    // Once it returns error, new container log file MUST NOT be created.
    rpc ReopenContainerLog(ReopenContainerLogRequest) returns (ReopenContainerLogResponse) {}

    // ExecSync runs a command in a container synchronously.
    rpc ExecSync(ExecSyncRequest) returns (ExecSyncResponse) {}
    // Exec prepares a streaming endpoint to execute a command in the container.
    rpc Exec(ExecRequest) returns (ExecResponse) {}
    // Attach prepares a streaming endpoint to attach to a running container.
    rpc Attach(AttachRequest) returns (AttachResponse) {}
    // PortForward prepares a streaming endpoint to forward ports from a PodSandbox.
    rpc PortForward(PortForwardRequest) returns (PortForwardResponse) {}

    // ContainerStats returns stats of the container. If the container does not
    // exist, the call returns an error.
    rpc ContainerStats(ContainerStatsRequest) returns (ContainerStatsResponse) {}
    // ListContainerStats returns stats of all running containers.
    rpc ListContainerStats(ListContainerStatsRequest) returns (ListContainerStatsResponse) {}

    // UpdateRuntimeConfig updates the runtime configuration based on the given request.
    rpc UpdateRuntimeConfig(UpdateRuntimeConfigRequest) returns (UpdateRuntimeConfigResponse) {}

    // Status returns the status of the runtime.
    rpc Status(StatusRequest) returns (StatusResponse) {}
}

// ImageService defines the public APIs for managing images.
service ImageService {
    // ListImages lists existing images.
    rpc ListImages(ListImagesRequest) returns (ListImagesResponse) {}
    // ImageStatus returns the status of the image. If the image is not
    // present, returns a response with ImageStatusResponse.Image set to
    // nil.
    rpc ImageStatus(ImageStatusRequest) returns (ImageStatusResponse) {}
    // PullImage pulls an image with authentication config.
    rpc PullImage(PullImageRequest) returns (PullImageResponse) {}
    // RemoveImage removes the image.
    // This call is idempotent, and must not return an error if the image has
    // already been removed.
    rpc RemoveImage(RemoveImageRequest) returns (RemoveImageResponse) {}
    // ImageFSInfo returns information of the filesystem that is used to store images.
    rpc ImageFsInfo(ImageFsInfoRequest) returns (ImageFsInfoResponse) {}
}

 

 

具体容器运行时则需要实现 CRI 定义的接口(即 gRPC server,通常称为 CRI shim)。容器运行时在启动 gRPC server 时需要监听在本地的 Unix Socket (Windows 使用 tcp 格式)。

 

五、 容器运行时实现

  除了上面介绍的默认的容器运行时的实现,目前容器运行时主要有:

 

参考文档:

https://github.com/kubernetes/cri-api/blob/master/pkg/apis/runtime/v1alpha2/api.proto

https://feisky.gitbooks.io/kubernetes/plugins/CRI.html

https://aleiwu.com/post/cncf-runtime-landscape/

https://www.infoq.cn/article/r*ikOvovTHhADAWw1Hb1

 

Guess you like

Origin www.cnblogs.com/justinli/p/11578951.html