[Cloud Resident Co-Creation] Kuasar Development Practice in Multi-Sandbox Container Runtime

[Abstract] In the era of vigorous development of cloud-native technology, the container that carries the application at the bottom layer is very important. However, a single container isolation technology can no longer meet the requirements of various scenarios. Different scenarios require different container forms to carry, resulting in different sandbox techniques. Based on the development history of sandbox containers, this article introduces the advantages of the Kuasar project in Huawei Cloud's multi-sandbox container runtime, and demonstrates the installation and operation of Kuasar for developers to start the hands-on experience of multi-sandbox container runtime!

foreword

At the annual top cloud-native open source summit KubeCon + CloudNativeCon Europe 2023, Kuasar, a cloud-native multi-sandbox container runtime jointly initiated by Huawei Cloud, the Agricultural Bank of China, the openEuler community, and the CNCF project WasmEdge, officially announced its open source. Wide attention and heated discussions from the community and cloud vendors.

The new cloud-native open source project Kuasar combines Huawei Cloud's years of production business practice and thinking on the development of sandbox technology, and is implemented based on the industry's emerging sandbox interface. On the basis of retaining traditional container runtime functions, Kuasar further reduces management overhead, simplifies call links, flexibly expands support for mainstream sandbox technologies in the industry, and realizes cloud-native services through comprehensive Rustization and optimized management models and frameworks. Full coverage of the scene. In addition, by supporting multi-security sandbox co-node deployment, Kuasar can make full use of node resources, reduce costs and increase efficiency, and provide users with a more secure and efficient sandbox scenario solution.


Sandbox container development

container age

image-20230703160021362.png

As early as 2013, docker was born, marking the arrival of the container era. The original container technology is actually using the namespace Namespace and control group Cgroup functions provided by the Linux kernel to realize resource isolation and limitation between container processes. In the container era, containers are the only first-class citizens in docker.

image-20230703160041595.png

Soon, in 2014, the container orchestration field was vying for hegemony. As Kubernetes finally became the mainstream container orchestration tool, Pod also became a first-class citizen in the container orchestration field. In order to be compatible with the concept of Pod, docker introduced the pause container.

However, the introduction of pause containers often confuses developers because there are many differences between Pods and pause containers. In kubernetes, Pod is the carrier of a set of container logic and physical resources, while the pause container only provides a shared namespace between containers. In addition, there are many redundant and complex judgment logics in the container runtime to distinguish between pause containers and user containers, making code reading and development difficult.

In 2019, containerd graduated from CNCF and has become the preferred container runtime in kubernetes. Similarly, containerd also needs to use the pause container to run a pod.

image-20230703175933767.png

Since the runC container shares the kernel with the host system, once the container escapes, especially in a multi-tenancy scenario, it will bring huge security risks.


sandbox rise

With the emergence of the above problems, related solutions have also emerged one after another. In 2018, the cloud-native field is developing rapidly, many sandbox (Sandbox) isolation technologies have been applied to the container field, and sandbox containers are in full swing. The sandbox container restricts the container process in a closed sandbox environment, preventing it from causing damage to the system and other containers, and has extremely high security. The sandbox naturally conforms to the definition of Pod. It provides an isolated environment for a group of containers. The containers running in the sandbox environment are sandbox containers .

According to the boundary of sandbox isolation, it can be divided into lightweight virtual machine sandbox (MicroVM Sandbox), user mode kernel sandbox (Application Kernel Sandbox) and WebAssembly sandbox (Wasm Sandbox). They are all products incubated to meet different business needs, so they have their own advantages in different dimensions.

  • Lightweight virtual machine sandbox (MicroVM Sandbox): Simulate a complete set of virtual machines on the host machine, and the container runs in the virtual machine, which has a very high security isolation effect.

  • User state kernel sandbox (Application Kernel Sandbox): Through a kernel program running in user state, intercept and realize the system call of the container, so as to ensure the security isolation between containers.

  • WebAssembly Sandbox (Wasm Sandbox): Run the container in the WebAssembly runtime, relying on WebAssembly's ability to provide process-level isolation.

image-20230703163015164.png

Each type of sandbox has its own advantages in terms of extreme speed, flexibility, security isolation, and standard common dimensions. At present, cloud vendors have deployed sandbox container products in the production environment, and each sandbox implements a set of management plane programs containerd Shim v2. are not compatible with each other.

image-20230703163609606.png

In March 2023, a feature was released containerdin its version, which provides a set of APIs for managing sandboxes. Its appearance decouples the concepts of containers and sandboxes, "containers belong to containers, and sandboxes belong to sandboxes", creating Pod is to create a sandbox, no longer need to use the pasue container.v1.7.0Sandbox API

image-20230703164309361.png

Sandbox containers have become a security solution in cloud-native scenarios. We hope to use the power of Sandbox API to implement a container runtime that supports multiple sandbox technologies.


Kuasar Project Introduction

Project Description

The emergence of the Sandbox API makes the sandbox a new first-class citizen in the container world. We need a container runtime that supports multiple mainstream sandbox technologies and has an extensible, maintainable, and evolvable mechanism. Thus, Kuasar was born.

KubeCon + CloudNativeCon Europe 2023Huawei Cloud will officially open source Kuasar at the Cloud Native Summit held in Amsterdam, Netherlands in April 2023 . The newly open source multi-sandbox container runtime Kuasar can make full use of node resources, reduce costs and increase efficiency, and provide users with a more secure and efficient sandbox scenario solution.

Kuasar is a container runtime developed based on the Rust language that can support multiple mainstream sandbox isolation technologies at the same time. It has the following characteristics:

  • Sandbox-friendly : Developed based on the Sandbox API interface, which is different from the current Shim v2 interface, and has natural advantages in sandbox definition and life cycle management.
  • Multi-sandbox mixed deployment : Integrating multiple mainstream sandbox technologies, you can run multiple different types of sandbox containers on a single node.
  • Simplified model : A 1:N container process management model is adopted. Compared with the current Shim process 1:1 approach, it brings 100% startup speed improvement and 99% memory overhead optimization.

Github address : https://github.com/kuasar-io/kuasar

image-20230703170203846.png


Project official website : https://kuasar.io

image-20230703170723726.png


Kuasar positioning

Kuasar is a multi-sandbox container runtime, so what is a container runtime? Simply put, the container runtime is a runtime component responsible for pulling up the container and managing the running state of the container. It can be divided into two types: high-level container runtime and low-level container runtime:

  • High-level container runtime : responsible for the implementation of CRI, managing containers and image instances from a high-dimensional perspective, containerd, CRI-O, docker and iSulad are typical high-level container runtimes.
  • Low-level container runtime : Responsible for OCI implementation, and actually operate the container. Kata-containers and runC are low-level container runtimes.

image-20230703175846432.png

Kuasar belongs to the low-level container runtime and interacts with the high-level container runtime containerd. Kuasar is mainly composed of two modules:

  • Kuasar-Sandboxer: Implements the Sandbox API and is responsible for managing the sandbox lifecycle and resource allocation. Sandboxer interacts with containerd as a plugin.
  • Kuasar-Task: implements the Task API and is responsible for managing the life cycle and resource allocation of containers.

At present, at the northbound interface level, Kuasar is jointly building the latest sandbox interface standard with containerd, and the sandboxer plug-in has been added to the version roadmap of containerd v2.0; in addition, the lightweight container engine iSulad project of the OpenEuler community has also been completed. docking. At the southbound sandbox level, Kuasar already supports multiple mainstream security sandboxes including Cloud Hypervisor (MicroVM category), WasmEdge (Wasm category), StratoVirt (MicroVM category), and Quark (App Kernel category). And it is planned to support more sandboxes in Roadmap , which can adapt to more cloud-native scenarios in the future.


MicroVM Sandboxer

In the lightweight virtual machine scenario, the virtual machine process provides a complete virtualization layer and the Linux kernel. Such virtual machines include Cloud Hypervisor , StratoVirt , Firecracker , and QEMU . In MicroVM Sandboxer, vmm-sandboxer is responsible for creating a virtual machine and calling API vmm-task as the init process in the virtual machine is responsible for pulling up the container process, and the IO stream of the container can be exported through the vsock or uds of the virtual machine.

image-20230703180046968.png

Currently only Cloud Hypervisor, QEMU and StratoVirt are supported.


App Kernel Sandboxer

App Kernel Sandbox deeply integrates the KVM virtualization layer and the Guest kernel into a user-mode kernel process, and implements container isolation by intercepting container system calls. Typical representatives include gVisor and Quark.

Quark is an App Kernel sandbox that uses its own QVisor hypervisor and custom kernel QKernel. QVisor is only responsible for the life cycle management of the KVM virtual machine, and does not simulate any device. Qkernel intercepts all syscalls, and notifies QVisor to process through VM_Exit or eventfd if necessary. By mapping the memory space of the host process to the physical memory space of the VM, the memory sharing between QVisor and QKernel is realized.

The quark-sandboxer of App Kernel Sandboxer pulls up Qvisor and Qkernel. Whenever containerd needs to start a container in the sandbox, quark-task in QVisor will call Qkernel to start a new container. All containers in the same Pod will run in the same process.

image-20230703180754844.png

Currently only Quark is supported.


Wasm Sandboxer

If the App Kernel sandbox implements a set of isolation sandbox technologies at the virtualization and kernel levels, then the WebAssembly sandbox defines a new architecture, including a set of instruction sets and virtual machines. All programs must be compiled into the WebAssembly instruction set to run in the WebAssembly virtual machine. Therefore, there are high requirements for applications. Common Wasm sandboxes include WasmEdge and Wasmtime.

wasm-sandboxer and wasm-task launch containers inside a WebAssembly sandbox. When containerd needs to start a container in the sandbox, wasm-task will fork a new process, start a new WasmEdge runtime, and run Wasm code in it. All containers in the same Pod will share the same Namespace and Cgroup resources with the wasm-task process.

image-20230703181104620.png

Currently only WasmEdge is supported. Subject to certain technical limitations (mainly standard input and output cannot be redirected), wasm-task uses fork to start a new runtime. Subsequent evolution may choose to start the runtime directly in the process to achieve faster startup and lower memory usage.


Changes to Kuasar's management model

In the current Shim v2 model of container runtime, every time containerd creates a Pod, it needs to create a corresponding Shim process for Pod management, and the Shim process then creates virtual machines and containers. In this scenario, the management plane Shim The relationship between the number of processes and Pods is 1:1.

But in Kuasar, only one Kuasar-Sandboxer process needs to be run. Containerd manages Pods by calling the interface exposed by Sandboxer. It is no longer necessary to pull up a management process for each Pod. Therefore, the management plane Sandboxer process and Pod The quantitative relationship is 1:N. This model can greatly reduce the number of resident processes, and the entire architecture becomes clearer and more concise.

image-20230703181807237.png

Kuasar changes the current Shim V2 management model, bringing the following benefits:

  1. Sandbox management logic is clear : sandbox management logic and container management logic are completely separated, which is friendly to development and has clear semantics.
  2. Simplify the container call chain : cancel the conversion from Task API to Shim v2 API, call directly, and simplify the link.
  3. Efficient sandboxer process : The resident Sandboxer process reduces the time-consuming of cold-starting the Shim process, the 1:N management model greatly reduces the number of processes, and the Rust program is memory-safe, with less overhead than Golang.
  4. The pause container disappears : Creating a Pod no longer creates a pause container, and it is no longer necessary to prepare a pause container image snapshot.

performance

So what is the performance of Kuasar? Select the two most concerned indicators of " end-to-end container startup time " and " management plane component memory consumption " as the two indicators to measure Kuasar performance. The specific definitions are as follows:

  • End-to-end container startup time : In the CRI implementation of containerd v1.7.0, the practice of disguising the sandbox as a container is abandoned, and the new features of the Sandbox API are used to create a sandbox and start the container. Therefore, we need to use CRI as the entry point. The actual test time required to pull up a container process end-to-end.
  • Memory consumption of management plane components : measure the memory consumption of management components (excluding virtual machines), that is, compare the PSS memory (Proportional Set Size) of the Sandboxer process and all Shim v2 processes. PSS is the physical memory actually occupied by a single process when it is running, including the memory occupied by the shared library after being allocated in proportion.

image-20230703182912215.png

Control the following variables:

  • VMM, Guest OS (except init process), and Guest Kernel are consistent.
  • The container image is the same and uses the local image snapshot directly.
  • All container storage drivers use Overlayfs.
  • The container network is HostNetwork mode.

boot time test

image-20230703183433601.png

The startup time test is divided into two groups, one group counts the startup time of a single Pod, and the other group counts the time to start 50 Pods in parallel:

Kuasar's 100% startup speed improvement is mainly due to two aspects. On the one hand, the implementation of the Sandbox API makes creating a container no longer a separate pause container, which saves the time for preparing pause container image snapshots; on the other hand, it benefits from 1: N management model, the Sandboxer process is resident, which saves the time of cold starting the Shim process, which greatly improves the startup speed of the container.


memory consumption test

image-20230703183528578.png

The memory consumption test is divided into three rounds. In each round, 1, 5, 10, 20, 30, and 50 Pods are started, and the PSS values ​​​​of the Sandboxer process and all Shim processes are queried.

Kuasar saves nearly 99% of memory. The reasons can be divided into two points: the main reason is that the 1:N management model reduces N processes to 1 process, and the memory benefits brought are proportional to the number of Pods; secondly, Kuasar adopts The Rust programming language, compared to the Golang language used by the Kata Shim process, the language itself will also bring some memory benefits.


Kuasar installation and deployment

Pre-preparation

Before installing and configuring, you need to prepare the following:

image-20230704141422102.png

For other specific building requirements, please refer to the official website and Github address:
Official address: https://kuasar.io/docs/developer/build/
Github address: https://github.com/kuasar-io/kuasar

image-20230707095618287.png


Installation and deployment

Download and compile from source

For installation and deployment, please refer to Github address: https://github.com/kuasar-io/kuasar

image-20230707095708348.png

Operation command:

git clone https://github.com/kuasar-io/kuasar.git
cd kuasar
make all
make install

Note: The Guest OS image needs to be compiled in a container, so containerd or other container runtimes need to be started.


configure cotainerd

The containerd configuration file /etc/containerd/config.toml needs to add three runtimes, namely vmm, quark, wasm:

[proxy_plugins.vmm ]
  type = "sandbox"
  address = "/run/vmm-sandboxer.sock"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.vmm]
  runtime_type = "io.containerd.kuasar.v1"
  sandboxer = "vmm"
  io_type = "hvsock"
[proxy_plugins.quark ]
  type = "sandbox"
  address = "/run/quark-sandboxer.sock"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.quark]
  runtime_type = "io.containerd.quark.v1"
  sandboxer = "quark"
[proxy_plugins.wasm ]
  type = "sandbox"
  address = "/run/wasm-sandboxer.sock"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.wasm]
  runtime_type = "io.containerd.wasm.v1"
  sandboxer = "wasm"

The AppArmor feature is not supported, so disable_apparmor = true also needs to be configured

image-20230707112123151.png


running components

start kuasar

Start kuasar with the following command:

对于vmm: nohup vmm-sandboxer --listen /run/vmm-sandboxer.sock --dir /run/kuasar-vmm &
对于quark: nohup quark-sandboxer --listen /run/quark-sandboxer.sock --dir /var/lib/kuasar-quark &
对于wasm: nohup wasm-sandboxer --listen /run/wasm-sandboxer.sock --dir /run/kuasar-wasm &

Configure containerd environment variables

ENABLE_CRI_SANDBOXES=1 containerd, Sandbox API must set the environment variable ENABLE_CRI_SANDBOXES=1 to take effect

image-20230707142513839.png


run container

According to the documentation, we can directly run the script provided by the code warehouse for testing.

image-20230707143418833.png

Prepare the demo image:

image-20230707144250934.png


run vmm sandbox

bash examples/run_example_container.sh vmm

image-20230707144612228.png


Access to the test container:

image-20230707145248796.png


run quark sandbox

bash examples/run_example_container.sh quark

image-20230707145603676.png

Access to the test container:

image-20230707145724035.png


Run the wasm sandbox

bash examples/run_example_wasm_container.sh

image-20230707152043414.png

View existing pods and clear demos

image-20230707152511018.png


Summarize

As a new generation of container runtime, Kuasar no longer uses the shim v2 interface to manage pods. Instead, Kuasar provides the new generation of container runtime Pod management interface Sandbox API to the container engine. This set of interfaces not only makes the logic clearer, but also supports multiple sandbox access. Each Sandboxer uses its own container isolation technology to manage Pods of the same type. Kuasar will take advantage of the sandbox interface and embrace the industry's latest management interfaces such as DRA (Dynamic Resource Allocation) and CDI (Container Device Interface) to bring more secure, efficient, and convenient container solutions to cloud-native scenarios, and provide cloud Native apps provide greater security.


Reference

Kuasar Githubhttps://github.com/kuasar-io

Kuasar official website : https://kuasar.io


This article participated in the 22nd issue of the Huawei Cloud Community [Content Co-creation] event.

[Content Co-creation] Activity 22 details: https://bbs.huaweicloud.com/blogs/402312

Task 19. [ DTSE Tech Talk technology live broadcast NO.28 Kuasar development hands-on practice when multi-sandbox container runtime ]

Guess you like

Origin blog.csdn.net/qq_41765918/article/details/131768187