Let open source projects from easy to easy to use | Amazon's open source culture

Amazon's leadership principles are the core of Amazon's culture. It is like Amazon's DNA integrated into every important decision-making, deeply affecting every Amazon person, every Amazon customer, partner and every Amazon cloud Builders of technology. At the same time, Amazon's leadership principles have a profound impact on the way Amazon interacts with open source.

The Amazon cloud technology developer community provides developers with global development technology resources. There are technical documents, development cases, technical columns, training videos, activities and competitions, etc. Help Chinese developers connect with the world's most cutting-edge technologies, ideas, and projects, and recommend outstanding Chinese developers or technologies to the global cloud community. If you haven't paid attention/favorite yet, please don't rush over when you see this, click here to make it your technical treasure house!

 

In the first two articles, we respectively introduced how Amazon practices the two leadership principles of " customer first " and " highest standard " in the open source community.

As the end of this series, we want to share with you how we have achieved the leadership principle of "innovation and simplicity" in the open source community.

Innovation and Simplification

Thanks to the freedom of open source, developers have access to thousands of open source projects. But just because you have access to the source code, does that mean you can ride it?

That's not the case - for many open source projects, "running" is required to make it usable. At Amazon Web Technologies, the leadership discipline of "Innovate and Simplify" has a mission—to make it easier for people to use open source code.

Developers tell us that making it easy for them to go open source is the most important thing a cloud provider can do to help them.

Bottlerocket

Project address: GitHub - bottlerocket-os/bottlerocket: An operating system designed for hosting containers

Bottlerocket is a Linux-based operating system built specifically for running containers. We take the Bottlerocket project as an example to introduce Amazon Cloud Technology's efforts in helping developers use open source code more easily.

The original intention of Bottlerocket

In 2014, Amazon Cloud Technology launched Amazon ECS, and at the same time launched a pre-configured, ready-to-use operating system AMI for hosting containers. In 2017, we launched Amazon EKS with a companion optimized AMI. While these two container services are widely used by developers, we listened to their suggestions on ECS-optimized AMI, EKS-optimized AMI and other container-centric operating systems, such as: enhancing security, ensuring The unity of each instance, easy operation and maintenance, etc. At the same time, developers have also told us that they often encounter production scenarios that only run containers on Linux systems, which does not always require a full Linux distribution. They want more of a container-specific Linux system that provides only the necessary packages. At the same time, the lightweight operating system can also reduce the deployment time.

So, on March 10, 2020, Amazon officially launched Bottlerocket. It is not a regular Linux distribution like Ubuntu, Fedora, but a new dedicated operating system for hosting Linux containers. It not only optimizes the time of system deployment and the occupancy of underlying resources, but the most important thing is:

  • A minimalist system will bring strong built-in security: Bottleroket only includes the necessary components for hosting containers, minimizing the attack surface;
  • It uses a read-only file system whose integrity is checked by dm-verity at startup to help prevent rootkit-based attacks;
  • Without SSH, interpreter (such as Python), or Shell, it will be more difficult for attackers to find residence points in the system;
  • By using Bottleroket, developers can consistently apply configuration settings when upgrading or replacing nodes, reducing maintenance overhead and automating workflows.

Bottlerocket Design

safety

The main concepts of security are to reduce the attack surface, verify the software quality and enforce the demarcation of authority boundaries.

Bottlerocket is designed to be lightweight in most cases, so there are very few components included in Bottlerocket, no SSH, no interpreter (such as Python), and no Shell. An added benefit of removing such components is that it will be more difficult for attackers to find a place to stay in the system.

In addition to reducing components, Bottlerocket also controls the attack surface of the operating system through some designs, including: building executable files without location restrictions (PIE), using relocated read-only files (RELRO) links, using Rust and Go, etc. Memory-safe languages ​​and more .

For added security, Bottlerocket also self-authenticates through cryptography. The system is composed of disk images. The system uses dm-verity  for self-verification when starting the image  . Any unexpected changes to the disk image will cause the operating system to fail to start.

Bottlerocket also has its own software updater instead of using the more common Linux package managers directly. All updates to Bottlerocket come from a  codebase that follows The Update Framework (TUF) specification . TUF is effective in mitigating common attack methods against various software code bases in traditional package manager systems. Bottlerocket also uses SELinux in enforcing mode to restrict its own modification, and can even deny operation requests issued by high-privilege containers. SELinux is a set of implementations that enforce Mandatory Access Control (MAC) on the Linux kernel, limiting the specific classes of actions a process can take. Today, Bottlerocket's SELinux restriction policy is very clear, and it can restrict various containers from making unexpected changes to the operating system. Going forward, we hope to expand these strategies to address more and different types of persistent threat activity.

consistency

Ensuring the uniformity of each instance in the cluster is one of the important demands we have heard from developers for the container operating system, and it is also another original intention of Bottlerocket's design.

Bottlerocket enhances consistency in three main ways: image-based updates, a read-only root filesystem, and API-driven configuration .

Image-Based Updates - Today's common general-purpose Linux distributions always have built-in integrated package management systems for software installation and update operations. When managing a large number of hosts, each host may have different versions of different packages installed, and with the wide variety of packages available in the package manager, the combination of packages we install may never have been tested for a match. These pose serious challenges to the management of the consistency of each instance. In contrast, Bottlerocket is a completely different situation - it does not provide a package manager. Bottlerocket uses pre-built images directly, which contain the necessary software for the operating system, and other software solutions such as diagnostic and observability tools. When an update is available, Bottlerocket will download a new full disk image and apply the update with a simple reboot. This image-based deployment method can strictly guarantee the level of consistency: all Bottlerocket hosts in the fleet can run the exact same software, and ensure that each component and its specific version in the Bottlerocket image have undergone rigorous combination testing. Bottlerocket is designed to run containers and ensures consistency with its image-based deployment mechanism. But every use case in a running container is different, and there is no universal software and configuration that can be widely applied. At the same time, Bottlerocket must be able to run normally in different locations (including Raspberry Pi devices) with different orchestration tools (such as Amazon ECS). In order to solve this seemingly "opposite" problem, Bottlerocket designed the "variant" system to meet the differences in requirements and provide different images for different use cases. For example, the variant released by Amazon Cloud Technology for Kubernetes 1.15 is called aws-k8s-1.15. Bottlerocket also includes corresponding tools that allow you to build your own variants based on your needs.

Read-only root filesystem - Unlike traditional Linux distributions, Bottlerocket OS is configured with a read-only root filesystem. Such a design will further improve the level of consistency and reduce variable drift. Applications cannot modify the disk image, nor can they introduce current changes to another host. When Bottlerocket has finished downloading an update and is ready to install it, the update will be written to the secondary partition. After a reboot, Bottlerocket's bootloader will boot from this partition, changing the primary partition and continuing to keep the old version image in the secondary partition. Once an update problem occurs, we can use this mechanism to achieve a quick rollback. Bottlerocket is also equipped with a separate writable part of the file system, which is dedicated to storing persistent user data, such as container images and storage volumes.

Plus API-driven configuration - It is more common practice now to store software configuration in the /etc directory on Linux. Bottlerocket is also compatible with the /etc directory, but exposes it as a temporary memory-based filesystem that undergoes rebuilding on every boot. Except for this part of configuration and a small amount of configuration that may allow applications to change the Bottlerocket body, all other configurations will be implemented through Bottlerocket's open API. The API has a rich range of semantic support, covering structured settings, transaction settings, automatic migrations, and more. Users can make changes interactively by accessing the API through Amazon Systems Manager through the "control" container in Bottlerocket; alternatively, you can make changes directly programmatically. If you run Bottlerocket on an EC2 instance, you can also configure settings through user data in TOML format.

Bottlerocket also has the ability to set some configuration options by itself. Early in the boot process, Bottlerocket sets itself up by automatically generating hostnames, network configuration, etc. When using the asws-k8s-1.15 variant of Bottlerocket, helpers are run to configure specific settings in Kubernetes, such as cluster DNS settings and names of paused container images. You can use the API to override these initial settings, or you can complete the settings through user data in TOML format when working with EC2 instances.

Operability

Although Bottlerocket is a stripped-down operating system designed for containers, it still has many general operating functions. For example, a built-in automatic software update mechanism is integrated with Kubernetes to reduce the impact of service interruption through mechanisms such as monitoring alarms and workload transfer. Another example is Bottlerocket, which provides tools for performing routine administrative tasks (such as changing settings and manually installing software updates) for developers to use in emergencies.

Bottlerocket's update function is implemented by a variety of different components. The first is a set of TUF-based code bases, which contain updated images and signatures to attest to the integrity of the images and the integrity of the code base itself. Second, updog, a hosted tool in Bottlerocket, is used to interact with the codebase and retrieve updates. Updog is able to query for updates and apply them to Bottlerocket immediately. However, it should be noted that updog uses an update strategy based on the "wave" principle by default - the wave mechanism allows updates to be delivered sequentially to different hosts in the cluster during different periods of time, rather than releasing updates to all hosts at the same time. This mechanism can prevent all hosts from trying to update at the same time, which will cause service interruption of the workload in the container; and once a problem is found on the host, the update can be stopped immediately. Each host will be assigned a random wave during the boot process. Of course, users can also specify a specific wave for each host according to their needs.

Bottlerocket's updated functionality enables integration with container orchestration tools. Bottlerocket provides a Kubernetes operator that can be deployed into a cluster to perform updates using updog. This operator will ensure that only one host in the cluster is updated at a time, and will properly handle things like monitoring alerts and offloading before the update is applied.

Since SSH is not installed in Bottlerocket, we need a different mechanism for operating system control, interacting with the API, and switching to admin mode in an emergency. Bottlerocket provides two tools: a "control" container for typical planned maintenance tasks such as changing settings, and an "admin" container for emergency use. The control container will start running when the container starts, which contains the Amazon SSM agent; you can use the AmazonSystems Manager API to interact with it. This control container also contains programs called apiclient for interacting with the Bottlerocket API, and enable-admin-container, a small helper program that automatically makes API calls to start the emergency admin container.

The Admin container is dedicated to handling various emergencies. It starts with full privileges and is not subject to any restrictions other than the applied SELinux profile. The Admin container is based on the Amazon Linux 2 container image, which provides all the tools we need in a general-purpose Linux distribution. SSH is installed and running in the Admin container, and users can use the specified SSH key to complete access through the main network interface of Bottlerocket when the instance starts. In addition, the Admin container also provides a tool called sheltie, which can convert the context of the current workload (Linux namespace) into the host context, helping developers to operate the host within the admin container. Admin containers are not enabled by default and are recommended to be disabled in production deployments of Bottlerocket.

Bottlerocket runs containers that are partly managed by an orchestration tool, and partly run locally - we refer to the latter as "host containers". These host containers also include the previously mentioned control and admin containers.

Bottlerocket uses two separate container runtimes that are equivalent to running two different copies of those containers. There are three reasons for this approach:

  1. According to the SELinux configuration requirements, the orchestration container and the host container may have different independent security requirements from each other;
  2. We can start the orchestration container through a runtime different from the host container (such as Docker or CRI-O);
  3. The orchestration container and the host container can have fault domains that do not interfere with each other, ensuring that when a configuration change or failure occurs when a container is running, the other past tense will not be affected.

Bottlerocket includes control containers by default, but admin containers can only be added manually if necessary. Of course, you can also use the host container system to run your own diagnostics, operations, and management tools on Bottlerocket.

Bottlerocket's Evolution and Future

Bottlerocket is a completely open source operating system consisting of existing open source components (such as the Linux kernel), software packages, and new components written specifically for Bottlerocket (mainly written in Rust and Go). All open source components used in Bottlerocket comply with their original licenses, and components developed specifically for Bottlerocket also follow similar license agreements. Developers can choose Apache 2.0 license or MIT license according to their needs.

Welcome to the Bottlerocket family! Check out our GitHub repository, participate in discussions through issues, and contribute to the project by submitting merge requests.

Bottlerocket project address: GitHub - bottlerocket-os/bottlerocket: An operating system designed for hosting containers

We have also compiled a series of articles related to Bottlerocket, welcome to read:

Firecracker

Project address: GitHub - firecracker-microvm/firecracker: Secure and fast microVMs for serverless computing.

Developers told us that when all containers must use a shared operating system kernel, existing containers cannot achieve sufficient isolation between applications, and security problems are difficult to solve.

We listened to the ideas of developers, and based on this, launched the open source project of Firecracker.

Firecracker is an open source virtual machine monitor (VMM) based on Linux Kernel Virtual Machine (KVM) technology. Lightweight micro virtual machines (microVMs) can be started in less than a second in a non-virtualized environment, taking full advantage of the security provided by traditional virtual machines to achieve isolation between workloads. And the resource efficiency that containers bring.

The original intention of Firecracker development

Safety, always a top priority for Amazon ! Firecracker adheres to the minimalist design principle, which only includes the components needed to run a secure and lightweight virtual machine. At every point in the design process, optimize Firecracker for security, speed, and efficiency. For example:

  • Only relatively new Linux kernels need to be booted, and only kernels compiled with a specific set of configuration options (more than 1000 kernel compilation configuration options);

  • Also, no graphics cards or accelerators of any kind are supported, hardware passthrough is not supported, and (most) legacy devices are not supported. Firecracker starts with minimal kernel configuration, does not rely on emulated BIOS, and does not use full device mode. The only devices are a paravirtualized network card and a paravirtualized hard disk, and a one-button keyboard (the reset pin is used when there is no power management device). This minimalist device model not only helps reduce boot time, but also reduces the attack surface, thereby improving security. For more information on  Firecracker's promise to support executing container and serverless workloads with extremely low overhead , check out the documentation:

  • Firecracker is implemented in the Rust language to ensure thread and memory safety, prevent buffer overflows, and many other types of memory safety issues that can lead to security holes.

Firecracker is in turn considered lightweight virtualization for serverless computing. In 2014, Amazon Cloud Technology launched Amazon Lambda, focusing on providing developers with a secure serverless experience so that they do not have to manage infrastructure. For ideal isolation, a dedicated EC2 instance is used for each customer. This approach achieves your security goals, but you have to make some trade-offs when managing Lambda behind the scenes.

With the rapid development and widespread adoption of serverless technology, its advantages extend to containers, such as Amazon Fargate. How to further improve the efficiency of serverless and container operations is a new problem for developers. For Amazon, adhering to the principles of "innovation" and "simplification", we asked ourselves: what should a virtual machine design for today's container and function world look like?

In 2018 we launched our open source product Firecracker. Unlike Docker containers or language VMs like the JVM, Firecracker is a lightweight virtualization dedicated to serverless applications. Firecracker allows the creation of tiny virtual machines, or microVMs. It contains only the components needed to run a secure, lightweight virtual machine. Firecracker microVMs improve efficiency and utilization with extremely low memory overhead < 5MB per microVM. This means that thousands of microVMs can be packed into a single virtual machine. Developers can use in-process rate limiters to achieve fine-grained control over how network and storage resources are shared, even across thousands of microVMs. All hardware computing resources can be safely oversubscribed, maximizing the number of workloads that can run on a host.

In summary, the development of Firecracker stems from the following guiding principles:

  • Built-in Security: Provides a compute security barrier that supports multi-tenant workloads and cannot be mistakenly disabled by customers. User workloads are considered both sacrosanct (inviolable) and evil (should be kept out);

  • Lightweight virtualization: Emphasize transient or stateless workloads over long-running or persistent workloads. The hardware resource overhead of Firecracker is clear and guaranteed;

  • Functional minimalism: No functionality is built that is not explicitly required by the specific task. Implement only one of each function;

  • Open Source: Firecracker is an active open source project, open to everyone, all hardware computing resources can be safely oversubscribed, and looks forward to collaborating with contributors from all over the world.

Design of Firecracker

Firecracker runs on Linux hosts and Linux guest OS (see  Kernel Support Policy for a complete list of currently supported kernel versions ). In production, Firecracker is started as a jailer binary (see Sandboxing for more details ). Before the process starts and executes InstanceStart, the user configures the microVM by interacting with the Firecracker API. Firecracker simulates the network device of the virtual machine through the TAP module on the host, and simulates the block storage device of the virtual machine through the file system of the host.

A host instance running the Firecracker microVM

As a lightweight virtual machine, each Firecracker process encapsulates one and only one microVM. This means that there is only one application container running inside each microVM, or only one application Pod. Therefore, virtualization-level isolation is achieved between different applications of different tenants, and each application no longer shares the HostOS kernel, but has an independent GuestOS Linux kernel. At the same time, Firecracker runs on ordinary Linux, and each microVM runs as a KVM process, and the operating system is responsible for process scheduling. For the Firecracker processes running on different microVMs on the host machine, Firecracker starts in jailer mode through static links, and implements inter-process security isolation through CGroups and Seccomp BPF, providing comprehensive isolation guarantees.

As a lightweight virtual machine, Firecracker only includes the following features:

  • VirtIO-based network, disk and socket drivers (virtio-net, virtio-blk, virtio-vsock), respectively used for network and disk IO access of microVM, and AF_VSOCK socket on microVM and AF_UNIX socket of the host Socket communication, so that the application running in the microVM can transfer the vhost kernel code to the host computer;

  • Programmable interval timer;

  • KVM clock;

  • Serial terminal console (eg /dev/ttyS0);

  • A keyboard containing only the power off key. Among them, the VirtIO driver is a mainstream paravirtualized IO driver method, which reduces the number of switching between user mode and kernel mode to improve efficiency, and is also the most commonly used IO driver method in QEMU-KVM.

The evolution and future of Firecracker

As an open source project, Firecracker not only attracted widespread attention from the community, but also incubated several container projects based on the Firecracker runtime. Open source has also greatly broadened the usage scenarios of Firecracker, a general-purpose lightweight virtual machine. In addition to Amazon Cloud Technology's serverless computing products, it can also run containers in local environments other than Amazon Cloud Technology.

Take the community's  firecracker-containerd project as an example, through which the standard RunC container can run inside the microVM, and GeustOS containers can be managed through containerd on HostOS. In this way, it is seamlessly compatible with OCI (Open Container Initiative) standard specifications, image formats and management tools, which makes it possible to adapt the security container solution based on Firecracker virtualization technology isolation with only a small amount of modification to the mainstream container operation scheduling framework of Docker and Kubernetes . This brings out the ecological charm of the community dominated by open source culture in the container field.

In addition, another project with similar functions to firecracker-containerd is Kata Containers led by the OpenStack community  . After version 1.5, this project has added support for Firecracker VMM (previously only supported QEMU and NEMU as VMM) , Kata's own runtime, like RunC, also complies with OCI and CRI standards, and can be directly supported by Docker, Kubernetes, and OpenStack at the same time.

Under the guidance of Amazon's innovative and simplified leadership guidelines, we will continue to make unremitting efforts to reduce the cost of using open source and improve the quality of open source products.

Welcome to continue to pay attention to the WeChat official account of Build On Cloud, share your favorite articles, and communicate with more developers about new technology trends and cloud development trends!

Past recommendation

 

 

Author Zheng Yubin

Senior developer evangelist of Amazon Cloud Technology, with 20 years of experience in the ICT industry and digital transformation practice, focusing on the field of cloud native and cloud security technology of Amazon Cloud Technology. With 18 years of experience as an architect, he is committed to providing consulting and technical implementation of data center construction and software-defined data center solutions for finance, education, manufacturing, and Fortune 500 corporate users.

 Article source: https://dev.amazoncloud.cn/column/article/63e5a7a62dfc3e07190484fa?sc_medium=regulartraffic&sc_campaign=crossplatform&sc_channel=CSDN

Guess you like

Origin blog.csdn.net/u012365585/article/details/131484084