NKD: container cloud cluster and OS integrated operation and maintenance tool

NKD is the abbreviation of NestOS-kubernetes-Deployer, which is a solution prepared for the operation and maintenance of Kubernetes clusters deployed based on NestOS. Its goal is to provide services such as deployment, update and configuration management of the cluster infrastructure (including the operating system and Kubernetes basic components) outside the cluster.

1 Introduction

As the de facto standard for container cloud scenarios in the cloud-native field, Kubernetes greatly simplifies the deployment and management of containerized applications with its excellent functions and flexibility. Container technology encapsulates the required operating environment and realizes decoupling from the underlying operating system, while Kubernetes further realizes decoupling from the underlying operating nodes, enabling seamless deployment of applications in different cloud providers and environments. However, this decoupling also brings new challenges. With the wide application of Kubernetes, the complexity of operation and maintenance has gradually become prominent. Maintaining a Kubernetes cluster requires a lot of technology and resources, and operation and maintenance personnel must be familiar with complex Kubernetes configuration and management to ensure the stable operation of the cluster. To solve this problem, the operation and maintenance personnel mainly devote their energy to the operation and maintenance management of the cluster itself, hold a conservative attitude towards the underlying operating system and Kubernetes basic components, and avoid frequent updates. However, considering the support of security updates and features of new versions of Kubernetes, it is imperative to update the underlying infrastructure components. Therefore, a solution is needed to simplify the operation and maintenance of the underlying infrastructure and make it easier for the operation and maintenance to manage and update the infrastructure of the cluster.

2. NestOS cloud base operating system

NestOS is a cloud base operating system incubated by the CloudNative sig group in the openEuler community, focusing on providing the best container host for large-scale cluster deployment environments. Traditionally, when a general-purpose operating system is used as the cluster infrastructure, an enterprise-level system management platform such as Red Hat Satellite or SUSE Manager is an ideal solution. These platforms provide software package management, configuration management, patch update and other functions to facilitate the operation and maintenance of large-scale clusters. However, in the process of upgrading and managing the operating system, abnormal conditions such as network or power supply may be encountered, causing some nodes to be in an unstable intermediate state. In addition, these platforms are limited to supporting the management scope covered by themselves, and it may not be possible to detect and correct in time the environment inconsistency caused by temporary maintenance by operation and maintenance personnel or temporary debugging by developers in the cluster. To solve these problems, NestOS adopts an operating system packaging solution based on rpm-ostree technology to realize atomic updates and avoid the existence of intermediate states. Even if there is a failure during the upgrade process, it can quickly roll back to the previous stable state to ensure system stability. sex. In daily operation, NestOS takes security measures to ensure system stability. Key directories are set to read-only status, core system files and configurations will not be accidentally modified, and important system configurations are imported and solidified through the ignition mechanism to ensure configuration persistence and consistency. This allows the operating system to always maintain the expected state, reducing unexpected errors. The core idea of ​​NestOS is similar to the image in container technology, aiming to create an immutable infrastructure at the operating system level. The OS version and configuration are frozen at deployment time, making it an immutable entity. This practice helps eliminate operating system variants and ensures consistent versions and configurations of underlying components, thereby providing a stable and reliable infrastructure environment. Using NestOS in a cluster can bring higher consistency and reliability, simplify and efficiently manage and maintain the operating system. pictureMutable Infrastructure vs Immutable InfrastructureHowever, while the introduction of NestOS simplifies cluster operation and maintenance in principle, it also brings some new challenges in practice. For example, as a new operating system form, NestOS requires operation and maintenance personnel to be familiar with and understand its core ideas. To accept any changes, the system image needs to be rebuilt on the CICD pipeline instead of logging into the system for operation. Operating system changes need to be rebooted, etc. and general operations Different usage habits of the system for effective management and maintenance. In addition, NestOS is different from traditional operating systems and needs to adapt to new tools and technologies, specifically as follows:

  1. "System image construction toolchain" : All business-related components and configuration changes of NestOS need to rebuild a new version of the system image. Automated testing tools to verify the reliability of mirroring basic components.
  2. "Environment-related dynamic configuration files" : NestOS injects environment-related dynamic configurations (such as login credentials, network, external storage, etc.) during the deployment phase through the ignition mechanism. Therefore, operation and maintenance personnel need to write configuration files in advance and provide them through NestOS. tools to convert it into a machine-readable ign file for use during operating system deployment.
  3. "System image update source" : NestOS supports direct download of the latest version of the current version tree through rpm-ostree, but operation and maintenance personnel are required to maintain and manage the system image update source, and correspondingly need to learn the corresponding deployment tools. The new version of the system image can also be distributed in the form of a container image to reuse existing CICD resources, but the incremental update method cannot be applied, and the complete system image needs to be downloaded for each update.
  4. "System Updates and Upgrades Management" : The tools above prepare for system updates, but do not address the question of when the system can be upgraded. Although NestOS provides the zincati component to implement basic update policy management, the best practice should still be to perform update and maintenance operations on nodes in the time window allowed by the business in combination with the current running business status of the operating system. For container cloud scenarios, NestOS provides a housekeeper service based on the operator mechanism. After the container business running on the node is expelled, it can be upgraded. Users can also formulate more complex upgrade and maintenance strategies based on this.

Accepting the new way of using the operating system requires mature cases and time accumulation, but the operation and maintenance challenges introduced can be solved through convenient operation and maintenance tools. In this context, NestOS-kubernetes-Deployer emerged as the times require, aiming to provide services such as deployment, update, and configuration management outside the cluster, and to improve the NestOS operation and maintenance experience for operation and maintenance personnel.

3. NKD's assistance to container cloud operation and maintenance scenarios

NKD is a solution for kubernetes cluster scenarios, which mainly simplifies the cluster deployment and upgrade process, as follows:

  1. "Creation of cluster infrastructure" : NKD connects to infrastructure providers to dynamically create the required IaaS resources according to cluster requirements, supports bare metal and virtualization scenarios, and currently prioritizes the implementation of openstack scenarios.
  2. "Operating system image construction" : NestOS provides a complete image construction tool chain, which can be easily integrated into the user's existing CICD process, and it is convenient to quickly build a custom image based on openEuler official or user-defined software sources. NKD currently obtains the image build results and applies the corresponding update source configuration to the cluster. In the future, NKD will support the managed image build process.
  3. "Dynamic Configuration Injection" : When deploying the NestOS system, it is necessary to pass in the dynamic configuration required after system deployment through the ignition mechanism. NKD currently provides a tool form that supports converting user configurations to ign files via command line parameters or configuration files. The ultimate goal is to provide a user-friendly front-end configuration interface that facilitates easy generation of required configurations and provides configuration change version management functions.
  4. "Kubernetes cluster deployment" : This is also the core capability of NKD. It automatically merges the configuration required for k8s cluster deployment into the ign file generated by the user configuration, so that the node automatically starts to create a k8s cluster after the deployment completes the operating system boot, without manual intervention.
  5. "Cluster status detection and housekeeper deployment" : NKD continuously detects the k8s cluster status. Once the cluster is created, it provides users with access credentials and deploys housekeeper custom resources for subsequent maintenance and upgrade functions. Users can choose not to deploy this CRD by default.
  6. "Upgrade and maintenance of operating system or k8s basic components" : When the operating system or k8s basic components need to be upgraded and maintained, NKD uses the image building tool to rebuild a new version of the system image, and after querying the new version of the image, it creates a housekeeper CR resource for the cluster. The housekeeper service in the cluster upgrades the cluster nodes one by one according to the configuration, and completes the upgrade of the entire cluster.

Through the above overall solution, users can complete the cluster creation and subsequent update work only through NKD with one click, without manual cumbersome steps, simplifying the operation and maintenance steps. Next, we will give a brief introduction to the NKD architecture and future plans.

4. Complete planning of NKD

pictureNKD overall architecture and cluster interaction panorama The overall architecture of NKD consists of multiple components, mainly including NKDS (NestOS-kubernetes-deployer-service) as the main body, HKO (housekeeper operator) deployed in the cluster, and the installer integrated in the NestOS image . In addition, it can also cooperate with NestOS image building tool chain, configuration management warehouse (such as git) and privatized deployment container image warehouse to jointly complete cluster operation and maintenance tasks. At present, NKDS is provided as a command line tool, and does not provide external http interface and front-end configuration page for the time being, but the infrastructure management, configuration management, system image management, certificate management, health detection and other modules required by the main functions have been initially formed. HKO mainly includes cluster-oriented HKO components and HKD (housekeeper daemon) components integrated in the NestOS image. At present, the installer component is responsible for deploying and creating K8S clusters during the system ignition phase. In the future, it is planned to integrate its functions into the HKD component to make the overall solution more streamlined and easier for users to manage the required K8S basic components according to individual needs. The ultimate goal of NKD is to provide operation and maintenance services in the form of long-term resident services, while supporting the management of multiple clusters. It will provide functions such as persistent configuration change records, certificate management, multiple update and upgrade strategies, and mirror source channels. In the future, we will continue to optimize the functions and performance of NKD, and introduce more intelligent features, such as automatic fault handling and resource optimization. Our goal is to make NKD a core component in the NestOS ecosystem, provide comprehensive support for the operation and maintenance work in the cloud-native scenario, and further promote the development and application of cloud-native technology.

5. Summary

As an operating system specially designed for cloud-native scenarios, NestOS has brought great assistance to container cloud operation and maintenance, and NKD is a solution for newly introduced problems in NestOS operation and maintenance. Through continuous optimization and innovation, NKD will make it easier for the industry to accept immutable infrastructure represented by NestOS. At the same time, with the development of the Kubernetes community, more innovative solutions will drive cloud native technology to a more mature and sustainable future. Welcome to visit the NestOS project official website (https://nestos.openeuler.org/) and the NKD project homepage (https://gitee.com/openeuler/nestos-kubernetes-deployer) for exchanges and discussions.

Guess you like

Origin blog.csdn.net/openEuler_/article/details/132182762