Effectively and reliably manage large-scale cluster Kubernetes

Author | ants gold dress senior technical engineer Cang desert # Preface Kubernetes its advanced design concepts and excellent technical architecture, come out on top in the field of container arrangement. More and more companies began to deploy in a production environment Kubernetes practice, Alibaba and ants gold dress Kubernetes has been extensively used in the production environment. Kubernetes makes the emergence of the majority of students can develop complex distributed systems operation and maintenance, which greatly reduces the threshold container application deployment, but a production-level high-availability cluster Kubernetes still very difficult operation and maintenance and management. This article will share ants gold dress is how to effectively and reliably manage large-scale Kubernetes cluster, and will detail a cluster management system core components. # [] (Https://developer.alipay.com/article/9184?from=lyj#1) System Overview Kubernetes cluster management system needs to have a convenient cluster lifecycle management capabilities to create a complete cluster node upgrade and work management. In large-scale scenario, the cluster change controllability is directly related to the stability of the cluster, the management system monitoring, gray, rollback ability is one of the key system design. In addition, the ultra-large-scale cluster, the number of nodes in the order has reached 10K, node hardware failure, abnormal components and other issues will appear normal. For large-scale cluster management systems early in the design will need to take full account of these abnormal scenes, and be able to recover from these exceptions from the scene. ## Design Patterns Based on this background, we designed a cluster management system for the final state. Detecting a current state of the cluster system timing, determines whether or coincides with the target state, when inconsistency, Operators initiates a series of operations, the cluster reaches the target drive state. The common control theory design reference negative feedback closed loop control system, closed loop system, the system can effectively resist external interference, in our scenario, the interference nodes corresponding to hardware and software failures. ! [file] (https://img2018.cnblogs.com/blog/1411156/201908/1411156-20190830112950348-62877794.jpg) ## [] (https://developer.alipay.com/article/9184?from= lyj # 3) architecture! [file] (https://img2018.cnblogs.com/blog/1411156/201908/1411156-20190830112950631-1659168133.jpg) above, a cluster element Kubernetes cluster is highly available for N traffic management Master node cluster. Business services cluster is a production business Kubernetes cluster. SigmaBoss is a cluster management portal, to provide users with a convenient interface and a controlled change processes.
Cluster-Operator yuan cluster deployment provides a business cluster to create a cluster, delete, and update capabilities, Cluster-Operator for the final state design, when the business cluster Master node or component exception, will automatically isolate and repair to ensure that the business cluster Master node stable final state. This program uses Kubernetes management Kubernetes, we called Kube on Kube program, referred to as KOK program.
Business cluster deployed Machine-Operator and node failure self-healing component is used to manage the business work node cluster, a node add, remove, upgrade and troubleshooting capabilities. The ability to provide a single node Machine-Operator holding the final state, the cluster constructed gradation change dimensions and ability to rollback SigmaBoss. # [] (Https://developer.alipay.com/article/9184?from=lyj#4) core components ## [] (https://developer.alipay.com/article/9184?from=lyj#5) final state holding cluster based K8S CRD, defines Cluster CRD final states to describe the services in the cluster element clusters, each a business cluster corresponds to a cluster resource, create, delete, update cluster resources corresponding to achieve business clusters to create, delete, and update. Cluster-Operator watch Cluster resources, business drive assembly reaches the final state of the cluster Cluster Master resource description.
Business Master component version cluster centrally maintained in ClusterPackageVersion CRD in, ClusterPackageVersion resource records Master components (such as: api-server, controller-manager , scheduler, operators , etc.) of the image, the default information start parameters. Cluster resource associated with a unique ClusterPackageVersion, modify the Cluster CRD recorded ClusterPackageVersion version to complete the Components Business Cluster Master release and rollback. ## [] (https://developer.alipay.com/article/9184?from=lyj#6) holding the final state node management tasks Kubernetes node cluster work are: - a node configuration system, kernel patch management - Docker / kubelet components such as installation, upgrade, unloading - the final state and the node status management schedule (e.g., a key deployment is complete before allowing opening DaemonSet scheduling) - to achieve the above self-healing node management tasks defined in the service cluster Machine CRD to working node described in the final state, each node corresponds to a working resource Machine, Machine resources by modifying the management node. Machine CRD is defined as shown below, spec described components need to install node name and version, status recorded in each of the current operating state of the component mounting working node. In addition, Machine CRD also provides a plug-in end state management capabilities, for cooperating with the other nodes management Operators, this branch is described in detail later. ! [File] (https://img2018.cnblogs.com/blog/1411156/201908/1411156-20190830112950965-429669031.jpg) component version management node on the work done by the MachinePackageVersion CRD. MachinePackageVersion maintenance information rpm versions of each component, configuration and installation methods. Machine resources are associated with a different N MachinePackageVersion, mounting a plurality of components to implement. In Machine, MachinePackageVersion CRD based on the design and implementation of the final state node controller Machine-Operator. Machine-Operator watch Machine resources and interpret MachinePackageVersion, operation and maintenance operations performed on the driving node to a final node state and the final state of continuous guard. ## [] (https://developer.alipay.com/article/9184?from=lyj#7) final state management node as business demands, no longer limited to the management node installed docker / kubelet like components, we need to implement such as waiting for log collection DaemonSet deployment is complete before they can open the scheduling needs, and such requirements become more and more. If the final state unified by the Machine-Operator management, is bound to increase the Machine-Operator and coupling other components, and scalability of the system will be affected. Therefore, we have designed a set of mechanisms for the management of the final state node to coordinate Machine-Operator and the other node operation and maintenance Operators. Design as shown below:! [File] (https://img2018.cnblogs.com/blog/1411156/201908/1411156-20190830112951284-627124307.jpg) the total amount ReadinessGates: recording node may need to check the schedule list Condition
Condition ConfigMap: each node operation and maintenance of the status reporting ConfigMap Operators final state
association: 1. Operators external node operation and maintenance associated with the detection and report their data to the sub-final state corresponding for condition condition ConfigMap; 1. Machine-Operator Gets all child nodes associated by tag the ConfigMap condition final state, and to synchronize conditions Machine status in accordance with a listing 1. Machine-Operator condition ReadinessGates total amount recorded, the node checks whether the end state, the node does not reach the final state is not turned on schedule ## node failure self-healing we all know that there is a certain probability of failure physical machine hardware, with the increase in the size of the nodes in the cluster, the cluster node will fail normal, if not promptly repair on the line, this part of the resources of the physical machine will be idle . To solve this problem, we have designed a set of fault detection, isolation, closed-loop self-healing system repair. As shown below, aspects of fault detection, reporting and monitoring Agent take the form of active detection systems combining ensure real-time and reliability of the fault detection (Agent better real-time reporting, active detection monitoring system may cover no abnormality reporting Agent Scenes). Failure information is stored in a unified event center, concerned about the failure of a component or system cluster can subscribe Event Center Events get these error messages. ! [File] (https://img2018.cnblogs.com/blog/1411156/201908/1411156-20190830112951772-1556384542.jpg) node failure self-healing system creates different maintenance procedures according to the type of fault, such as: hardware to maintain flow, reinstall the system processes. After the repair process priority will isolate the failed node (node ​​scheduling pause), then the Pod node marked with labels to be migrated to notify PAAS or MigrateController were Pod migration, complete these pre-operation, it will try to recover the node (hardware maintenance, heavy equipment operation systems, etc.), successfully repaired node will re-open the schedule, the long-term is not automatically restored node investigation handled by manual intervention. ! [File] (https://img2018.cnblogs.com/blog/1411156/201908/1411156-20190830112952091-945279333.jpg) ## [] (https://developer.alipay.com/article/9184?from= lyj # 9) in risk prevention capability atomic basis Machine-Operator provided on the system design and implementation of the grayscale change and the ability to roll back the cluster dimension. In addition, in order to further reduce the risk of change, Operators will carry out a risk assessment when initiating real change, architecture diagram below. ! [File] (https: //img2018.cnblogs. com / blog / 1411156/201908 / 1411156-20190830112952641-128298192.jpg) high-risk change operations (eg: deleted node, reinstall the system) limiting access unified center, the center maintains a current limiting limiting policies for different types of operations, If the trigger limit, the fuse is changed. In order to assess the change process is normal, we will be before and after the change, for each component health checks, health checks, although the components can be found most unusual, but can not cover all the exceptions scene. Therefore, the risk assessment process, the system will get a cluster business metrics (such as: Pod create success rate) from the center of the event, the monitoring system, if an exception occurs indicators, automatic fuse change. # [] (Https://developer.alipay.com/article/9184?from=lyj#10) Conclusion This article and share a core design stage ants gold dress Kubernetes Cluster Management System, a large number of core components for use Operator final state design pattern. In the future we will try to change the cluster size is switched to the Operator for the final state design pattern, to explore how the final state in the face of the model, so that changes can be monitored, and gray can be rolled back, unattended change.

Guess you like

Origin www.cnblogs.com/alisystemsoftware/p/11434085.html