Best practices for high-availability architecture in cloud-native scenarios

Author: Liu Jiaxu (nick name: Jiaxu), Alibaba Cloud container service technical expert

introduction

With the rapid development of cloud native technology and its in-depth application in the field of enterprise IT, high-availability architecture in cloud native scenarios has become increasingly important for the availability, stability, and security of enterprise services. Through reasonable architectural design and cloud platform technical support, cloud-native high-availability architecture can provide high availability, elastic scalability, simplified operation and maintenance management, improved reliability and security, and other advantages, providing enterprises with more reliable and efficient services. application operating environment.

Kubernetes is one of the core technologies of cloud native. It provides container orchestration and management capabilities, including infrastructure automation, elastic scalability, microservice architecture and automated operation and maintenance. Therefore, the application high availability architecture of Kubernetes is cloud native and highly available. cornerstone. This article will take Alibaba Cloud Container Service for Kubernetes (ACK) as an example to introduce the best practices for high-availability architecture and governance of applications based on ACK.

Application high availability architecture design

The high-availability architecture design of cloud-native applications is an important prerequisite for high-availability development, deployment, and governance of applications. It can be considered from the following aspects:

1. Cluster design: The components and nodes of the cluster control plane and data plane are deployed with multi-node and multi-copy high availability to ensure the high availability of the K8s cluster. Taking ACK as an example, it provides cluster high availability capabilities covering the control plane and data plane. On the control plane, the control plane components of the ACK Pro managed version cluster are deployed across availability zones using multiple copies and are automatically elastic based on the load pressure on the control plane; the ACK proprietary version can be configured with 3 or 5 master nodes. On the data side, ACK supports users’ ability to choose to deploy and add nodes across availability zones and ECS deployment sets.

2. Container design: Applications are deployed with multiple copies in the cluster, and application copies are managed based on Deployment, Statefulset, and OpenKruise CRD to achieve high availability of applications; automatic elastic policies are configured for applications to cope with dynamic changes in load. In the scenario of multi-copy Pods, depending on whether there are copies in the primary and secondary roles, it can be divided into primary and secondary high availability and multi-active high availability.

3. Resource scheduling: Use the K8s scheduler to achieve application load balancing and failover. Use labels and selectors to specify the application's deployment node range, and use affinity, anti-affinity, and topology scheduling constraint rules to control the application's scheduling strategy to implement Pods by node, availability zone, deployment set, topology domain, etc. Different categories of high availability.

4. Storage design: Use persistent storage to save application data, such as K8s persistent volumes to mount storage, etc. to avoid data loss. For stateful applications, use StatefulSet to manage replicas and storage volumes of stateful applications.

5. Fault recovery: Use the automatic recovery mechanism of K8s to handle application faults. You can use health checks and automatic restarts to monitor the health of your application and automatically restart or migrate your application in the event of an application failure.

6. Network design: Use the service discovery and load balancing functions of K8s to implement application network access, and use Service and Ingress to expose application services.

7. Monitoring and alarming: Use K8s monitoring and alarming systems (such as Prometheus, Thanos, AlertManager, etc.) to monitor the running status of the application and discover and handle faults in a timely manner.

8. Full-link high availability design: Full-link high availability refers to the high availability of all involved components, modules, services, networks, and other links in cloud-native application systems and services. Full-link high availability is a comprehensive consideration that requires comprehensive consideration and implementation from the architectural design of the entire system, component selection and configuration, service deployment and operation and maintenance. At the same time, appropriate compromises and trade-offs need to be made based on specific business needs and technical requirements.

In short, designing a high-availability architecture for K8s applications requires comprehensive consideration of factors such as clusters, containers, resource scheduling, storage, fault recovery, network, and monitoring alarms to achieve reliable and stable application high-availability architecture and functions. High-availability transformation of existing systems can also be implemented by referring to the above principles.

The following will introduce the various high-availability technologies provided by Kubernetes based on the above design principles, as well as related product implementations based on ACK.

K8s high availability technology and its application in ACK

Kubernetes provides a variety of high-availability technologies and mechanisms to ensure high availability of clusters and applications. Including topology distribution constraints, PodAntiAffinity, container health check and automatic restart, storage redundancy and persistence, service discovery and load balancing, etc. These technologies and mechanisms can help build highly available Kubernetes clusters and applications and provide stable and reliable services, which will be introduced below.

3.1 Control plane/data plane is highly available in multiple availability zones

Clustering achieves high availability by deploying control plane and data plane nodes/components in different availability zones. It is an important high-availability architecture design method. Availability Zones are logically independent data centers provided by a cloud provider within a geographical area. By deploying nodes in different Availability Zones, you can ensure that the cluster can continue to provide services even if one Availability Zone fails or becomes unavailable.

The following are the key points of the high-availability design of K8s cluster nodes based on availability zones, which can be used to implement high-availability configuration of control plane/data plane nodes based on availability zones in Kubernetes:

  • Multiple Availability Zone options

    Choose multiple availability zones supported by your cloud vendor for your cluster to reduce the risk of single points of failure.

  • Deploy control plane nodes/components

    Deploy control plane nodes and components (such as etcd, kube-apiserver, kube-controller-manager, kube-scheduler) in different availability zones, and use the cloud vendor's load balancer or DNS resolution to distribute traffic to the backend. Configure a highly available etcd cluster using multiple etcd nodes and distribute them in different availability zones. This ensures that even if one of the etcd nodes fails, the cluster can still continue to work.

  • Deploy data plane nodes

    Distribute data plane nodes in multiple availability zones to ensure that Pods can be scheduled to different availability zones to achieve cross-availability zone high availability.

  • Monitoring and Auto-Recovery Configure monitoring systems to monitor the health of the cluster and set up auto-recovery mechanisms for automatic failover and recovery in the event of node or Availability Zone failure.

By deploying control plane nodes/components and data plane nodes in multiple availability zones, the availability and fault tolerance of the Kubernetes cluster can be improved, ensuring that the cluster can continue to provide services even when a single availability zone or node fails.

ACK provides cluster high availability capabilities covering the control plane and data plane. ACK container service uses the Kubernetes on Kubernetes architecture to host user Kubernetes cluster control plane components, including etcd, API Server, Scheduler, etc. Multiple instances of each control plane component are deployed and managed using a high-availability architecture. If the number of Availability Zones in the region where the ACK Pro cluster is located is 3 or more, the SLA of the ACK Pro managed cluster control plane is 99.95%; if the number of Availability Zones in the region where the ACK Pro cluster is located is 2 or less, the SLA of the ACK Pro managed cluster control plane is The SLA is 99.50%.

ACK container service is responsible for the high availability, security and elastic expansion and contraction of hosting components. ACK Pro cluster provides complete observability capabilities for managed components, helping users monitor and alert cluster status.

The following introduces common technologies for controlling high-availability fragmentation by availability zones and their use in ACK scenarios. They are widely used in high-availability scenarios on the data plane.

3.1.1 Topology Spread Constraints

Topology Spread Constraints is a function that manages Pod distribution in a Kubernetes cluster. It ensures that Pods are evenly distributed across different nodes and availability zones to improve application high availability and stability. This technology applies to workloads Deployment, StatefulSet, DaemonSet and Job/CronJob.

By using topological distribution constraints, you can set the following configuration to control the distribution of Pods:

  • MaxSkew

    Specify the maximum deviation value of Pods in each topological domain. Topology domains can be nodes, availability zones, or other custom topology domains. The maximum deviation value is an integer that represents the maximum difference between Pods in any two topological domains of the cluster. For example, if MaxSkew is set to 2, the difference in the number of Pods in any two topological domains must not exceed 2.

  • TopologyKey

    Specifies the key used to identify the label or annotation of the topological domain. You can use node labels (such as node.kubernetes.io/zone), or custom labels or annotations.

  • WhenUnsatisfiable

Specifies the action to take when topological distribution constraints cannot be met. The operations that can be selected are DoNotSchedule (not scheduling Pods), ScheduleAnyway (forcing Pods to be scheduled), and RequireDuringSchedulingIgnoredDuringExecution (requiring topology distribution constraints, but not enforcing them).

By using these configurations, you can create policies with topological distribution constraints to ensure that Pods are deployed in the cluster according to the desired topological distribution. This is useful for applications that need to evenly distribute load across different Availability Zones or nodes to improve reliability and availability.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: app-run-per-zone
spec:
  replicas: 3
  selector:
    matchLabels:
      app: app-run-per-zone
  template:
    metadata:
      labels:
        app: app-run-per-zone
    spec:
      containers:
        - name: app-container
          image: app-image
      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: "topology.kubernetes.io/zone"
          whenUnsatisfiable: DoNotSchedule

In the above example, a topology distribution constraint is created with maxSkew set to 1, topologyKey set to "topology.kubernetes.io/zone", and whenUnsatisfiable set to DoNotSchedule. This means that Pods are forcibly dispersed by availability zones according to the topology domain defined by the node's label "topology.kubernetes.io/zone" (corresponding to the cloud vendor's availability zone), and the maximum difference in the number of Pods between topology domains is 1. In this way, Workload Pods can be dispersed and scheduled according to availability zones as much as possible.

The ACK K8s cluster control plane adopts a multi-availability zone high-availability architecture by default. Further, we need to implement a multi-availability zone architecture for the data plane. The data plane of the ACK cluster consists of node pools and virtual nodes. Each node pool is a group of ECS instances. Through the node pool, users can manage, expand and shrink the nodes, and perform daily operations and maintenance. Virtual nodes can provide a serverless container running environment through elastic container instances ECI.

Behind each node pool is an elastic scaling group (ESS), which supports manual expansion and automatic expansion of nodes. ACK node pool supports deployment sets, which can disperse and deploy ECS instances in the same deployment set on different physical servers, thereby avoiding the downtime of multiple ECS instances due to the failure of one physical machine. ACK also supports multi-AZ node pools: During creation and operation, multiple vSwitches spanning different AZs can be selected for the node pool. And select the balanced distribution strategy , so that ECS instances can be evenly distributed among the multiple availability zones specified by the scaling group (that is, specifying multiple VPC switches). If the availability zones become unbalanced due to insufficient inventory or other reasons, you can perform a balancing operation to balance the availability zone distribution of resources.

Based on the meta-information of different fault domains such as nodes, deployment sets, and AZs, combined with the topology spread constraints (Topology Spread Constraints) in K8s scheduling, we can achieve different levels of fault domain isolation capabilities. First, all nodes in the ACK node pool will automatically add topology-related labels, such as "kubernetes.io/hostname", "topology.kubernetes.io/zone", "topology.kubernetes.io/region", etc. Developers can use topological distribution constraints to control the distribution of Pods among different fault domains to improve tolerance to underlying infrastructure faults.

3.1.2 Relationship between topological distribution constraints and NodeAffinity/NodeSelector

The relationship between topology distribution constraints and NodeAffinity/NodeSelector is that Node Affinity and Node Selectors are mainly used to control the scheduling of Pods to a specific range of nodes, while topology distribution constraints are more focused on controlling the distribution of Pods among different topological domains. These mechanisms can be combined to provide more fine-grained control over the scheduling and distribution of Pods.

NodeAffinity and NodeSelector can be used with topological distribution constraints. You can first use NodeAffinity and NodeSelector to select nodes that meet specific conditions, and then use topological distribution constraints to ensure that the distribution of Pods on these nodes meets the expected topological distribution requirements.

For details, please refer to  [1] .

Regarding the mixed use of Node Affinity and Node Selectors with topological distribution constraints, please refer to  [2] .

3.1.3 Shortcomings of topological distribution constraints

There are some limitations and shortcomings in topological distribution constraints, which need to be noted:

  1. Limitation on the number of Availability Zones: If the number of Availability Zones is very limited, or only one Availability Zone is available, then using topological distribution constraints may not bring its benefits. In this case, true availability zone fault isolation and high availability cannot be achieved.

  2. When a Pod is removed, there is no guarantee that the constraints are still satisfied. For example, scaling down a Deployment may result in uneven distribution of Pods.

  3. The scheduler does not have prior knowledge of all regions in the cluster or other topological domains. They are determined based on the existing nodes in the cluster. This can cause problems in auto-scaling clusters, when a node pool (or node group) is reduced to zero nodes, and the user expects the cluster to be able to scale, at which point these topological domains will not be considered until at least A node is in it.

See [3] for more details  .

3.1.4 Multiple availability zones to achieve simultaneous and rapid elastic expansion

The ACK node auto-scaling component can determine whether the service can be deployed on a certain scaling group through pre-scheduling, and then sends a request to expand the number of instances to the specified scaling group, and finally the ESS scaling group generates instances. However, this model feature of configuring multiple availability zones vSwtich on a scaling group will lead to the following problems:

When a multi-AZ business Pod cannot be scheduled due to insufficient cluster resources, the node automatic scaling service will trigger the expansion of the scaling group. However, the relationship between the Availability Zone and the instance that needs to be expanded cannot be passed to the ESS elastic scaling group, so it may continuously pop up. Multiple instances in a certain region, instead of popping up in multiple vSwtich at the same time, cannot meet the needs of simultaneous expansion in multiple availability zones.

Multi-availability zone balancing is a common deployment method in high-availability scenarios for data types. When business pressure increases, the application of a multi-availability zone balanced scheduling strategy hopes to automatically expand instances in multiple availability zones to meet the cluster's scheduling level.

In order to solve the problem of simultaneously expanding nodes in multiple availability zones, Container Service ACK introduces the ack-autoscaling-placeholder component, which uses a small amount of resource redundancy to transform the elastic scaling problem of multiple availability zones into the directional scaling problem of concurrent node pools.

The specific principles are as follows:

  1. First, create a node pool for each availability zone, and label each node pool with the availability zone.

  2. By configuring the availability zone label nodeSelector, use ack-autoscaling-placeholder to create a placeholder Pod for each availability zone. The default placeholder Pod has a relatively low-weight PriorityClass, and the application Pod has a higher priority than the placeholder Pod.

  3. In this way, after the business applies Pod Pending, it will preempt the pods in each availability zone. After the multi-availability zone pods with the availability zone nodeSelector are in Pending, the scheduling policy perceived by the node auto-scaling component is from the anti- affinity becomes the nodeSelector for the Availability Zone, making it easy to handle requests for nodes that issue expansion zones.

The two availability zones in the figure below are taken as an example to introduce the method of using the existing architecture to meet the simultaneous expansion of multiple availability zones.

Please refer to [4] for details  .

3.1.5 Automatic recovery after availability zone capacity is damaged

When a multi-zone high-availability cluster faces AZ failure, it often means that the application capacity is damaged. K8s will automatically expand the capacity based on the number of application copies or HPA (horizontal Pod scaling) configuration. This requires the cluster to be configured with Cluster AutoScaler or virtual nodes to achieve automatic expansion of cluster resources. When using Cluster AutoScaler, you can reserve resources through over-allocation, which can quickly launch container applications and avoid business interruptions caused by slow or failed underlying expansion. Please refer to [5] for details  .

3.2 Apply Pod anti-affinity by node (PodAntiAffinity)

Pod Anti-Affinity is a scheduling strategy in Kubernetes that is used to ensure that Pods are not scheduled to the same node when scheduling them. This can be used to disperse Pod nodes to improve the high availability and fault isolation capabilities of the application.

By using the Pod anti-affinity policy, you can configure the following methods to control the node dispersion of Pods:

1.RequiredDuringSchedulingIgnoredDuringExecution

This is an enforced policy that requires anti-affinity relationships between Pods to be satisfied at scheduling time. This means that the Kubernetes scheduler will try its best to ensure that when scheduling Pods, they are not scheduled on the same node. However, after scheduling, if node resources are insufficient or other reasons make it necessary to run these Pods on the same node, Kubernetes will ignore this anti-affinity policy.

2.PreferredDuringSchedulingIgnoredDuringExecution

This is a preferred strategy that recommends maintaining anti-affinity relationships between Pods, but is not a requirement. When the scheduler has multiple choices, it will try to avoid scheduling these Pods on the same node. However, the scheduler can still schedule these Pods on the same node if no other viable option is available.

By using the Pod anti-affinity policy, you can control the distribution of Pods in the cluster and avoid scheduling them on the same node, thus achieving node dispersion. This is useful for applications that need to run Pods on different nodes to improve availability and fault isolation.

The following is an example of Pod anti-affinity:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: app-run-per-node
spec:
  replicas: 3
  selector:
    matchLabels:
      app: app-run-per-node
  template:
    metadata:
      labels:
        app: app-run-per-node
    spec:
      containers:
        - name: app-container
          image: app-image
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchExpressions:
                  - key: app
                    operator: In
                    values:
                      - app-run-per-node
              topologyKey: "kubernetes.io/hostname"

In the above example, anti-affinity settings are set for Pods, requiring the scheduler to ensure that these Pods are not scheduled to the same node when scheduling. The topology key is set to "kubernetes.io/hostname", allowing the scheduler to spread out between different nodes.

Please note that Pod anti-affinity and node affinity (Node Affinity) are different concepts. Node affinity is used to specify that Pods prefer to be scheduled on nodes with specific labels, while Pod anti-affinity is used to ensure that Pods are not scheduled to the same node. Combining these two scheduling strategies enables more flexible and fault-tolerant scheduling and node distribution.

Based on TopologySpreadConstraints, it is also possible to achieve the high-availability scheduling effect of Pods by node. Specifying topologyKey: "kubernetes.io/hostname" is equivalent to each node being a topology domain to achieve skew comparison between nodes. The following is an example of a topological distribution constraint:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
    spec:
      containers:
        - name: my-app-container
          image: my-app-image
      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: "kubernetes.io/hostname"
          whenUnsatisfiable: DoNotSchedule

In the above example, a topology distribution constraint is created with maxSkew set to 1, topologyKey set to "kubernetes.io/hostname", and whenUnsatisfiable set to DoNotSchedule. This means that only 1 Pod can be run on each node to force Pods to be dispersed by node for high availability.

3.3 Application multi-copy high availability

In Kubernetes, multi-copy high availability of applications can be achieved based on a variety of workloads, such as CRD advanced resources such as Deployment, Statefulset, and OpenKruise. Taking Deployment as an example, the Deployment resource is a controller used to define the number of Pod copies. It is responsible for ensuring that the specified number of copies are always running and automatically creates new copies when a copy fails.

3.3.1 Application multi-activity and high availability

Multi-activity and high availability of an application means that multiple copies of the application can receive traffic and process services independently. The number of copies can be configured through HPA and automatic elasticity triggered by load pressure can be used to adapt to dynamic traffic, such as API Server and Nginx Ingress. Controller. This form of high-availability design needs to consider issues such as data consistency and performance of multiple copies.

3.3.2 Application active and backup high availability

The high availability of the master and backup of an application means that there are master and backup copies in multiple copies of the application. The most common one is one master and one backup. There are also more complex forms such as one master and multiple slaves. The master is selected based on lock grabbing and other methods. It is applicable Components in the form of controllers. For example, Etcd, KubeControllerManager, and KubeScheduler are all high-availability applications in active and backup mode.

Users can design the controller to use multi-active or active-standby according to their own business forms and scenarios.

3.3.3 PDB improves application high availability

In order to further improve application availability, Pod Disruption Budget (PDB) configuration can also be used in Kubernetes. PDB allows users to define a minimum number of available replicas. When node maintenance or failure occurs, Kubernetes will ensure that at least the specified number of replicas remain running. PDB can prevent too many replicas from being terminated at the same time, which is especially suitable for scenarios where multiple live replicas handle traffic, such as MessageQueue products, thereby avoiding service interruption.

To use a PDB, you add a PDB resource to the Deployment or StatefulSet's configuration and specify the minimum number of replicas available. For example, the following is an example Deployment configuration using PDB:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: app-with-pdb
spec:
  replicas: 3
  selector:
    matchLabels:
      app: app-with-pdb
  template:
    metadata:
      labels:
        app: app-with-pdb
    spec:
      containers:
        - name: app-container
          image: app-container-image
          ports:
            - containerPort: 80
  ---
  apiVersion: policy/v1beta1
  kind: PodDisruptionBudget
  metadata:
    name: pdb-for-app
  spec:
    minAvailable: 2
    selector:
      matchLabels:
        app: app-with-pdb

In the above example, the Deployment configuration defines 3 replicas and the PDB configuration specifies that at least 2 replicas should be kept available. This means that even during node maintenance or failure, Kubernetes will ensure that at least 2 replicas are always running, improving the availability of applications in the Kubernetes cluster and reducing possible service interruptions.

3.4 Health detection and self-healing

In Kubernetes, you can monitor and manage the status and availability of containers by configuring different types of probes. The following are commonly used probe configurations and strategies in Kubernetes:

  • Liveness Probes Liveness probes are used to monitor whether the container is still running normally. If the liveness probe fails (returns a non-200 status code), Kubernetes will restart the container. By configuring a survival probe, you can ensure that the container can automatically restart when a problem occurs. Liveness probes can detect using HTTP requests, TCP sockets, or command execution.

  • Readiness Probes Readiness probes are used to monitor whether the container is ready to receive traffic. Kubernetes will forward traffic to the container only if the readiness probe succeeds (returning a 200 status code). By configuring readiness probes, you can ensure that containers are added to the service's load balancer only when they are fully started and ready to receive requests.

  • Startup Probes Startup probes are used to monitor whether the container is starting. Unlike liveness probes and readiness probes, startup probes are executed during container startup, and the container is marked as ready only after the probe succeeds. If starting the probe fails, Kubernetes marks the container as failed and restarts the container.

  • Restart Policy A restart policy defines the actions that should be taken when a container exits. Kubernetes supports the following three restart strategies:

<!---->

    • Always: Kubernetes will automatically restart the container no matter how it exits.
    • OnFailure: Kubernetes will automatically restart the container only if the container exits with a non-zero status.
    • Never: Kubernetes will not automatically restart the container no matter how it exits.

This can be configured by adding corresponding probes and restart policies in the configuration of the Pod or Deployment, for example:

apiVersion: v1
kind: Pod
metadata:
  name: app-with-probe
spec:
  containers:
    - name: app-container
      image: app-image
      livenessProbe:
        httpGet:
          path: /health
          port: 80
        initialDelaySeconds: 10
        periodSeconds: 5
      readinessProbe:
        tcpSocket:
          port: 8080
        initialDelaySeconds: 15
        periodSeconds: 10
      startupProbe:
        exec:
          command:
            - cat
            - /tmp/ready
        initialDelaySeconds: 20
        periodSeconds: 15
      restartPolicy: Always

In the above example, a liveness probe (detecting port 80 with the path /health through an HTTP GET request), a readiness probe (detecting port 8080 through a TCP socket), and a startup probe (using the command cat /tmp/ready to detect whether the container has been started) and the restart policy is Always. According to actual needs, appropriate configuration can be made based on the characteristics of the container and health check requirements.

3.5 Application and data decoupling

In Kubernetes, application and data decoupling can be achieved by using Persistent Volumes (PV) and Persistent Volume Claims (PVC), or you can choose an appropriate back-end database service storage.

PV and PVC provide an abstraction layer that applications can use independently of the underlying storage technology. Persistent Volumes are storage resources in the cluster, and they are independent of Pods and nodes. Persistent Volume Claims are requests for Persistent Volumes that are used to bind storage resources to the application's Pods.

To choose the right Persistent Volume Claims and Persistent Volumes, there are several factors to consider:

  • storage type

    Choose the appropriate storage type based on your application's needs. Kubernetes supports a variety of storage types, including local storage, network storage (such as NFS, iSCSI, etc.), cloud provider's persistent storage (such as Alibaba Cloud OSS, etc.), and external storage plug-ins (such as Ceph, GlusterFS, etc.). Choose the appropriate storage type based on your application's read and write performance, data protection, and availability requirements.

  • storage

    Choose the appropriate storage capacity based on your application's storage needs. When creating a Persistent Volume, you can specify the size range of the storage capacity. When creating a Persistent Volume Claim, you can specify the required storage capacity. Make sure you provide your application with enough storage space to meet its data storage needs.

  • access mode

    Select the appropriate access mode based on the access mode of the application. Kubernetes supports multiple access modes, including consistent read and write (ReadWriteOnce), read and write multiple times (ReadWriteMany), and read-only (ReadOnlyMany). Choose the appropriate access mode based on your application's multi-node access needs.

When choosing an appropriate backend data service, such as RDS, you need to consider the following factors:

  • Database types and functions

    Choose the appropriate database type based on your application's needs. Different database types such as relational databases (such as MySQL, PostgreSQL), NoSQL databases (such as MongoDB, Cassandra), etc. provide different functions and adaptability. Choose the appropriate database type based on your application's data model and query needs.

  • Performance and scalability

    Choose a backend data service based on your application's performance requirements. Consider the database's performance metrics (such as throughput, latency) and its ability to scale.

  • Availability and reliability

    Consider availability and reliability when choosing a backend data service. Cloud providers' managed database services typically offer high availability and automated backup capabilities. Make sure you choose a backend data service that meets your application's availability and data protection needs.

3.6 Load balancing high availability configuration

Traditional load balancing CLB has deployed multiple availability zones in most regions to achieve cross-machine room disaster recovery in the same region. You can use Service Annoation to specify the active and backup availability zones of SLB/CLB. The active and backup availability zones should be consistent with the ECS availability zones in the node pool, which can reduce cross-availability zone data forwarding and improve network access performance.

apiVersion: v1
kind: Service
metadata:
  annotations:
    service.beta.kubernetes.io/alibaba-cloud-loadbalancer-master-zoneid: "cn-hangzhou-a"
    service.beta.kubernetes.io/alibaba-cloud-loadbalancer-slave-zoneid: "cn-hangzhou-b"
  name: nginx
  namespace: default
spec:
  ports:
  - port: 80
    protocol: TCP
    targetPort: 80
  selector:
    run: nginx
  type: LoadBalancer

In order to reduce cross-availability zone network traffic and improve network performance. We can use Topology Aware Hints introduced in Kubernetes 1.23 to implement the topology-aware nearest routing function.

3.7 Cloud disk high availability configuration

Currently, Alibaba Cloud cloud disks can only be created and mounted in a single availability zone. To use cloud disks as data storage for persistent applications in a multi-availability zone cluster, you need to select the topology-aware cloud disk storage type alicloud-disk-topology provided by ACK.  [6] to create a PersistentVolumeClaim. The Volume Binding Mode of this storage class defaults to WaitForFirstConsumer and supports delayed binding until the PersistentVolumeClaim is created in the corresponding availability zone. You can create a more refined ESSD cloud disk storage class that specifies multi-availability zone topology awareness to generate storage volumes. The storage declaration and persistence applications are as follows. For more details, see the document [7  ] .

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: alicloud-disk-topology-essd
provisioner: diskplugin.csi.alibabacloud.com
parameters:
  type: cloud_essd
  fstype: ext4
  diskTags: "a:b,b:c"
  zoneId: “cn-hangzhou-a,cn-hangzhou-b,cn-hangzhou-c” #指定可用区范围来生成云盘
  encrypted: "false"
  performanceLevel: PL1 #指定性能类型
  volumeExpandAutoSnapshot: "forced" #指定扩容编配的自动备份开关,该设置仅在type为"cloud_essd"时生效。
volumeBindingMode: WaitForFirstConsumer
reclaimPolicy: Retain
allowVolumeExpansion: true
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: topology-disk-pvc
spec:
  accessModes:
  - ReadWriteOnce
  volumeMode: Filesystem
  resources:
    requests:
      storage: 100Gi
  storageClassName: alicloud-disk-topology-essd
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mysql
spec:
  serviceName: "mysql"
  selector:
    matchLabels:
      app: mysql
  template:
    metadata:
      labels:
        app: mysql
    spec:
      containers:
      - image: mysql:5.6
        name: mysql
        env:
        - name: MYSQL_ROOT_PASSWORD
          value: "mysql"
        volumeMounts:
        - name: data
          mountPath: /var/lib/mysql
      volumes:
        - name: data
          persistentVolumeClaim:
            claimName: topology-disk-pvc

3.8 Virtual node high availability configuration

Virtual Node supports Kubernetes applications to be deployed in elastic container instances ECI, eliminating the need for node operation and maintenance and creating them on demand, reducing the waste of reserved resources. When performing rapid horizontal expansion of business to cope with sudden traffic, or launching a large number of instances to process job tasks, you may encounter situations such as insufficient inventory of instances of corresponding specifications in the availability zone or exhaustion of specified switch IPs, resulting in ECI instance creation failure. . Using the multi-availability zone feature of ACK Serverless can improve the success rate of creating ECI instances.

Virtual nodes can be deployed across multiple AZ applications by configuring vSwitches across different AZs through ECI Profile.

  • ECI will distribute the requests to create Pods to all vSwitches to achieve the effect of distributing pressure.
  • If a Pod creation request encounters a situation where there is no inventory in a vSwitch, it will automatically switch to the next vSwitch and continue trying to create it.

Modify the vswitch field in the kube-system/eci-profile configmap according to actual needs. The modification will take effect immediately.

kubectl -n kube-system edit cm eci-profile
apiVersion: v1
data:
  kube-proxy: "true"
  privatezone: "true"
  quota-cpu: "192000"
  quota-memory: 640Ti
  quota-pods: "4000"
  regionId: cn-hangzhou
  resourcegroup: ""
  securitygroupId: sg-xxx
  vpcId: vpc-xxx
  vswitchIds: vsw-xxx,vsw-yyy,vsw-zzz
kind: ConfigMap

Related content can be found in  [8].

3.9 Monitoring alarm configuration

Through the indicators revealed by K8s itself and components such as kube-state-metrics, the high availability of resources such as Pods and Nodes and the distribution of available zones can be effectively monitored, which is of great significance for quickly discovering and locating problems. In the production environment, the monitoring and alarm system is continuously built and iteratively updated according to K8s version and component version upgrades. It is recommended to continue to pay attention to new indicators and introduce them into the monitoring and alarm system according to business scenarios. The following lists two alarm configurations for resource high availability for reference.

3.9.1 Monitoring alarm that the application load copy is unavailable

K8s's kube-state-metrics can aggregate and analyze the number of unavailable copies, total number of copies, etc. of application load Deployment/Statefulset/Daemonset. Based on this type of indicators, you can find whether there are unavailable copies in the application and the percentage of unavailable copies in the total number of copies. Realize monitoring and alarming of partially or completely affected services.

Taking Deployment as an example, the alarm examples of AlertManager/Thanos Ruler are as follows:

# kube-system或者monitoring中的Deployment存在不可用副本,持续1m,则触发告警,告警serverity配置为L1
- alert: SystemPodReplicasUnavailable
  expr: kube_deployment_status_replicas_unavailable{namespace=~"kube-system|monitoring",deployment!~"ack-stub|kubernetes-kdm"} > 0
  labels:
    severity: L1
  annotations:
    summary: "namespace={{$labels.namespace}}, deployment={{$labels.deployment}}: Deployment存在不可用Replica"
  for: 1m
# kube-system或者monitoring中的Deployment副本总数>0,且全部副本不可用,持续1m,则触发告警,告警serverity配置为L1
- alert: SystemAllPodReplicasUnavailable
  expr: kube_deployment_status_replicas_unavailable{namespace=~"kube-system|monitoring"} == kube_deployment_status_replicas{namespace=~"kube-system|monitoring"} and kube_deployment_status_replicas{namespace=~"kube-system|monitoring"} > 0
  labels:
    severity: L1
  annotations:
    summary: "namespace={{$labels.namespace}}, deployment={{$labels.deployment}}: Deployment全部Replicas不可用"
  for: 1m

3.9.2 Monitoring alarms on the percentage of unhealthy nodes in the cluster availability zone

The kube-controller-manager component of K8s counts the number of unhealthy nodes, the percentage of healthy nodes, and the total number of nodes in the availability zone, and can configure related alarms.

# HELP node_collector_unhealthy_nodes_in_zone [ALPHA] Gauge measuring number of not Ready Nodes per zones.
# TYPE node_collector_unhealthy_nodes_in_zone gauge
node_collector_unhealthy_nodes_in_zone{zone="cn-shanghai::cn-shanghai-e"} 0
node_collector_unhealthy_nodes_in_zone{zone="cn-shanghai::cn-shanghai-g"} 0
node_collector_unhealthy_nodes_in_zone{zone="cn-shanghai::cn-shanghai-l"} 0
node_collector_unhealthy_nodes_in_zone{zone="cn-shanghai::cn-shanghai-m"} 0
node_collector_unhealthy_nodes_in_zone{zone="cn-shanghai::cn-shanghai-n"} 0
# HELP node_collector_zone_health [ALPHA] Gauge measuring percentage of healthy nodes per zone.
# TYPE node_collector_zone_health gauge
node_collector_zone_health{zone="cn-shanghai::cn-shanghai-e"} 100
node_collector_zone_health{zone="cn-shanghai::cn-shanghai-g"} 100
node_collector_zone_health{zone="cn-shanghai::cn-shanghai-l"} 100
node_collector_zone_health{zone="cn-shanghai::cn-shanghai-m"} 100
node_collector_zone_health{zone="cn-shanghai::cn-shanghai-n"} 100
# HELP node_collector_zone_size [ALPHA] Gauge measuring number of registered Nodes per zones.
# TYPE node_collector_zone_size gauge
node_collector_zone_size{zone="cn-shanghai::cn-shanghai-e"} 21
node_collector_zone_size{zone="cn-shanghai::cn-shanghai-g"} 21
node_collector_zone_size{zone="cn-shanghai::cn-shanghai-l"} 21
node_collector_zone_size{zone="cn-shanghai::cn-shanghai-m"} 21
node_collector_zone_size{zone="cn-shanghai::cn-shanghai-n"} 21

The alarm examples of AlertManager/Thanos Ruler are as follows:

# node_collector_zone_health <= 80 如果可用区内健康节点比例小于80%,就触发告警。
- alert: HealthyNodePercentagePerZoneLessThan80
  expr: node_collector_zone_health <= 80
  labels:
    severity: L1
  annotations:
    summary: "zone={{$labels.zone}}: 可用区内健康节点与节点总数百分比 <= 80%"
  for: 5m

Single/multi-cluster high-availability architecture for applications

Based on the high-availability technology introduced in Chapter 3 above and the product capabilities provided by Alibaba Cloud's product capabilities, a high-availability architecture within a single cluster can be fully realized. Multi-cluster high-availability architecture is a high-availability architecture that is further upgraded on the single-cluster high-availability architecture, providing high-availability service capabilities across clusters/regions.

Through multi-region, multi-cluster deployment and unitized application architecture, the challenges of cross-region network transmission delay, cost and failure rate can be overcome while ensuring high availability. This can provide users with a better user experience while ensuring business stability and reliability.

4.1 Single-region, multi-availability zone high-availability cluster

Based on the high-availability technology introduced in Chapter 3 above and the capabilities provided by Alibaba Cloud products, a high-availability architecture within a single cluster dimension can be realized, which will not be described again here.

4.2 Single-region multi-cluster high availability + multi-region multi-cluster high availability

First, each ACK cluster adopts a multi-availability zone high-availability architecture, and business applications adopt a multi-availability zone deployment mode to provide external services through SLB.

Multi-region deployment and multi-AZ deployment are similar in nature, but due to differences in network transmission delays, costs, and failure rates between regions, different deployment and application architectures are required to adapt. For the platform layer, it is not recommended to implement cross-regional Kubernetes clusters. Instead, it is recommended to adopt a multi-regional multi-cluster approach and combine it with a unitized application architecture to achieve a multi-regional high-availability architecture. Deploy multiple independent Kubernetes clusters in different regions, and each cluster manages its own nodes and applications. The cluster in each region is independent and has its own master node and worker node. This can reduce cross-regional network delays and failure rates, and improve application availability and performance.

At the same time, a unitized application architecture is adopted to split the application into independent units, and each unit deploys copies in clusters in multiple regions. Through load balancing and DNS resolution technology, user requests can be routed to the nearest location, traffic can be distributed to the nearest region, network latency can be reduced, and high availability and disaster recovery capabilities can be provided.

If the cross-region ACK cluster requires network interconnection, the interconnection between multi-region VPCs can be achieved through the Cloud Enterprise Network (CEN). Inter-regional business traffic scheduling is implemented through global traffic management GTM and cloud analysis services.

If you want to conduct unified management of multiple clusters in multiple regions, such as observability, security policies, and unified delivery of applications across clusters. We can use ACK One to achieve this. Distributed Cloud Container Platform ACK One (Distributed Cloud Container Platform for Kubernetes) is an enterprise-level cloud-native platform launched by Alibaba Cloud for hybrid cloud, multi-cluster, distributed computing, disaster recovery and other scenarios. ACK One can connect and manage users' Kubernetes clusters in any region and on any infrastructure, and provides consistent management and community-compatible APIs to support computing, network, storage, security, monitoring, logs, jobs, applications, traffic, etc. Carry out unified operation and maintenance management and control. If you want to know more about ACK One, please feel free  to join the group with DingTalk search group number: 35688562 .

ACK One builds a disaster recovery solution for application systems in two places and three centers. For related content, please refer to  [9].

Link summary

High-availability architecture and design of applications in cloud-native scenarios are crucial to the availability, stability, and security of enterprise services. They can effectively improve application availability and user experience, and provide fault isolation and fault tolerance. This article introduces the key principles of high-availability architecture design for cloud-native applications, K8s high-availability technology, its use and implementation in ACK scenarios, and the use of single-cluster and multi-cluster high-availability architecture. We hope to provide reference and help for enterprises with related needs. , ACK will continue to provide customers with cloud-native products and services that are secure, stable, performance and cost-optimized and upgraded continuously!

This article refers to the wonderful sharing of Yi Li, the head of Alibaba Cloud Container Service, on the analysis of ACK high-availability architecture. I would like to express my sincere thanks!

Related Links:

[1] https://kubernetes.io/docs/concepts/scheduling-eviction/topology-spread-constraints/#interaction-with-node-affinity-and-node-selectors

[2] https://kubernetes.io/blog/2020/05/introducing-podtopologyspread/

[3] https://kubernetes.io/docs/concepts/scheduling-eviction/topology-spread-constraints/#known-limitations

[4] https://help.aliyun.com/zh/ack/ack-managed-and-ack-dedicated/user-guide/configure-auto-scaling-for-cross-zone-deployment

[5] https://help.aliyun.com/document_detail/184995.html

[6] https://help.aliyun.com/zh/ack/ack-managed-and-ack-dedicated/user-guide/dynamically-provision-a-disk-volume-by-using-the-cli-1#section-dh5-bd8-x0q

[7] https://help.aliyun.com/zh/ack/ack-managed-and-ack-dedicated/user-guide/use-dynamically-provisioned-disk-volumes

[8] https://help.aliyun.com/zh/ack/serverless-kubernetes/user-guide/create-ecis-across-zones

[9] https://developer.aliyun.com/article/913027

Qt 6.6 is officially released. The pop-up window on the lottery page of Gome App insults its founder . Ubuntu 23.10 is officially released. You might as well take advantage of Friday to upgrade! RISC-V: not controlled by any single company or country. Ubuntu 23.10 release episode: ISO image was urgently "recalled" due to containing hate speech. Russian companies produce computers and servers based on Loongson processors. ChromeOS is a Linux distribution using Google Desktop Environment 23-year - old PhD student fixes 22-year-old "ghost bug" in Firefox TiDB 7.4 released: officially compatible with MySQL 8.0 Microsoft launches Windows Terminal Canary version
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/3874284/blog/10117788