Elastic scaling

In the Pod Orchestration and Scheduling chapter, a controller like Deployment is introduced to control the number of Pod replicas. By adjusting the size of replicas, you can achieve the purpose of manually scaling applications. However, in some actual scenarios, manual adjustment is one of cumbersome and second, the speed is not so fast, especially when fast and flexible response to traffic peaks is required.

Kubernetes supports automatic elastic scaling of Pod and cluster nodes. By setting elastic scaling rules, when external conditions (such as CPU usage) reach a certain condition, Pod and cluster nodes are automatically scaled according to the rules.

Prometheus and Metrics Server

To achieve automatic elastic scaling, the prerequisite is to be able to perceive various operating data, such as cluster nodes, Pod, container CPU, memory usage, and so on. And Kubernetes does not implement the monitoring capabilities of these data by itself, but expands the capabilities of Kubernetes through other projects.

  • Prometheus is an open source system monitoring and alarm framework that can collect a wealth of metrics (metric data). It is now basically a standard monitoring solution for Kubernetes.
  • Metrics Server is an aggregator of Kubernetes cluster-wide resource usage data. Metrics Server collects measurement data from the Summary API exposed by kubelet, can collect measurement data including Pod, Node, container, Service and other main Kubernetes core resources, and provides a set of standard APIs.
    Using HPA (Horizontal Pod Autoscaler) with Metrics Server can achieve automatic elastic scaling based on CPU and memory, and with Prometheus, you can also achieve automatic elastic scaling of custom monitoring indicators.

HPA working mechanism

HPA (Horizontal Pod Autoscaler) is a controller used to control the horizontal expansion of the Pod. HPA periodically checks the measurement data of the Pod, calculates the number of copies required to meet the target value configured by the HPA resource, and then adjusts the target resource (such as Deployment) The replicas field.

Figure 1 HPA working mechanism

Elastic scaling

HPA can configure single and multiple metrics. When configuring a single metric, you only need to sum the current metric data of the Pod, divide it by the desired target value, and then round up to get the desired number of copies. For example, a Deployment controls 3 Pods. The CPU usage of each Pod is 70%, 50%, 90%, and the expected value configured in HPA is 50%. The expected number of copies is calculated = (70 + 50 + 90)/ 50 = 4.2, round up to get 5, that is, the expected number of copies is 5.

If multiple metrics are configured, the expected number of copies of a single metric will be calculated separately, and then the maximum value will be the final expected number of copies.

Use HPA

The following example demonstrates the use of HPA. First use the Nginx image to create a 4-copy Deployment.

$ kubectl get deploy
NAME               READY     UP-TO-DATE   AVAILABLE   AGE
nginx-deployment   4/4       4            4           77s

$ kubectl get pods
NAME                                READY     STATUS    RESTARTS   AGE
nginx-deployment-7cc6fd654c-5xzlt   1/1       Running   0          82s
nginx-deployment-7cc6fd654c-cwjzg   1/1       Running   0          82s
nginx-deployment-7cc6fd654c-dffkp   1/1       Running   0          82s
nginx-deployment-7cc6fd654c-j7mp8   1/1       Running   0          82s

Create an HPA, expect the CPU utilization to be 70%, and the range of the number of copies is 1-10.

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: scale
  namespace: default
spec:
  maxReplicas: 10                    # 目标资源的最大副本数量
  minReplicas: 1                     # 目标资源的最小副本数量
  metrics:                           # 度量指标,期望CPU的利用率为70%
  - resource:
      name: cpu
      targetAverageUtilization: 70
    type: Resource
  scaleTargetRef:                    # 目标资源
    apiVersion: apps/v1
    kind: Deployment
    name: nginx-deployment

HPA view after creation.

$ kubectl create -f hpa.yaml
horizontalpodautoscaler.autoscaling/celue created

$ kubectl get hpa
NAME      REFERENCE                     TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
scale     Deployment/nginx-deployment   0%/70%    1         10        4          18s

It can be seen that the expected value of TARGETS is 70%, but the actual value is 0%, which means that HPA will make a shrinking action. The number of copies expected = (0+0+0+0)/70=0, but due to the minimum The number of copies is 1, so the number of Pods will be adjusted to 1. After waiting for a while, you can see that the number of Pods becomes 1.

$ kubectl get pods
NAME                                READY     STATUS    RESTARTS   AGE
nginx-deployment-7cc6fd654c-5xzlt   1/1       Running   0          7m41s

View HPA details, you can see such a record in Events. This means that HPA successfully performed the shrinking action 21 seconds ago, and the number of new Pods is 1, because all the metrics are lower than the target value.

$ kubectl describe hpa scale
...
Events:
  Type    Reason             Age   From                       Message
  ----       ------                  ----     ----                           -------
  Normal  SuccessfulRescale  21s   horizontal-pod-autoscaler  New size: 1; reason: All metrics below target

If you look at the details of Deployment, you can see such a record in Events. This means that the number of copies of Deployment is set to 1, which is consistent with what you see in HPA.

$ kubectl describe deploy nginx-deployment
...
Events:
  Type    Reason             Age   From                   Message
  ----       ------                  ----     ----                       -------
  Normal  ScalingReplicaSet  7m    deployment-controller  Scaled up replica set nginx-deployment-7cc6fd654c to 4
  Normal  ScalingReplicaSet  1m    deployment-controller  Scaled down replica set nginx-deployment-7cc6fd654c to 1

Cluster AutoScaler

HPA is aimed at the Pod level, but if the resources of the cluster are not enough, then the node can only be expanded. The elastic scaling of cluster nodes is originally a very troublesome thing, but fortunately, most of the current clusters are built on the cloud, and the cloud can directly call the interface to add and delete nodes, which makes the elastic scaling of cluster nodes very convenient.

Cluster Autoscaler is an elastic scaling component of cluster nodes provided by Kubernetes, which automatically expands and shrinks cluster nodes according to Pod scheduling status and resource usage. Since the interface on the cloud is called to achieve elastic scaling, the implementation and use in different environments are different, so I won't introduce it in detail here.

For the elastic scaling of cluster nodes of Huawei Cloud CCE, see Creating a Node Scaling Policy.

Guess you like

Origin blog.51cto.com/14051317/2553830