How does Kubernetes Autoscaling work?

How does Kubernetes Autoscaling work? This is a question we are often asked recently.

So this article will explain how the Kubernetes Autoscaling feature works and the advantages it can provide when scaling a cluster.

What is Autoscaling

Imagine filling 2 buckets with a faucet, we want to make sure the water starts filling the second bucket when the first bucket is 80% full. The solution is simple, just install a pipe connection between the two buckets at the appropriate location. And when we want to expand the amount of water, we only need to increase the bucket in this way.

The same is true for our applications or services. The elastic scaling function of cloud computing can free us from manually adjusting physical servers/virtual machines. Then compare "bucketed water" with "application consumption of computing resources" -

  • bucket - scale unit - explains what we scale
  • The 80% Mark - The Measure and Trigger of Zoom - Explains When We're Scaling
  • Pipeline - implements scaling operations - explains how we scale the problem

What are we scaling?

In a Kubernetes cluster environment, as users we generally scale two things:

Pods - For an application, let's say we run X replicas. When requests exceed the processing capacity of X Pods, we need to scale the application. And for this process to work seamlessly, our Nodes should have enough resources available to successfully schedule and execute these additional Pads;

Nodes - The total capacity of all Nodes represents our cluster capacity. If the workload demand exceeds this capacity, we need to add nodes to the cluster to ensure efficient scheduling and execution of the workload. If the Pods keep expanding, then there may be a situation where the node's available resources are about to be exhausted, and we have to add more nodes to increase the overall resources available at the cluster level;

When to zoom?

Typically, we measure a metric continuously, and when the metric exceeds a threshold, act on it by scaling a resource. For example, we might need to measure the average CPU consumption of a Pod and then trigger a scaling operation when the CPU consumption exceeds 80%.

But one metric is not suitable for all use cases, and the metric may vary for different types of applications - for message queues, the number of messages in the waiting state may be used as a metric; for memory-intensive applications, memory Consumption may be more appropriate as a metric. If we had a business application that could handle ~1000 transactions per second for a given capacity pod, then we would probably pick this metric and scale when the Pods reached more than 850.

Above we only considered the scaling part, but when workload usage drops, there should be a way to scale back modestly without disrupting existing requests being processed.

How to zoom?

For Pods, just change the number of replicas in the replication; for Nodes, we need a way to call the cloud computing service provider's API to create a new instance and make it part of the cluster.

Kubernetes Autoscaling

Based on the above understanding, let's take a look at the specific implementation and technology of Kubernetes Autoscaling -

Cluster Autoscaler

The Cluster Autoscaler is used to dynamically scale the cluster (Nodes). Its function is to continuously monitor the Pods. Once the Pods are found to be unable to be scheduled, they PodConditoinwill scale based on it. This method is much more efficient than looking at the percentage of CPUs in the cluster. Since Nodes take a minute or more to create (depending on factors such as cloud computing service providers), it may take some time for Pods to be Scheduled.

Within a cluster, we may have multiple Nodes Pools, such as a Nodes Pool for billing applications and another Nodes Pool for machine learning workloads. Cluster Autoscaler provides various flags and methods to tune Nodes scaling behavior, see https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md for more details.

For Scale down, the Cluster Autoscaler looks at the average utilization of Nodes and considers other relevant factors, such as if Pods (Pod disruption Budget) are running on a Node that cannot be rescheduled, then that Node cannot be removed from the cluster. Custer Autoscaler provides a way to gracefully terminate Nodes, typically relocating Pods within 10 minutes.

Horizo ​​ntal Pod Autoscaler (HPA)

HPA is a control loop that monitors and scales Pods in a deployment. This can be done by creating an HPA object that references the deployment/reolication controller. We can define thresholds and upper and lower limits for deployment scaling. The earliest version of HPA, GA (autoscaling/v1), only supports CPU as a monitorable metric. The current version of HPA is in beta (autoscaling/v2beta1) to support memory and other custom metrics. Once the HPA object is created and it is able to query the pod's metrics, you can see that it reports details:

$ kubectl get hpa
NAME               REFERENCE                     TARGETS  MINPODS MAXPODS REPLICAS AGE
helloetst-ownay28d Deployment/helloetst-ownay28d 8% / 60% 1       4       1        23h

We can make some adjustments to the horizontal Pod Autoscaler by adding Flags to the Controller Manager:

  • Use Flags -horizontal-pod-autoscaler-sync-periodto determine the monitoring frequency of hPa for Pods group metrics. The default period is 30 seconds.
  • The default interval between two expansion operations is 3 minutes, which can be controlled by Flags-horizontal-pod-autoscaler-upscale-delay
  • The default interval between two zoom out operations is 5 minutes, which can also be controlled by Flags-horizontal-pod-autoscaler-downscale-delay
Metrics and Cloud Providers

In order to measure metrics, the server should enable Heapster or enable API aggregation along with Kubernetes custom metrics ( https://github.com/kubernetes/metrics). API metrics server is the preferred method for Kubernetes version 1.9 and above. For configuring Nodes, we should enable and configure the appropriate cloud provider in the cluster, see https://kubernetes.io/docs/concepts/cluster-administration/cloud-providers/ for more details.

some plugins

There are also some very good plugins, such as -

All in all, next time someone asks "how does Kubernetes Autoscaling work"? Hope this short article can help you explain.

It's ad time again

A series of conceptual abstractions proposed by Kubernetes are very consistent with an ideal distributed scheduling system. However, a large number of difficult technical concepts have also formed a steep learning curve, which directly raises the threshold for the use of Kubernetes.

Rainbond, an open source PaaS of Nimbus Cloud, packages these technical concepts into "Production-Ready" applications, which can be used as a Kubernetes panel that developers can use without special learning. Including the elastic scaling in this article, Rainbond supports users to scale horizontally and vertically :)

In addition, Kubernetes itself is a container orchestration tool and does not provide management processes, while Rainbond provides ready-made management processes, including DevOps, automated operation and maintenance, microservice architecture and application market, etc., which can be used out of the box.

Learn more: https://www.goodrain.com/scene/k8s-docker

Rainbond Github:https://github.com/goodrain/rainbond

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325703455&siteId=291194637