The ultimate serverless experience brought by Knative

Head picture.jpg

Author |
Dongdao Alibaba Senior Technical Expert Source | Serverless Official Account

Introduction : Serverless is now a state that everyone is looking forward to in the future, but what capabilities does a system have to better support serverless applications? With the rise of Kubernetes and cloud native concepts, how should serverless play on top of Kubernetes? This article starts from the core characteristics of serverless applications and discusses what characteristics should be possessed as a serverless application management platform. Through this article, you will have a deep understanding of Knative's serverless application management method.

Why do we need Knative

1.png

Serverless is already highly anticipated, and the future can be expected. Various survey reports show that enterprises and developers are already using Serverless to build online services, and this proportion is still increasing.

Under this general trend, let's look at the evolution direction of the IaaS architecture. Initially, enterprises used cloud resources in the cloud based on VMs, and enterprise online services were all naked in VMs through tools such as Ansible, Saltstack, Puppet, or Chef. Starting the application directly in the VM caused the online service to have a strong dependence on the environment configuration of the VM. Then, with the rise of container technology, everyone began to deploy applications in the VM through the method of containers.

But if there are more than a dozen or even dozens of applications to be deployed, it is necessary to quickly deploy and upgrade applications in hundreds of VMs, which is a very headache. And Kubernetes has solved these problems well, so now everyone starts to use cloud resources through Kubernetes. With the popularity of Kubernetes, major cloud vendors have begun to provide Serverless Kubernetes services. Users do not need to maintain Kubernetes clusters and can directly use cloud capabilities through Kubernetes semantics.

Now that Kubernetes is already very good, why do you need Knative? To answer this question, let us first sort out the common characteristics of serverless applications:

  • Use on demand, automatic flexibility

Use cloud resources on demand, automatically expand when business volume rises, and automatically shrink when business volume declines, so automatic flexibility is required.

  • Grayscale release

To support multi-version management, various gray release strategies can be used to launch new versions when the application is upgraded.

  • Traffic management

Able to manage north-south traffic, and grayscale different versions according to the percentage of traffic.

  • Load balancing, service discovery

Automatically increase or decrease the number of instances in the process of application elasticity, and traffic management needs to have load balancing and service discovery functions.

  • Gateway

When multiple applications are deployed in the same cluster, an access layer gateway is required to manage the traffic of multiple applications and different versions of the same application.

With the rise of Kubernetes and cloud native concepts, the first instinct may be to deploy serverless applications directly on Kubernetes. So, what might we do if we want to deploy Serverless applications on native Kubernetes?

2.png

First, a Deployment is needed to manage Workload, and the ability to expose services and realize service discovery through Service is also required. There are major changes in the application. It may be necessary to suspend the observation when the new version is released, and continue to increase the gray scale after the observation confirms that there is no problem. At this time, two Deployments are needed to do it.

V1 Deployment represents the old version, and the number of instances is reduced one by one in grayscale; v2 Deployment represents the new version, and the number of instances is increased one by one in grayscale. hpa stands for flexibility, and every deployment has an hpa to manage flexible configuration.

There is actually a conflict: assuming that the v1 Deploymen originally had three pods, and one pod was upgraded to v2 in grayscale, at this time, 1/3 of the traffic would actually be hit to the v2 version. But when the business peak comes, because both versions are configured with hpa, v2 and v1 will be expanded at the same time. In the end, the number of pods in v1 and v2 will not be the ratio of 1/3 originally set.

Therefore, the traditional gray-scale strategy published according to the number of Deployment instances is naturally in conflict with flexible configuration. However, if the grayscale is performed according to the traffic ratio, there will be no such problem, which may require the introduction of Istio capabilities.

3.png

Introducing Istio as a Gateway component, Istio can manage the traffic of different applications in addition to the gray level of traffic of the same application. It looks good, but let's take a closer look at the problems. First sort out what you need to do to manually manage Serverless applications on native K8s:

  • Deployment
  • Service
  • HPA
  • Ingress
  • Same
    • VirtualService
    • Gateway

These resources are maintained for each application. If there are multiple applications, multiple copies must be maintained. These resources are scattered in K8s, the concept of application is not visible at all, and the management is also very cumbersome.

4.png

Serverless applications require application-oriented management actions, such as application hosting, upgrades, rollbacks, gray releases, traffic management, and flexibility. What Kubernetes provides is the abstraction of the use of IaaS. Therefore, there is no abstraction of application orchestration between Kubernetes and serverless applications.

And Knative is a serverless application orchestration framework built on Kubernetes. In addition to Knative, the community also has several FaaS-like orchestration frameworks, but there is no uniform standard for the applications compiled by these frameworks. Each framework has its own set of specifications and is completely incompatible with the Kubernetes API. Incompatible APIs lead to high difficulty in use and low reproducibility. One of the core standards of cloud native is the API standard of Kubernetes. Serverless applications managed by Knative keep the semantics of the Kubernetes API unchanged. Good compatibility with Kubernetes API is the cloud native feature of Knative.

What is Knative?

5.png

The main problem that Knative solves is to provide general serverless orchestration and scheduling services on top of Kubernetes, and provide application-oriented atomic operations for the upper serverless applications. And through the Kubernetes native API to expose the service API to maintain the perfect integration with the Kubernetes ecological tool chain. Knative has two core modules, Eventing and Serving. This article mainly introduces the core architecture of Serving.

Introduction to Knative Serving

6.png

The core of Serving is Knative Service. Knative Controller automatically operates Kubernetes Service and Deployment through the configuration of Service, thus achieving the goal of simplifying application management.

Knative Service corresponds to a resource called Configuration. Each time the Service changes, if a new Workload needs to be created, the Configuration is updated, and then each Configuration update will create a unique Revision. Revision can be considered as the version management mechanism of Configuration. In theory, Revision will not be modified after it is created.

Route is mainly responsible for Knative traffic management. Knative Route Controller automatically generates Knative Ingress configuration through Route configuration. Ingress Controller implements routing management based on the Ingress strategy.

Knative Serving's serverless orchestration of application workload starts with traffic. The traffic reaches Knative's Gateway first. The Gateway automatically splits the traffic into different Revisions based on the percentage of the Route configuration, and then each Revision has its own independent elastic strategy. When the incoming traffic requests increase, the current Revision starts to automatically expand. The expansion strategy of each Revision is independent and does not affect each other.

Different Revisions are grayed out based on the traffic percentage, and each Revision has an independent elastic strategy. Knative Serving realizes the perfect combination of traffic management, flexibility and gray scale through flow control.

Knative Serving API detailed

7.png

The figure above shows the working mechanism of Knative Autoscaler. Route is responsible for accessing traffic and Autoscaler is responsible for elastic scaling. When there is no business request, it will be scaled down to zero. After the scale is scaled down to zero, the incoming request from Route will go to the Activator. When the first request comes in, Activator will keep the http link, and then notify Autoscaler to expand. After Autoscaler completes the expansion of the first pod, Activator forwards the traffic to the Pod, thus achieving the goal of scaling to zero without losing traffic.

At this point, the core modules and basic principles of Knative Serving have been introduced, and you should have a preliminary understanding of Knative. In the process of introducing the principle, you may also feel that if you want to use Knative, you still need to maintain a lot of Controller components and Gateway components (such as Istio), and you must continue to invest in IaaS costs and operation and maintenance costs.

8.png

If the Gateway component is implemented using istio, Istio itself needs more than a dozen Controllers. If it is to be highly available, it may need more than 20 Controllers. A dozen Knative Serving Controllers are all highly available deployments. The IaaS costs and operation and maintenance costs of these Controllers are relatively high. In addition, the cold start problem is also obvious. Although scaling down to zero can reduce the cost of the business trough, the first batch of traffic may also time out.

The perfect integration of Knative and the cloud

In order to solve the above-mentioned problems, we have deeply integrated Knative and Alibaba Cloud. Users still use Knative's native semantics, but the underlying Controller and Gateway are deeply embedded in the Alibaba Cloud system. This not only ensures that users can use cloud resources with Knative API without the risk of vendor lock-in, but also enjoy the existing advantages of Alibaba Cloud infrastructure.

9.png

The first is the integration of Gateway and cloud. Alibaba Cloud SLB is directly used as Gateway. The advantages of using cloud product SLB are:

  • Cloud product level support, providing SLA guarantee;
  • Pay as you need, no IaaS resources are required;
  • Users do not need to bear operation and maintenance costs, and do not need to consider high availability issues. Cloud products have their own high availability capabilities.

10.png

In addition to the Gateway component, Knative Serving Controller also requires a certain cost, so we have also integrated Knative Serving Controller and Alibaba Cloud Container Service. Users only need to have a Serverless Kubernetes cluster and activate the Knative function to use the cloud capabilities based on the Knative API, and users do not need to pay any cost for Knative Controller.

Knative's cold start problem

11.png

Next, analyze the cold start problem. Traditional applications have a fixed number of instances when flexible configuration is not enabled. Serverless applications managed by Knative have flexible policies by default, which will shrink to zero when there is no traffic. In traditional applications, even if there is no business request processing when the traffic is low, the number of instances remains the same, which is actually a waste of resources. But the advantage is that the request will not time out, and any request coming in can be handled well. If the capacity is reduced to zero, the capacity expansion process will only be triggered after the first request arrives.

In Knative's model, the expansion from 0 to 1 requires 5 steps to be carried out serially. After these 5 steps are completed, the first request can be processed, and it will often time out at this time. Therefore, although Knative is reduced to zero, although the cost of resident resources is reduced, the cold start problem of the first batch of requests is also very obvious. It can be seen that flexibility is actually looking for a balance between cost and efficiency.

12.png

In order to solve the cold start problem of the first instance, we introduced the retained instance function. Reserved instance is a unique function of Alibaba Cloud Container Service Knative. The community's Knative defaults to shrink to zero when there is no traffic, but the cold start problem from 0 to 1 after shrinking to zero is difficult to solve. In addition to solving the problems of IaaS resource allocation, Kubernetes scheduling, and mirroring, cold start also involves application startup time. The application startup time ranges from milliseconds to minutes. Application startup time is entirely a business behavior, which is almost beyond control at the underlying platform level.

ASK Knative's solution to this problem is to balance the cost and cold start problem through low-priced reserved instances. Alibaba Cloud ECI has many specifications. Different specifications have different computing capabilities and different prices. The following is a price comparison of computing instances and burst-performance instances of 2c4G configuration.

13.png

From the above figure, it can be seen that the burst performance instance is 46% cheaper than the computing type. It can be seen that if there is no traffic, using the burst performance instance to provide services will not only solve the cold start problem, but also save a lot of costs.

In addition to the price advantage, the burst performance instance has a very eye-catching feature that is CPU credits. The burst performance instance can use CPU credits to cope with burst performance requirements. Sudden performance instances can continue to earn CPU credits. When the performance cannot meet the load requirements, the computing performance can be seamlessly improved by consuming the accumulated CPU credits without affecting the environment and applications deployed on the instance. Through CPU points, you can allocate computing resources from the perspective of the overall business, and seamlessly transfer the remaining computing power during the peak period of the business to the peak period (a simple understanding is the hybrid of oil and electricity). For more details of burst performance examples, see here .

Therefore, ASK Knative’s strategy is to use burst performance instances to replace standard computing instances when the business troughs, and seamlessly switch to standard computing instances when the first request comes. This can reduce the cost of traffic troughs, and the CPU points obtained during the troughs can also be consumed when the business peak comes, and every penny paid by the user will not be wasted.

Using burst performance instances as reserved instances is only the default strategy, and users can specify other types of instances they desire as reserved instance specifications. Of course, the user can also specify to reserve at least one standard instance, thereby turning off the function of reserve instance.

to sum up

Knative is the most popular serverless orchestration framework in the Kubernetes ecosystem. The community-native Knative requires a resident Controller and a resident gateway to provide services. In addition to paying the cost of IaaS, these resident instances also bring a lot of operation and maintenance burdens, and bring certain difficulties to the serverlessization of applications. Therefore, we fully host Knative Serving in ASK, which provides the ultimate serverless experience out of the box. .

The Serverless official account releases the latest information on Serverless technology, gathers the most complete content of Serverless technology, pays attention to the trend of Serverless, and pays more attention to the confusion and problems you encounter in your practice.

Guess you like

Origin blog.51cto.com/14902238/2562264