In-depth understanding of the concurrency of serverless computing

background

In 2019, Berkeley predicted that Serverless will replace Serverful computing [1] and become a new computing paradigm for cloud computing. Serverless provides a brand-new system architecture for application development. With the three core values ​​of elastic scaling, lower cost, and lower OPS by focusing on business, it frees developers from heavy manual resource management and performance cost optimization, and changes the productivity of engineers again.

According to the official definition of CNCF [2]:

Serverless is a cloud native development model that allows developers to build and run applications without having to manage servers. There are still servers in serverless, but they are abstracted away from app development. A cloud provider handles the routine work of provisioning, maintaining, and scaling the server infrastructure. Developers can simply package their code in containers for deployment. Once deployed, serverless apps respond to demand and automatically scale up and down as needed. Serverless offerings from public cloud providers are usually metered on-demand through an event-driven execution model. As a result, when a serverless function is sitting idle, it doesn’t cost anything.

As can be seen from the above definition, Severless != No Server, but for developers, there is no Server to manage. Among the services provided by cloud vendors, the serverless architecture should be a design that uses FaaS (Function as a service) and BaaS (backend service) services to solve problems.

Typical representatives of FaaS services: AWS lambda, Alibaba Cloud Function Compute FC, Azure Functions, Google Cloud Functions, etc.
Typical representatives of BaaS services: AWS: S3, Dynamodb, SQS, etc.; Alibaba Cloud: OSS, TableStore, MNS, etc.

Serverless Computing

Of course, with the development of demand and technology, some serverless computing services other than FaaS have appeared in the industry, such as Google Cloud Run, AWS App Runner, Alibaba Cloud Serverless Application Engine SAE, Alibaba Cloud Serverless Kubernetes ASK, etc. These services also provide elastic scalability and pay-per-use charging models. With the form of serverless services, it can be said that it has further expanded the serverless computing camp.

In the field of serverless computing, the two most typical product forms, FaaS and Google Cloud Run, both adopt the index of concurrency (Concurrency) as the expansion and contraction strategy. Next, we will focus on analyzing the semantics of concurrency in different product forms and why these popular serverless computing products use concurrency as a scaling strategy.

What is concurrency?

Concurrency is one of the core principles of modern computing. Concurrency refers to the ability of a computing system to process multiple tasks simultaneously. For example, if your computer is running multiple programs at the same time, having multiple concurrent processes/threads can share CPU time. An application is also considered to be performing concurrent work if a single application process is processing multiple network requests at the same time, or processing multiple jobs in a queue in parallel.

For example, the practice of "the world's first language PHP" in the Web field uses a process pool, such as the FastCGI process manager in the figure below. Web requests sent to the server are assigned to CGI processes in the process pool. This CGI process will handle that single request. If multiple requests are received at the same time, multiple CGI processes are started to process them in parallel. However, each process can only handle one request at a time. The server is able to handle concurrent requests by context switching the CGI process. The operating system scheduler will keep track of all CGI processes and switch CGI processes running on the CPU as needed, so that each CGI process gets its own, fair share of CPU time when needed.

Schematic diagram of PHP Web operation

Today, there are more tools for concurrency, including powerful asynchronous concurrency mechanisms built into modern programming languages, and cloud computing services that help simplify concurrency. Let's look at how some cloud computing services design and use concurrency.

Single instance single concurrency

The principle of concurrent expansion and contraction of FaaS services of cloud vendors is basically the same. We refer to the official document of AWS Lambda [3]:

When a function is called for the first time, the FaaS service creates a function instance and runs the handler method to handle the event. Once complete, the function remains available for a period of time to handle subsequent events. If other events arrive while the function is busy, FaaS creates more function instances to handle those requests concurrently.

From the documentation, we can see that each function instance can only handle one event request at a time (that is, one concurrent request per instance, also known as single instance single concurrency). A function is considered busy while processing event requests, so any concurrent events must go to another function instance. There is a short "Cold Start" delay every time a new instance of the function must be created. The duration of this cold start depends on your code size and the runtime used. The following figure [4] shows how FaaS scales the number of function instances in real time when there are multiple concurrent requests that need to be processed in parallel:

Tips: Only the green part is billed in milliseconds, the yellow and blank parts are not billed, and the real 100% is paid for computing resources.

FaaS scaling and concurrency

This makes FaaS's concurrency model somewhat similar to those of old-school PHP process managers. In both cases: 1). The PHP process manager achieves concurrency by starting more processes in parallel. A single process can only handle one event request at a time. 2). FaaS achieves concurrency by starting more execution environment container instances in parallel, and a single instance can only process one event request at a time. But using process-level concurrency like the PHP process manager has two classic problems to solve:

  • Safe isolation between processes: You must make good decisions when the operating system allocates CPU time and system resources to processes. One process may consume excessive resources, affecting the performance of other processes running on the same machine.
  • Autoscaling: Using the PHP application as an example, you must manage the number of PHP CGI processes on each server, and you must manually scale the number of servers running these processes.

FaaS can solve the above two problems very well. FaaS obviously has some modern features. Take the security isolation of containers in the function computing execution environment as an example[5]:

Security isolation of Alibaba Cloud FC computing nodes

  • Virtualization level security isolation
    • X-Dragon bare metal computing nodes can run function instances from different users. Alibaba Cloud security sandbox is used to provide function-level virtualization and container isolation. ECS virtual machines are only allowed to run function instances of the same user. ECS isolation is used to provide user-level virtualization isolation, and container technologies such as Runc are used to achieve function-level container isolation .
  • The network access of the function instance is restricted, and the user determines the network external access permission
    • The function instance is configured with a private IP address. Users cannot directly access it, and the network between instances is unreachable. Network isolation is implemented using open vSwitch, iptables, and routing tables.
  • Function Instance Resource Limits Quotas for Function CPU/Memory Settings
  • Function Compute is responsible for bug fixes and security upgrades for function instance sandbox containers

Using FaaS, an event-driven fully managed computing service, you will automatically obtain isolated execution environment instances, and the FaaS service automatically manages the number and capacity of execution environment instances. All you have to do is provide your code to the FaaS service and send an event to the FaaS service to trigger the execution of the code.

A quick overview of FaaS

From the above discussion of FaaS concurrent expansion and contraction, I believe that everyone will soon get a single instance with a concurrency capability that is very friendly to CPU-intensive logic. Many modern workloads are full of I/O operations. If we adopt the classic one concurrent request per instance mode of FaaS, there will be the following pain points:

  1. serious waste of resources

IO-intensive workload[11]

2. The blue box indicates the time when the program is working, and the red box indicates the time spent waiting for the IO operation to complete. Since IO requests may take several orders of magnitude longer than CPU instructions, your program may spend most of the time waiting, and instance resources are seriously wasted. And as the number of concurrency increases, the wasted resources also increase linearly. For example, the red part below is the wasted computing resources:

FaaS IO-intensive workload

3. May have unintended consequences on shared resources

Databases are a typical example. When using a traditional relational database (such as mysql), the database has a maximum number of concurrent connections. Traditional resident servers are often optimized using "database connection pools". The "database connection pool" limits the maximum number of concurrent connections of a single server instance to the database, while allowing concurrent requests to effectively share the "database connection pool" connections. However, if each instance can only handle one request and maintain an open connection to the database, there is a one-to-one relationship between the number of requests and the number of connections to the database. The result is that during peak loads, the database can become overwhelmed with too many connections and eventually reject new connections. If the maximum number of connections for a database instance is 100 and FaaS is used, the schematic diagram is as follows:

FaaS with DB

Single instance with multiple concurrency

Therefore, regarding the pain point of one concurrent request per instance in the FaaS field, Google Cloud Run provides the capability of multi concurrent requests per instance [6], which is a good solution to the pain point of the single-instance single-concurrent scaling model we discussed above:

The default maximum concurrency of a single instance of Google Cloud Run (that is, the upper limit of the number of concurrent requests for a single instance) is 80, and the maximum can be adjusted to 100

1. IO waiting period is no longer a waste of resources

Google Cloud Run IO-Intensive workload

2. The impact on shared resources can be expected: improve database connection throughput

Google Cloud Run With DB

If each instance is configured with a database connection pool size of 10, then each instance can allow 10 parallel requests to the database. Since each instance may receive up to 80 concurrent requests, the Database Connection Pool will automatically block incoming requests while waiting for the database connection to be released and returned to the pool. By using 10 database connections to serve 80 requests, you can theoretically increase the throughput of the database by a factor of 10 before the database reaches its maximum connection limit.

Interestingly, some FaaS vendors bravely made attempts to implement multi concurrent requests per instance. For example, Alibaba Cloud Function Computing sets instance concurrency  , and the second generation of Google Cloud Functions also supports setting instance concurrency . Designed to solve the most important modern IO-intensive workloads.

Why Serverless uses concurrency for scaling

FaaS and Google Cloud Run use the index of instance concurrency (that is, the upper limit of the number of concurrent requests of the instance) to expand and shrink capacity, instead of using HPA strategies such as CPU indicators, because in the serverless field, instance concurrency is the best way to express "scaling and shrinking based on request processing/event-driven".

  • Both FaaS and Google Cloud Run have the ability to shrink the instance to 0 and pull up a new instance when a request comes in. In the process of instance 0-1, it is not possible to use indicators such as CPU or memory to expand capacity.
  • Better matching of request processing: Concurrency can better match the number of actual requests, so computing resources can be better utilized, while ensuring that requests can be responded quickly. A comparison of resource matching request speed between Alibaba Cloud Function Compute and K8S [7]:

  • Better resource utilization: The instance concurrency strategy can make better use of computing resources. It can quickly expand capacity during peak request periods, and keep the minimum number of instances when there are fewer requests, thereby reducing resource waste. FaaS and Google Cloud Run allow users to run code in any language and automatically scale to match the traffic: Total concurrency = number of instances processing requests simultaneously * maximum maximum number of concurrent requests per instance

Of course, the introduction of the concept of concurrency also brings new doubts to developers who are accustomed to scaling such as CPU indicators. For IO-intensive applications, the HPA expansion strategy based on CPU indicators can easily improve the availability, performance, and reliability of applications, and make resource utilization more efficient. On the contrary, how to set a reasonable value for the maximum concurrency of a single instance is a headache? For this issue, the industry usually recommends that you perform stress testing and iteratively find an appropriate concurrency value based on your load conditions. Alibaba Cloud Function Compute has made the most cutting-edge exploration in the industry, providing automated recommendation capabilities: from bronze to king, revealing the optimal configuration of serverless automated functions [8], and thus looking forward to intelligent dynamic concurrency: In this mode, users do not need to manually configure parameters, but dynamically adjust them when the function is running, and automatically adjust to the optimal value according to the health indicators of the instance CPU load.

in conclusion

Based on the above discussion of concurrency, which serverless product should I choose to host my application for the two forms of serverless products: single instance single concurrency (cloud products represent FaaS) and single instance multiple concurrency (cloud products represent Google Cloud Run)? Here are a few scenarios where I personally would suggest which product to choose:

But in the end, you still need to make trade-offs based on your specific business needs and choose the most suitable products and solutions.
Note: Function Compute FC and Google Cloud Functions V2 in FaaS also support single-instance multi-concurrency

The suggestions in the above table are based on the user’s preferred deployment times for applications in the Alibaba Cloud Function Computing Application Center [9] [see the figure below] and customer landing cases [see reference 12] to prove it. Especially for each request that must be isolated from each other or CPU-intensive tasks, FaaS has unparalleled advantages:

  • For stock applications, CPU-intensive tasks are extracted from the application to improve service stability. This article  PDF Generation With AWS Lambda [10] discusses the benefits of this practice in depth.
  • For new business CPU/GPU-intensive applications, such as audio and video processing and the recently popular large-scale model AIGC (AI generated content) application, FaaS is a natural fit scenario.
The scheduling of requests and back-end resources in AI scenarios has higher requirements than traditional microservice scenarios. The main reason is that requests in AI scenarios consume a lot of resources. For example, if a Stable Diffusion is deployed using an A10 GPU card, an A10 card (ecs.gn7i-c8g1.2xlarge) starts the Stable Diffusion service and can only process single-digit text drawing requests at a time. Once there are too many requests coming in at the same time, there will be competition for computing resources, which will cause the request to time out. The "one concurrent request per instance" of FaaS naturally fits this scenario, and it is a perfect match.

Function computing FC application center file processing application deployment diagram

Function computing FC application center audio and video processing application deployment diagram

Function Compute FC Application Center AI Application Deployment Diagram

references

  1. https://www2.eecs.berkeley.edu/Pubs/TechRpts/2019/EECS-2019-3.pdf
  2. https://glossary.cncf.io/serverless/
  3. https://docs.aws.amazon.com/lambda/latest/operatorguide/scaling-concurrency.html
  4. https://nathanpeck.com/concurrency-compared-lambda-fargate-app-runner/files/Concurrency%20Compared.pptx
  5. https://help.aliyun.com/document_detail/438853.html
  6. https://cloud.google.com/run/docs/about-concurrency?hl=zh-cn
  7. https://developer.aliyun.com/article/1243681
  8. https://developer.aliyun.com/article/1161868
  9. https://help.aliyun.com/document_detail/606948.html
  10. https://medium.com/1mgofficial/pdf-generation-with-aws-lambda-627b8dd07c77
  11. https://realpython.com/python-concurrency/
  12. Case reference for the implementation of the Function Computing Application Center
    1. The Serverless Exploration Road of Netease Cloud Music Audio and Video Algorithms
    2. Focusing on the issue of flexibility, the road to serverless in Hangzhou Mingshitang
    3. Optimizing 20% ​​resource cost, New Oriental's serverless practice road
    4. When Rokid meets Function Compute
    5. Multiple game company apk real-time packaging channel package  repackAPK

Author|Xi Liu (Alibaba Cloud Technical Expert)

Click to try cloud products for free now to start the practical journey on the cloud!

Original link

This article is the original content of Alibaba Cloud and may not be reproduced without permission.

The 8 most in-demand programming languages ​​in 2023: PHP strong, C/C++ demand slow Programmer's Notes CherryTree 1.0.0.0 released CentOS project declared "open to everyone" MySQL 8.1 and MySQL 8.0.34 officially released GPT-4 getting more and more stupid? The accuracy rate dropped from 97.6% to 2.4%. Microsoft: Intensify efforts to use Rust Meta in Windows 11 Zoom in: release the open source large language model Llama 2, which is free for commercial use. The father of C# and TypeScript announced the latest open source project: TypeChat does not want to move bricks, but also wants to fulfill the requirements? Maybe this 5k star GitHub open source project can help - MetaGPT Wireshark's 25th anniversary, the most powerful open source network packet analyzer
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/yunqi/blog/10089788