How does Serverless achieve large-scale implementation in Alibaba?

Head picture.jpg

Author | Zhao Qingjie (Lu Ling)
Source | Serverless Official Account

1. The results of the large-scale serverless landing group

In 2020, we have made a very big upgrade in the underlying infrastructure of Serverless. For example, computing has been upgraded to the fourth-generation Shenlong architecture, storage has been upgraded to Pangu 2.0, and the network has entered the 100G Luoshen network. After the overall upgrade, the performance has doubled; The BaaS level has also been greatly expanded, such as supporting Event Bridge, Serverless Workflow, and further improving system capabilities.

In addition, we have also cooperated with more than a dozen BUs in the group to help the business side implement Serverless products, including the core application scenarios of Double 11, and help them successfully pass the double 11 traffic peak test, which proves that Serverless is at the core In the application scenario, it still performs very stable.

1.jpg

2. Two backgrounds, two advantages-to accelerate the implementation of Serverless

1. Serverless two backgrounds

Why can Serverless be implemented on a large scale quickly within the group? First of all, we have two major premise backgrounds:

The first background is going to the cloud . The group going to the cloud is an important prerequisite. Only going to the cloud can enjoy the elastic dividend on the cloud. If it is still a cloud within oneself, the subsequent effect and cost reduction are actually very difficult to achieve, so 2019 On Double Eleven, Ali realized 100% of its core system to go to the cloud. With the premise of going to the cloud, Serverless has room to play a very important role.

The second background is full cloud-native , creating a powerful cloud-native product cloud family, empowering the group's internal business, and helping the business achieve the two main goals based on the cloud: improving efficiency and reducing costs In 2020, Tmall Double Eleven core system will be fully cloud-native, efficiency will be improved by 100%, and cost will be reduced by 80%.
 

2. Two advantages of Serverless

  • Improve performance

2.jpg

For a standard cloud-native application, from R&D to launch to operation and maintenance, you need to complete all the work items marked in orange in the above figure to complete the formal microservice application online. The first is the CI/CD code construction, and the other is the system operation and maintenance. Visualization work projects not only need to be configured and connected, but also flow assessment, security assessment, and flow management of the overall data link are required. Obviously, the requirements for manpower threshold are already very high. In addition, in order to improve resource utilization, we also need to mix each business, and the threshold will be further increased.

It can be seen that for the overall cloud-native traditional applications, it is very difficult for developers to complete the work items that need to be completed for microservices to go online. It needs to be completed by multiple roles, but if it comes to the serverless era, developers only need to complete the work. The blue box coding in the figure, for all the remaining work items, the serverless R&D platform can directly help the business to complete the online.
 

  • reduce costs

The improvement of efficiency mainly refers to the saving of labor costs, while the reduction of costs is aimed at the utilization of resources. For ordinary applications, we need to reserve resources for peaks, but troughs will cause great waste. In the serverless scenario, we only need to pay on demand and refuse to reserve resources for peaks. This is the biggest advantage of serverless in reducing costs.

3.jpg

The above two backgrounds and two advantages are in line with the trend of cloud technology. Therefore, the business side within the group hit it off. Some large BUs have upgraded serverless to the operational level, accelerating the serverless scenario of business landing. At present, the serverless scenarios implemented in the group are very rich, involving some core applications, personalized recommendations, video processing, AI reasoning, business inspections, and so on.
 

3. Serverless landing scenario-front-end light application

At present, the front-end scenes within the group are the fastest and most widely used scenes for serverless , including more than 10 BUs such as Taoxi, Gaode, Feizhu, Youku, and Xianyu. So why is the front-end scene suitable for Serverless?

4.jpg

The above figure is the capability model diagram of a full-stack engineer. There are three roles in a general micro-application: front-end engineer, back-end development engineer, and operation and maintenance engineer, all of which jointly complete the online release of the application. In order to improve efficiency, the role of a full-stack engineer has emerged in recent years. As a full-stack engineer, he must have the capabilities of these three roles. He needs not only front-end application development technology, but also back-end system-level development skills, and Pay attention to the underlying kernel, system resource management, etc., which is obviously very high for front-end engineers.

In recent years, the rise of Node.js technology can replace the role of back-end development engineers. As long as front-end engineers have front-end development capabilities, they can play two roles, namely front-end engineers and back-end development engineers, but operation and maintenance engineers still cannot be replaced.

The Serverless platform solves the bottom three layers in the triangle structure above, which greatly reduces the threshold for front-end engineers to become full-stack engineers, which is very tempting for front-end business developers.

5.jpg

Another reason is that the business characteristics are consistent. Most front-end applications have the characteristics of traffic peaks, which require business evaluation beforehand, and there are evaluation costs; at the same time, the front-end scenes are updated and iterated quickly, fast up and down, high operation and maintenance costs; and lack of dynamic expansion Shrinking capacity, there are resource fragments and resource waste. And if you use Serverless, the platform will automatically help you solve all the above worries, so Serverless is very attractive to front-end scenarios.

1. Front-end landing scene

6.png

The above figure lists several main scenarios and technical points of the front-end landing:

BFF is converted to SFF layer : BFF is mainly Backend For Frontend, front-end engineers do the main operation and maintenance, but in the serverless era, operation and maintenance are completely handed over to the serverless platform, front-end engineers only need to write business code to complete this work.

Slimming : Sink the front-end business logic to the SFF layer, and the SFF layer will do the logic reuse, and the operation and maintenance capabilities are also handed over to the serverless platform to achieve lightweight client and sink efficiency-enhancing functions.

Cloud integration : One code multi-terminal application, which is a very popular development framework, and also requires SFF as support.

CSR/*** : Through Serverless to meet the server-side rendering, client-side rendering, etc., to achieve the rapid display of the front-end first screen, Serverless combined with CDN as a whole can be used as a front-end acceleration solution.

NoCode : It is equivalent to packaging on the Serverless platform. Just drag and drop a few components to build a front-end page. Each component can be packaged and aggregated with Serverless to achieve the effect of NoCode.

Mid- and back-end scenarios : Mainly single-rich application scenarios. Single applications can be completely hosted in the serverless mode to complete mid- and back-end applications. This can also save operation and maintenance capabilities and reduce costs.
 

2. Front-end Coding changes

After applying Serverless in front-end scenarios, what changes have been made to coding?

7.jpg

Those who have a certain understanding of the front-end know that the front-end is generally divided into three layers: State, View and Logic Engine. At the same time, some abstract business logic will be submerged on the FaaS layer cloud function, and then the cloud function will be used as the FaaS API to provide services. Various types of Aactions can be abstracted in code writing, and each Aaction can have FaaS function APIs to provide services.

8.jpg

Take a simple page as an example. On the left side of the page are some rendering interfaces, which can obtain product details, shipping addresses, etc., which are implemented based on the Faas API; on the right are some interaction logic, such as purchase, add, etc., which is also Faas API can continue to complete tasks.

In page design, all Faas APIs are not only used by one page, but can be reused for multiple pages. After reusing these APIs or dragging and dropping, the front-end page assembly can be completed, which is very convenient for the front-end.

3. Front-end light application research and development to improve efficiency: 1-5-10

9.jpg

After applying Serverless on the front-end, we briefly summarized the efficiency improvement of Serverless on the front-end R&D efficiency as 1-5-10, which means:

Quick start in 1 minute : We summarize various main scenarios and classify them as application templates. When each user or business party starts a new business, they only need to select the corresponding application startup template to help users quickly Generate business code, users only need to write their own business function code to start quickly.

Online application in 5 minutes : Fully reuse the Serverless operation and maintenance platform, use the platform's natural capabilities to help users complete gray-scale releases and other capabilities; and cooperate with front-end gateways, stream cuts, and other functions to complete canary tests.

10-minute troubleshooting : Based on the serverless function after the launch, it provides the display of business indicators or system indicators. The indicators can not only set alarms, but also push error logs to users on the console to help users quickly locate and analyze problems. Master the health status of the entire Serverless function within 10 minutes.
 

4. Serverless effect on the front end

What is the effect of the serverless scenario in the front end? We compared the performance and man-hours required by the three apps in the traditional application development mode with the Faas scenario after applying the Faas scenario. It can be clearly seen that the performance can be increased by 38.89% on the original cloud-native basis , which is useful for Serverless applications or For the front-end application, the effect is very impressive. At present, the serverless scene has almost covered the entire group, helping the business side to achieve serverlessness and achieving the two main goals of improving efficiency and reducing costs .

4. Technology output, expand new scenarios

During the implementation of the Group's Serverless, we discovered many new business demands, such as how to quickly migrate existing businesses and save costs? Can the execution time be adjusted larger or longer? Can resource allocation be adjusted higher? Wait, we have proposed some solutions to these problems. Based on these solutions, we have abstracted some functions of the product. Next, we will introduce several more important functions:

1. Custom Mirror

10.jpg

The main purpose of custom mirroring is to achieve seamless migration of existing services, help users achieve zero-code transformation, and completely migrate business code to the Serverless platform.

The migration of existing business is a very big pain point. Within a team, it is impossible to have two R&D models for a long time, which will cause great internal consumption. If you want business parties to migrate to the Serverless R&D system, you must launch a thorough transformation plan to help users realize the transformation of the Serverless system. Not only need to support the use of Serverless for new businesses, but also help existing businesses to achieve zero-cost rapid migration, so we launched a custom Container function.

11.jpg

Features of traditional Web monolithic application scenarios :

  • Apply modern fine-grained responsibility splitting, service governance and other operation and maintenance burdens;
  • Historical burdens are not easy to serverless: business codes on and off the cloud are not uniform in dependency and configuration;
  • Capacity planning, self-built operation and maintenance and monitoring system;
  • Low resource utilization (low-traffic services exclusive resources).

Function computing + container image advantages :

  • Low-cost migration of single applications;
  • Free operation and maintenance;
  • No capacity planning, automatic scaling;
  • 100% resource utilization, optimizing idle costs.

The custom container function allows traditional Web monolithic applications (such as SpringBoot, Wordpress, Flask, Express, Rails and other frameworks) to be migrated to function computing in mirroring mode without any modification, avoiding low-traffic business exclusive servers. Waste of resources. At the same time, you can also enjoy benefits such as no need to do capacity planning for applications, automatic scaling, and free shipping.

2. Performance examples

12.jpg

High-performance instances, reduce usage restrictions, and expand more scenarios. For example, the code package has increased from 50M to 500M, the execution time has been increased from 10 minutes to 2 hours, and the performance specifications have increased by more than 4 times. It can support up to 16G and 32G large-format instances to help users run Time-consuming long tasks and so on.

13.jpg

Function computing has served many scenarios, and we have received many demands during the service process, such as many constraints, high thresholds for use, and insufficient resources for computing scenarios. Therefore, for these scenarios, we have introduced the performance instance function. The goal is to reduce the use restrictions of function computing application scenarios and lower the use threshold. In addition, users can flexibly configure and configure on demand in terms of execution time and various indicators.

At present, the 16-core 32G we support has exactly the same computing power as the ECS of the same specification, and can be applied to high-performance business scenarios such as AI reasoning, audio and video transcoding, etc. This function is very important for subsequent expansion of application scenarios.

Challenge :

  • Flexible instances have many constraints and certain usage thresholds, such as execution time, instance specifications, etc.;
  • In heavy-computing scenarios such as traditional single applications and audio and video, services need to be split and reconstructed, which increases the burden;
  • For resource dimensions such as vCPU, memory, and bandwidth, elastic instances have not given clear promises.

Goal :

  • Reduce the use limit of function calculation and lower the threshold for enterprise use;
  • Compatible with traditional applications and recalculation scenarios;
  • Give users a clear resource commitment.

Practice :

  • Introduce performance examples with higher specifications and clearer resource commitments;
  • In the future, performance instances will have higher stability SLA and richer functional configurations.

Main scenarios :
computing tasks, long-running tasks, and insensitive tasks for elastic scaling.

  • Audio and video transcoding processing;
  • AI reasoning;
  • Other computing scenarios that require high specifications.

Advantages :

In addition to relaxing restrictions, performance instances still retain all the capabilities of current function computing products: pay-as-you-go, reservation mode, single instance multiple requests, integration of multiple event sources, multi-zone disaster recovery, automatic scaling, and application construction Deployment and free operation and maintenance, etc.
 

3. Link tracking

14.jpg

Link tracking functions include: link restoration, topology analysis, and problem location.

A normal microservice cannot complete all the work with just one function and needs to rely on upstream and downstream services. When upstream and downstream services are normal, link tracking is generally not required, but if there is an abnormality in downstream services, how to locate the problem? At this time, you can rely on the link tracking function to quickly analyze upstream and downstream performance bottlenecks or locate the point of occurrence of problems.

Function Computing has also investigated many open source technology solutions inside and outside the group. It currently supports X-trace functions, is compatible with open source solutions, embraces open source, and provides product capabilities compatible with OpenTracing.

15.jpg
16.jpg

The above figure is a demo diagram of link tracing. By calculating tracing, you can visually see the database access overhead of back-end services, avoiding complex verification relationships between a large number of services and increasing the difficulty of troubleshooting. Function calculation also supports function code-level link analysis capabilities to help users optimize cold start and key code implementation.

Serverless products have brought huge benefits from a business perspective, but packaging also brings a staged problem-the black box problem. When we provide users with link tracking technology and at the same time expose black box problems to users, users can also improve their business capabilities through these black box problems. This is also the direction for Serverless to improve user experience in the future. We will continue to increase investment in this area in the future to reduce the cost of users using Serverless.

Challenge :

  • Serverless products have huge benefits from a business perspective, but packaging brings black box problems;
  • Serverless connects to the cloud ecosystem, and a large number of cloud services cause complex calling relationships;
  • Serverless developers still have requirements for link restoration, topology analysis, and problem location.

The main advantages of FC + x-trace :

  • Function code-level link analysis to help optimize the implementation of key codes such as cold start;
  • Service call-level link tracking helps connect cloud ecological services and analyze distributed links.

4. Asynchronous configuration

17.jpg

In the serverless scenario, we provide functions such as offline task processing and message opposition consumption. In function calculations, the usage rate of these functions accounts for about 50%. In the large amount of message consumption, there are many asynchronous configuration issues that are often challenged by the business side, such as, where do these messages come from? Where are you going? What services are consumed? Time spent? What is the success rate of consumption? and many more. The visualization/configurability of these problems is an important topic that needs to be resolved.

18.jpg

The above figure shows the working principle of asynchronous configuration. First, the asynchronous call is triggered from the event source specified by the user. The function calculation returns the request ID immediately. At the same time, the execution function can be called and the execution result is returned to the function calculation or the message queue MNS. Then you can configure triggers and so on through the event source, these effects or theme consumption, you can consume the message again. For example, if a message processing fails, you can configure it for secondary processing.

19.jpg

Typical application scenarios :

  • The first is the closed loop of events , such as the result analysis of delivery results (such as collection of monitoring indicators, alarm configuration); in production events, customers can not only use FC consumption events, but also use FC to actively produce events.
  • The second is daily exception handling , such as failure handling, retry strategies, etc.
  • The third is resource recovery . Users can customize the inventory time, discard useless messages in time, and save resources. This is a very big optimization for asynchronous scenarios.

About the author :
Zhao Qingjie (Ling Lu) currently works in the Alibaba Cloud native Serverless team, focusing on Serverless, PaaS, and distributed system architecture. He is committed to building a new generation of Serverless technology platform and making the platform technology more inclusive. Worked at Baidu, responsible for the largest internal PaaS platform, undertook 80% of the online business, and has extensive experience in the PaaS direction and back-end distributed system architecture.

This article is organized from the [Serverless Live series live broadcast] January 26th
live broadcast link:https://developer.aliyun.com/topic/serverless/practices

Guess you like

Origin blog.51cto.com/14902238/2635490