Serverless observability in the past, present and future

Introduction: The observability of function computing has undergone the development of 1.0 -> 2.0. It has developed from observability behind closed doors to open source observability, from platform observability to developer observability, and from FaaS Only observability to observability. Cloud-native observable.

Author: Xia Wan

background

Serverless will become the default programming paradigm for the next decade of cloud

 

With the further popularization of the serverless concept, developers have gradually entered the trial stage from a wait-and-see state, and more and more enterprise users have begun to migrate their businesses to the serverless platform. Within the Alibaba Group, core functions such as Taobao, Fliggy, Xianyu, Gaode, and Yuque have been steadily implemented. Enterprises in the industry have also unlocked different scenarios used by Serverless. Serverless is becoming the default programming paradigm for the next decade of cloud.

 

More cases refer to the function to calculate the user Case

 

Serverless's features of cost reduction, efficiency increase, and free operation and maintenance have brought real benefits to developers: the serverless solution based on function computing has saved Bluemo's IT costs by about 60%, and has saved 58% of server costs for Graphite Document; improved The development efficiency of Malong Technology has realized the functional launch within two weeks; Sina Weibo, which supports the load with a peak and valley difference of more than 5 times, can easily handle billions of requests every day.

 

Advertisement time: Welcome to join the cloud-native serverless team (function computing, serverless workflow, serverless application engine), and build an industry-leading serverless product system in the trinity of public cloud, group, and open source community. See JD for job requirements. The recruitment is effective for a long time. Interested students can contact the author of this article or @Chang, Shuai(shuai.chang).

 

Observability becomes a stumbling block to the development of Serverless?

 

With the in-depth use of serverless, developers have gradually found that problem positioning under the serverless architecture is more difficult than traditional applications. The main reasons are as follows:


  • Distributed components : Serverless architecture applications often combine multiple cloud services, and requests need to flow through multiple cloud products. Once the end-to-end latency becomes longer or the performance does not meet expectations, the problem location is very complicated, and you need to go to each product side in turn Investigate step by step.


  • Scheduling black box : The serverless platform assumes the responsibility of request scheduling and resource allocation. Real-time elastic expansion will bring an inevitable cold start. Serverless resource expansion does not require developers to participate and is not controlled by developers. A cold start will affect the end-to-end delay. Whether this request encounters a cold start, which steps are spent in the cold start, and whether there is room for optimization are questions that developers are eager to know.


  • Execution environment black box : Developers are accustomed to executing their own code on their own machines. If something goes wrong, log in to the machine to check the abnormal site, and check the CPU/memory/IO situation of the execution environment. In the face of serverless applications, the machine is not its own, nor can it be boarded, nor can it be seen, and the developer’s eyes are completely dark.


  • Product non-standardization : In the serverless scenario, developers cannot control the execution environment, cannot install probes, cannot use the open source third-party monitoring platform, and the method of investigating problems has to be changed. The traditional experience of investigating problems cannot be used, which is very uncomfortable. .

 

Function computing is a serverless product of Alibaba Cloud. In the past year, the function computing team has made a lot of efforts to better answer the above questions.


This article mainly introduces the attempts of function calculation in observability and the status quo of function calculation observability.

 

Observability under Serverless

 

Observability is a measure of the internal state of the system through external performance.

--Wikipedia

 

In application development, observability helps us to judge the health of the system. When the system is running smoothly, it helps us assess risks and predict possible problems. When there is a problem in the system, it helps us quickly locate the problem and stop the loss in time.


A good observability system should help users find problems as quickly as possible, locate problems, and solve problems end-to-end.


In serverless, an O&M-free platform system, observability is the eyes of developers. Without observability, how can we talk about high availability?

 

Observability 1.0


1.png

Figure 1: The basics of observability

 

Observability mainly consists of three parts: logs, indicators, and link tracking.


Like almost all FaaS products, Function Computing (FC) supports the viewing of function logs and indicators at the beginning of commercialization.

 

  • Function log

The user configures the SLS Project and Logstore in FC, and FC transfers the log of the function to stdout to the user's Logstore. Users can view function logs through the SLS console, and analyze and aggregate the logs with the help of SLS capabilities.


  • Basic index

FC pushes indicator logs to cloud monitoring and provides basic indicators such as the number of function calls/errors/function delay/function memory through cloud monitoring.

 

The function log and basic indicators are the stethoscope of the application. Although simple and simple, they can also help users find and locate problems.


Even if there are problems that developers cannot troubleshoot, in an era when the number of users is not so large, development students can provide users with personal services, combined with background logs to help users locate problems.

 

Function logs and metrics using detailed information, please refer to configure and view the log function , monitoring indicators .

 

Observability 2.0 - cloud-native observability

 

With the development of Serverless, more and more scenarios are implemented in Serverless, the scale of use is getting larger and the product architecture is becoming more and more complex, the observability 1.0 of the application of stethoscope can no longer meet the monitoring requirements of developers in all walks of life. This almost black box execution environment brings a strong sense of distance and distrust to developers.


Developers need to control their applications . They want to know how each request has gone through in function calculation, whether the long end-to-end delay is due to a cold start, and want to check the execution environment of the function instance. When there is an exception in the request, the problem is located at the first time, and I want to reuse the familiar open source observation platform.

 

In the face of these needs, the team has also undergone a long period of intense discussions. Some students believe that we should support these needs, and some other students believe that these needs are against the essence of Serverless to some extent. Serverless is to shield the bottom layer. Computing resources, users do not need to care about the underlying computing resources. On the other hand, what is the use of these indicators? Even if the user sees a cold start, sees the system time consumption, and sees the CPU of the underlying instance, the user cannot have any actual operations. Are these indicators really meaningful? These two views are arguing, and I am a firm opponent.

 

Later, the team moved to EFC and waited every day for the elevator that did not know when it would come (enter the floor you want to go to, go to the corresponding elevator and wait quietly, and you can't see the elevator's current floor). The elevator tells us, you just wait here, I will definitely come, but you don’t need to know which floor I am on now, and when I will get off, you don’t need to know, it’s useless if you know, my scheduling is definitely the best , You have to trust the dispatching algorithm of professional elevators. But how can I trust you?

 

As far as developers are concerned, function computing is also the elevator that does not know when it will come. We told the developers that we will execute your request stably, your execution environment must be healthy, and if there are too many requests, we will automatically expand the capacity. , But the monitoring indicators of the current instance, you don’t need to know when I expand the capacity, our scheduling is definitely the best, you have to trust the scheduling algorithm of the professional R&D team. Similarly, how can developers trust us?

 

The observability of Serverless is not only to help developers troubleshoot problems, but also to gradually unveil the mystery of Serverless and win developers' trust in Serverless.

 

So with the function computing observability 2.0, we hope that the observability 2.0 can become the applied electrocardiogram.


3.png

Figure 2: The current state of the observability of function computing

 

  • In order to answer requests in the life course of functional computing, connect upstream and downstream services of distributed systems, and embrace open source observability, we integrated OpenTracing to support link tracing.
  • In order to expose system status and provide application-level monitoring, we integrate ARMS (Java) and built-in APM capabilities.
  • In order to speed up the end-to-end location of the problem, we support the request level indicator (FC Insights), release the monitoring center, problem discovery/investigation one-stop solution.
  • In order to be compatible with the existing user experience of developers, we embrace open source, integrate OpenTracing, and support Grafana Dashboard; we support a three-party monitoring platform, and achieve almost zero code modification to access the APM monitoring system.
  • In order to be compatible with the observable experience of traditional developers and support probe installation, we have expanded the programming model to support the function LifeCycle, which provides the possibility of integrating three-party monitoring.

 

3.png

 Figure 3: Function computing is compatible with open source observability

 

Compared with the new experience of FaaS observability invented and created by myself, functional computing is compatible with open source observability, integrated with Jaeger, supports the Grafana market, and also supports access to excellent third-party monitoring platforms such as New Relic with very small changes. Function computing is the first FaaS provider that is compatible with open source and embraces container ecology and cloud native developers. The smooth migration of observable experiences supports the smooth migration of applications on containers and serverless platforms.

 

Integrated OpenTracing, support link tracking

 

FC is integrated with the link tracking service to provide developers with complete call link restoration, call volume statistics, link topology analysis, cold start positioning and other tools. Help developers quickly analyze and diagnose performance bottlenecks under distributed architecture.


FC link tracking has the following characteristics:


• Embrace open source: Fully compatible with the OpenTracing protocol, no additional learning costs;

• Active recording: the end-to-end time consumed by the report request in function calculation;

• Scheduling transparency: Exposing code preparation time and instance startup time, it is the first FaaS product that exposes cold start delay and specific time consumption;

• Linking up and down: connecting upstream and downstream applications, connecting with upstream applications through span context, and passing span context into functions to connect downstream services.


Picture 1.png

 Figure 4: Example of link tracking link

 

5.png

Figure 5: Details of comprehensive link tracking capabilities

 

Integrated ARMS, built-in APM capability

 

FC seamlessly docks with ARMS application monitoring. Developers only need to add an environment variable to the function to enable the APM application monitoring function. ARMS probes monitor application performance in a non-intrusive manner to the code, providing application-level observability, including The CPU, memory indicators, Java virtual machine indicators, code profiling information, SQL query and other function instance indicators of the function instance.


4.png 

Figure 6: ARMS example

 

Release of the monitoring center (Insights), one-stop solution for problem discovery and investigation

 

FC supports request-level indicators, and a camera is installed for requests by creating an indicator log for each user request. Through request level indicators, users can clearly see the execution time of the request, the memory used, whether it is abnormal, the type of error, the cold start, traceID and other information. It is also possible to chain all observability capabilities based on request level indicators.

 

The monitoring center is the master of FC observability capabilities. The monitoring center integrates the capabilities of Metrics, Logs, and Tracing. It can complete the ability to preview indicators, view logs, and analyze links at one site, and strive to achieve one-stop problem discovery and investigation.式Solution.

 

The monitoring center has the following characteristics:

 

  • Multi-dimensional: Supports multi-dimensional indicators of Region, Service, Function, Qualifier, and Request, showing the number of calls and error distribution in each dimension;
  • Multi-level: Integrate the capabilities of Metrics, Logs, and Tracing to monitor applications in all directions at multiple levels;
  • Full link: Combining indicators, logs, links and other information, step by step, and peel off the cocoon. It is truly possible to discover, locate and solve problems at one site.

6.png 

Figure 7: Example of a monitoring center

 

Extended programming model, integrated three-party monitoring

 

The life cycle of a function instance is completely controlled by the platform. The user cannot control the startup and recycling of the instance, nor does it perceive the pause and restart of the instance, which makes it extremely difficult to execute background threads other than the main thread in function calculations. Monitoring probes are one of many important background threads.

 

FC expands the programming model and releases the Runtime LifeCycle function. The Runtime LifeCycle monitors the life cycle events of the function instance, allowing the function instance to call back the user's function logic before being suspended and recycled. The release of this function makes it possible for FC to integrate three-party APM monitoring. Users only need to send out the collected indicators before the instance is suspended and clear the data in the memory before the instance is reclaimed to view the monitoring indicators in real time on the APM platform.


8.png

Figure 8: Tingyun APM example


 9.png

Figure 9: NewRelic APM example


to sum up

 

The observability of functional computing has undergone a 1.0 -> 2.0 development, from observability behind closed doors to open source observability, from platform observability to developer observability, and from FaaS Only observability to cloud native observability. Observable.


As the first FaaS provider that is compatible with open source and observable, embraces container ecology and cloud native developers, functional computing will also be more capable of achieving a smooth migration of developers' businesses.

 

future plan

 

FC observability has taken a small step forward compared to a year ago. It has evolved from black box observability to faint candlelight observability, but there is still a long way to go before the goal of "serverless application white-boxing" . We hope to be compatible with the developer's monitoring experience and support users to move their business to Serverless smoothly and assuredly.


Next, we will continue to invest in the following things:

 

  • Improve the monitoring center, support alarm configuration, and early warning of abnormal indicators;
  • Provide instance-level indicators, so that code problems can be located and the environment can be traced back to the scene;
  • Integrate open source projects, integrate Prometheus, Opentelemetry, and configure the Grafana market;
  • Enrich the content of indicators. At present, there are still some indicators that are not easy to reveal, and we have to gradually expose them;
  •  ... ...

 

I hope that the observability of function calculation will become a light to illuminate every serverless application.

Original link: https://developer.aliyun.com/article/782787?

Copyright statement: The content of this article is voluntarily contributed by Alibaba Cloud real-name registered users. The copyright belongs to the original author. The Alibaba Cloud Developer Community does not own its copyright and does not assume corresponding legal responsibilities. For specific rules, please refer to the "Alibaba Cloud Developer Community User Service Agreement" and the "Alibaba Cloud Developer Community Intellectual Property Protection Guidelines". If you find suspected plagiarism in this community, fill in the infringement complaint form to report it. Once verified, the community will immediately delete the suspected infringing content.

Guess you like

Origin blog.csdn.net/alitech2017/article/details/115001213