Observability of function calculation

Head picture.jpg

Author | Xia Guan Alibaba Function Computing Team
This article is compiled from "Serverless Technology Open Course"

Introduction : This article is mainly divided into three parts: the basic concept of observability is introduced in the overview, mainly including three aspects of Logging, Metrics, and Tracing; then, the Logging, Metrics, and Tracing in function calculation are introduced in detail; and finally, several common scenarios Take an example to introduce how to quickly locate and solve problems in function calculations.

Overview

What is observability? According to Wikipedia: Observability is a measure of the internal state of the system through external performance.

In application development, observability helps us judge the health of the system. When there is a problem in the system, it helps us locate, troubleshoot, and analyze the problem; when the system is running smoothly, it helps us assess risks and predict possible problems. Assessing risks is similar to weather forecasting. If it is predicted that it will rain tomorrow, you must bring an umbrella when you go out. In the application development of function computing, if the concurrency of the function is observed to continue to increase, it is likely that the hard work of the business promotion team has led to the rapid expansion of the business scale. In order to avoid reaching the concurrency limit to trigger flow control, developers need to improve in advance Concurrency.

1.PNG

Observability includes three aspects: Logging, Metrics, Tracing

  • Logging is a log. The log records the key information in the operation of the function. This information is discrete and specific. Combining the error log and the function code can quickly locate the problem.
  • Metrics are indicators, aggregated data, usually displayed in the form of charts. The core indicators such as tps and error rate in the chart can reflect the operation and health of the function.
  • Tracing is link tracing, which is request-level tracing. In a distributed system, you can see the delay of requests in each module and analyze performance bottlenecks.

Logging/Metrics/Tracing in function calculation

1. Log

How to view function log in function calculation? In the traditional server development method, you can record the log to a file on the disk, and then collect the content of the file through the log collection tool; in the function calculation, the developer does not need to maintain the server, so how to collect the code and print it What about the log?

1) Configuration log

Function calculation and log service are seamlessly integrated, and function logs can be recorded in the log store (Logstore) provided by the developer. Log is an item in the service configuration. LogProject and Logstore are configured for the service. The logs printed by all functions under the same service through stdout will be collected in the corresponding Logstore.

2) Logging

How about logging? Can the log printed directly through console.log/print in the code be collected? The answer is yes. The log printing libraries provided by various development languages ​​print logs to stdout, such as node.js's console.log(), python's print(), golang's fmt.Println(), etc. The function calculation collects all logs printed to stdout and uploads them to the Logstore.

The call of function calculation is of request dimension, and each call corresponds to a request, which corresponds to a requestID. When the number of requests is large, there will be massive logs. How to distinguish which logs belong to which request? This requires the requestID to be recorded in the log together. Function calculation provides built-in log statements, and each log is printed with a request ID to facilitate log filtering.

3) View log

When function logs are collected in the Logstore of Log Service, you can log in to the Log Service console to view the logs.

At the same time, the function computing console is also integrated with the log service, and the logs can be viewed on the function computing console. There are two ways to query the function computing console:

  • Simple query : The log corresponding to each requestID is listed in the simple query, and the log can be filtered by requestID;
  • Advanced query : The advanced query is embedded with the log service and can be queried through SQL statements.

Click the link to watch the Demo demo:https://developer.aliyun.com/lesson_2024_18996

2. Indicators

Ways to view indicators:

  • Function details to view monitoring indicators: FC provides a wealth of system indicators, these indicators can be viewed in the function calculation console without any configuration.
  • Configure the log disk: The log disk can not only see the monitoring indicators provided by function calculation, but also can be associated with the developer log to generate custom monitoring indicators.

3. Link tracking

2.png
(Request the delay waterfall chart on each link)

Link tracing is an important part of troubleshooting problems in distributed systems. Link tracing can analyze the delay of requests in each link in the distributed system. There are several situations:

  • As a link in the entire link, function calculation can see the time delay of the request in function calculation. The time delay includes the system startup time and the real execution time of the request, helping users analyze performance bottlenecks.
  • If you call FC SDK in function calculation, you can see the call delay of SDK API by default.
  • Developers access products such as databases in the function code, and can manually bury points in the function to analyze this delay.

Troubleshooting

Function calculation provides many observability-related functions, so how to locate the problem? Take a few scenarios as examples.

Scenario 1: After the new version is released, the error rate of functions increases

First, after the release of the version, observe the various indicators of the function, once the error rate rises, roll back immediately to avoid failures, check the function log to locate the cause of the error, and fix the problem again.

Scenario 2: Poor function performance, always taking a long time to execute, even overtime

Turn on the tracing function, bury points in the function that may be time-consuming, view the requested waterfall chart, locate the cause of the long execution time, and fix the problem.

Scenario 3: The business volume is expanding rapidly, and the concurrency is about to reach the concurrency limit

Check the current concurrency through metrics. When you observe that the concurrency continues to increase, contact the function computing development classmates in time to improve the concurrency.

The Serverless official account releases the latest information on Serverless technology, gathers the most complete content of Serverless technology, pays attention to the trend of Serverless, and pays more attention to the confusion and problems you encounter in your practice.

Guess you like

Origin blog.51cto.com/14902238/2562955