Application business indicator monitoring of distributed system observability

After the 2017 Distributed Tracing Summit (2017 Distributed Tracing Summit), Peter Bourgon wrote a summary article "Metrics, Tracing, and Logging" to systematically expound the definitions and characteristics of these three, as well as the relationship and differences between them. In this paper, the observability problem is mapped to how to deal with three types of data: metrics, tracing, and logging.

Later, in her book "Distributed Systems Observability", Cindy Sridharan further mentioned that indicators, traces, and logs are the three pillars of observability .

Relationships and differences between metrics, traces, and logs

In 2018, CNCF Landscape took the lead in introducing the concept of Observability, introducing Observability from Cybernetics to the IT field. In cybernetics, observability refers to the degree to which a system can infer its internal state from its external output. The stronger the observability of the system, the stronger our controllability of the system.

What problems can observability solve? Chapter 12 of the Google SRE Book gives a concise answer: Quick Troubleshooting .

In the cloud-native era, distributed systems are becoming more and more complex, and changes to distributed systems are very frequent, and each change may lead to new types of failures. After the application is launched, if there is a lack of effective monitoring, it is likely that we will not know about the problems encountered. We need to rely on user feedback to know that there is a problem with the application.

This article mainly describes how to establish Metrics monitoring of application business indicators and how to achieve precise alarms . Metrics can be translated into measures or indicators, which refers to making regular statistics on some key information in an aggregated and numerical form, and drawing various trend charts. Through it, we can observe the status and trends of the system.

technology stack selection

Our applications are all Spring Boot applications, and use Spring Boot Actuator to implement health checks for applications. Starting from Spring Boot 2.0, Actuator changed the bottom layer to Micrometer, providing stronger and more flexible monitoring capabilities. Micrometer supports docking with various monitoring systems, including Prometheus.

Therefore, we choose Micrometer to collect business indicators, Prometheus to store and query indicators, display them through Grafana, and realize accurate alarms through Alibaba Cloud's alarm center.

Solution Architecture

reference

Guess you like

Origin blog.csdn.net/shupili141005/article/details/128069418