Exploration and Practice of Apache Dubbo Cloud Native Observability

Dubbo3 observability overview

Apache Dubbo3 has completed a major upgrade in terms of cloud-native observability. Using the latest version of Dubbo3, you only need to introduce the dubbo-spring-boot-observability-starter dependency, and the microservice cluster natively has the following capabilities:

Ability 1: Visually view cluster, stand-alone traffic indicators and health status

The latest version of Dubbo 3.2 supports the observation of running status at different granularities such as application, stand-alone, and single service, including qps, rt, thread pool, error classification statistics, etc.

Capability 2: Full Link Tracking

The latest version of Dubbo 3.2 collects the link data in the RPC request through the built-in link filter, and then exports the link data to major manufacturers through the exporter after collection.

https://cn.dubbo.apache.org/zh-cn/overview/tasks/observability/

Exploration of Cloud Native Observability

Challenges of cloud-native upgrades

In the first part of high-quality delivery, DevOps guarantees the quality and efficiency of development and testing, and cloud-native guarantees the efficiency and quality of operation and maintenance deployment. However, large-scale rapid iteration means frequent changes, and the stability problems caused by changes and system operation cannot be solved. Ignored, such as downtime, network and system abnormalities, etc., many unknown problems are unavoidable. With the help of observable systems to detect problems in a timely manner, analyze abnormalities efficiently, and quickly restore the system, avoid known problems in advance, dig deep into unknown problems, and efficiently improve As for the quality of operation and maintenance, we can see that building a complete observable platform is very necessary to discover known and unknown anomalies and improve the stability of the system.

Dubbo Observable Construction Target

As a microservice RPC basic framework, Dubbo directly builds a large and comprehensive observable system, which does not conform to the positioning and is not very realistic. However, it can provide more basic monitoring data from its own perspective to help enterprises establish observable systems. Observability Different from traditional single-dimensional monitoring, it pays more attention to the correlation of data. It observes and analyzes problems as a whole through single-dimensional and multi-dimensional perspectives. First, it starts from the popular three pillar indicators. On this basis, Dubbo provides multi-dimensional aggregation And non-aggregated indicators help users quickly discover and diagnose problems. Multi-dimensional indicators can then be associated with the link system through label information such as applications and hosts. The link system provides link performance and abnormal problem analysis functions at the service request level. Dubbo By providing a link portal to connect with all major link manufacturers, after link analysis, detailed logs can be tracked through link data such as TraceId, SpanId custom data, etc. The Dubbo side of the detailed log provides a wealth of expert advice and error codes For development and operation and maintenance students to quickly diagnose and locate problems.

Dubbo multi-dimensional index system

In the construction of the Dubbo multi-dimensional indicator system, from the vertical and horizontal perspectives, the vertical Dubbo side provides an easy-to-access façade, and then stores the indicators collected in the system in the memory indicator container, and then decides whether to implement it according to the indicator type. Aggregate calculation, and finally export the indicators to different indicator systems. From a horizontal perspective, the acquisition dimension also covers scenarios such as RPC request links that are prone to problems, interactions between the three centers, and thread resource usage.

What indicators are collected by the Dubbo multi-dimensional indicator system?

The collection of indicators on a large scale was introduced earlier, but what detailed indicators should Dubbo collect? Next, you can see some methodologies that Dubob refers to when collecting indicators.

According to the Google SRE book: Google's experience summary for a large number of distributed monitoring, four gold indicators (delay, traffic, error, and saturation) can help measure end-user experience, service interruption, and business impact at the service level.

The RED approach (from Tom Wilkie), the RED approach focuses on the request, the actual work, and the external perspective (that is, from the service consumer's perspective) including: rate, error, and duration.

The USE method (from Brendan Gregg): The USE method looks at resources internally, including: utilization, saturation, and errors.

Dubbo multi-dimensional index system access - export to QOS

The multi-dimensional index system has been released after version 3.2 and is in continuous iteration. For users, only one dependency needs to be introduced:

<dependency>
    <groupId>org.apache.dubbo</groupId>   
    <artifactId>dubbo-spring-boot-observability-starter</artifactId>        
    <version>3.2.x</version>
</dependency>

After the dependency is introduced, some key indicators will be enabled by default by default. You only need to access the current service 22222 service port and metrics path on the command line to obtain the indicator data. The 22222 port is the service quality provided by Dubbo, and the health management port can be used Modify through QOS configuration.

The queried Dubbo indicators are displayed in the format named: dubbo_type_action_unit_otherfun.

Of course, there will be cases where users directly use SpringBoot to manage ports. For this scenario, Dubbo has already done automatic adaptation and can directly use SpringBoot to export the index data in Prometheus format, as shown in the following configuration:

When accessing the SpringBoot management port to query indicator data, you can see some indicators built in SpringBoot and some indicators provided by Dubbo are displayed to the user together.

Dubbo multi-dimensional index system Prometheus query

What we obtained by accessing the indicator service directly through the curl command is only instantaneous indicator data. For indicator data, we often need time-series vector data. At this time, we need to use Prometheus to collect externally and store Dubbo indicators For traditional applications deployed on physical machines and virtual machines, you can use static, file-based or index discovery services based on your own CMDB system construction. Of course, you can also use the service discovery services provided by Dubbo Admin for the index system in the future. For deployment in The system in K8s can directly use the service discovery supported by K8s, and the automatic collection configuration of connecting to Prometheus is as follows:

The query metrics in Prometheus are as follows:

Grafana display of Dubbo's multi-dimensional indicator system

Prometheus focuses on scenarios such as collecting indicators and storing indicators. It is relatively simple to display indicators. Grafana provides a rich indicator panel. It is more intuitive and easier to use Grafana to build indicator panels. You can see the following pictures. Multi-dimensional screening such as application level, instance level, interface level and other scenarios to query service data. In the indicator monitoring dashboard, you can also see some dimensional indicators based on the previous indicator methodology, such as traffic, number of requests, delay, error, saturation, etc. In addition, you can also see some application instance information such as Dubbo version distribution, instance distribution, etc.

Dubbo link tracking facade construction

Agent user access is simple, but it is risky to dynamically modify the form of bytecode to provide support. It seems a bit overkill for a proxy layer agent to only do a Dubbo layer link function. Dubbo is positioned as a microservice RPC framework for general purpose The link facade is relatively better, and professional things are left to professional people. Dubbo makes it easier for users to access by adapting to all major link systems.

Dubbo link tracking facade selection

The OpenTelemetry link tracking facade, which is more common in the industry, is more inclined to standard and unified specifications, supports major manufacturers, and is also a project incubated with CNCF. The advantage of Micrometer is that it is the same as the source of dependencies used for index buried points, and it is also integrated by default in SpringBoot3 User access is more convenient. In addition, Micrometer is positioned as an observable facade in line with the positioning of Dubbo link system construction, and OpenTelemetry can also be bridged in the form of bridging.

Micrometer + OpenTelemetry Bridge:

Dubbo link tracking structure

Dubbo collects the link data in the RPC request through the built-in link filter, and then exports the link data to major manufacturers through the exporter after collection.

Dubbo link tracking access

The Dubob link tracking facade has been released. To access the link tracking system, you only need to simply import the starter integration package corresponding to the link tracking and then perform single-piece configuration. For more detailed access manuals, please refer to documents and cases. [1]

Switches, sampling rate, exporter and other configurations can be configured in the link tracking configuration.

Finally, the link tracking system often needs to associate the link id with the log to analyze more detailed root causes. At this time, it is necessary to add the configuration of log MDC printing in the log configuration in advance, as follows to obtain traceId and spanId.

Dubbo link tracking Zipkin

Here is the display of Dubbo access link tracking Zipkin, you can see the performance and metadata of some interfaces.

Dubbo link tracking Skywalking

Here is a demonstration of Dubbo access link tracking Skywalking, request-level link analysis retrieved by link id.

Dubbo log management

Dubbo log management exception

The Dubbo framework has been developed for many years, and its functions have become more and more abundant. It includes the interaction with the three major centers and the interaction between the client and the server. This kind of internal and external interaction scenarios are more prone to some abnormalities. If you encounter problems, you can often find out by observing the logs Without thinking about it, it is a relatively headache to locate the root cause by analyzing the code.

I don't know the reason for the problem:

Dubbo Log Management Expert Advice

If you carefully observe the log printed out by the new version of Dubbo3.x, you can see that a problem help manual will be printed in the log. When you find a problem, copy this link and open it in your browser to see expert advice when an abnormal log occurs, such as The steps for troubleshooting the cause of the problem shown in the figure below will become more and more detailed as Dubbo develops expert suggestions. To make this process more complete, users and developers need to participate together. The Dubbo community is very open and encourages Users and developers participate in the construction together.

 

Dubbo Observability-Stability Practice

The last thing is to do stability practice around the entire observable platform. In the stability practice, observe the health status of the service, troubleshoot and analyze system problems, and finally restore the system quickly. Among them, the abnormality of the observation system can be actively observed and monitored by the on-duty personnel, or the abnormality can be analyzed and alarmed, and the problem can be detected in time by passively receiving alarm emails, IM, SMS, phone calls, etc. Aggregate the service information to locate the abnormal location, and then use the link tracking system to find the service-level exception for analysis. Finally, you can find detailed logs based on the link information to analyze the abnormal context and eliminate the root cause. The troubleshooting process requires the help of the entire observation platform Aiming at rapid recovery of the system, the system can be restored to reduce losses through strategies such as traffic isolation and service degradation. Afterwards, the persistent information provided by the observable platform can be used to analyze anomalies and regularities in detail to locate the root cause.

[1] Documentation and examples

https://cn.dubbo.apache.org/zh-cn/overview/tasks/observability/tracing/

Author: Song Xiaosheng - Senior Engineer of Ping An One Wallet Middleware

Click to try cloud products for free now to start the practical journey on the cloud!

Original link

This article is the original content of Alibaba Cloud and may not be reproduced without permission.

Ministry of Industry and Information Technology: Do not provide network access services for unregistered apps Go 1.21 officially released Ruan Yifeng released " TypeScript Tutorial" Bram Moolenaar, the father of Vim, passed away due to illness The self-developed kernel Linus personally reviewed the code, hoping to calm down the "infighting" driven by the Bcachefs file system. ByteDance launched a public DNS service . Excellent, committed to the Linux kernel mainline this month
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/yunqi/blog/10094821