Baidu commercial large-scale microservice distributed monitoring system - Fengjing

picture

Introduction : As an early access party of Fengjing and a core member in the later stage, the author has experienced the changes of the entire project for four years before and after, and has seen the difficult beginning of the project, the silent accumulation in the mid-term and the vigorous development in the later stage. Every architectural change is marked by a wave of technology, and it also sees continuous innovation by project members using limited resources to solve practical problems.

Phoenix Eye is the performance monitoring system (APM) of Baidu's commercial business system. It focuses on the monitoring of Java applications and basically accesses most of Baidu's Java applications (covering thousands of business applications and tens of thousands of containers). It can automatically embed the mainstream middleware frameworks (Spring Web, RPC, database, cache, etc.), realize full-stack performance monitoring and full-link tracking diagnosis, and provide microservice system performance indicators and business gold for Baidu's various business lines. Metrics, health status, monitoring alerts, etc.

picture

△Fengjing product flow chart

  • Data collection : Phoenix eye probe technology can be automatically implanted into the business process to collect relevant performance information, and the business process is completely unaware.

  • Data calculation and analysis : According to the type, time series data is stored in the time series database TSDB of Baidu SIA intelligent monitoring platform, which is used to generate visual reports and abnormal alarms. The call chain data will be stored in the Palo (open source called Doris) big data warehouse for topology analysis and call chain retrieval.

  • Application scenarios : As mentioned above, Phoenix Eye provides stability reports, exception alarms, error stack analysis, service time-consuming analysis, call topology analysis, business log correlation analysis, etc.

picture

△Fengjing’s structural change timeline

01 Phoenix Eye Project

The project was launched in 2016, and Baidu Fengchao advertising business system middleware (distributed RPC framework Stargate, etc., configuration center, database middleware, etc.) has been perfected. With the deepening of the splitting of single services, the overall Java online deployment scale has gradually increased, and at the same time, more and more problems have been exposed.

Typical questions are:

  1. The core service problem location cycle is long. After multiple modules reported a large number of errors, it took a long time to locate the problem.

  2. The cost of obtaining cluster logs is very high, and the lack of log call chain relationships leads to high positioning costs, and even some problems cannot be located.

  3. Exception logs need to be logged in to a specific online instance to view. The number of online deployment instances is large, and the troubleshooting time is long.

The business side of Phoenix Nest is in urgent need of a distributed tracking system to complete the "big serialization" of the logs of the entire business side. Therefore, Baidu's business platform infrastructure team launched the Phoenix Eye project, called "Phoenix Nest Eye".

picture

picture

picture

02 Phoenix Eye 1.0

In the field of distributed link tracking, there are mainly intrusive and non-intrusive methods in probe acquisition. 1.0 The invasive way the probe goes. Business developers first introduce probe-related dependency jar packages, and automatically collect call relationships and performance data through interceptors; then, add hard-coded supplementary business data.

picture

picture

△ Coding example

The data collected by the probe will be printed to disk and collected through kafka. The underlying data processing and data storage used popular data processing systems such as Storm and Hbase at that time. The backend architecture is more complex.

picture

picture

△Fengjing 1.0 Architecture Diagram

03 Phoenix Eye 2.0

In Fengjing 2.0 version, the first is to reduce the cost of probe access. In version 2.0, the probe uses java agent technology combined with cglib to make AOP annotations, reducing the number of jar packages introduced by dependencies from N to 1. Change from writing a large call chain filling code to try to use AOP. The probe-side transport layer adopts a more efficient transport protocol (protobuffer+gzip), which is directly sent to kafka through the HTTP protocol, which greatly reduces the disk IO overhead.

picture

picture

The 2.0 probe is easier to access than 1.0, and the transmission is faster. However, the business side still needs to add AOP code. For hundreds of applications on the business side, access is still a big project, and promotion is still difficult.

04 Phoenix Eye 3.0

In the architectural design of Fengjing 3.0, the project team members have been thinking about two issues:

  1. How to allow business parties to access quickly, with as few changes as possible, or even "non-sensing access"?

  2. How to reduce the difficulty of architecture operation and maintenance, not only can handle massive data, but also low-cost operation and maintenance?

In order to solve problem 1, Probe 3.0 decided to completely abandon the intrusive approach and change to a non-intrusive approach that is bytecode enhancement.

Several popular monitoring and diagnostic tools were investigated at that time:

picture

picture

△Newrelic, pinpoint, greys monitoring probe survey

The 3.0 probe refers to the features of Greys supporting runtime enhancement and the design concept of pinpoint and Newrelic based on plug-in extension development. The final effect is that the probe can automatically implant the monitoring code into the business process, and the specific monitoring work is completed by the plug-in system, which is completely oriented to aspect monitoring.

picture

picture

△ Schematic diagram of probe active loading

The back-end storage system turned to Doris. Doris is an MPP-based interactive SQL data warehouse developed by Baidu, which is compatible with the mysql protocol and has low learning costs. It can be used for both storage and analysis calculations. In the early stage, technologies such as spark and storm are avoided to reduce system complexity.

picture

picture

△Architecture design as shown in the figure

After the architecture is upgraded, as a small team, probes can be quickly deployed in batches, and the computing and storage capacity can also meet the needs. As of 2017, Fengjing 3.0 has launched more than 100 applications, running on more than 1,000 containers.

05 Phoenix Eye 4.0

In 2018, the wave of microservices and virtualization swept across. With the continuous upgrading of the deployment platform and the maturity and perfection of the springboot system, the monomer can be quickly split into a large number of microservices, relying on the platform for efficient operation and maintenance deployment. As a basic component, Phoenix Eye is integrated by the microservice hosting platform, and has been promoted and applied at the company level. The overall deployment application scale has surged from a hundred to a thousand, and the deployment container has changed from a thousand to a thousand.

picture

Many problems broke out at this stage. There are two main technical core problems:

  1. The probe upgrade requires restarting the business application to take effect, and restarting the online application will damage the traffic. This makes it difficult to frequently upgrade probe versions and quickly introduce new features.

  2. 15 billion records are written in real time every day, and the peak traffic is 300w records/s. Data import is easy to lose; it takes about 100 seconds to retrieve a single call chain for query.

In 2019, Fengjing carried out further transformation and upgrading, and carried out technical breakthroughs for two problems 1 and 2.

The probe level studies how to support hot swap, that is, the probe automatically completes the upgrade when the business process is running. At first, in order to ensure the visibility of the business class to the probe plug-in class, the probe class was unified into the System Classloader. But System Classloader, as the default system, does not support uninstallation. Conversely, if all the probe classes are placed in the custom class loader. The probe class is completely invisible to the business class, so the bytecode enhancement cannot be completed.

picture

picture

△ Probe hot-swappable classloader system

In order to solve the visibility problem, the probe introduces a bridge class. Through the code stub and plug-in class library projection provided by the bridge class, the user class can access the actual probe class to complete the purpose of monitoring and transformation. For different plug-ins, put them in different custom Classloaders. This way the plugins are invisible to each other. A single plug-in can also be hot-swapped. The specific design details will be explained in detail later in the article.

Undoubtedly, the Phoenix Eye Probe is the only monitoring probe technology that can be hot-swapped in the industry, and we have also applied for a patent. Its functional correctness and performance have been verified by large-scale online traffic.

Continue to push to optimize the performance of call chain retrieval.

First analyze our underlying storage structure:

picture

Through the analysis of slow queries, it is found that there are two main reasons for the slow retrieval: First, a large number of queries do not use any indexes, and the full table scan of massive data is very slow. Second, there are too many import fragments, resulting in very slow file compaction, typical LSM-Tree read amplification. In order to solve these problems, the chain storage layer is called to reconstruct the table structure, and a large number of Rollups are used to cooperate with the basic tables to optimize the query time. Doris already has the ability to stream import at this time, and also took the opportunity to switch from small batch import to stream import.

picture

picture

△Call chain processing architecture

picture

△ The above picture is a panoramic topology map of microservices constructed by Fengjing in real time. As of January 2020, it probably covers the online traffic topology of dozens of product lines, and the most fine-grained node is the interface, that is, the function in the Java application. It can be analyzed from the figure that there are about 50w+ non-island interface nodes hosting the whole platform, and 200w+ interface nodes are connected.

06 Separation of data processing architecture

The architecture continues to evolve, the amount of data collected by Phoenix Eyes is increasing, and the needs of the business side are also increasing.

There are mainly two problems:

  1. Data visualization capabilities depend on front-end development, and a large number of multi-dimensional visualization analysis requirements are difficult to meet.

  2. The call chain is sampled, resulting in inaccurate statistical data, which cannot meet the needs of statistical reports.

These two problems boil down to how time series data is stored and presented. This involves two very basic concepts in the field of distributed tracing, timing time and call chain data. The so-called time series data is a series of data based on time, which is used to view some indicator data and indicator trends. Call chain data refers to recording the entire process of a request, which is used to check which link of the request fails and where the bottleneck of the system is.

Time series data does not need to save details, only data points of time, dimensions and indicators can be stored in a special time series database (Time Series Database). In actual scenarios, Fengjing did not specifically maintain a time series database. Instead, it is connected to the distributed time series database TSDB of the Baidu SIA intelligent monitoring platform. At the same time, the Baidu SIA platform is used to provide rich multi-dimensional visual analysis reports to meet the needs of users for various visual multi-dimensional data analysis.

picture

picture

‍△The current overall architecture

07 Conclusion

The entire project of Fengjing lasted for 4 years, and experienced numerous difficulties and setbacks in the middle. Through the accumulation of the continuous efforts of project members, it finally achieved milestone results. This article briefly introduces the business background, technical architecture and product form of Fengjing products, and will continue to publish articles to introduce the implementation details related to technology. Please continue to pay attention.

Read the original
Baidu commercial large-scale microservice distributed monitoring system - Fengjing

Recommended reading

Cloud native transformation of Baidu search and recommendation engine

Finally, welcome everyone to pay attention to our public account of the same name "Baidu Geek Said"~

{{o.name}}
{{m.name}}

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324131460&siteId=291194637