Exploration and Practice of Microservice System Performance Optimization Based on SkyWalking Full Link Monitoring

With the rapid advancement of open source communities and cloud computing, cloud-native microservices, as the core architecture of new application systems, have been more and more widely used. According to Gartner's definition of microservices: "A microservice is a narrowly scoped, tightly encapsulated, loosely coupled, independently deployable, and independently scalable application component."

Martin Fowler, the father of microservices, outlined microservices as follows: At present, there is no unified and standard definition for the microservice industry.

But generally speaking, the microservice architecture is an architectural pattern or an architectural style, which advocates dividing a single application into a set of small services, each of which runs in its own independent process, and coordinates between services , and cooperate with each other to provide users with the ultimate value. Services communicate with each other using a lightweight communication mechanism (usually an HTTP-based RESTful API).

Each service is built around a specific business and can be independently deployed to a production environment, production-like environment, etc. This method can improve the response speed, flexibility and deployment elasticity of the application system, and can keep pace with business development For rapid iteration and optimization. At present, more and more application service systems in the industry have been upgraded to micro-service architecture, which poses new challenges to the existing application monitoring system.

In order to promote the construction and development of the microservice application monitoring system and explore the practical path of microservice full-link monitoring technology in the industry, we have introduced the SkyWalking open source observable platform to collect microservice full-link monitoring through non-code intrusion. Information, visually display the topological relationship of the microservice system, track transaction links, accurately identify performance bottlenecks, and make up for the lack of existing testing tools and methods for microservice full-link application monitoring.

Introduction to SkyWalking

SkyWalking is an open source observable platform APM system, an application performance monitoring tool designed for microservices, cloud-native architectures, and container-based (Docker, k8s, Mesos, etc.) architectures, used to collect, analyze, aggregate, and visualize data from Data for services and cloud-native infrastructure.

Provides integrated solutions for distributed tracing, service grid telemetry analysis, metrics aggregation and visualization. SkyWalking is mainly composed of the following four parts:

0 1 Agent Agent

Probes collect data and reformat it according to SkyWalking requirements (different probes support different sources).

The Agent runs in each service instance and is responsible for collecting data such as Trace and Metrics of the service instance, and then reports it to the SkyWalking backend through gRPC for analysis by the OAP server. This article will introduce the Agent program in detail in Chapter 3.

0 2 OAP server

SkyWalking's OAP (Observability Analysis Platform, Observation Analysis Platform) is an analysis computing system for analyzing link sampling data.

The OAP service mainly needs to calculate the following three types of data:

(1) Record data

The recorded link data, such as Trace, access log and other data, is processed by RecordStreamProcessor.

(2) Metrics data

The recorded indicator data, most of the OAL (Observability Analysis Language) indicators will generate such data, which will be processed by MetricsStreamProcessor.

(3) TopN data

The recorded periodic sampling data, such as the periodic collection of slow SQL, is processed by TopNStreamProcessor.

Detailed data such as Trace and access logs have a relatively large amount of data, but they do not need to be merged and processed, so they can be processed within the OAP node. These detailed data are cached, asynchronous batch processing, and stream writing. Write to external storage (Storage).

Most of the indicator data defined by OAL (Observability Analysis Language) needs to be aggregated and calculated by microservices, so it is divided into two steps in the OAP cluster computing flow.

Step 1: Receive and parse the data sent by the Agent, and perform data aggregation in the current OAP service node, using OAL or other aggregation modes.

For data that does not need to be aggregated, it is directly written to the external storage (Storage); if it is data that needs to be aggregated by microservices, it is sent to the designated OAP service node according to certain routing rules.

Step 2, receiving and parsing the data processed in step 1, then performing secondary aggregation calculation, and writing the result data into the external memory (Storage).

For the above two steps, the OAP service node is divided into two roles: Receiver (processing step 1) and Aggregator (processing step 2).

By default, all OAP service nodes are Mixed roles, which can perform the operations of step 1 and step 2. In the scenario of large-scale system deployment of SkyWalking, two-level deployment of role separation can be performed according to network traffic.

The OAP server also responds to the query request sent by the SkyWalking UI interface, queries the previously persisted data, forms a correct response result and returns it to the UI interface for display.

0 3 Storage database storage

As an external storage device for OAP services, it is responsible for data storage and supports multiple storage types. You can use existing storage systems, such as ElasticSearch, Mysql, etc., or customize the storage system.

SkyWalking data can be stored in the implemented persistence systems of ElasticSearch, Mysql, TiDB, InfluxDB, and H2. H2 is an in-memory database. The stored data is in memory and does not fall to the disk. Restarting the SkyWalking service will cause data loss. It is the default storage method, and the ElasticSearch cluster is generally used online as its back-end storage.

0 4 UI interface

Responsible for visualizing and managing SkyWalking data, separating the front and back ends. The UI interface is responsible for encapsulating the user's query operations into GraphQL requests and submitting them to the OAP backend to trigger subsequent query operations. After the query results are obtained, they will be responsible for displaying and viewing the chain Call relationship, view various monitoring indicators, performance indicators, etc.

From the above introduction to the sub-systems that make up SkyWalking, it can be known that the Agent agent program is responsible for collecting various link sampling data, passing it to OAP through GRPC for analysis and storing it in the database, and finally through the UI interface to analyze the statistical reports, Service dependencies and topology diagrams are displayed.

SkyWalking application expansion and performance tuning

An example of custom plug-in development, develop a custom plug-in based on a certain system, and deploy it to the plugins directory of the SkyWalking deployment package.

When a query interface is called, multiple threads can view the sampling information of the method in SkyWalking, as shown in Figure 1:

picture

Figure 1 Sampling information of a query method

Click a query method link in Figure 1 to view detailed span information, as shown in Figure 2.

picture

Figure 2 Span information

From the above information, we can clearly see that the three tags we added are: invoke start time, invoke end time, and inter-system query method execution time (ms).

System reconstruction, the architecture is characterized by multi-microservices and multi-link systems. The results of four topics, parameter configuration inspection, observability technology, data migration, and simultaneous verification, can be applied.

An example of performance tuning. In order to minimize the impact of SkyWaling Agent on business performance testing and truly monitor the performance bottleneck of the business system, we have performed some performance tuning on SkywalkingAgent. By adjusting related parameters such as sampling frequency and sampling quantity, we can reduce the deployment of SkyWalking Additional performance loss after Agent.

Figure 3 is a comparison of performance test results under the same concurrency for the same transaction without deploying SkyWaling Agent, deploying SkyWaling Agent standardization (without performance tuning), and deploying SkyWaling Agent with performance tuning , after tuning, we found that the performance was improved compared with the standardized deployment scenario, and the performance loss was minimized compared with the situation where no agent was deployed.

picture

Finally: The complete software testing video tutorial below has been sorted out and uploaded, and friends who need it can get it by themselves [Guaranteed 100% free]

Software Testing Interview Documentation

We must study to find a high-paying job. The following interview questions are the latest interview materials from first-tier Internet companies such as Ali, Tencent, and Byte, and some Byte bosses have given authoritative answers. Finish this set The interview materials believe that everyone can find a satisfactory job.

Guess you like

Origin blog.csdn.net/wx17343624830/article/details/132480946