With the development of the business, the system split causes the system call chain to become more complex. A front-end request may eventually need to call many back-end services to complete. When the entire request becomes slow or unavailable, we cannot know whether the request is made by a certain It is caused by one or some back-end services. At this time, it is necessary to solve the problem of how to quickly read and locate the fault point of the service, so as to prescribe the right medicine. Thus, distributed system call tracing was born.
The theoretical basis of distributed service tracing in the industry today mainly comes from a Google paper "Dapper, a Large-Scale Distributed Systems Tracing Infrastructure" . The most widely used open source implementation is Twitter's Zipkin, in order to achieve platform-independent and vendor-independent distribution. For distributed service tracing, CNCF has published Open Tracing, a distributed service tracing standard. In China, Taobao's "Eagle Eye", JD.com's "Hydra", Dianping's "CAT", Sina's "Watchman", Vipshop's "Microscope", and Wowo.com's "Tracing" are all such systems.
Spring Cloud Sleuth
Generally, a distributed service tracking system has three main parts: data collection, data storage and data display. Depending on the size of the system, the structure of each part changes to some extent. For example, for large-scale distributed systems, data storage can be divided into two parts: real-time data and full-scale data. Real-time data is used for troubleshooting (troubleshooting), and full-scale data is used for system optimization; data collection supports platform-independent and language-independent systems. The data collection also includes asynchronous data collection (need to track messages in the queue to ensure the continuity of calls), and ensure less intrusiveness; data display involves data mining and analysis. While each part can get complicated, the basic principles are similar.
The tracking unit of service tracking is the process from when the client initiates a request (request) to the boundary of the tracked system until the tracked system returns a response (response) to the client, which is called a "trace". Several services will be called in each trace. In order to record which services are called and the time consumption of each call, a call record, called a "span", is embedded each time a service is called. In this way, several ordered spans form a trace. In the process of the system providing services to the outside world, requests and responses will continue to occur, and traces will continue to be generated. By recording these traces with spans, a service topology of the system can be depicted. With information such as the response time in the span and the success of the request, you can find abnormal services when problems occur; based on historical data, you can also analyze where the performance is poor from the overall system level and locate the target of performance optimization. .
Spring Cloud Sleuth provides link tracing for calls between services. Through Sleuth, you can clearly understand which services a service request has gone through and how long each service takes to process. This allows us to easily sort out the calling relationship between microservices. Additionally Sleuth can help us:
- Time-consuming analysis: Through Sleuth, you can easily understand the time-consuming of each sampling request, so as to analyze which service calls are more time-consuming;
- Visual errors: For exceptions that are not caught by the program, you can see them on the integrated Zipkin service interface;
- Link optimization: For frequently called services, you can implement some optimization measures for these services.
Spring cloud sleuth can combine zipkin to send information to zipkin, use zipkin storage to store information, and use zipkin ui to display data.
Here is a conceptual diagram of Spring Cloud Sleuth:
ZipKin
Zipkin is an open source distributed tracking system, open sourced by Twitter, which is dedicated to collecting timing data of services to solve the delay problem in microservice architecture, including data collection, storage, lookup and presentation.
Each service reports timing data to zipkin, and zipkin generates a dependency graph through the Zipkin UI based on the calling relationship, showing how many trace requests pass through each service. The system allows developers to easily collect and analyze data through a web front-end, such as The processing time of each user request service, etc., can easily monitor the bottlenecks existing in the system.
Zipkin provides pluggable data storage methods: In-Memory, MySql, Cassandra, and Elasticsearch. In the next test, in order to directly use the In-Memory method for storage, Elasticsearch is recommended for production.
Get started quickly
Create zipkin-server project
project dependencies
<dependencies> <dependency> <groupId>org.springframework.cloud</groupId> <artifactId>spring-cloud-starter-eureka</artifactId> </dependency> <dependency> <groupId>io.zipkin.java</groupId> <artifactId>zipkin-server</artifactId> </dependency> <dependency> <groupId>io.zipkin.java</groupId> <artifactId>zipkin-autoconfigure-ui</artifactId> </dependency> </dependencies>
startup class
@SpringBootApplication @EnableEurekaClient @EnableZipkinServer public class ZipkinApplication { public static void main(String[] args) { SpringApplication.run(ZipkinApplication.class, args); } }
Using the @EnableZipkinServer
annotation, enable the Zipkin service.
configuration file
eureka: client: serviceUrl: defaultZone: http://localhost:8761/eureka/ server: port: 9000 spring: application: name: zipkin-server
After the configuration is complete, start the sample projects in sequence: spring-cloud-eureka
, zipkin-server
Project. Just asked the address: http://localhost:9000/zipkin/
you can see the Zipkin background page
Project add zipkin support
Add zipkin support in the project spring-cloud-producer
and spring-cloud-zuul
.
<dependency> <groupId>org.springframework.cloud</groupId> <artifactId>spring-cloud-starter-zipkin</artifactId> </dependency>
After the Spring application detects that there are sleuth and zipkin in the Java dependency package, it will automatically inject trace information into the HTTP request during the RestTemplate invocation process, and send the information to the Zipkin Server.
At the same time, add the following code to the configuration file:
spring: zipkin: base-url: http://localhost:9000 sleuth: sampler: percentage: 1.0
spring.zipkin.base-url specifies the address of the Zipkin server, and spring.sleuth.sampler.percentage sets the sampling ratio to 1.0, which is all required.
Spring Cloud Sleuth has a Sampler strategy that can be used to control the sampling algorithm through this implementation class. The sampler will not hinder the generation of span-related ids, but will affect the export and related operations of attaching event tags. The implementation of Sleuth's default sampling algorithm is Reservoir sampling, the specific implementation class is PercentageBasedSampler, and the default sampling ratio is: 0.1 (ie 10%). However, we can set it through spring.sleuth.sampler.percentage. The value set is between 0.0 and 1.0, and 1.0 means all collection.
After adding zipkin to these two projects, start them in turn.
authenticating
In this way, we simulated such a scenario, accessing the Zuul gateway through external requests, and the Zuul gateway calling spring-cloud-producer
the externally provided services.
After the four items are activated, visit the address in the browser: http://localhost:8888/producer/hello?name=neo
twice, and then open the address: http://localhost:9000/zipkin/
click the corresponding button to view.
Click to find and see that there are two records
Click on the record to enter the page, you can see the time and order of each service
Click Dependency Analysis to see the calling relationship between projects