Distributed Link Tracking Sleuth and Zipkin

  As business grows, leading to increasingly complex system to split a front-end system call link request may eventually need to call many times to complete the back-end service, when the entire request slow or unavailable, we can not know that the request is authorized by a or due to some back-end services, then we need to figure out how to read fast positioning service point of failure to remedy. Then there is the birth of a distributed system call tracking.

  The theoretical basis of modern industry tracking service distributed mainly from Google's paper "Dapper, A Large-Scale Distributed Systems Tracing Infrastructure" , the most widely used open source implementation of the use of Twitter Zipkin, in order to achieve a platform-independent, vendor-agnostic distribution style service tracking, CNCF released a distributed service tracking standard Open tracing. Domestic, Ali Tracing "Eagle Eye," Jingdong "Hydra", the public comments of "CAT", Sina "Watchman", the only product will be the "Microscope" is such a system.

Spring Cloud Sleuth also provides us with a complete solution. In this chapter, we will describe in detail how to use the Spring Cloud Sleuth + Zipkin to increase the capacity of distributed service tracking service for our micro-architecture.

Spring Cloud Sleuth

In general, a distributed service tracking system consists of three parts:

  • data collection
  • data storage
  • Data Display

  Depending on the size of the system, each structural part has a certain variation. For example, for large-scale distributed systems, data storage, real-time data can be divided into two parts and the whole amount of data, real-time data for troubleshooting (Trouble Shooting), the full amount of data for system optimization; support platform-independent data collection in addition to and independent of development language data collection system, further comprising an asynchronous data collection (need to track message queue, ensure the continuity of the call), and ensuring less invasive; display data also involves data mining and analysis. Although each part can become very complex, but the basic principles are similar.
Here Insert Picture Description
  Tracking unit tracking service from the client initiates a request (request) was arrived at the border began tracking system, the tracking system to process until a response is returned (response) to the client, known as a the trace . Each trace in the number of service calls, record calls for what services, as well as time-consuming and so each call information at each service call, buried in a call record, called a span . In this way, a number of orderly span on the formation of a trace. In the process of the system providing service to the outside world, we will continue to have the request and response occurs, will continue to generate a trace, with a span of these trace record, can depict a service system topology. Comes the response time span, as well as the success of the request and other information, can be in the event of problems, to find unusual services; based on historical data, but also analyze where poor performance, optimize positioning performance goals from the overall system level .

  Spring Cloud Sleuth provides a link between the service call tracking. By Sleuth can clearly understand what a service request through the service, each service takes a long process. So that we can easily clarify the relationships among micro-calling service. In addition Sleuth can help us:

  • Time-consuming analysis: you can easily understand each sampling time-consuming request by Sleuth, to analyze what the service calls time-consuming;
  • Visualization Error: uncaught exception for the program, you can see through the integration Zipkin service interface;
  • Link Optimization: call for more frequent service, we can implement some optimization measures for these services.

  Spring Cloud Sleuth Zipkin may be combined, to send information to Zipkin, using the memory to store information of Zipkin, Zipkin use the UI to display data.

This is a conceptual view of Spring Cloud Sleuth:
Here Insert Picture Description

zipkin

  zipkin principle is based on the 2010 Google published a paper distributed tracking system Dapper its internal use ( PDF address , translation address ), tells the story of Dapper in Google inside a two-year evolution and design, operation and maintenance experience, Twitter is also based on the thesis developed its own tracking system distributed Zipkin, and open source. It is committed to regular data collection services, in order to solve the delay problem micro-service architecture, including data collection, storage, search and show.

  We can use it to collect data on each server request tracking link, and through REST API that provides an interface to assist our inquiry tracking data to enable monitoring program for distributed systems, thus timely discovery system delay occurring in liters high problem and find the root cause of system performance bottlenecks. In addition to the API interface for developers, it also provides a convenient UI components to help our intuitive search and analyze tracking information request link details, such as: query processing time can be within a certain period of time each user requests and so on.

  Providing Zipkin pluggable data storage: In-Memory, MySql, Cassandra and Elasticsearch. The next test is stored using In-Memory convenient and direct manner, production recommended Elasticsearch.
Here Insert Picture Description
The figure shows Zipkin infrastructure, which mainly consists of four core components:

  • Collector: collector assembly, it is mainly for processing information transmitted from an external system to track over the information into these Zipkin Span internal processing format to support subsequent storage, analysis, display and other functions.
  • Storage:存储组件,它主要对处理收集器接收到的跟踪信息,默认会将这些信息存储在内存中,我们也可以修改此存储策略,通过使用其他存储组件将跟踪信息存储到数据库中。
  • RESTful API:API 组件,它主要用来提供外部访问接口。比如给客户端展示跟踪信息,或是外接系统访问以实现监控等。
  • Web UI:UI 组件,基于 API 组件实现的上层应用。通过 UI 组件用户可以方便而有直观地查询和分析跟踪信息。

zipkin和ELK的区别

  ELK提供的是日志的管理,包含了收集、存储、搜索等功能,但是它缺乏实时服务链路跟踪。而zipkin刚好凝补了它这个缺陷。

快速上手

  Zipkin 分为两端,一个是 Zipkin 服务端,一个是 Zipkin 客户端,客户端也就是微服务的应用。
客户端会配置服务端的 URL 地址,一旦发生服务间的调用的时候,会被配置在微服务里面的 Sleuth 的监听器监听,并生成相应的 Trace 和 Span 信息发送给服务端。

发送的方式主要有两种:

  • HTTP 报文的方式
  • 消息总线的方式如 RabbitMQ

不论哪种方式,我们都需要:

  • 一个 Eureka 服务注册中心,这里我们就用之前的 eureka 项目来当注册中心。
  • 一个 Zipkin 服务端。
  • 两个微服务应用,trace-atrace-b,其中 trace-a 中有一个 REST 接口 /trace-a,调用该接口后将触发对 trace-b 应用的调用。

Zipkin 服务端

关于 Zipkin 的服务端,在使用 Spring Boot 2.x 版本后,官方就不推荐自行定制编译了,反而是直接提供了编译好的 jar 包来给我们使用,详情请看 upgrade to Spring Boot 2.0 NoClassDefFoundError UndertowEmbeddedServletContainerFactory · Issue #1962 · openzipkin/zipkin · GitHub

并且以前的 @EnableZipkinServer 也已经被打上了 @Deprecated

If you decide to make a custom server, you accept responsibility for troubleshooting your build or configuration problems, even if such problems are a reaction to a change made by the OpenZipkin maintainers. In other words, custom servers are possible, but not supported.

EnableZipkinServer.javagithub.com/openzipkin/zipkin/blob/master/zipkin-server/src/main/java/zipkin/server/EnableZipkinServer.java

简而言之就是:私自改包,后果自负。

所以官方提供了一键脚本

curl -sSL https://zipkin.io/quickstart.sh | bash -s
java -jar zipkin.jar

如果用 Docker 的话,直接

docker run -d -p 9411:9411 openzipkin/zipkin

启动后访问 http://IP:9411/zipkin/
Here Insert Picture Description
服务端搭建完毕。

微服务应用

HTTP报文方式

两个服务均需要引入以下依赖

<dependency>
	<groupId>org.springframework.cloud</groupId>
	<artifactId>spring-cloud-starter-netflix-eureka-client</artifactId>
</dependency>

<!--sleuth 服务跟踪依赖-->
<dependency>
	<groupId>org.springframework.cloud</groupId>
	<artifactId>spring-cloud-starter-sleuth</artifactId>
</dependency>

<!--zipkin依赖-->
<dependency>
	<groupId>org.springframework.cloud</groupId>
	<artifactId>spring-cloud-starter-zipkin</artifactId>
</dependency>

两个服务还需要添加以下配置

eureka.client.service-url.defaultZone=http://192.168.xxx.xxx:8761/eureka/,http://192.168.xxx.xxx:8762/eureka/

# 将采样比例设置为 1.0,也就是全部都需要。默认是 0.1
spring.sleuth.sampler.probability=1.0
# 指定 Zipkin 服务器的地址
spring.zipkin.base-url=http://106.15.120.126:9411/

  Spring Cloud Sleuth 有一个 Sampler 策略,可以通过这个实现类来控制采样算法。采样器不会阻碍 span 相关 id 的产生,但是会对导出以及附加事件标签的相关操作造成影响。 Sleuth 默认采样算法的实现是 Reservoir sampling,具体的实现类是 PercentageBasedSampler,默认的采样比例为: 0.1 (即 10%)。不过我们可以通过 spring.sleuth.sampler.percentage 来设置,所设置的值介于 0.0 到 1.0 之间,1.0 则表示全部采集。

消息总线的方式(以rabbitmq为例)

HTTP方式 相比,两个服务都增加了一个依赖

<!--zipkin 采用rabbitmq收集数据所需依赖-->
<dependency>
	<groupId>org.springframework.cloud</groupId>
	<artifactId>spring-cloud-stream-binder-rabbit</artifactId>
</dependency>

同时,配置文件也有所改变,主要的不同是新增了 rabbitmq 相关配置,并删除了 spring.zipkin.base-url=http://106.15.120.126:9411/

eureka.client.service-url.defaultZone=http://192.168.xxx.xxx:8761/eureka/,http://192.168.xxx.xxx:8762/eureka/

# 将采样比例设置为 1.0,也就是全部都需要。默认是 0.1
spring.sleuth.sampler.probability=1.0
# 设置为rabbitmq收集信息,也可以设置为kafka或者web
spring.zipkin.sender.type=rabbit
# 设置zipkin所用队列,默认是zipkin
spring.zipkin.rabbit.queue=zipkin

#rabbitmq配置
spring.rabbitmq.host=106.15.120.126
spring.rabbitmq.port=5672
spring.rabbitmq.username=admin
spring.rabbitmq.password=admin
spring.rabbitmq.virtual-host=/

zipkin 服务端启动的变化

  因为之前说的 Zipkin 不再推荐我们来自定义 Server 端了,所以在最新版本的 Spring Cloud 依赖管理里已经找不到 zipkin-server 了。

那么如果直接用官方提供的 jar 包怎么从 RabbitMQ 中获取 trace 信息呢?

我们可以通过环境变量让 Zipkin 从 RabbitMQ 中读取信息,就像这样:

RABBIT_ADDRESSES=localhost java -jar zipkin.jar

可配置的环境变量如下表所示:

属性 环境变量 描述
zipkin.collector.rabbitmq.concurrency RABBIT_CONCURRENCY 并发消费者数量,默认为 1
zipkin.collector.rabbitmq.connection-timeout RABBIT_CONNECTION_TIMEOUT 建立连接时的超时时间,默认为 60000 毫秒,即 1 分钟
zipkin.collector.rabbitmq.queue RABBIT_QUEUE 从中获取 span 信息的队列,默认为 zipkin
zipkin.collector.rabbitmq.uri RABBIT_URI 符合 RabbitMQ URI 规范 的 URI,例如 amqp://user:pass@host:10000/vhost

如果设置了 URI,则以下属性将被忽略。

属性 环境变量 描述
zipkin.collector.rabbitmq.addresses RABBIT_ADDRESSES 用逗号分隔的 RabbitMQ 地址列表,例如 localhost:5672,localhost:5673
zipkin.collector.rabbitmq.password RABBIT_PASSWORD 连接到 RabbitMQ 时使用的密码,默认为 guest
zipkin.collector.rabbitmq.username RABBIT_USER 连接到 RabbitMQ 时使用的用户名,默认为 guest
zipkin.collector.rabbitmq.virtual-host RABBIT_VIRTUAL_HOST 使用的 RabbitMQ virtual host,默认为 /
zipkin.collector.rabbitmq.use-ssl RABBIT_USE_SSL 设置为 true 则用 SSL 的方式与 RabbitMQ 建立链接

zipkin.jar的yml配置文件内容可点此查看

zipkin 跟踪数据落地 MySQL

zipkin 默认是将数据存储在内存中的,如果数据量过大,或者系统运行时间太长内存就会爆掉,所以可以将记录的数据存储到 MySQL 中。

只需在启动 zipkin 服务端时添加相应环境变量即可。

属性 环境变量 描述
zipkin.collector.storage.type STORAGE_TYPE 数据类型,默认为 mem,即存储在内存中
zipkin.collector.storage.type.mysql.db MYSQL_DB 所用数据库,默认为 zipkin
zipkin.collector.storage.type.mysql.username MYSQL_USER 数据库用户
zipkin.collector.storage.type.mysql.password MYSQL_PASS 数据库用户密码
zipkin.collector.storage.type.mysql.host MYSQL_HOST 数据库地址,默认为 localhost
zipkin.collector.storage.type.mysql.port MYSQL_TCP_PORT 数据库端口,默认为 3306
zipkin.collector.storage.type.mysql.max-active MYSQL_MAX_CONNECTIONS 最大连接数,默认值为 10
zipkin.collector.storage.type.mysql.use-ssl MYSQL_USE_SSL Whether to use ssl, the default value is false

The official address database scripts , here is copied from the official GitHub.

--
-- Copyright 2015-2019 The OpenZipkin Authors
--
-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except
-- in compliance with the License. You may obtain a copy of the License at
--
-- http://www.apache.org/licenses/LICENSE-2.0
--
-- Unless required by applicable law or agreed to in writing, software distributed under the License
-- is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
-- or implied. See the License for the specific language governing permissions and limitations under
-- the License.
--

CREATE TABLE IF NOT EXISTS zipkin_spans (
  `trace_id_high` BIGINT NOT NULL DEFAULT 0 COMMENT 'If non zero, this means the trace uses 128 bit traceIds instead of 64 bit',
  `trace_id` BIGINT NOT NULL,
  `id` BIGINT NOT NULL,
  `name` VARCHAR(255) NOT NULL,
  `remote_service_name` VARCHAR(255),
  `parent_id` BIGINT,
  `debug` BIT(1),
  `start_ts` BIGINT COMMENT 'Span.timestamp(): epoch micros used for endTs query and to implement TTL',
  `duration` BIGINT COMMENT 'Span.duration(): micros used for minDuration and maxDuration query',
  PRIMARY KEY (`trace_id_high`, `trace_id`, `id`)
) ENGINE=InnoDB ROW_FORMAT=COMPRESSED CHARACTER SET=utf8 COLLATE utf8_general_ci;

ALTER TABLE zipkin_spans ADD INDEX(`trace_id_high`, `trace_id`) COMMENT 'for getTracesByIds';
ALTER TABLE zipkin_spans ADD INDEX(`name`) COMMENT 'for getTraces and getSpanNames';
ALTER TABLE zipkin_spans ADD INDEX(`remote_service_name`) COMMENT 'for getTraces and getRemoteServiceNames';
ALTER TABLE zipkin_spans ADD INDEX(`start_ts`) COMMENT 'for getTraces ordering and range';

CREATE TABLE IF NOT EXISTS zipkin_annotations (
  `trace_id_high` BIGINT NOT NULL DEFAULT 0 COMMENT 'If non zero, this means the trace uses 128 bit traceIds instead of 64 bit',
  `trace_id` BIGINT NOT NULL COMMENT 'coincides with zipkin_spans.trace_id',
  `span_id` BIGINT NOT NULL COMMENT 'coincides with zipkin_spans.id',
  `a_key` VARCHAR(255) NOT NULL COMMENT 'BinaryAnnotation.key or Annotation.value if type == -1',
  `a_value` BLOB COMMENT 'BinaryAnnotation.value(), which must be smaller than 64KB',
  `a_type` INT NOT NULL COMMENT 'BinaryAnnotation.type() or -1 if Annotation',
  `a_timestamp` BIGINT COMMENT 'Used to implement TTL; Annotation.timestamp or zipkin_spans.timestamp',
  `endpoint_ipv4` INT COMMENT 'Null when Binary/Annotation.endpoint is null',
  `endpoint_ipv6` BINARY(16) COMMENT 'Null when Binary/Annotation.endpoint is null, or no IPv6 address',
  `endpoint_port` SMALLINT COMMENT 'Null when Binary/Annotation.endpoint is null',
  `endpoint_service_name` VARCHAR(255) COMMENT 'Null when Binary/Annotation.endpoint is null'
) ENGINE=InnoDB ROW_FORMAT=COMPRESSED CHARACTER SET=utf8 COLLATE utf8_general_ci;

ALTER TABLE zipkin_annotations ADD UNIQUE KEY(`trace_id_high`, `trace_id`, `span_id`, `a_key`, `a_timestamp`) COMMENT 'Ignore insert on duplicate';
ALTER TABLE zipkin_annotations ADD INDEX(`trace_id_high`, `trace_id`, `span_id`) COMMENT 'for joining with zipkin_spans';
ALTER TABLE zipkin_annotations ADD INDEX(`trace_id_high`, `trace_id`) COMMENT 'for getTraces/ByIds';
ALTER TABLE zipkin_annotations ADD INDEX(`endpoint_service_name`) COMMENT 'for getTraces and getServiceNames';
ALTER TABLE zipkin_annotations ADD INDEX(`a_type`) COMMENT 'for getTraces and autocomplete values';
ALTER TABLE zipkin_annotations ADD INDEX(`a_key`) COMMENT 'for getTraces and autocomplete values';
ALTER TABLE zipkin_annotations ADD INDEX(`trace_id`, `span_id`, `a_key`) COMMENT 'for dependencies job';

CREATE TABLE IF NOT EXISTS zipkin_dependencies (
  `day` DATE NOT NULL,
  `parent` VARCHAR(255) NOT NULL,
  `child` VARCHAR(255) NOT NULL,
  `call_count` BIGINT,
  `error_count` BIGINT,
  PRIMARY KEY (`day`, `parent`, `child`)
) ENGINE=InnoDB ROW_FORMAT=COMPRESSED CHARACTER SET=utf8 COLLATE utf8_general_ci;

verification

Access http://192.168.xxx.xxx:9411/zipkin/
Here Insert Picture Description
if using rabbitmq way, visit rabbitmq Web page you can see the name zipkinof the queueHere Insert Picture Description

Reference: https: //windmt.com/2018/04/24/spring-cloud-12-sleuth-zipkin/

Published 285 original articles · won praise 192 · Views 230,000 +

Guess you like

Origin blog.csdn.net/k_young1997/article/details/104239086