Distributed Link Tracking System (a): Dapper Introduction

Transfer:   https://blog.csdn.net/hustspy1990/article/details/93385286

 

 

 

Article Directory
outlines
the basic principles
Annotation
implanted
sample rate
information
Overview
With the advent of distributed systems and micro services, a user may request through a number of systems, the interaction between the different services is very complex, any error may affect the entire system request processing result. Previous surveillance systems often only know the health of a single system, the success or failure of a request, can not quickly locate the root cause of the failure. In addition, complex distributed systems also face this the following questions:

Performance analysis difficult: a service depends on many services, is dependent services also depend on other services. If an interface suddenly takes longer, it may not be called directly downstream service slow, it may be downstream of the downstream slow, time-consuming longer the root cause of how quickly locate the cause of it?
Link combing difficult: demand iterations quickly, calling the relationship between systems change frequently and on a clear system is difficult to sort out the link topology.
Capacity assessment difficult: when engaged in promotional activities, the general capacity is needed in advance to cope with traffic spikes, but different promotional activities, the effect of different flow inlet of each system is different, how to accurately assess the impact of growth on an inlet flow downstream system it ?
To address these issues, Google launched a distributed link tracking system Dapper, after referring to the various Internet companies have launched their own ideas Dapper distributed link tracking system.

The basic principle
below shows in FIG caused a user requests an RPC call between the system several times, in this example, the user request first arrives at the front end of the system A, A and the intermediate layer system call B, C, and C followed called two back-end systems D, E. Distributed link tracking system is to record basic information about this series of RPC calls: call relations, time of occurrence, abnormality information and some traffic markings.

In order to record all calls to a request triggered together, the need for a global traceId, pass this traceId call on the entire link, and associated with each record of day.

Dapper tracking architecture built like a tree in an RPC call, the tree node is the basic unit of the whole structure, and each node is a reference to the span. span represents the connection between the node and its parent span a direct relationship. In the above chart, the Span are recorded each interface name calling, parent id and their own id. These Dapper where id is globally unique 64-bit integer, in some other distributed tracking system link, and parent span id id is typically combined into a string representation:

-0
- 0.1
--- 0.1.1
--- 0.2.2
--0.2
--- 0.2.1
. 1
2
. 3
. 4
. 5
. 6
such spanId = 0.1 spanId parent node represents the span is 0, spanId = 0.1 .1 parent node represents the span is 0.1.

The following is a detailed view of the internal span, except spanName, spanId, also records the client and server ends the start and end times,

Annotation
Dapper also allows developers to add additional information in the span in order to conduct higher-level monitoring system, or add debug information. annotation in two ways: plain text, key-value pairs. In a distributed link tracking system, not only can traceId query link information can also be queried trace link associated key = value according to the expression. For example, in the trading business, using key / value recording order number, you can find out a series of related trace link according to the order number, which for troubleshooting problems is helpful.

Implantation of
how to pass trace information when RPC calls, how to report the data collected it? Dapper known for application developers almost zero intrusion, mainly dependent on the reform of the common component library, which requires the use of a unified internal RPC framework. Transformation as follows:

Internal system to track ThreadLocal placed in context.
When executed by the asynchronous thread pool, thread pool package, pass the ThreadLocal from the main thread to the child thread. For callback mode is also similar operations.
For RPC communication, modify the RPC frame, the traceId, spanId passed on in Context.
Sampling rate of
trace data collection and reporting will consume some system resources, if the system performance requirements are relatively high, can not all the amount of acquisition reporting, set a sampling rate to capture only part of the data, but also helps us analyze system problems.

Information
Dapper, large-scale distributed systems tracking system
Dapper, A Large-Scale Distributed Systems Tracing Infrastructure
--------------------- 
Author: albon_arith 
Source: CSDN 
Original: https://blog.csdn.net/hustspy1990/article/details/93385286 
Disclaimer: This article is a blogger original article, reproduced, please attach Bowen link!

Guess you like

Origin blog.csdn.net/qq_36688928/article/details/93465795