Introduction to OpenTracing

Since the release of Google Dapper’s paper, the distributed link tracking products developed by major Internet companies and open source communities have flourished. At the same time, it has also caused a problem for users. The APIs of various distributed link tracking products are not compatible. The cost for users to switch between various products is very high.

OpenTracing perfectly solves this problem. OpenTracing provides platform-independent and vendor-independent APIs to help developers easily add (or replace) tracking systems.

Introduction to Trace

A Trace represents the execution of a transaction, request, or process in a distributed system. A trace in OpenTracing is considered to be a directed acyclic graph (DAG diagram) composed of multiple Spans. A Span represents a logical unit with a start time and an execution time in the system. The Span generally has a name. In a Trace Span is connected end to end.

 

 

The above figure shows the whole process of a client request in a distributed system. Although this kind of visual graph is useful for viewing the combination of components, it cannot well show the calling time, sequence, serial or Parallel and other information, if you want to show more complex call relationships, the graph will be more complicated.

If the processing flow of this client request is regarded as a Trace, each call, whether it is an HTTP call, an RPC call, a storage access, or a local method call we are more concerned about, can become a Span, usually as shown in the following figure :

 

 

Each color block in the figure is a Span. We can clearly see that after the request enters the back-end load balancer, it will first call the authorization service processing, then call the billing service processing, and finally execute the resource service, where the container start-up The two operations and storage allocation are executed in parallel.

Introduction to Span

Span represents a logical unit with start time and execution time in the system. Spans establish logical causality through nesting or sequential arrangement.

Each Span can contain the following information:

  • Operation name : such as the specific RPC service accessed, the URL address accessed, etc.;

  • A start time ;

  • End Time ;

  • Span Tag : a set of Span tags composed of a set of key-value pairs, where the key must be a string type, and the value can be a string, a bool value or a number;

  • Span Log : a set of Span logs;

  • SpanContext : Trace global context information;

  • References : Reference relationship between Spans, the following describes the reference relationship between Spans in detail;

In a Trace, a Span can have a causal relationship with one or more Spans. Currently, OpenTracing defines the reference relationship between ChildOf and FollowsFrom Span. These two reference types represent the direct causal relationship between the child node and the parent node.

  • ChildOf relationship : A Span may be a child of a parent Span, which is a ChildOf relationship. The following situations constitute a ChildOf relationship:

    • In an HTTP request, the Span generated by the called server and the Span generated by the calling client form the ChildOf relationship;

    • A Span of SQL Insert operation and Span of ORM save method form a ChildOf relationship.

 

Obviously, the parent Span in the above ChildOf relationship has to wait for the return of the child Span. The execution time of the child Span affects the execution time of its parent Span, and the parent Span depends on the execution result of the child Span. In addition to serial tasks, there are many parallel tasks in our logic, and their corresponding Spans are also parallel. In this case, a parent Span can merge the execution results of all child Spans and wait for the end of all parallel child Spans. .

The following figure shows the Span of the above two ChildOf relationships:

 

 

  • FollowsFrom relationship : In a distributed system, some upstream systems (parent nodes) do not rely on the execution results of downstream systems (child nodes) in any way. For example, upstream systems send messages to downstream systems through message queues. In this case, there is a FollowsFrom relationship between the child Span corresponding to the downstream system and the parent Span corresponding to the upstream system. The following figure shows some possible FollowsFrom relationships:

 

The following example Trace is composed of 8 Spans, among which Span A and Span C are ChildOf relationships, and Span F and Span G are FollowsFrom relationships:

Introduction to Logs

Each Span can perform multiple Logs operations, and each Logs operation requires a timestamp and optional additional information. In the environment built above, the trace obtained by requesting http://localhost:8000/err will record the exception stack information through Logs, as shown in the following figure, which not only includes the exception stack information, but also includes some descriptive information Key-value pair information:

Tags introduction

Each Span can have multiple tags in the form of key-value pairs. Tags do not have a timestamp, just add some simple explanations and supplementary information for the Span. The following figure shows the information of Tags in the previous example:

SpanContext 和 Baggage

SpanContext represents the process boundary. Some global information, such as TraceId, current SpanId, etc., needs to be encapsulated in Baggage and passed to another process (downstream system) when entering the call.

Baggage is a collection of key-value pairs stored in SpanContext. It will be globally transmitted in a trace, and all Spans in the trace can get the information in it.

It should be noted that since Baggage needs to be transmitted globally across processes, serialization and deserialization operations of related data will be involved. If too much data is stored in Baggage, serialization and deserialization operations will be time-consuming. Long, the RPC latency of the entire system increases and throughput decreases.

Although Baggage, like Span Tags, is a collection of key-value pairs, the biggest difference between the two is that the information in Span Tags will not be transmitted across processes, while Baggage needs to be transmitted globally. Therefore, OpenTracing requires two operations, Inject and Extract. SpanContext can add key-value pair data to Baggage through the Inject operation, and obtain key-value pair data from Baggage through Extract.

 

Core interface semantics

OpenTracing hopes that each implementation platform can be modeled and implemented based on the above core concepts. Not only that, OpenTracing also provides a description of the core interfaces to help developers better implement the OpenTracing specification.

  • Span interface

The Span interface must implement the following functions:

    • Get the associated SpanContext : Get the associated SpanContext object through Span.

    • Close (Finish) Span : Complete the span that has already started.

    • Add Span Tag : Add Tag key-value pairs for Span.

    • Add Log: Add a Log event to Span.

    • Add Baggage Item: Add a set of key-value pairs to Baggage.

    • Get Baggage Item: Get the elements in Baggage according to Key.

  • SpanContext interface

The SpanContext interface must implement the following functions. The user can obtain the SpanContext interface instance through the Span instance or Tracer's Extract capability.

    • Traverse all KVs in Baggage .

  • Tracer interface

The Tracer interface must implement the following functions:

  • Create Span: Create a new Span.

  • Inject SpanContext : It is mainly to record Baggage data carried by cross-process calls into the current SpanContext.

  • Extracting the SpanContext is  mainly to extract the global information in the current SpanContext and encapsulate it into Baggage for subsequent cross-process calls.

Guess you like

Origin blog.csdn.net/lewee0215/article/details/109621568