Build Telemetry for Distributed Services之OpenTracing实践

官网:https://opentracing.io/docs/best-practices/

Best Practices

 This page aims to illustrate common use cases that developers who instrument their applications and libraries with OpenTracing API need to deal with.

Stepping Back: Who is OpenTracing for?

OpenTracing is a thin standardization layer that sits between application/library code and various systems that consume tracing and causality data. Here is a diagram:

   +-------------+  +---------+  +----------+  +------------+
   | Application |  | Library |  |   OSS    |  |  RPC/IPC   |
   |    Code     |  |  Code   |  | Services |  | Frameworks |
   +-------------+  +---------+  +----------+  +------------+
          |              |             |             |
          |              |             |             |
          v              v             v             v
     +-----------------------------------------------------+
     | · · · · · · · · · · OpenTracing · · · · · · · · · · |
     +-----------------------------------------------------+
       |               |                |               |
       |               |                |               |
       v               v                v               v
 +-----------+  +-------------+  +-------------+  +-----------+
 |  Tracing  |  |   Logging   |  |   Metrics   |  |  Tracing  |
 | System A  |  | Framework B |  | Framework C |  | System D  |
 +-----------+  +-------------+  +-------------+  +-----------+

Use cases

The table below lists some use cases for OpenTracing and describes them in detail:

Use case Description
Application Code Developers writing application code can use OpenTracing to describe causality, demarcate control flow, and add fine-grained logging information along the way.
Library Code Libraries that take intermediate control of requests can integrate with OpenTracing for similar reasons. For instance, a web middleware library can use OpenTracing to create spans for request handling, or an ORM library can use OpenTracing to describe higher-level ORM semantics and measure execution for specific SQL queries.
OSS Services Beyond embedded libraries, entire OSS services may adopt OpenTracing to integrate with distributed traces initiating in – or propagating to – other processes in a larger distributed system. For instance, an HTTP load balancer may use OpenTracing to decorate requests, or a distributed key:value store may use OpenTracing to explain the performance of reads and writes.
RPC/IPC Frameworks Any subsystem tasked with crossing process boundaries may use OpenTracing to standardize the format of tracing state as it injects into and extracts from various wire formats and protocols.

All of the above should be able to use OpenTracing to describe and propagate distributed traces without knowledge of the underlying OpenTracing implementation.

OpenTracing priorities

Since there are many orders of magnitude more programmers and applications above the OpenTracing layer (rather than below it), the APIs and use cases prioritize ease-of-use in that direction. While there are certainly ample opportunities for helper libraries and other abstractions that save time and effort for OpenTracing implementors, the use cases in this document are restricted to callers (rather than callees) of OpenTracing APIs.

Motivating Use Cases

The sections below discuss some commonly encountered use cases in the OpenTracing ecosystem.

Tracing a Function

def top_level_function():
    span1 = tracer.start_span('top_level_function') try: . . . # business logic finally: span1.finish()

As a follow-up, suppose that as part of the business logic above we call another function2 that we also want to trace. In order to attach that function to the ongoing trace, we need a way to access span1. We discuss how it can be done later, for now let’s assume we have a helper function get_current_span for that:

def function2():
    span2 = get_current_span().start_child('function2') \ if get_current_span() else None try: . . . # business logic finally: if span2: span2.finish()

We assume that, for whatever reason, the developer does not want to start a new trace in this function if one hasn’t been started by the caller already, so we account for get_current_span potentially returning None.

These two examples are intentionally naive. Usually developers will not want to pollute their business functions directly with tracing code, but use other means like a function decorator in Python:

@traced_function
def top_level_function():
    ... # business logic

Tracing Server Endpoints

Tracing Server Endpoints

When a server wants to trace execution of a request, it generally needs to go through these steps:

  1. Attempt to extract a SpanContext that’s been propagated alongside the incoming request (in case the trace has already been started by the client), or start a new trace if no such propagated SpanContext could be found.
  2. Store the newly created Span in some request context that is propagated throughout the application, either by application code, or by the RPC framework.
  3. Finally, close the Span using span.finish() when the server has finished processing the request.

Extracting a SpanContext from an Incoming Request

Let’s assume that we have an HTTP server, and the SpanContext is propagated from the client via HTTP headers, accessible via request.headers:

extracted_context = tracer.extract( format=opentracing.HTTP_HEADER_FORMAT, carrier=request.headers )

Here we use the headers map as the carrier. The Tracer object knows which headers it needs to read in order to reconstruct the tracer state and any Baggage.

Continuing or Starting a Trace from an Incoming Request

The extracted_context object above can be None if the Tracer did not find relevant headers in the incoming request: presumably because the client did not send them. In this case the server needs to start a brand new trace.

extracted_context = tracer.extract( format=opentracing.HTTP_HEADER_FORMAT, carrier=request.headers ) if extracted_context is None: span = tracer.start_span(operation_name=operation) else: span = tracer.start_span(operation_name=operation, child_of=extracted_context) span.set_tag('http.method', request.method) span.set_tag('http.url', request.full_url)

The set_tag calls are examples of recording additional information in the Span about the request.

The set_tag calls are examples of recording additional information in the Span about the request.

The operation above refers to the name the server wants to give to the Span. For example, if the HTTP request was a POST against /save_user/123, the operation name can be set to post:/save_user/. The OpenTracing API does not dictate how applications name the spans.

In-Process Request Context Propagation

Request context propagation refers to application’s ability to associate a certain context with the incoming request such that this context is accessible in all other layers of the application within the same process. It can be used to provide application layers with access to request-specific values such as the identity of the end user, authorization tokens, and the request’s deadline. It can also be used for transporting the current tracing Span.

Implementation of request context propagation is outside the scope of the OpenTracing API, but it is worth mentioning them here to better understand the following sections. There are two commonly used techniques of context propagation:

Implicit Propagation

In implicit propagation techniques the context is stored in platform-specific storage that allows it to be retrieved from any place in the application. Often used by RPC frameworks by utilizing such mechanisms as thread-local or continuation-local storage, or even global variables (in case of single-threaded processes).

The downside of this approach is that it almost always has a performance penalty, and in platforms like Go that do not support thread-local storage implicit propagation is nearly impossible to implement.

Explicit Propagation

In explicit propagation techniques the application code is structured to pass around a certain context object:

func HandleHttp(w http.ResponseWriter, req *http.Request) { ctx := context.Background() ... BusinessFunction1(ctx, arg1, ...) } func BusinessFunction1(ctx context.Context, arg1...) { ... BusinessFunction2(ctx, arg1, ...) } func BusinessFunction2(ctx context.Context, arg1...) { parentSpan := opentracing.SpanFromContext(ctx) childSpan := opentracing.StartSpan( "...", opentracing.ChildOf(parentSpan.Context()), ...) ... }

The downside of explicit context propagation is that it leaks what could be considered an infrastructure concern into the application code. This Go blog post provides an in-depth overview and justification of this approach.

Tracing Client Calls

When an application acts as an RPC client, it is expected to start a new tracing Span before making an outgoing request, and propagate the new Span along with that request. The following example shows how it can be done for an HTTP request.

def traced_request(request, operation, http_client): # retrieve current span from propagated request context parent_span = get_current_span() # start a new span to represent the RPC span = tracer.start_span( operation_name=operation, child_of=parent_span.context, tags={'http.url': request.full_url} ) # propagate the Span via HTTP request headers tracer.inject( span.context, format=opentracing.HTTP_HEADER_FORMAT, carrier=request.headers) # define a callback where we can finish the span def on_done(future): if future.exception(): span.log(event='rpc exception', payload=exception) span.set_tag('http.status_code', future.result().status_code) span.finish() try: future = http_client.execute(request) future.add_done_callback(on_done) return future except Exception e: span.log(event='general exception', payload=e) span.finish() raise
  • The get_current_span() function is not a part of the OpenTracing API. It is meant to represent some util method of retrieving the current Span from the current request context propagated implicitly (as is often the case in Python).
  • We assume the HTTP client is asynchronous, so it returns a Future, and we need to add an on-completion callback to be able to finish the current child Span.
  • If the HTTP client returns a future with exception, we log the exception to the Span with log method.
  • Because the HTTP client may throw an exception even before returning a Future, we use a try/catch block to finish the Span in all circumstances, to ensure it is reported and avoid leaking resources.

Using Baggage / Distributed Context Propagation

The client and server examples above propagated the Span/Trace over the wire, including any Baggage. The client may use the Baggage to pass additional data to the server and any other downstream server it might call.

# client side
span.context.set_baggage_item('auth-token', '.....') # server side (one or more levels down from the client) token = span.context.get_baggage_item('auth-token')

Logging Events

猜你喜欢

转载自www.cnblogs.com/panpanwelcome/p/11694206.html