A Guide to React Design Patterns

The observability of today's complex dynamical systems depends on domain knowledge, or more importantly, unknown "unknowns" based on incomplete domain knowledge. In other words, the cases that fall between the cracks surprise us, as the following quote shows:

It took us less than an hour to figure out how to bring the network back; it took a few extra hours because it took us so long to get the misbehaving IMPs under control and get them back to normal. A built-in software alert system (assuming, of course, that it is immune to false positives) may allow us to restore the network much faster, reducing the duration of the outage significantly. This is not to say that a better alarm and control system is a substitute for careful research and design in trying to properly allocate the utilization of vital resources, only that it is a necessary adjunct to dealing with the inevitable fall-offs that even the most careful design will inevitably fall into. Case Between Cracks - (Rosen, RFC 789)

Essentially, observability is how we expose the behavior of a system (hopefully in some disciplined way) and understand that behavior.

In this article, we'll discuss the importance of observability and examine the basics of how to write observable Rust applications.

Why is observability important?

Due to the proliferation of microservices deployments and orchestration engines, how to access advanced boot options in Windows 11 (6 ways) Our systems have become more complex, with large companies running thousands of microservices and even startups running hundreds A microservice.

The harsh reality of microservices is that they suddenly force every developer to become a cloud/distributed systems engineer, dealing with the inherent complexities of distributed systems. Specifically, partial failures, where the unavailability of one or more services may adversely affect the system in unknown ways. – (Meiklejohn et al., Service Level Fault Injection Testing)

In this age of complexity, implementing observability can go a long way toward building, troubleshooting, and benchmarking systems in the long run. Providing observability begins by collecting output data (telemetry and instrumentation) from our running systems, at an appropriate level of abstraction (typically organized around request paths) so that we can explore and dissect data patterns and find cross-correlations.

On paper, this sounds easy to achieve. How to disable the keyboard that comes with the laptop in Windows 11? 3 Easy Ways We gathered the three pillars (logs, metrics, and traces) and we were done. However, these three pillars are just bits in themselves, and collecting the most useful bits and analyzing the collection of bits together is the most complex.

Forming the correct abstraction is the hard part. It can be very domain specific and relies on building a model of our system behavior that is open to change and ready for the unexpected. It involves developers having to become more involved in how to generate and diagnose events in their applications and systems.

Throwing log statements all over the place and collecting every possible metric loses long-term value and causes other problems. We need to expose and enhance meaningful outputs so that data association is possible.

This is a Rust article after all, and while Rust was built with safety, speed, and efficiency in mind, exposing system behavior was not one of its founding principles.

How can we make Rust applications more observable?

Starting from first principles, how do we instrument our code, collect meaningful traces, and derive data to help us explore unknown "unknowns"? If everything is event-driven, and we have traces that capture a range of events/actions, including requests/responses, database reads/writes, and/or cache misses, etc., then it's important to have to communicate with the outside world for end-to-end observability What's the trick to building a Rust app from scratch? What are the building blocks?

Sadly, there's more than one trick or silver bullet here, especially when writing Rust services, which leaves a lot to the developer. First, the only thing we can really rely on to understand and debug unknown "unknowns" is telemetry data, and we should make sure we present meaningful contextual telemetry data (e.g., correlable fields like , , and ). Second, we need a way to explore that output and correlate it across systems and services. request_path parent_spantrace_id categorysubject

In this post, we'll focus primarily on gathering and gathering meaningful contextual output data, but we'll also discuss how best to connect to platforms that provide further processing, analysis, and visualization. Fortunately, core tools are available to instrument Rust programs to collect structured event data and process and emit traces for asynchronous and synchronous communication.

We'll focus on the most standard and flexible framework, Tracing, which sits around spans, events, and subscribers, and how to take advantage of its composability and customizability.

However, even though we have an extensive framework, such as Tracing , that helps us write the foundation of observable services in Rust, meaningful telemetry doesn't come "out of the box" or "for free".

Getting the abstractions right in Rust is not as straightforward as it is in other languages. Instead, a robust application must be built on top of layered behaviors, all of which provide exemplary control for the informed developer, but can be cumbersome for the less experienced.

We'll decompose the problem space into a series of composable layers that act on four distinct units of behavior:

  • Store context information for future use

  • Augment structured logs with contextual information

  • Deriving Metrics by Detection and Span Duration

  • Open Telemetry Interoperability for Distributed Tracing

Similar to how the original QuickCheck paper on property-based testing relied on users specifying properties and providing instances for user-defined types, building an end-to-end observable Rust service requires an understanding of how traces are generated, how data is specified and maintained, and how What telemetry makes sense as your application grows. This is especially true when debugging and/or exploring inconsistencies, partial failures, and questionable performance characteristics.

Trace collection will drive everything in this article's examples, where spans and events will be the lenses through which we tie together a complete picture of a known quantity. We'll have logs, but we'll treat them as structured events. We'll collect metrics, but automate with instrumentation and spanning, and export OpenTelemetry-compatible trace data to a distributed tracing platform like Jaeger.

Scope, Events and Tracking

Before getting into implementation details, let's start with some terms and concepts we need to be familiar with, such as spans, traces, and events.

across

A span represents an operation or segment that belongs to a trace and serves as the main building block of distributed tracing. For any given request, the initial span (with no parent) is called the root span. It is usually expressed as the end-to-end latency of an entire user request given a distributed trace.

There can also be subsequent child spans, which can be nested under other different parent spans. The total execution time of a span includes the time spent in that span and the entire subtree represented by its children.

Here is an example of a parent span log that is intentionally compressed for a new request:

level=INFO span name="HTTP request" span=9008298766368774 parent_span=9008298766368773 span_event=new_span timestamp=2022-10-30T22:30:28.798564Z http.client_ip=127.0.0.1:61033 http.host=127.0.0.1:3030 http.method=POST http.route=/songs trace_id=b2b32ad7414392aedde4177572b3fea3

This span log contains important information and metadata such as the request path ( ), timestamp ( ), request method ( ), and trace identifier ( , and ) respectively. We'll use this information to demonstrate how to tie traces together from start to finish. http.route 2022-10-30T22:30:28.798564Zhttp.method spanparent_span``trace_id

Why is it called a span? Ben Sigelman, author of Google's Dapper tracing infrastructure paper, considers these factors in a brief history of The Span: Hard to love, hard to kill:

  • In the code itself, the API feels like a timer

  • When viewing a trace as a directed graph, the data structure looks like a node or vertex

  • In the context of structured, multi-process logging (side note: this is what distributed tracing is all about at the end of the day), one might think of spans as two events

  • Given a simple timing diagram, it's easy to call this concept a duration or window

event

An event represents a single operation in time, where something happens during the execution of some arbitrary program. In contrast to out-of-band unstructured logging, we treat events as the core unit of ingestion that occurs within the context of a given span, and are structured using key-value fields (similar to span logs above). More precisely, these events are called span events:

level=INFO msg="finished processing vendor request" subject=vendor.response category=http.response vendor.status=200 vendor.response_headers="{\"content-type\": \"application/json\", \"vary\": \"Accept-Encoding, User-Agent\", \"transfer-encoding\": \"chunked\"}" vendor.url=http://localhost:8080/.well-known/jwks.json vendor.request_path=/.well-known/jwks.json target="application::middleware::logging" location="src/middleware/logging.rs:354" timestamp=2022-10-31T02:45:30.683888Z

Our application can also have arbitrary structured log events that occur outside of the span context. For example, displaying configuration settings at startup or monitoring when caches are refreshed.

trace

A trace is a collection of spans representing some workflow, such as a server request or a queue/stream processing step for an item. Essentially, a trace is a directed acyclic graph of spans, where edges connecting spans indicate a causal relationship between a span and its parent.

Here is an example trace visualized in the Jaeger UI:

If this application were part of a larger distributed trace, we would see it nested within a larger parent span.

Now, with these terms in hand, how do we start implementing the skeleton of an observability-ready Rust application?

Combine multiple tracking layers to build subscribers

Tracking framework is split into different components (as crates). For our purposes we will focus on this set of dependencies: .toml

opentelemetry = { version = "0.17", features = ["rt-tokio", "trace"] }
opentelemetry-otlp = { version = "0.10", features = ["metrics", "tokio", "tonic", "tonic-build", "prost", "tls", "tls-roots"], default-features = false}
opentelemetry-semantic-conventions = "0.9"
tracing = "0.1"
tracing-appender = "0.2"
tracing-opentelemetry = "0.17"
tracing-subscriber = {version = "0.3", features = ["env-filter", "json", "registry"]}

This crate enables us to write trace subscribers from smaller units of behavior (called layers) to collect and enrich trace data. tracing_subscriber

It itself is responsible for registering new spans (with spans) on creation, logging and appending field values ​​and subsequent annotations to spans, and filtering out spans and events. Subscriber``id

When combined with a Subscriber, the layer utilizes hooks fired throughout the span's lifecycle:

fn on_new_span(&self, _attrs: &Attributes<'_>, _id: &span::Id, _ctx: Context<'_, C>) {...} 
fn on_record(&self, _span: &Id, _values: &Record<'_>, _ctx: Context<'_, S>) { ... }
fn on_follows_from(&self, _span: &Id, _follows: &Id, _ctx: Context<'_, S>) { ... }
fn event_enabled(&self, _event: &Event<'_>, _ctx: Context<'_, S>) -> bool { ... }
fn on_event(&self, _event: &Event<'_>, _ctx: Context<'_, S>) { ... }
fn on_enter(&self, _id: &Id, _ctx: Context<'_, S>) { ... }
fn on_exit(&self, _id: &Id, _ctx: Context<'_, S>) { ... }
fn on_close(&self, _id: Id, _ctx: Context<'_, S>) { ... }

How are the layers in the code organized? Let's start with the setup method, generating a registry defined by four combinators or layers: with

fn setup_tracing(
    writer: tracing_appender::non_blocking::NonBlocking,
    settings_otel: &Otel,
) -> Result<()> {
    let tracer = init_tracer(settings_otel)?;
    let registry = tracing_subscriber::Registry::default()
        .with(StorageLayer.with_filter(LevelFilter::TRACE))
        .with(tracing_opentelemetry::layer()...                
        .with(LogFmtLayer::new(writer).with_target(true)...
        .with(MetricsLayer)...
        ); 
     ...

This function is usually called when the server's method is initialized. The storage layer itself provides zero output, but acts as an information store for gathering contextual trace information to enhance and extend the downstream output of other layers in the pipeline. setup_tracing main()main.rs

This method controls which spans and events are enabled for this layer, we want to capture basically everything, this is the most verbose option. with_filter``LevelFilter::TRACE

Let's examine each layer and see how each layer operates on the trace data it collects and hooks into the span lifecycle. Customizing the behavior of each layer involves implementing the lifecycle hooks associated with the trait, as shown below. Layer``on_new_span

Along the way, we'll demonstrate how these behavioral units can augment span and event log formats, automatically derive some metrics, and send what we gather downstream to a distributed tracing platform such as Jaeger, Honeycomb, or Datadog. We'll start with ours, which provides contextual information from which other layers can benefit. StorageLayer

Store context information for future use

on new span

impl<S> Layer<S> for StorageLayer
where
    S: Subscriber + for<'span> LookupSpan<'span>,
{
    fn on_new_span(&self, attrs: &Attributes<'_>, id: &Id, ctx: Context<'_, S>) {
        let span = ctx.span(id).expect("Span not found");
        // We want to inherit the fields from the parent span, if there is one.
        let mut visitor = if let Some(parent_span) = span.parent() {
            let mut extensions = parent_span.extensions_mut();
            let mut inner = extensions
                .get_mut::<Storage>()
                .map(|v| v.to_owned())
                .unwrap_or_default();
​
            inner.values.insert(
                PARENT_SPAN, // "parent_span"
                Cow::from(parent_span.id().into_u64().to_string()),
            );
            inner
        } else {
            Storage::default()
        };
​
        let mut extensions = span.extensions_mut();
        attrs.record(&mut visitor);
        extensions.insert(visitor);
    }
...

When starting a new span (via), for example, a request to the application to an endpoint, our code checks to see if we are already inside the parent span. Otherwise, it will default to the newly created empty, which is what is wrapped under the hood. on_new_span POST/songs HashmapStorage::default()

For simplicity, we default to keys mapped to string references and values ​​to copy-on-write (Cow) smart pointers around string references:

#[derive(Clone, Debug, Default)]
pub(crate) struct Storage<'a> {
    values: HashMap<&'a str, Cow<'a, str>>,
}

Thanks to spans, storing across layers persists fields across layers for the lifetime of the span, enabling us to mutably associate arbitrary data to spans or read immutably from persistent data (including our own data structures). extensions


More great articles from LogRocket:

  • Don't miss a replay moment, here's a curated newsletter from LogRocket

  • Learn how LogRocket's Galileo cancels out noise and proactively solves problems in your app

  • Use React's useEffect to optimize application performance

  • switch between multiple versions of node

  • Learn how to use React child props with TypeScript

  • Explore Using CSS to Create a Custom Mouse Cursor

  • Advisory boards aren't just for executives. Join LogRocket's Content Advisory Board. You'll help understand the type of content we create and gain access to exclusive meetups, social proof, and swag.


Many of these lifecycle hooks involve wrestling, which can be a bit verbose. The registry is what actually collects and stores span data, which can then be implemented by implementing .extensions``LookupSpan

Another code to highlight is to record field values ​​of various types by visiting each type of value, which is a must-implemented trait: attrs.record(&mut visitor)

// Just a sample of the implemented methods
impl Visit for Storage<'_> {
    /// Visit a signed 64-bit integer value.
    fn record_i64(&mut self, field: &Field, value: i64) {
        self.values
            .insert(field.name(), Cow::from(value.to_string()));
    }
    ... // elided for brevity
    fn record_debug(&mut self, field: &Field, value: &dyn fmt::Debug) {
        // Note: this is invoked via `debug!` and `info! macros
        let debug_formatted = format!("{:?}", value);
        self.values.insert(field.name(), Cow::from(debug_formatted));    
    }
...

Once we have recorded all the values ​​of each type, the visitor will store all these values ​​in storage which can be used by downstream layers for lifecycle triggers in the future. Hashmap

record on file

impl<S> Layer<S> for StorageLayer
where
    S: Subscriber + for<'span> LookupSpan<'span>,
{
... // elided for brevity
    fn on_record(&self, span: &Id, values: &Record<'_>, ctx: Context<'_, S>) {
        let span = ctx.span(span).expect("Span not found");
        let mut extensions = span.extensions_mut();
        let visitor = extensions
            .get_mut::<Storage>()
            .expect("Visitor not found on 'record'!");
        values.record(visitor);
    }
... // elided for brevity

As we proceed through each of the lifecycle triggers, we'll notice that the patterns are similar. We get a mutable scope handle in the span's store extension and log values ​​as they arrive.

This hook notifies the layer that a span with the given identifier has logged the given value via a call like: debug_span!``info_span!

let span = info_span!(
    "vendor.cdbaby.task",
    subject = "vendor.cdbaby",
    category = "vendor"
);

event

impl<S> Layer<S> for StorageLayer
where
    S: Subscriber + for<'span> LookupSpan<'span>,
{
... // elided for brevity
    fn on_event(&self, event: &Event<'_>, ctx: Context<'_, S>) {
        ctx.lookup_current().map(|current_span| {
            let mut extensions = current_span.extensions_mut();
            extensions.get_mut::<Storage>().map(|visitor| {
                if event
                    .fields()
                    .any(|f| ON_EVENT_KEEP_FIELDS.contains(&f.name()))
                {
                    event.record(visitor);
                }
            })
        });
    }
... // elided for brevity

For our context store layer, there is usually no need to hook into events (such as messages). However, this is valuable for storing information about event fields that we want to preserve, which might be useful in another layer later in the pipeline. tracing::error!

An example is storing events attributed to errors so that we can track errors in the metrics layer (e.g. an array of fields bound to the error key). ON_EVENT_KEEP_FIELDS

when entering and closing

impl<S> Layer<S> for StorageLayer
where
    S: Subscriber + for<'span> LookupSpan<'span>,
{
... // elided for brevity
    fn on_enter(&self, span: &Id, ctx: Context<'_, S>) {
        let span = ctx.span(span).expect("Span not found");
        let mut extensions = span.extensions_mut();
        if extensions.get_mut::<Instant>().is_none() {
            extensions.insert(Instant::now);
        }
    }

    fn on_close(&self, id: Id, ctx: Context<'_, S>) {
        let span = ctx.span(&id).expect("Span not found");
        let mut extensions = span.extensions_mut();
        let elapsed_milliseconds = extensions
            .get_mut::<Instant>()
            .map(|i| i.elapsed().as_millis())
            .unwrap_or(0);

        let visitor = extensions
            .get_mut::<Storage>()
            .expect("Visitor not found on 'record'");

        visitor.values.insert(
            LATENCY_FIELD, // "latency_ms"
            Cow::from(format!("{}", elapsed_milliseconds)),
        );
    }
... // elided for brevity

A span is essentially a marked interval of time, with a well-defined start and end. For ranges of spans, we want to capture the time elapsed between when an input has a given span() and when it is closed for a given operation. id``Instant::now

Storing the latency of each span in our extension enables other layers to automatically derive metrics and facilitates exploration purposes when debugging the event log for a given span. Below, we can see the opening and closing of the provider task/process span, which takes 18ms from start to finish: id``id=452612587184455697

level=INFO span_name=vendor.lastfm.task span=452612587184455697 parent_span=span=452612587184455696 span_event=new_span timestamp=2022-10-31T12:35:36.913335Z trace_id=c53cb20e4ab4fa42aa5836d26e974de2 http.client_ip=127.0.0.1:51029 subject=vendor.lastfm application.request_path=/songs http.method=POST category=vendor http.host=127.0.0.1:3030 http.route=/songs request_id=01GGQ0MJ94E24YYZ6FEXFPKVFP
level=INFO span_name=vendor.lastfm.task span=452612587184455697 parent_span=span=452612587184455696 span_event=close_span timestamp=2022-10-31T12:35:36.931975Z trace_id=c53cb20e4ab4fa42aa5836d26e974de2 latency_ms=18 http.client_ip=127.0.0.1:51029 subject=vendor.lastfm application.request_path=/songs http.method=POST category=vendor http.host=127.0.0.1:3030 http.route=/songs request_id=01GGQ0MJ94E24YYZ6FEXFPKVFP

Augment structured logs with contextual information

Now, we'll see how the stored data is leveraged for actual telemetry output by looking at the event log formatting layer:

.with(LogFmtLayer::new(writer).with_target(true)...

Many examples favor custom formatters when writing custom layer and subscriber implementations:

  • A useful online walkthrough showing how to build a custom JSON logger other than the one provided by default in the trackbox

  • Bunyan format

  • Embark Studio's logfmt

Note that the log example above uses the same format, inspired by InfluxDB's implementation)

We recommend using a published layer or library, or following the tutorials listed above to get into the details of generating the data in your preferred format.

This article builds our own custom formatter layer, and for this article we'll re-familiarize with the span lifecycle, specifically spans and event logs, and will now leverage our storage maps.

on new span

impl<S, Wr, W> Layer<S> for LogFmtLayer<Wr, W>
where
    Wr: Write + 'static,
    W: for<'writer> MakeWriter<'writer> + 'static,
    S: Subscriber + for<'span> LookupSpan<'span>,
{
    fn on_new_span(&self, _attrs: &Attributes<'_>, id: &Id, ctx: Context<'_, S>) {
        let mut p = self.printer.write();
        let metadata = ctx.metadata(id).expect("Span missing metadata");
        p.write_level(metadata.level());
        p.write_span_name(metadata.name());
        p.write_span_id(id);
        p.write_span_event("new_span");
        p.write_timestamp();

        let span = ctx.span(id).expect("Span not found");
        let extensions = span.extensions();
        if let Some(visitor) = extensions.get::<Storage>() {
            for (key, value) in visitor.values() {               
                p.write_kv(
                    decorate_field_name(translate_field_name(key)),
                    value.to_string(),
                )
            }
        }
        p.write_newline();
    }
... // elided for brevity

The code above prints a formatted text representation of the span event using the trait. The call to the printer and all printer methods execute the specific formatting attributes behind the scenes (in this case, again). MakeWriter decorate_field_namewrite``logfmt

Going back to our earlier span log example, keys like , , and are now more clearly set. The piece of code to be called here is how we loop over the values ​​read from the storage map, promoting the information we observed and collected at the previous layer. level spanspan_name``for (key, value)

We use this to provide context to augment structured log events in another layer. In other words, we compose specific sub-behaviors on trace data through layers to build a single subscriber for the entire trace. For example, field keys like and remove from this storage layer. http.route``http.host

event

impl<S, Wr, W> Layer<S> for LogFmtLayer<Wr, W>
where
    Wr: Write + 'static,
    W: for<'writer> MakeWriter<'writer> + 'static,
    S: Subscriber + for<'span> LookupSpan<'span>,
{
... // elided for brevity
    fn on_event(&self, event: &Event<'_>, ctx: Context<'_, S>) {
        let mut p = self.printer.write();
        p.write_level(event.metadata().level());
        event.record(&mut *p);
        //record source information
        p.write_source_info(event);
        p.write_timestamp();

        ctx.lookup_current().map(|current_span| {
            p.write_span_id(&current_span.id());
            let extensions = current_span.extensions();
            extensions.get::<Storage>().map(|visitor| {
                for (key, value) in visitor.values() {
                    if !ON_EVENT_SKIP_FIELDS.contains(key) {
                        p.write_kv(
                            decorate_field_name(translate_field_name(key)),
                            value.to_string(),
                        )
                    }
                }
            })
        });
        p.write_newline();
    }
... // elided for brevity

While somewhat tedious, the patterns for implementing these span lifecycle methods are getting easier to understand. Field key-value pairs such as target and location are formatted according to the source information, providing us with what we saw earlier. Liked keys are also fetched from the context store. target="application::middleware::logging" location="src/middleware/logging.rs:354"vendor.request_path``vendor.url

While it may take more work to implement any formatting specifications correctly, we can now see the fine-grained control and customization that the tracking framework provides. This contextual information is how we are ultimately able to form dependencies within the request lifecycle.

Deriving Metrics by Detection and Span Duration

Metrics, in particular, are actually very bad for observability itself, and the cardinality of metrics, the number of unique combinations of metric names and dimension values, can be easily abused.

blank

We have shown how to derive structured logs from events. Indicators themselves should be formed from the events or spans that contain them.

We still need out-of-band metrics such as those collected around processes (e.g. CPU usage, disk bytes written/read). But if we can already instrument code at the function level to determine when certain things happen, shouldn't some metrics "fall out for free"? As mentioned, we have the tools, but we just need to bring them through.

Tracing provides an accessible way to annotate functions that the user wants to instrument, which means creating, entering, and closing scopes every time an annotated function is executed. The rust compiler itself makes heavy use of these annotated instruments throughout the codebase:

#[instrument(skip(self, op), level = "trace")]
pub(super) fn fully_perform_op<R: fmt::Debug, Op>(
    &mut self,
    locations: Locations,
    category: ConstraintCategory<'tcx>,
    op: Op) -> Fallible<R>

For our purposes, let's look at a simple asynchronous database function instrumented with some very specific field definitions: save_event

#[instrument(
    level = "info",
    name = "record.save_event",
    skip_all,
    fields(category="db", subject="aws_db", event_id = %event.event_id,
           event_type=%event.event_type, otel.kind="client", db.system="aws_db",
           metric_name="db_event", metric_label_event_table=%self.event_table_name,
           metric_label_event_type=%event.event_type)
        err(Display)
)]
pub async fn save_event(&self, event: &Event) -> anyhow::Result<()> {
    self.db_client
        .put_item()
        .table_name(&self.event_table_name)
        .set(Some(event))
        .send()
        .await...
}

Our detection functions have prefix fields such as , , and . These keys correspond to metric names and labels typically found in Prometheus monitoring settings. We'll come back to these prefix fields later. First, let's extend the filter we initially set up with some additional filters. metric nameevent_type event_tableMetricsLayer

Essentially, these filters do two things: 1) generate metrics for all events at trace log level or higher (even though they may not log to stdout depending on the configured log level); 2) pass the instrumentation function with an additional prefix events, as described above. record``name = "record. save_event"

After that, all that remains is to return to our indicator layer implementation in order to automatically perform the indicator derivation.

when off

const PREFIX_LABEL: &str = "metric_label_";
const METRIC_NAME: &str = "metric_name";
const OK: &str = "ok";
const ERROR: &str = "error";
const LABEL: &str = "label";
const RESULT_LABEL: &str = "result";

impl<S> Layer<S> for MetricsLayer
where
    S: Subscriber + for<'span> LookupSpan<'span>,
{
    fn on_close(&self, id: Id, ctx: Context<'_, S>) {
        let span = ctx.span(&id).expect("Span not found");
        let mut extensions = span.extensions_mut();
        let elapsed_secs_f64 = extensions
            .get_mut::<Instant>()
            .map(|i| i.elapsed().as_secs_f64())
            .unwrap_or(0.0);
        if let Some(visitor) = extensions.get_mut::<Storage>() {
            let mut labels = vec![];
            for (key, value) in visitor.values() {
                if key.starts_with(PREFIX_LABEL) {
                    labels.push((
                        key.strip_prefix(PREFIX_LABEL).unwrap_or(LABEL),
                        value.to_string(),
                    ))
                }
            }
            ... // elided for brevity
            let name = visitor
                .values()
                .get(METRIC_NAME)
                .unwrap_or(&Cow::from(span_name))
                .to_string();
            if visitor.values().contains_key(ERROR)
                labels.push((RESULT_LABEL, String::from(ERROR)))
            } else {
                labels.push((RESULT_LABEL, String::from(OK)))
            }
            ... // elided for brevity
            metrics::increment_counter!(format!("{}_total", name), &labels);
            metrics::histogram!(
                format!("{}_duration_seconds", name),
                elapsed_secs_f64,
                &labels
            );
            ... // elided for brevity

There are a lot of bits being pushed around in this example, some of which are omitted. Nevertheless, we can always drive our histogram calculations via macros by accessing the end of the span interval. on_close elapsed_secs_f64metrics::histogram!

Note that we leverage the metrics-rs project here. Anyone can model this function in the same way using another indicator library that provides counter and histogram support. From the storage map, we extract all labeled keys and use these keys to generate labels for auto-derived incremented counters and histograms. metric_*

Also, if we store an errored event, we can use it as part of the label, based on / to differentiate our generated functions. Given any instrumented function, we will use the same code behavior to derive metrics from it. ok``error

The output we encounter from the Prometheus endpoint will show a counter that looks like this:

db_event_total{event_table="events",event_type="Song",result="ok",span_name="save_event\"} 8

Detect async closures and indirect span relationships

A question that arises from time to time is how to detect code that references spans that are indirect, non-parent-child relationships, or so-called follows from references.

This will apply to asynchronous operations that generate requests to downstream services for side effects or processes that send data to Service Bus where the direct response or returned output is not useful in the operation that generated it itself.

For these cases, we can directly instrument async closures (or futures) by stepping into a given span (captured below for reference) associated with our async future every time we poll and exit the future, as follows: follows_from``.instrument(process_span)

// Start a span around the context process spawn
let process_span = debug_span!(
    parent: None,
    "process.async",
    subject = "songs.async",
    category = "songs"
);
process_span.follows_from(Span::current());

tokio::spawn(
    async move {
        match context.process().await {
            Ok(r) => debug!(song=?r, "successfully processed"),
            Err(e) => warn!(error=?e, "failed processing"),
        }
    }
    .instrument(process_span),
);

Open Telemetry Interoperability for Distributed Tracing

Much of the usefulness of observability comes from the fact that most services today are actually composed of many microservices. We should all spread our minds.

If various services must interconnect across networks, providers, clouds, and even edge-oriented or local-first peers, some standard and vendor-agnostic tools should be enforced. This is where OpenTelemetry (OTel) comes into play, and many known observability platforms are more than happy to ingest OTel-compliant telemetry data.

While there is a whole suite of open-source Rust tools that work within the OTel ecosystem, many well-known Rust web frameworks have yet to incorporate the OTel standard in a built-in way.

Popular axum-encompassing web frameworks such as Actix and Tokio rely on custom implementations and external libraries to provide integration (actix-web-opentelemetry and axum-tracing-opentelemetry respectively). Third-party integrations have been by far the most popular option, and while this promotes flexibility and user control, it can make it more difficult for those looking to add integrations almost seamlessly.

We won't go into the details of writing a custom implementation here, but canonical HTTP middleware like Tower allows overriding the default implementation of request creation spans. If implemented according to the spec, the following fields should be set on the span's metadata:

  • http.client_ip: IP address of the client

  • http.flavor: The protocol version used (HTTP/1.1, HTTP/2.0, etc.)

  • http.host: the value of the header Host

  • http.method: request method

  • http.route: the matching route

  • http.request_content_length: request content length

  • http.response_content_length: Response content length

  • http.scheme: URI scheme to use (or HTTP``HTTPS)

  • http.status_code: response status code

  • http.target: the complete request target including path and query parameters

  • http.user_agent: the value of the header User-Agent

  • otel.kind: In general, find more information here server

  • otel.name: A name consisting of http.method``http.route

  • otel.status_code: if the response is successful; if it is 5xxOK``ERROR

  • trace_id: an identifier for a trace, used across processes to group together all spans of a particular trace

Initialize tracker

Tracing through and exposing another layer that we can use to compose our subscribers to add OTel context information to all spans and concatenate and emit those spans to an observability platform like Datadog or Honeycomb, or directly to a running Jaeger or Tempo instance, which samples trace data for manageable consumption. tracing-opentelemetry``rust-opentelemetry

Initializing a to generate and manage spans is easy: Tracer

pub fn init_tracer(settings: &Otel) -> Result<Tracer> {
    global::set_text_map_propagator(TraceContextPropagator::new());
​
    let resource = Resource::new(vec![
        otel_semcov::resource::SERVICE_NAME.string(PKG_NAME),
        otel_semcov::resource::SERVICE_VERSION.string(VERSION),
        otel_semcov::resource::TELEMETRY_SDK_LANGUAGE.string(LANG),
    ]);
​
    let api_token = MetadataValue::from_str(&settings.api_token)?;
    let endpoint = &settings.exporter_otlp_endpoint;
​
    let mut map = MetadataMap::with_capacity(1);
    map.insert("x-tracing-service-header", api_token);
​
    let trace = opentelemetry_otlp::new_pipeline()
        .tracing()
        .with_exporter(exporter(map, endpoint)?)
        .with_trace_config(sdk::trace::config().with_resource(resource))
        .install_batch(runtime::Tokio)
        .map_err(|e| anyhow!("failed to intialize tracer: {:#?}", e))?;
​
    Ok(trace)
}

It's also trivial to include in our layer pipeline. We can also filter by level and use dynamic filters to skip events we want to avoid in the trace:

.with(
    tracing_opentelemetry::layer()
        .with_tracer(tracer)
        .with_filter(LevelFilter::DEBUG)
        .with_filter(dynamic_filter_fn(|_metadata, ctx| {
            !ctx.lookup_current()
                // Exclude the rustls session "Connection" events
                // which don't have a parent span
                .map(|s| s.parent().is_none() && s.name() == "Connection")
                .unwrap_or_default()
        })),
)

With this pipeline initialization, all of our application traces can be pulled in by tools like Jaeger, as shown earlier in this article. Then, all that's left is data association, slicing, and dicing.

in conclusion

By combining these tracing layers together, we can expose information about system behavior in a granular and granular manner, while obtaining enough output and enough context to begin to understand such behavior. All this customization still comes at a price: it's not fully automatic, but the pattern is idiomatic, and there are many normal use cases to use the open source layer.

In all things, this post should help make it easier for users to try and collect custom application interactions using traces, and demonstrate how far we can go in preparing our applications to handle the inevitable. This is just the beginning of our beautiful friendship with events and when they occur, hence, observability. How we debug and resolve issues in the long run is always ongoing work.

Guess you like

Origin blog.csdn.net/weixin_47967031/article/details/132673136