Dapper - distributed tracking system

Distributed tracking system

background

Contemporary Internet services are usually implemented using complex, large-scale distributed clusters. Internet applications are built on different sets of software modules. These software modules may be developed by different teams, may be implemented using different programming languages, and may be distributed on thousands of servers across multiple different data center.

Please add image description

question:

  1. The overall request takes a long time. Which service is specifically slow?
  2. An error occurred during the request process. Which service specifically reported the error?
  3. What is the request volume of a certain service and what is the interface success rate?

For example, a front-end service may initiate a Web query to hundreds of query servers. This query may be sent to multiple subsystems. These subsystems are used to process advertisements, perform spell checking, or search for images, videos, etc. Or special results like news. Filter according to the query results of each subsystem to obtain the final results, which are finally summarized on the page. We call this search model "universal search".

In total, this global search may call thousands of servers, involving various services. Moreover, users are very sensitive to search time, and the inefficiency of any subsystem will lead to the final search time. If an engineer can only know that the query takes abnormally long time, but he has no way of knowing which service call is causing the problem, or why the performance of this call is unsatisfactory.

First of all, the engineer may not be able to accurately locate which services were called in this global search, because new services or even a certain fragment of the service may be online or modified at any time, which may be for User features may also be functional improvements such as performance or security authentication.

Secondly, you cannot require the engineer to know all the services participating in this global search. Each service may be developed or maintained by a different team.

Thirdly, these exposed services or servers may be used by other clients at the same time, so the performance problem of this global search may even be caused by other applications.

When faced with decoupling into a large number of distributed microservices, monitoring, warning, and locating faults become difficult. Therefore, there is a need for tools that can help understand system behavior and analyze performance problems.

Thinking: How to troubleshoot system problems without any additional tools

When our system fails, we need to log in to each server and use Linux script tools such as grep / sed / awk to find the cause of the failure in the logs.

In the absence of a logging system, you first need to locate the server that handles the request. If this server has multiple instances deployed, you need to go to the log directory of each application instance to find the log file.

Each application instance will also set a log rolling strategy (such as generating a file every day), as well as a log compression and archiving strategy. After the data expires, the log will disappear permanently.

To sum up, we may consider the need for a centralized log collection and retrieval system. This log system is ELK.

ELK / EFK log management system

ELK, which is the combination of Elasticsearch, Logstash, and Kibana, is an open source distributed log management solution.

Please add image description

  • Elasticsearch: responsible for log storage, retrieval and analysis
  • LogStash: Responsible for log collection and processing
  • Kibana: Responsible for log visualization

In short, ELK can help us collect and store the logs of various services in real time, and provide us with a visual log retrieval page.

Kubernetes officially provides the EFK log collection solution. Logstash is based on JDK. The fatal problem of Logstash is its performance and resource consumption (the default heap size is 1GB). Although its performance has improved significantly in recent years, it is still much slower than its replacements. Simply starting Logstash when no logs are generated consumes about 500M of memory. When a log collection component is started in each Pod, using Logstash is a bit of a waste of system resources. Using Fluentd instead is quite lightweight compared to Logstash.

insufficient

The traditional ELK solution requires developers to print as many logs as possible when writing code, then collect and filter log data related to business logic from ES through key fields, and then piece together on-site information about business execution.

There are the following pain points:

  • Log collection is cumbersome : Although ES provides log retrieval capabilities, log data often lacks structured text segments, making it difficult to quickly and completely collect all relevant logs.
  • Difficulty in log filtering : There are overlaps between different business scenarios and business logic, and business logs printed by overlapping logic may interfere with each other, making it difficult to filter out the correct associated logs.
  • Log analysis is time-consuming : The collected logs are just pieces of discrete data. You can only read the code. Combined with logic, the logs are manually analyzed in series to restore the scene as much as possible.

To sum up, as the complexity of business logic and systems increases, the traditional ELK solution becomes increasingly time-consuming and labor-intensive in terms of log collection, log filtering, and log analysis.

Thinking: Is there a better way to troubleshoot in discrete logs?

  1. Log concatenation: traceid
  2. Log clustering: Clustering Intrusion Detection Alarms to Support Root Cause Analysis [KLAUS JULISCH, 2002]

Sci-Hub | Clustering intrusion detection alarms to support root cause analysis. ACM Transactions on Information and System Security, 6(4), 443–471 | 10.1145/950191.950192

Dapper - distributed session tracking system

1 Overview

Almost all distributed session tracking system frameworks on the market are implemented based on the Google Dapper paper, and are generally similar.

Core idea: In the process of a user requesting a service, no matter how many subsystems the request is distributed to, or the subsystem calls more subsystems, we will track and record the system information and the calling relationships between systems. Finally, the data is collected and displayed visually.

Design goals

  1. Low consumption: The impact of the tracking system on online services should be small enough. In some very performance-sensitive services, even a small loss can be easily noticed and may force the deployment team of the online service to shut down the tracking system.
  2. Application-level transparency: For application programmers, there is no need to know that there is a tracking system. If a tracking system wants to be effective, it must rely on the active cooperation of the application developers. Then this tracking system is too fragile. Application problems are often caused by bugs or negligence in the code implanted by the tracking system in the application. This is why it cannot Meet the need for "ubiquitous deployment" of tracking systems. This is especially important in a fast-paced development environment.
  3. Scalability: Google should be able to fully control the size of its services and clusters, and its monitoring system, at least in the next few years.

2. Dapper’s distributed tracing

Please add image description

Figure 1: This path is initiated by the user's X request and traverses a simple service system. Nodes identified by letters represent different processes in the distributed system.

To associate all logging entries with a given initiator and log all information, there are two solutions, black-box and annotation-based monitoring solutions.

The black box scheme assumes that there is no additional information that needs to be tracked except the above information, and uses statistical regression techniques to infer the relationship between the two.

Annotation-based solutions rely on the application or middleware to explicitly tag a global ID to connect each record to the originator of the request.

Although black-box schemes are more portable than annotation schemes, they require more data to obtain sufficient accuracy because they rely on statistical inference.

The main disadvantage of annotation-based solutions is the need for code implantation. In Google's production environment, because all applications use the same threading model, control flow, and RPC system, code implantation can be limited to a small common component library, thus achieving application-to-development monitoring systems. Personnel is effectively transparent.

Dapper's tracing architecture is like a tree structure embedded in RPC calls. But the core data model is not limited to specific RPC frameworks, but can also track other behaviors, such as SMTP sessions, HTTP requests, and external queries to SQL servers.

The Dapper tracing model uses trace, span, and annotation.

2.1 Trace and span

The meaning of Trace is relatively intuitive, it is a link, which refers to the path of a request through all services, which can be represented by the following tree-like graphic.

https://bigbully.github.io/Dapper-translation/images/img2.png

Figure 2: Five spans track the short-term relationships between tree species in Dapper

Dapper records the span name, as well as the ID and parent ID of each span to reconstruct the relationship between different spans during a trace. If a span has no parent ID it is called a root span. All spans are hung on a specific trace and share a traceid. All these IDs are identified by globally unique 64-bit integers (snowflake algorithm). In a typical Dapper trace, each RPC corresponds to a span.

A span represents a call across services. By calculating the time difference between the start time and the end time, the time difference generated by the calling process on this span can be clarified.

To put it simply, the links are called in series through trace, and the link calling relationship is reflected through span.

https://bigbully.github.io/Dapper-translation/images/img3.png

Figure 3: Detail of an individual span shown in Figure 2

If application developers choose to add their own annotations (for example, business data) to the trace, this information will be logged along with other span information.

2.2 Instrumentation points

Dapper is based on a small number of common component libraries and can be tracked at almost zero intrusive cost. as follows:

  • When a thread is processing a trace control path, Dapper stores the context of this trace in ThreadLocal. The trace context is a small and easily replicated container containing traceid and spanid.
  • When computations are lazy or asynchronous, most Google developers use a common control flow library to call back via a thread pool or other executor. Dapper ensures that all such callbacks can store the context of this trace, and when the callback function is triggered, the context of this trace will be associated with the appropriate thread. In this way, Dapper can use trace ID and span ID to assist in building the path of asynchronous calls.
  • Almost all of Google's inter-process communication is built on an RPC framework developed in C++ and Java. We built tracing into the framework to define all spans in RPC. The span ID and tracking ID are sent from the client to the server. RPC-based systems like that are widely used at Google, which is an important implantation point. When those non-RPC communication frameworks mature and find their own user groups, we will plan to embed the RPC communication framework.

2.3 Sampling

Thinking: With so many requests, will the loss of all collections be too high?

Being low-impact was a key design goal for Dapper, because if the tool has unproven value but has an impact on performance, you can understand why service operators would be reluctant to deploy it.

Certain types of web services are indeed very sensitive to the performance penalty caused by implantation. Therefore, in addition to limiting the performance loss of basic components of Dapper's collection work to as small as possible, there is also a need to further control the loss, that is, only record a small part of a large number of requests when encountering them.

The consumption of Dapper for any given process is proportional to the sampling rate of the trace per process unit time. The first production version of Dapper used a uniform sampling rate of 1/1024 across all processes within Google. This simple solution is very useful for high-throughput online services.

New Dapper users often feel that low sampling rates (often as low as 0.01% on high-throughput services) will be detrimental to their analysis. For high-throughput services, if a significant operation occurs once in the system, it occurs thousands of times. Low-throughput services (perhaps dozens of requests per second rather than hundreds of thousands) can afford to track every request, which is what motivates the use of adaptive sampling rates.

However, lower sampling rates and lower transmission loads may result in missing important events, while using higher sampling rates requires an acceptable performance loss. The solution for such a system is to override the default sampling rate, which requires manual intervention, which is what we try to avoid in dapper. In the process of deploying variable sampling, when parameterizing the sampling rate, instead of using a unified sampling plan, a sampling expectation rate is used to identify the tracking of sampling per unit time. In this way, low flow and low load automatically increase the sampling rate, while in the case of high flow and high load, the sampling rate will be reduced to keep the loss under control. The actual sampling rate used is recorded with the trace itself, which facilitates accurate analysis from Dapper's trace data.

https://bigbully.github.io/Dapper-translation/images/table2.png

2.5 Trace collection

https://bigbully.github.io/Dapper-translation/images/img5.png

Figure 4: Overview of Dapper’s collection pipeline

Dapper's trace recording and collection pipeline is a three-stage process.

First, span data is written to (1) the local log file.

Dapper's daemon and collection components then pull this data from the production host (2).

Finally written to (3) Dapper's Bigtable warehouse. A trace is designed as a row in Bigtable, and each column is equivalent to a span.

Dapper uses BigTable data warehouse, and commonly used storages include ElasticSearch, HBase, In-memory DB, etc.

3. Go’s intrusive session tracking implementation ideas

The basic idea of ​​OpenTelemetry's go sdk to implement call chain interception is: based on the idea of ​​AOP, the decorator mode is adopted, and the core interface or component of the target package (such as net/http) is replaced by packaging to add Span related logic before and after the core call process.

3.1 HttpServer Handler Span generation process

The core of httpserver is the http.Handler interface. Therefore, you can be responsible for the generation and propagation of Span by implementing an interceptor for the http.Handler interface.

package http

type Handler interface {
    
    
    ServeHTTP(ResponseWriter, *Request)
}

http.ListenAndServe(":8090", http.DefaultServeMux)
import (
  "net/http"
  "go.opentelemetry.io/otel"
  "go.opentelemetry.io/otel/sdk/trace"
  "go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp"
)

wrappedHttpHandler := otelhttp.NewHandler(http.DefaultServeMux, ...)
http.ListenAndServe(":8090", wrappedHttpHandler)
func ServeHTTP(ResponseWriter, *Request) {
    
    
  ctx := tracer.Extract(r.ctx, r.Header)
	ctx, span := tracer.Start(ctx, genOperation(r))
	r.WithContext(ctx)
	// original ServeHTTP
	span.End()
}

WrppedHttpHandler will mainly implement the following logic (pseudocode):

  1. ctx := tracer.Extract(r.ctx, r.Header): Extract TraceId and SpanId from the request header, then construct the SpanContext object, and finally store it in ctx;
  2. ctx, span := tracer.Start(ctx, genOperation(r)): Generate a Span that tracks the current request processing process (i.e., the Span1 mentioned above), and record the start time. At this time, the SpanContext will be read from ctx, SpanContext.TraceId will be used as the TraceId of the current Span, and SpanContext.SpanId will be used as the current Span. ParentSpanId, and then write itself as a new SpanContext into the returned ctx;
  3. r.WithContext(ctx): Add the newly generated SpanContext to the context of request r, so that the intercepted handler can get the SpanId of Span1 from r.ctx as its ParentSpanId attribute during processing, thereby establishing the parent-child relationship between Spans;
  4. span.End(): Record the time when the processing is completed, and then send it to the exporter to report to the server

3.2 HttpClient request Span generation process

The key operation for httpclient to send requests is the http.RoundTriper interface, which wraps the RoundTripper originally used by httpclient to generate span.

package http

type RoundTripper interface {
    
    
  RoundTrip(*Request) (*Response, error)
}
import (
  "net/http"
  "go.opentelemetry.io/otel"
  "go.opentelemetry.io/otel/sdk/trace"
  "go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp"
)

wrappedTransport := otelhttp.NewTransport(http.DefaultTransport)
client := http.Client{
    
    Transport: wrappedTransport}
func RoundTrip ( r *http.Request)  (*Response, error){
    
    
	ctx, span := tracer.Start(r.Context(), url)
	tracer.Inject(ctx, r.Header)
  // original RoundTrip
  span.End()
}

As shown in the figure, wrappedTransport will mainly complete the following tasks (pseudocode):

  1. req, _ := http.NewRequestWithContext(r.ctx, “GET”,url, nil): Here we pass the ctx of the http.Handler request in the previous step to the request to be issued by httpclient, so that we can extract the Span1 information from request.Context() later to establish the association between spans;
  2. ctx, span := tracer.Start(r.Context(), url): After executing client.Do(), you will first enter the WrappedTransport.RoundTrip() method, where a new Span (Span2) will be generated and the time-consuming situation of the httpclient request will be recorded. As before, the Start method will start from r.Context( ) and use its spanid as the ParentSpanId of the current Span (Span2), thereby establishing a nested relationship between spans. At the same time, the SpanContext saved in the returned ctx will be the newly generated Span (Span2). information;
  3. tracer.Inject(ctx, r.Header): The purpose of this step is to write the TraceId and SpanId information in the current SpanContext to r.Header so that it can be sent to serverB with the http request, and then associated with the current Span in serverB;
  4. span.End(): Wait for the httpclient request to be sent to serverB and receive the response, mark the end of the current Span tracking, set the EndTime and submit it to the exporter to report to the server.

4. Transparent transmission of parameters

Data dissemination is divided into two categories according to scenarios: intra-process dissemination and cross-process dissemination (Cross-Process-Tracing).

In-process propagation means that Trace is passed within a service and monitors the mutual calls within the service. It is quite simple. The most difficult part of a tracking system is keeping tracking working in a distributed application environment. Any tracing system needs to understand the causal relationships between multiple cross-process calls, whether they are through RPC frameworks, publish-subscribe mechanisms, general message queues, HTTP request calls, UDP transports, or other transport modes. Therefore, when the industry talks about Tracing technology, it often refers to distributed link tracing across processes (Distrubute Tracing).

There is a core concept Carrier in Trace delivery. It passes the Span information in Trace. In the HTTP call scenario, there will be HttpCarrier, and in the RPC call scenario, there will be RpcCarrier to carry the SpanContext. Trace can "transport" link tracking status from one process to another through Carrier.

Link data requires data to be serialized and deserialized for network transmission. This process Trace is implemented through a Formatter interface responsible for data serialization and deserialization context. For example, when using HttpCarrier, there is usually a corresponding HttpFormatter. So the Inject injection is delegated to the Formatter to serialize the SpanContext and write it to the Carrier.

[External link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-OCScq7qC-1681700005519)(https://s3-us-west-2.amazonaws.com/secure.notion -static.com/6bd3cb35-42c2-418b-97d6-f4254caae089/Untitled.png)]

The W3C organization specifically defines the Trace standard in Headers to support link tracing for HTTP:

Trace Context

5. Performance loss

The cost of a tracking system consists of two components:

  1. The consumption of generating traces and collecting trace data on the system being monitored leads to system performance degradation.
  2. Some resources are required to store and analyze tracking data.

5.1 Loss of generating trace

The overhead of generating traces is the most critical part of Dapper's performance impact, as collection and analysis can be more easily shut down in an emergency. The most important trace generation consumption in the Dapper runtime is to create and destroy spans and annotations, and record them to the local disk for subsequent collection.

The creation and destruction of the root span takes an average of 204 nanoseconds, while the same operation on other spans takes 176 nanoseconds. The main difference in time lies in the need to assign a globally unique ID to this trace on span.

Writing to local disk is the most expensive operation during a Dapper runtime, but their visible cost is greatly reduced because both writing to the log file and operations are asynchronous with respect to the application system being tracked.

5.2 Consumption of collecting traces

Reading out trace data can also interfere with the load being monitored.

The following table shows the worst-case CPU usage of Dapper's log-collecting daemon when tested under a higher-than-realistic load benchmark. In production, during trace data processing, the daemon never exceeded 0.3% single-core CPU usage and had only minimal memory usage (and heap fragmentation noise).

It also limits the Dapper daemon to the lowest priority of the kernel scheduler to prevent cpu contention on a heavily loaded server.

Dapper is also a lightweight consumer of bandwidth resources. Each span transfer only takes up an average of 426 bytes. As a very small part of network behavior, Dapper's data collection only occupies 0.01% of network resources in Google's production environment.

https://bigbully.github.io/Dapper-translation/images/table1.png

6. Shortcomings

The main function of distributed session tracing is to analyze the calling behavior of distributed systems , and it cannot be well applied to tracing business logic. The figure below is a tracking case of an audit business scenario. The business system provides external audit capabilities. The audit of the object to be audited needs to go through two links: "preliminary review" and "reexamination" (the two links are associated with the same taskId), so the entire review process is The execution calls the audit interface twice. As shown on the left side of the figure, the complete audit scenario involves the execution of many "business logics", while distributed session tracking only generates the two call links on the right based on two RPC calls, and there is no way to accurately describe the audit scenario. Problems in the execution of business logic are mainly reflected in the following aspects:

https://p0.meituan.net/travelcube/8da3f1e203894e78788d6ca25055b70a348329.jpg

(1) Unable to track multiple calling links at the same time

Distributed session tracking only supports call tracing for a single request. When the business scenario contains multiple calls, multiple call links will be generated; because the call links are connected in series through traceId, different links are independent of each other, so a complete Business tracking adds to the difficulty.

For example, when troubleshooting business issues in an audit scenario, since the initial review and review are different RPC requests, two call links cannot be directly obtained at the same time, and the mapping relationship of two additional traceIds usually needs to be stored.

(2) Unable to accurately describe the full picture of business logic

The call link generated by distributed session tracking only contains the actual call situation of a single request. Some unexecuted calls and local logic cannot be reflected in the link, resulting in an inability to accurately describe the full picture of the business logic.

For example, it is the same audit interface. The initial review link 1 includes the call to service b, but the review link 2 does not. This is because there is "judgment logic" in the review scenario, and this logic cannot be reflected in the call link. Still need to manually combine the code for analysis.

(3) Unable to focus on the logical execution of the current business system

Distributed session tracking covers all services, components, machines, etc. that a single request flows through. It not only includes the current business system, but also involves many downstream services. When the internal logic of the interface is complex, the depth and complexity of the call link will be affected. Significant increase, and business tracking only needs to focus on the logical execution of the current business system.

For example, the call link generated by the audit scenario involves the internal calls of many downstream services, which increases the complexity of troubleshooting the current business system.

Business-focused distributed link tracking system

1. Background

1.1 Business systems are becoming increasingly complex

With the rapid development of Internet products, the business environment and user needs are constantly changing, resulting in numerous and complex business requirements. Business systems need to support an increasingly wide range of business scenarios and cover more and more business logic, so the complexity of the system is also increasing rapidly. At the same time, the evolution of microservice architecture has also resulted in the implementation of business logic often requiring the cooperation between multiple services. In short, the increasing complexity of business systems has become a norm.

1.2 Challenges faced by business tracking

Business systems often face a variety of daily customer complaints and unexpected problems, and "business tracking" has become a key response method. Business tracking can be regarded as the on-site restoration process of a business execution. The original scene is restored through various records during execution, which can be used to analyze the execution of business logic and locate problems. It is an important part of the entire system construction.

As business logic becomes increasingly complex, the above solutions are increasingly unsuitable for current business systems.

The traditional ELK solution is a kind of lagging business tracking, which requires collecting and filtering the required logs from a large number of discrete logs afterwards, and manually conducting serial analysis of the logs. The process is inevitably time-consuming and labor-intensive.

The distributed session tracking solution completes the dynamic concatenation of links in real time while calls are being executed. However, because it is at the session level and only focuses on issues such as call relationships, it cannot be well applied to business tracking.

Realize an efficient solution focusing on business logic tracking, efficiently organize and connect business execution logs using business links as a carrier, and support restoration and visual viewing of business execution sites, thereby improving the efficiency of locating problems, that is Road log tracking .

2. Visualized full-link log tracking

2.1 Design ideas

Visual full-link log tracking is considered in the pre-stage, that is, efficient organization and dynamic concatenation of business logs while executing the business, as shown in the figure below. At this time, the discrete log data will be organized according to business logic and the execution site will be drawn. , thus enabling efficient business tracking.

https://p1.meituan.net/travelcube/d5ef753204998aab83fa13d68f5f83b2169981.jpg

The new solution needs to answer two key questions: how to organize business logs efficiently, and how to dynamically concatenate business logs. The following will answer each question one by one.

Question 1: How to organize business logs efficiently?

In order to achieve efficient business tracking, it is first necessary to accurately and completely describe the business logic to form a panorama of the business logic. Business tracking is actually to restore the business execution site in the panorama through the log data during execution.

The new solution abstracts business logic and defines business logic links. The following takes the "audit business scenario" as an example to illustrate the abstraction process of business logic links:

  • Logical node : The many logics of the business system can be split according to business functions to form independent business logic units, that is, logical nodes , which can be local methods (the "judgment logic" node in the figure below) or remote methods such as RPC. Call the method (the "Logical A" node in the figure below).
  • Logical links : The business system supports numerous business scenarios externally. Each business scenario corresponds to a complete business process, which can be abstracted into a logical link composed of logical nodes . The logical link in Figure 5 below is accurate and complete. Describe the "audit business scenario".

A business trace is the restoration of a certain execution situation of a logical link . The logical link completely and accurately describes the business logic panorama, and at the same time, as a carrier, it can realize efficient organization of business logs.

https://p1.meituan.net/travelcube/fe5e5df2b6b3bf340a259fda34b75b46156456.jpg

Question 2: How to dynamically concatenate business logs?

The log data when business logic is executed is originally stored discretely. What needs to be achieved at this time is to dynamically concatenate the logs of each logical node as the business logic is executed, and then restore the complete business logic execution site.

Since logical nodes often interact with each other and within logical nodes through MQ or RPC, the distributed parameter transparent transmission capability provided by distributed session tracking is used to realize the dynamic series of business logs:

  • By continuously transparently transmitting parameters in the execution thread and network communication, the identification of links and nodes can be transmitted without interruption while the business logic is being executed, and the dyeing of discrete logs can be achieved.
  • Based on the identification, the dyed discrete logs will be dynamically connected to the executing node, gradually converging into a complete logical link, and ultimately achieving efficient organization and visual display of the business execution site.

Different from the distributed session tracking solution, when multiple distributed calls are connected in series at the same time, the new solution needs to select a public ID as an identifier based on the business logic. For example, the audit scenario involves 2 RPC calls. In order to ensure that the 2 executions are connected in series to For the same logical link, combined with the audit business scenario, the same "task id" for the initial review and review is selected as the identifier to completely realize the logical link series connection of the review scenario and perform on-site restoration.

2.2 General solution

After clarifying the two basic issues of efficient organization and dynamic concatenation of logs, this article selects "Logical Link 1" in the business system in Figure 4 to describe the general solution in detail. The solution can be broken down into the following steps:

https://p0.meituan.net/travelcube/6e28103ad7cdb54832e5e7844b31a9dd34296.jpg

2.2.1 Link definition

The meaning of "link definition" is: using a specific language to statically describe a complete logical link . The link is usually composed of multiple logical nodes and is combined according to certain business rules . The business rules are the execution between each logical node. Relationships, including serial , parallel , and conditional branches .

DSL (Domain Specific Language) is a computer language specially designed to solve a certain type of tasks. It can define the combination relationship (business rules) of a series of nodes (logical nodes) through JSON or XML. Therefore, this solution chooses to use DSL to describe logical links and realize logical links from abstract definition to concrete implementation .

https://p1.meituan.net/travelcube/48936bcb952d2485e5daa948afc6999372963.jpg

Logical Link 1-DSL

  [
    {
      "nodeName": "A",
      "nodeType": "rpc"
    },
    {
      "nodeName": "Fork",
      "nodeType": "fork",
      "forkNodes": [
        [
          {
            "nodeName": "B",
            "nodeType": "rpc"
          }
        ],
        [
          {
            "nodeName": "C",
            "nodeType": "local"
          }
        ]
      ]
    },
    {
      "nodeName": "Join",
      "nodeType": "join",
      "joinOnList": [
        "B",
        "C"
      ]
    },
    {
      "nodeName": "D",
      "nodeType": "decision",
      "decisionCases": {
        "true": [
          {
            "nodeName": "E",
            "nodeType": "rpc"
          }
        ]
      },
      "defaultCase": [
        {
          "nodeName": "F",
          "nodeType": "rpc"
        }
      ]
    }
  ]

2.2.2 Link coloring

The meaning of "link coloring" is: during the link execution process, through transparent transmission of the serial identification, it is clear which link is executing and which node it reaches.

Link coloring consists of two steps:

  • Step 1: Determine the serial identity

    , when the logical link is opened, the unique identifier can be determined to clarify the links and nodes to be executed subsequently.

    • Link unique identification = business identification + scenario identification + execution identification (the three identifications jointly determine "an execution under a certain business scenario")
    • Business ID: Give the link business meaning, such as "user id", "activity id", etc.
    • Scenario identification: Give the link scenario meaning, for example, the current scenario is "logical link 1".
    • Execution ID: Gives the link execution meaning. For example, if it only involves a single call, you can directly select "traceId"; if it involves multiple calls, select the same "public id" for multiple calls based on the business logic.
    • Node unique identifier = link unique identifier + node name (the two identifiers jointly determine "a logical node in a certain execution under a certain business scenario")
    • Node name: The unique name of the node preset in DSL, such as "A".
  • Step 2: Pass the concatenation identifier

    , when the logical link is executed, the series identification is transparently transmitted in the distributed complete link, and the executed nodes in the link are dynamically connected to realize the dyeing of the link. For example in "Logical Link 1":

    • When the "A" node triggers execution, it begins to transmit the serial identification in subsequent links and nodes. As the business process is executed, the dyeing of the entire link is gradually completed.
    • When the identifier is passed to the "E" node, it means that the judgment result of the "D" conditional branch is "true", and the "E" node is dynamically connected to the executed link.

2.2.3 Link reporting

The meaning of "link reporting" is: during the link execution process, logs are reported in the form of link organization to achieve accurate preservation of the business site.

https://p0.meituan.net/travelcube/1f1aec7b6a8ad083ace6d79f8a18d68587621.jpg

As shown in Figure 8 above, the reported log data includes: node logs and business logs . The function of the node log is to draw the executed nodes in the link and record the start, end, input and output of the node; the function of the business log is to display the execution of the specific business logic of the link node and record any changes to the business logic. The data used for interpretation includes input and output parameters that interact with upstream and downstream, intermediate variables of complex logic, and exceptions thrown by logic execution.

2.2.4 Link storage

The meaning of "link storage" is to store the logs reported during link execution and use them for subsequent "on-site restoration". Reported logs can be divided into three categories: link logs, node logs and business logs:

  • Link log : In a single execution of the link, the basic information of the link extracted from the logs of the start node and the end node, including link type, link meta information, link start/end time, etc.
  • Node log : In a single execution of the link, the basic information of the executed node includes node name, node status, node start/end time, etc.
  • Business log : During a single execution of the link, the business log information in the executed node includes log level, log time, log data, etc.

The figure below is the storage model of link storage, which includes link logs, node logs, business logs, link metadata (configuration data), and is a tree structure as shown in the figure below, in which the business identifier is used as the root node, with for subsequent link queries.

https://p0.meituan.net/travelcube/de1585811ef98880d0daf8d674f9e0ce102979.jpg

3. Dianping content platform practice

3.1 Support reporting and storage of large data volume logs

It supports unified log collection, processing and storage for many services, and can well support log tracking construction under large data volumes.

https://p0.meituan.net/travelcube/5766c1ee95a334de30cb8af12f495643120017.jpg

Log collection : Each application service collects asynchronously reported log data through the log_agent deployed on the machine, and uniformly transmits it to the Kafka channel. In addition, for a small number of services that do not support log_agent, a transfer application as shown in the figure is built.

Log analysis : The collected logs are connected to Flink through Kafka, and are parsed and processed uniformly. The logs are classified and aggregated according to the log type, and parsed into link logs, node logs, and business logs.

Log storage : After completing the log parsing, the logs will be stored according to the tree-like storage model. Based on the storage demand analysis and the characteristics of each storage option, the Dianping content platform finally selected HBase as the storage option.

demand analysis Selection advantages
OLTP business: The real-time read and write data volume of the logical link is very large: a huge number of records, and will continue to grow in the future. Write-intensive, less reading: log reporting peak QPS is high. The business scenario is simple: simple reading and writing can meet the needs. Storage features: supports horizontal expansion and rapid expansion. Field query features: supports precise and prefix matching queries, and supports fast and random access. Economic cost: low storage media cost.

Overall, the log reporting and storage architecture of log_agent + Kafka + Flink + HBase can well support complex business systems, naturally supports log reporting for many applications in distributed scenarios, and is also suitable for high-traffic data writing.

3.2. Achieve low-cost transformation of many back-end services

The Dianping content platform has implemented a "custom log toolkit" (i.e. the TraceLogger toolkit in Figure 13 below ), which shields reporting details in link tracking and minimizes the cost of many service transformations. Features of the TraceLogger toolkit include:

  • Imitate slf4j-api : The toolkit is implemented on top of the slf4j framework and imitates slf4j-api to provide the same API to the outside world, so there is no learning cost for users.
  • Shield internal details, internally encapsulate a series of link log reporting logic, shield coloring and other details, and reduce the development cost of users.
    • Report judgment :
      • Determine the link identification: When there is no identification, complete log reporting is performed to prevent log loss.
      • Determine the reporting method: When there is a mark, two reporting methods, log and RPC transfer, are supported.
    • Log assembly : Implement functions such as parameter occupancy and exception stack output, and assemble relevant data into Trace objects to facilitate unified collection and processing.
    • Exception reporting : Actively report exceptions through ErrorAPI, compatible with ErrorAppender in original log reporting.
    • Log reporting : Adapt the Log4j2 logging framework to achieve final log reporting.

https://p0.meituan.net/travelcube/3b651ca2e4b1c2d4c8931b28f6465a5d133236.jpg

The following is a use case for the TraceLogger toolkit to report business logs and node logs respectively . The overall transformation cost is low.

Business log reporting : no learning costs and basically no transformation costs.

  // 替换前:原日志上报
  LOGGER.error("updatestructfailed, param:{}", GsonUtils.toJson(structRequest), e);
  // 替换后:全链路日志上报
  TraceLogger.error("updatestructfailed, param:{}", GsonUtils.toJson(structRequest), e);

Node log reporting : supports API and AOP reporting methods, flexible and low cost.

public ResponserealTimeInputLink(long contentId) {
// 链路开始:传递串联标识(业务标识 + 场景标识 + 执行标识)
    TraceUtils.passLinkMark("contentId_type_uuid");
// ...// 本地调用(API上报节点日志)
    TraceUtils.reportNode("contentStore", contentId, StatusEnums.RUNNING)
    contentStore(contentId);
    TraceUtils.reportNode("contentStore", structResp, StatusEnums.COMPLETED)
// ...// 远程调用
    Response processResp = picProcess(contentId);
// ...
  }
// AOP上报节点日志@TraceNode(nodeName="picProcess")
public ResponsepicProcess(long contentId) {
// 图片处理业务逻辑// 业务日志数据上报
    TraceLogger.warn("picProcess failed, contentId:{}", contentId);
  }

3.2 Results

Based on the above practices, the Dianping content platform has implemented visual full-link log tracking, which can track the execution of all business scenarios of any piece of content with one click, and restore the execution site through visual links. The tracking effect is shown in the figure below:

[Link query function] : Query all logical link executions of the content in real time based on the content ID, covering all business scenarios.

https://p1.meituan.net/travelcube/30f2e1cc5f51fd03699e077a5a6b534e260325.jpg

[Link display function] : Visually display the panoramic view of business logic through the link diagram, and simultaneously display the execution status of each node.

https://p1.meituan.net/travelcube/544ccd8f71df3d1252c88d5c030d9498245769.jpg

[Node details query function] : Supports displaying the details of any executed node, including node input, output, and key business logs during node execution.

https://p0.meituan.net/travelcube/2f9639ac74c0f41e501793f21a36688d312018.jpg

At present, the visual full-link log tracking system has become a "troubleshooting tool" for the Dianping content platform. We can reduce the troubleshooting time from hours to less than 5 minutes. It is also a "testing auxiliary tool" that uses visual log concatenation. and display, which significantly improves the efficiency of RD self-test and QA testing. Finally, let’s summarize the advantages of visual full-link log tracking:

  • Low access cost : DSL configuration and simple log reporting modifications enable quick access.
  • Wide tracking range : All logical links of any piece of content can be tracked.
  • High efficiency of use : The management background supports visual query display of links and logs, which is simple and fast.

Guess you like

Origin blog.csdn.net/weixin_45177370/article/details/130195315