Read a text link tracking

Background

In an era of rampant micro-services, service oriented thinking is becoming the basic thinking programmers, however, since most projects just blindly increase the service and did not properly manage them, when the interface problem, it is difficult from complex the service calls the network to find the root of the problem, and thus missed a golden opportunity to stop-loss.

The track link appears precisely in order to solve this problem, it can locate problems in complex service calls, can also be added to the background after the new team, let clearly know that they are responsible for services in which the ring.

In addition, time-consuming if a sudden increase in the interface, but also do not have to individually serve time-consuming queries, we can visually analyze performance bottlenecks services to facilitate reasonably accurate in the case of expansion of traffic surges.

Link Trace

"Link Trace" term was proposed in 2010, when Google released a paper Dapper, Google introduced the realization of the principle of self-development of distributed link tracking, also it describes how they are transparent to the application at low cost of.

In fact, Dapper started just a stand-alone call tracking system link, then gradually evolved into a monitoring platform, and based monitoring platform gave birth to a lot of tools, such as real-time early warning, overload protection, index data query.

In addition to Google's dapper, there are other more well-known products, such as Ali Hawkeye, the public comment of CAT, Twitter's Zipkin, Naver (the famous social networking software LINE parent company) as well as pinpoint the domestic open source skywalking and so on.

The basic implementation principle

If you want to know one interface in which part of a problem, it must be clear what the interface call service, as well as the order of the call, if these services to string together, it looks like a chain, we call call chain.

Want to implement the call chain, it is necessary for each call to be a logo, and then identify the size of the press service arrangement, we can more clearly see calling sequence, we defer the identity named spanid.

The actual scenario, we need to know the circumstances of a particular request to call, so only spanid enough, it was to be a unique identification for each request, so as to find out all the service calls of this request based on the identity, and this identity we named traceid.

According spanid can now easily know invoked service order, but can not reflect the hierarchy calls, as the chart below, multiple services may be invoked chain step by step, you may call the same service simultaneously.

所以应该每次都记录下是谁调用的,我们用parentid作为这个标识的名字。

到现在,已经知道调用顺序和层级关系了,但是接口出现问题后,还是不能找到出问题的环节,如果某个服务有问题,那个被调用执行的服务一定耗时很长,要想计算出耗时,上述的三个标识还不够,还需要加上时间戳,时间戳可以更精细一点,精确到微秒级。

只记录发起调用时的时间戳还算不出耗时,要记录下服务返回时的时间戳,有始有终才能算出时间差,既然返回的也记了,就把上述的三个标识都记一下吧,不然区分不出是谁的时间戳。

虽然能计算出从服务调用到服务返回的总耗时,但是这个时间包含了服务的执行时间和网络延迟,有时候我们需要区分出这两类时间以方便做针对性优化。那如何计算网络延迟呢?我们可以把调用和返回的过程分为以下四个事件。

  • Client Sent简称cs,客户端发起调用请求到服务端。

  • Server Received简称sr,指服务端接收到了客户端的调用请求。

  • Server Sent简称ss,指服务端完成了处理,准备将信息返给客户端。

  • Client Received简称cr,指客户端接收到了服务端的返回信息。

假如在这四个事件发生时记录下时间戳,就可以轻松计算出耗时,比如sr减去cs就是调用时的网络延迟,ss减去sr就是服务执行时间,cr减去ss就是服务响应的延迟,cr减cs就是整个服务调用执行的时间。

其实span块内除了记录这几个参数之外,还可以记录一些其他信息,比如发起调用服务名称、被调服务名称、返回结果、IP、调用服务的名称等,最后,我们再把相同spanid的信息合成一个大的span块,就完成了一个完整的调用链。感兴趣的同学可以去深入了解一下链路追踪,希望本文对你有所帮助。

Guess you like

Origin www.cnblogs.com/enochzzg/p/10987438.html