A distributed service tracking system consists of three parts:
- data collection
- data storage
- data demonstration
Depending on the size of the system, each structural part has a certain variation. For example, for large-scale distributed systems, data storage, real-time data can be divided into two parts and the whole amount of data, real-time data for troubleshooting (Trouble Shooting), the full amount of data for system optimization; support platform-independent data collection in addition to and independent of development language data collection system, further comprising an asynchronous data collection (need to track message queue, ensure the continuity of the call), and ensuring less invasive; display data also involves data mining and analysis. Although each part can become very complex, but the basic principles are similar.
Tracking unit tracking service from the client initiates a request (request) was arrived at the border began tracking system, the tracking system to process until a response is returned (response) to the client, known as a the trace . Each trace in the number of service calls, record calls for what services, as well as time-consuming and so each call information at each service call, buried in a call record, called a span . In this way, a number of orderly span on the formation of a trace. In the process of the system providing service to the outside world, we will continue to have the request and response occurs, will continue to generate a trace, with a span of these trace record, can depict a service system topology. Comes the response time span, as well as the success of the request and other information, can be in the event of problems, to find unusual services; based on historical data, but also analyze where poor performance, optimize positioning performance goals from the overall system level .
ZipKin is a popular open source distributed tracking system, it mainly has the following four parts: