golang pprof monitoring series (1) - go trace statistics principle and use

Regarding the use of go tool trace, there are quite a lot of information on the Internet, but taking my previous experience of learning golang as an example, a lot of information does not explain which methods the relevant indicators in go tool trace are counted, and which intervals are counted. clear. So this article will not only introduce the use of go tool trace, but also analyze the principles of its statistics.

golang version go1.17.12

Let me briefly talk about the usage scenarios of go tool trace. When analyzing delay problems, go tool trace can play a very important role, because it will record various delay events and analyze their duration, even key code locations can find out.

Regarding the actual combat case of trace, there was a video ( a case of system delay analysis ) before, welcome to browse

Next, let's take a brief look at how to use the trace function in golang.

go trace uses

package main

import (
	_ "net/http/pprof"
	"os"
	"runtime/trace"
)

func main() {
	f, _ := os.Create("trace.out")
	defer f.Close()
	trace.Start(f)
	defer trace.Stop()
  ......
}

It is actually relatively easy to use the function of trace. Use trace.Start to enable the event sampling function of trace, and trace.Stop to stop using it. The sampled data will be recorded in the output stream parameter sent by trace.Start. Here I will use a A file named trace.out is passed in as the output stream parameter.

After sampling, you can use the go tool trace trace.out command to analyze the sampled file.

The go tool trace command will use a local random port to start an http service by default, and the page is displayed as follows:

Then I will analyze the statistical information corresponding to each link and the statistical principles behind it. Okay, next, the real drama begins.

Introduction to Statistical Principles

Usually, when using prometheus to monitor application services, we mainly use the method of burying points. Similarly, go runtime also uses this method to bury various events during the code running process, and finally read the sorted buried points. Click the data to form the trace monitoring graph we see on the web page.

// src/runtime/trace.go:517
func traceEvent(ev byte, skip int, args ...uint64) {
	mp, pid, bufp := traceAcquireBuffer()
    .....
}

Every time the corresponding event is to be recorded, the traceEvent method will be called, ev represents the event enumeration, skip is the number of layers that need to be jumped to see the stack frame, and args is passed in when certain events need to pass in specific parameters.

Inside the traceEvent, the thread information, coroutine information, and p running queue information when the current event occurs will also be obtained, and these information will be recorded together with the event in a buffer.

// src/runtime/trace/trace.go:120 
func Start(w io.Writer) error {
	tracing.Lock()
	defer tracing.Unlock()
	if err := runtime.StartTrace(); err != nil {
		return err
	}
	go func() {
		for {
			data := runtime.ReadTrace()
			if data == nil {
				break
			}
			w.Write(data)
		}
	}()
	atomic.StoreInt32(&tracing.enabled, 1)
	return nil
}

When we start the trace.Start method, a coroutine will be started to continuously read the contents of the buffer into the parameter of strace.Start at the same time.

In the sample code, the trace.Start method passes in the output stream parameter named trace.out file, so during the sampling process, the runtime will output the collected event byte stream to the trace.out file, and the trace.out file is in When read out, a structure called Event is used to represent these monitoring events.

// Event describes one event in the trace.
type Event struct {
	Off   int       // offset in input file (for debugging and error reporting)
	Type  byte      // one of Ev*
	seq   int64     // sequence number
	Ts    int64     // timestamp in nanoseconds
	P     int       // P on which the event happened (can be one of TimerP, NetpollP, SyscallP)
	G     uint64    // G on which the event happened
	StkID uint64    // unique stack ID
	Stk   []*Frame  // stack trace (can be empty)
	Args  [3]uint64 // event-type-specific arguments
	SArgs []string  // event-type-specific string args
	// linked event (can be nil), depends on event type:
	// for GCStart: the GCStop
	// for GCSTWStart: the GCSTWDone
	// for GCSweepStart: the GCSweepDone
	// for GoCreate: first GoStart of the created goroutine
	// for GoStart/GoStartLabel: the associated GoEnd, GoBlock or other blocking event
	// for GoSched/GoPreempt: the next GoStart
	// for GoBlock and other blocking events: the unblock event
	// for GoUnblock: the associated GoStart
	// for blocking GoSysCall: the associated GoSysExit
	// for GoSysExit: the next GoStart
	// for GCMarkAssistStart: the associated GCMarkAssistDone
	// for UserTaskCreate: the UserTaskEnd
	// for UserRegion: if the start region, the corresponding UserRegion end event
	Link *Event
}

Let’s take a look at the information contained in the Event event:
P is the running queue, when go runs the coroutine, it schedules the coroutine to run on P, G is the coroutine id, and the stack id StkID, stack frame Stk, and Some parameters Args and SArgs that can be carried when the event occurs.
Type is the enumeration field of the event, Ts is the timestamp information of the event occurrence, Link is other events associated with the event, and is used to calculate the time-consuming of the associated event.

Taking the calculation of system call time consumption as an example, the system call-related events include GoSysExit and GoSysCall, which are the system call exit event and system call start event respectively, so GoSysExit.Ts - GoSysCall.Ts is the time consumption of the system call.

Special note: The enumeration of monitoring events used inside the runtime is at src/runtime/trace.go:39, while the enumeration used for monitoring events in the read file is at src/internal/trace/parser.go: 1028, although there are two sets, but the value is the same.

Obviously, the Link field is not set when the runtime records the monitoring event, but is set after the event information is grouped according to the coroutine id when the monitoring event in trace.out is read. GoSysExit.Ts - GoSysCall.Ts of the same coroutine is the system call time of the coroutine.

Next, let's analyze the statistical information of the trace page and the statistical principles behind it one by one.

View trace is a monitoring graph composed of the timeline of each event . In the production environment, a large number of events will be generated in 1 second. I think it will still make people dazzled by looking at this graph directly. So let's skip it and start analyzing from Goroutine analysis.

Goroutine analysis

The code location finally referenced by go tool trace is under the go/src/cmd/trace package. The main function will start an http service and register some processing functions. When we click Goroutine analysis, we actually request one of the processing functions.
The following piece of code is the processing function of the registered goroutine. Click Goroutine analysis and it will be mapped to the /goroutines route.

// src/cmd/trace/goroutines.go:22
func init() {
	http.HandleFunc("/goroutines", httpGoroutines)
	http.HandleFunc("/goroutine", httpGoroutine)
}

Let's click on Goroutine analysis


It enters a list showing the calling position of the code. Each code position in the list is the position when the event EvGoStart coroutine starts running, where N means that there are N coroutines running at this position during the adopting period.

You may be wondering, how do you determine that these 10 coroutines are executed by the same piece of code? When the runtime records the event EvGoStart that the coroutine starts to execute, it will also record the stack frame, and the stack frame contains the function name and the value of the program counter (PC). In the Goroutine analysis page, the coroutine is the value of the PC grouped. Below is the code snippet for PC assignment.

// src/internal/trace/goroutines.go:176
case EvGoStart, EvGoStartLabel:
			g := gs[ev.G]
			if g.PC == 0 {
				g.PC = ev.Stk[0].PC
				g.Name = ev.Stk[0].Fn
			}

Let's click on the first line of link nfw/nfw_base/fw/routine.(*Worker).TryRun.func1 N=10, click on the link of the first line here will be mapped to the route of /goroutine (note that the route does not end with s ), processed by its handler function. Click as shown in the figure:

What you see now is the statistics of system calls, blocking, scheduling delay, and gc for these 10 coroutines.

Then we analyze one by one from top to bottom:
Execution Time refers to the ratio of the execution time of 10 coroutines to the execution of all coroutines .
Then there are pictures for analyzing network waiting time, lock block time, system call blocking time, and scheduling waiting time. These are sharp tools for analyzing system delay and blocking problems. I won't analyze the diagram here anymore, I believe there will be a lot of such information on the Internet.

Then look at the indicators in the table below:

Execution

is the execution time of the coroutine during the sampling period.

The recording method is also very simple. When reading the event event, it is read from front to back according to the timeline. Every time the time when the coroutine starts to execute is read, it will record the timestamp of the coroutine’s start execution (time Stamp is included in the Event structure), when the pause event of the coroutine is read, it will subtract the timestamp of the start of execution from the timestamp of the pause time to get a short period of execution time, and add this short period of time to the coroutine the total execution time of the program.

Pause is caused by lock blocks, system call blocking, network waiting, and preemptive scheduling.

Network wait

As the name implies, the network waiting time is actually a recording method similar to Execution. First, record the time stamp of the coroutine waiting on the network. Since the event is read according to the timeline, when the unblock event is read, go to the coroutine. Whether the last time the process was a network waiting event, if so, subtract the time stamp of the network waiting time from the current time stamp, and add this short period of time to the total network waiting time of the coroutine.

Sync block,Blocking syscall,Scheduler wait

The calculation methods of these three durations are similar to the previous two, but note that the trigger conditions of the events associated with them are different.

The duration of the Sync block is due to the lock sync.mutex, channel channel, wait group, and blockage caused by the select case statement will be recorded here. Below is the relevant code snippet.

// src/internal/trace/goroutines.go:192
case EvGoBlockSend, EvGoBlockRecv, EvGoBlockSelect,
			EvGoBlockSync, EvGoBlockCond:
			g := gs[ev.G]
			g.ExecTime += ev.Ts - g.lastStartTime
			g.lastStartTime = 0
			g.blockSyncTime = ev.Ts

Blocking syscall is the blocking caused by the system call.

Scheduler wait is the time period from the executable state to the execution state of the coroutine. Note that when the coroutine is in the executable state, it will be placed in the p run queue to wait to be scheduled. Only after being scheduled, the code will actually start to execute. This involves the understanding of the golang gpm model, so it will not be expanded here.

The last two columns are a percentage of the total time occupied by GC, and the knowledge related to gc in golang will not continue to expand.

Various profile diagrams

Remember when you first analyzed the web pages generated by trace.out, what was under Goroutine analysis? It is a profile diagram related to various analysis delays. The source of the data is the same as the indicator for analyzing the waiting time of a single Goroutine when we talk about Goroutine analysis, but this is for all goroutines.

Network blocking profile (⬇)
Synchronization blocking profile (⬇)
Syscall blocking profile (⬇)
Scheduler latency profile (⬇)

The trace tool of golang also allows users to customize monitoring events. In the generated trace web page, User-defined tasks and User-defined regions are to record some user-defined monitoring events. This part of the application will be discussed later .

Minimum mutator utilization is a graph showing the impact of gc on the program. I will talk about this in detail when I have the opportunity to talk about gc in the future.

Regarding the actual combat case of trace, there was a video ( a case of system delay analysis ) before, welcome to browse.

Guess you like

Origin blog.csdn.net/sinat_40572875/article/details/129758256