Goroutine leak

Outline

In Go, goroutine very lightweight, easily create thousands of goroutine is not a problem, but be aware that if so many goroutine consistent increase, but not quit, do not release resources in trouble. This article describes the actual scene goroutine leaked, and discuss how to solve the problem.

Cause Analysis

The reason goroutine leak (leakage coroutines) may have the following:

  • goroutine because the read channel / write terminal exit and has been blocked, resulting in goroutine been occupied resources, can not quit
  • goroutine enter an infinite loop, resulting in the release of resources has been unable to

goroutine terminated scene

A goroutine terminate the following situations:

  • When a goroutine completed its work
  • Not processed due to an error occurred
  • There are other coroutine tell it to terminate

The actual goroutine leak

Producer Consumer scene

func main() {
	newRandStream := func() <-chan int {
		randStream := make(chan int)

		go func() {
			defer fmt.Println("newRandStream closure exited.")
			defer close(randStream)
			// 死循环:不断向channel中放数据,直到阻塞
			for {
				randStream <- rand.Int()
			}
		}()

		return randStream
	}

	randStream := newRandStream()
	fmt.Println("3 random ints:")

	// 只消耗3个数据,然后去做其他的事情,此时生产者阻塞,
	// 若主goroutine不处理生产者goroutine,则就产生了泄露
	for i := 1; i <= 3; i++ {
		fmt.Printf("%d: %d\n", i, <-randStream)
	}

	fmt.Fprintf(os.Stderr, "%d\n", runtime.NumGoroutine())
	time.Sleep(10e9)
	fmt.Fprintf(os.Stderr, "%d\n", runtime.NumGoroutine())
}

Production coroutine into the loop, the continuous generation of data, consumer association process, which is the main coroutine only consume three of these values, then it is no longer the main coroutine consumption in the data channel, to do other things. At this point the production coroutine put in a data channel, but has no coroutine consume the data, so the production coroutine blocked. At this time, if no one in the consumption data channel, generating a coroutine is leaked coroutine.
Solution
In general, to solve the problem of channel goroutine leak caused, mainly to see when goroutine channel blocking, blocking the goroutine is normal, or may cause coroutine never have a chance to perform. If the coroutine may lead to never have a chance to perform, it may lead to leakage of coroutines. So, when you create a coroutine it is necessary to take into account how the termination.

Solution to the general problem is that, when the end of the main thread, the thread to inform the production, post-production of the thread to be informed, to clean up the work: in or out, or do some work to clean up the environment.

func main() {
	newRandStream := func(done <-chan interface{}) <-chan int {
		randStream := make(chan int)

		go func() {
			defer fmt.Println("newRandStream closure exited.")
			defer close(randStream)

			for {
				select {
				case randStream <- rand.Int():
				case <-done:  // 得到通知,结束自己
					return
				}
			}
		}()

		return randStream
	}


	done := make(chan interface{})
	randStream := newRandStream(done)
	fmt.Println("3 random ints:")

	for i := 1; i <= 3; i++ {
		fmt.Printf("%d: %d\n", i, <-randStream)
	}

    // 通知子协程结束自己
    // done <- struct{}{}
	close(done)
	// Simulate ongoing work
	time.Sleep(1 * time.Second)
}

The above code coroutine to be notified by the end of a Channel, so that it can clean up the site. Coroutine prevent leakage. Coroutine notice the end of the way, sending an empty struct, more simple way is to directly close channel. As shown in FIG.

master work scene

In this scenario, we generally work is to be divided into several work, work to put each child each goroutine to complete. At this time, if not handled properly, it is likely to occur goroutine leakage. Let's look at a practical example:

// function to add an array of numbers.
func worker_adder(s []int, c chan int) {
	sum := 0
	for _, v := range s {
		sum += v
	}
	// writes the sum to the go routines.
	c <- sum // send sum to c
	fmt.Println("end")
}

func main() {
	s := []int{7, 2, 8, -9, 4, 0}

	c1 := make(chan int)
	c2 := make(chan int)

	// spin up a goroutine.
	go worker_adder(s[:len(s)/2], c1)
	// spin up a goroutine.
	go worker_adder(s[len(s)/2:], c2)

	//x, y := <-c1, <-c2 // receive from c1 aND C2
	x, _:= <-c1
	// 输出从channel获取到的值
	fmt.Println(x)

	fmt.Println(runtime.NumGoroutine())
	time.Sleep(10e9)
	fmt.Println(runtime.NumGoroutine())
}

The above code in the main coroutine, for an array divided into two parts, each worker to two coroutine to calculate its value, the two channel coroutine by the results returned to the main coroutine. However, in the above code, we only received a data channel, resulting in another coroutine blocks while writing channel, the opportunity never executed. If we put this code into a permanent service, the look is more obvious:

http server scenarios

// 把数组s中的数字加起来
func sumInt(s []int, c chan int) {
	sum := 0
	for _, v := range s {
		sum += v
	}
	c <- sum
}

// HTTP handler for /sum
func sumConcurrent2(w http.ResponseWriter, r *http.Request) {
	s := []int{7, 2, 8, -9, 4, 0}

	c1 := make(chan int)
	c2 := make(chan int)

	go sumInt(s[:len(s)/2], c1)
	go sumInt(s[len(s)/2:], c2)

	// 这里故意不在c2中读取数据,导致向c2写数据的协程阻塞。
	x := <-c1

	// write the response.
	fmt.Fprintf(w, strconv.Itoa(x))
}

func main() {
	StasticGroutine := func() {
		for {
			time.Sleep(1e9)
			total := runtime.NumGoroutine()
			fmt.Println(total)
		}
	}

	go StasticGroutine()

	http.HandleFunc("/sum", sumConcurrent2)
	err := http.ListenAndServe(":8001", nil)
	if err != nil {
		log.Fatal("ListenAndServe: ", err)
	}
}

If you run the above procedure and enter in your browser:

http://127.0.0.1:8001/sum

And constantly refresh the browser sends a request to continue, you can see the following output:

2
2
5
6
7
8
9
10

This output is the number of coroutine our http server, you can see: once per request, the Association has increased a number of passes, and will not be reduced. Description coroutine leak has occurred (goroutine leak).

Solution
a solution is, no matter in any case, we must have a coroutine can read and write channel, let coroutine not block. Code changes as follows:

...
	x,y := <-c1,<-c2

	// write the response.
	fmt.Fprintf(w, strconv.Itoa(x+y))
...

How to debug and find goroutine leak

runtime

You can get the number of background services coroutine by runtime.NumGoroutine () function. By looking at each coroutine and increase or decrease the number of changes, we can determine whether there is goroutine leak occurred.

...
	fmt.Fprintf(os.Stderr, "%d\n", runtime.NumGoroutine())
	time.Sleep(10e9) //等一会,查看协程数量的变化
	fmt.Fprintf(os.Stderr, "%d\n", runtime.NumGoroutine())
...

pprof to confirm local leaked

Once we found goroutein leak, we need to confirm the source of the leak.

import (
  "runtime/debug"
  "runtime/pprof"
)

func getStackTraceHandler(w http.ResponseWriter, r *http.Request) {
    stack := debug.Stack()
    w.Write(stack)
    pprof.Lookup("goroutine").WriteTo(w, 2)
}
func main() {
    http.HandleFunc("/_stack", getStackTraceHandler)
}

to sum up

goroutine leak occurs often due to coroutines blocked on channel, or coroutines into the loop, especially in some of the background of resident services. When using the channel and goroutine to note:

  • And finding how good the goroutine the end when you create goroutine
  • When using the channel, to be taken into account when channel blocking coroutine likely behavior
  • Usually pay attention to some common goroutine leak scenarios, including: master-worker model, producer-consumer model, and so on.

reference

  • 《Concurrency in Go》
  • https://blog.golang.org/pipelines
  • https://blog.golang.org/context
  • https://www.openmymind.net/Leaking-Goroutines/
  • https://blog.minio.io/debugging-go-routine-leaks-a1220142d32c
Published 158 original articles · won praise 119 · views 810 000 +

Guess you like

Origin blog.csdn.net/u013474436/article/details/104630559