[Translation] Go: understanding Sync.Pool design

Original Address: medium.com/@blanchon.v...

Original author: Vincent Blanchon

Address Translation: github.com/watermelo/d...

Translator: khaki khaki

Level translator limited, if translated or understood the fallacy, pointed please help

ℹ️ Based Go 1.12 and version 1.13, and explains the evolution of sync / pool.go between the two versions.

syncPackage provides a robust and reusable instance pool to reduce the pressure GC. Before using this package, we need to use the pool before the benchmark for application and after. This is very important, because if do not understand its inner workings, may affect performance.

Pool restrictions

Let's look at an example of how to allocate it in a very simple to understand the context of 10k times:

type Small struct {
   a int
}

var pool = sync.Pool{
   New: func() interface{} { return new(Small) },
}

//go:noinline
func inc(s *Small) { s.a++ }

func BenchmarkWithoutPool(b *testing.B) {
   var s *Small
   for i := 0; i < b.N; i++ {
      for j := 0; j < 10000; j++ {
         s = &Small{ a: 1, }
         b.StopTimer(); inc(s); b.StartTimer()
      }
   }
}

func BenchmarkWithPool(b *testing.B) {
   var s *Small
   for i := 0; i < b.N; i++ {
      for j := 0; j < 10000; j++ {
         s = pool.Get().(*Small)
         s.a = 1
         b.StopTimer(); inc(s); b.StartTimer()
         pool.Put(s)
      }
   }
}
复制代码

There are two benchmarks above, an unused sync.Pool, another use:

name           time/op        alloc/op        allocs/op
WithoutPool-8  3.02ms ± 1%    160kB ± 0%      1.05kB ± 1%
WithPool-8     1.36ms ± 6%   1.05kB ± 0%        3.00 ± 0%
复制代码

Because there are 10k loop iterations, the pool is not used benchmark heap memory allocation needs 10k times while using the pool benchmarks conducted three distribution only. This generated three times assigned by the pool, but allocated only one structural example. Now, it seems pretty good; use sync.Pool faster and consume less memory.

However, in a real application, your instance may be used to handle heavy task and will do a lot of head memory allocation. In this case, when adding memory, it will trigger GC. We can also use the command runtime.GC()to enforce benchmark of GC to emulate this behavior :( Translator's Note: Add each iteration of the Benchmark runtime.GC())

name           time/op        alloc/op        allocs/op
WithoutPool-8  993ms ± 1%    249kB ± 2%      10.9k ± 0%
WithPool-8     1.03s ± 4%    10.6MB ± 0%     31.0k ± 0%
复制代码

We can now see that, in the case of low GC performance of the pool, the number of allocation and memory usage is also higher. We continue to better understand the reasons.

Internal workflow pool

In-depth understanding of sync/pool.gothe package initialization that can help us answer the previous question:

func init() {
   runtime_registerPoolCleanup(poolCleanup)
}
复制代码

He will register a cleanup method at runtime object pool. GC in the document runtime/mgc.gothat triggered this method will be:

func gcStart(trigger gcTrigger) {
   [...]
   // 在开始 GC 前调用 clearpools
   clearpools()
复制代码

This explains why low performance when you call the GC. Because each GC run time are cleaned pool objects (Translator's Note: survival pool object between two GC). Documents also told us:

Storage can be automatically deleted at any time without notice at any content to be in the pool

Now, let's create a flow chart to understand the management of the pool:

For each we created sync.Pool, go to generate a processor coupled to each of the (Translator's Note: Go i.e., processor scheduling model of GMP P, pool actually stored in the form of [P]poolLocal) the internal pool poolLocal. The structure consists of two attributes: privateand shared. The first can only be accessed by its owner (push and pop without any locks), and sharedproperty by any other processor to read and requires complicated by security. In fact, the pool is not a simple local cache, it can be used by any thread / goroutines our application.

Go version 1.13 will improve sharedaccess to, and will also bring a new cache to resolve the issue and GC tank cleaning related.

The new lock-free pool and victim cache

Go 1.13 version will shareduse a doubly linked listpoolChain as a storage structure, the change removed the lock and improved sharedaccess. The following is a sharednew process visit:

Using the new chain structure pools, each processor may be in its sharedqueue head push and pop, while other processors access sharedonly from the tail pop. Since next/ prevattribute sharedhead of the queue may be twice as large as the expansion by allocating a new structure, which will be linked to the previous structure. The default size of the initial structure is 8. This means that the structure will be the second 16, third 32 structure, and so on.

Further, now poolLocalstructure does not require a lock, the code may depend on the atomic operation.

Plus the victim about the new cache (Translator's Note: About the introduction of victim caching the commit , the introduction of the cache is to solve that problem before Benchmark), the new strategy is very simple. There are two pools: Pool and archiving activities pool (Translator's Note: allPoolsand oldPools). When the GC runs, it will refer to each pool to the pool to save the new property (victim), then clean up the current pool before the pool becomes a group Archive pool:

// 从所有 pool 中删除 victim 缓存
for _, p := range oldPools {
   p.victim = nil
   p.victimSize = 0
}

// 把主缓存移到 victim 缓存
for _, p := range allPools {
   p.victim = p.local
   p.victimSize = p.localSize
   p.local = nil
   p.localSize = 0
}

// 非空主缓存的池现在具有非空的 victim 缓存，并且池的主缓存被清除
oldPools, allPools = allPools, nil
复制代码

With this strategy, the application will now have to create a cycle of GC / collect new elements have backed up, thanks to the victim cache. After the previous flowchart, "shared" pool will flow in Request victim cache.

Reproduced in: https: //juejin.im/post/5d006254e51d45776031afe3