Thorough understanding of Golang Map

Get into the habit of writing together! This is the 4th day of my participation in the "Nuggets Daily New Plan·April Update Challenge", click to view the details of the event .

The content of this article is as follows. After reading this article, you will find the following Golang Map related interview questions.

image.png

interview questions

  1. The underlying implementation principle of map

  2. Why is iterating over the map unordered?

  3. How to implement in-order traversal of map?

  4. Why are Go maps not thread safe?

  5. How is a thread safe map implemented?

  6. Go sync.map or native map, which has better performance and why?

  7. Why is the load factor of Go map 6.5?

  8. What is the map expansion strategy?

Implementation principle

The map in Go is a pointer, occupying 8 bytes, pointing to the hmap structure; src/runtime/map.gothe underlying structure of the map can be seen in the source code

The underlying structure of each map is hmap, and hmap contains several bucket arrays whose structure is bmap. The bottom layer of each bucket adopts a linked list structure. Next, let's take a closer look at the structure of the map!

image.png

hmap structure

// A header for a Go map.
type hmap struct {
    count     int 
    // 代表哈希表中的元素个数,调用len(map)时,返回的就是该字段值。
    flags     uint8 
    // 状态标志,下文常量中会解释四种状态位含义。
    B         uint8  
    // buckets(桶)的对数log_2
    // 如果B=5,则buckets数组的长度 = 2^5=32,意味着有32个桶
    noverflow uint16 
    // 溢出桶的大概数量
    hash0     uint32 
    // 哈希种子

    buckets    unsafe.Pointer 
    // 指向buckets数组的指针,数组大小为2^B,如果元素个数为0,它为nil。
    oldbuckets unsafe.Pointer 
    // 如果发生扩容,oldbuckets是指向老的buckets数组的指针,老的buckets数组大小是新的buckets的1/2;非扩容状态下,它为nil。
    nevacuate  uintptr        
    // 表示扩容进度,小于此地址的buckets代表已搬迁完成。

    extra *mapextra 
    // 这个字段是为了优化GC扫描而设计的。当key和value均不包含指针,并且都可以inline时使用。extra是指向mapextra类型的指针。
 }
复制代码

bmap structure

bmapIt is what we often call "buckets". A bucket can hold up to 8 keys. The reason why these keys fall into the same bucket is that after they are hashed, the hash result is "one class". The positioning of the key is explained in detail in the query and insertion of the map. In the bucket, the upper 8 bits of the hash value calculated by the key are used to determine where the key falls into the bucket (there are at most 8 locations in a bucket).

// A bucket for a Go map.
type bmap struct {
    tophash [bucketCnt]uint8        
    // len为8的数组
    // 用来快速定位key是否在这个bmap中
    // 桶的槽位数组,一个桶最多8个槽位,如果key所在的槽位在tophash中,则代表该key在这个桶中
}
//底层定义的常量 
const (
    bucketCntBits = 3
    bucketCnt     = 1 << bucketCntBits
    // 一个桶最多8个位置
)

但这只是表面(src/runtime/hashmap.go)的结构,编译期间会给它加料,动态地创建一个新的结构:

type bmap struct {
  topbits  [8]uint8
  keys     [8]keytype
  values   [8]valuetype
  pad      uintptr
  overflow uintptr
  // 溢出桶
}
复制代码

The bucket memory data structure is visualized as follows:

Note that the key and value are put together separately, not in key/value/key/value/...this form. The source code shows that the advantage of this is that in some cases the padding field can be omitted to save memory space.

image.png

mapextra structure

当 map 的 key 和 value 都不是指针,并且 size 都小于 128 字节的情况下,会把 bmap 标记为不含指针,这样可以避免 gc 时扫描整个 hmap。但是,我们看 bmap 其实有一个 overflow 的字段,是指针类型的,破坏了 bmap 不含指针的设想,这时会把 overflow 移动到 extra 字段来。

// mapextra holds fields that are not present on all maps.
type mapextra struct {
    // 如果 key 和 value 都不包含指针,并且可以被 inline(<=128 字节)
    // 就使用 hmap的extra字段 来存储 overflow buckets,这样可以避免 GC 扫描整个 map
    // 然而 bmap.overflow 也是个指针。这时候我们只能把这些 overflow 的指针
    // 都放在 hmap.extra.overflow 和 hmap.extra.oldoverflow 中了
    // overflow 包含的是 hmap.buckets 的 overflow 的 buckets
    // oldoverflow 包含扩容时的 hmap.oldbuckets 的 overflow 的 bucket
    overflow    *[]*bmap
    oldoverflow *[]*bmap

  nextOverflow *bmap 
 // 指向空闲的 overflow bucket 的指针
}
复制代码

主要特性

引用类型

map是个指针,底层指向hmap,所以是个引用类型

golang 有三个常用的高级类型slice、map、channel, 它们都是引用类型,当引用类型作为函数参数时,可能会修改原内容数据。

golang 中没有引用传递,只有值和指针传递。所以 map 作为函数实参传递时本质上也是值传递,只不过因为 map 底层数据结构是通过指针指向实际的元素存储空间,在被调函数中修改 map,对调用者同样可见,所以 map 作为函数实参传递时表现出了引用传递的效果。

因此,传递 map 时,如果想修改map的内容而不是map本身,函数形参无需使用指针

func TestSliceFn(t *testing.T) {
 m := map[string]int{}
 t.Log(m, len(m))
 // map[a:1]
 mapAppend(m, "b"2)
 t.Log(m, len(m))
 // map[a:1 b:2] 2
}

func mapAppend(m map[string]int, key string, val int) {
 m[key] = val
}
复制代码

共享存储空间

map 底层数据结构是通过指针指向实际的元素存储空间 ,这种情况下,对其中一个map的更改,会影响到其他map

func TestMapShareMemory(t *testing.T) {
 m1 := map[string]int{}
 m2 := m1
 m1["a"] = 1
 t.Log(m1, len(m1))
 // map[a:1] 1
 t.Log(m2, len(m2))
 // map[a:1]
}
复制代码

遍历顺序随机

map 在没有被修改的情况下,使用 range 多次遍历 map 时输出的 key 和 value 的顺序可能不同。这是 Go 语言的设计者们有意为之,在每次 range 时的顺序被随机化,旨在提示开发者们,Go 底层实现并不保证 map 遍历顺序稳定,请大家不要依赖 range 遍历结果顺序。

map 本身是无序的,且遍历时顺序还会被随机化,如果想顺序遍历 map,需要对 map key 先排序,再按照 key 的顺序遍历 map。

func TestMapRange(t *testing.T) {
 m := map[int]string{1"a"2"b"3"c"}
 t.Log("first range:")
 // 默认无序遍历
 for i, v := range m {
  t.Logf("m[%v]=%v ", i, v)
 }
 t.Log("\nsecond range:")
 for i, v := range m {
  t.Logf("m[%v]=%v ", i, v)
 }

 // 实现有序遍历
 var sl []int
 // 把 key 单独取出放到切片
 for k := range m {
  sl = append(sl, k)
 }
 // 排序切片
 sort.Ints(sl)
 // 以切片中的 key 顺序遍历 map 就是有序的了
 for _, k := range sl {
  t.Log(k, m[k])
 }
}
复制代码

非线程安全

map默认是并发不安全的,原因如下:

Go 官方在经过了长时间的讨论后,认为 Go map 更应适配典型使用场景(不需要从多个 goroutine 中进行安全访问),而不是为了小部分情况(并发访问),导致大部分程序付出加锁代价(性能),决定了不支持。

Scenario: 2 coroutines read and write at the same time, the following program will have a fatal error: fatal error: concurrent map writes

func main() {
    
 m := make(map[int]int)
 go func() {
        //开一个协程写map
  for i := 0; i < 10000; i++ {
    
   m[i] = i
  }
 }()

 go func() {
        //开一个协程读map
  for i := 0; i < 10000; i++ {
    
   fmt.Println(m[i])
  }
 }()

 //time.Sleep(time.Second * 20)
 for {
    
  ;
 }
}
复制代码

If you want to achieve map thread safety, there are two ways:

Method 1: Use read-write lock map+sync.RWMutex

func BenchmarkMapConcurrencySafeByMutex(b *testing.B) {
 var lock sync.Mutex //互斥锁
 m := make(map[int]int0)
 var wg sync.WaitGroup
 for i := 0; i < b.N; i++ {
  wg.Add(1)
  go func(i int) {
   defer wg.Done()
   lock.Lock()
   defer lock.Unlock()
   m[i] = i
  }(i)
 }
 wg.Wait()
 b.Log(len(m), b.N)
}
复制代码

Method 2: Use the one provided by golangsync.Map

sync.map is implemented with read-write separation, and its idea is to exchange space for time. Compared with the implementation of map+RWLock, it has made some optimizations: the read map can be accessed without lock, and the read map will be preferentially operated. If only the read map can be operated to meet the requirements (addition, deletion, modification, search and traversal), then there is no need to go to The operation of the write map (its read and write must be locked), so in some specific scenarios, the frequency of lock competition will be much less than the implementation of map+RWLock.

func BenchmarkMapConcurrencySafeBySyncMap(b *testing.B) {
 var m sync.Map
 var wg sync.WaitGroup
 for i := 0; i < b.N; i++ {
  wg.Add(1)
  go func(i int) {
   defer wg.Done()
   m.Store(i, i)
  }(i)
 }
 wg.Wait()
 b.Log(b.N)
}
复制代码

hash collision

A map in golang is a collection of kv pairs. The bottom layer uses a hash table, and a linked list is used to resolve conflicts. When a conflict occurs, not each key applies for a structure to be strung together through the linked list, but is mounted with bmap as the smallest granularity, and one bmap can hold 8 kv. In the selection of the hash function, when the program starts, it will be detected whether the cpu supports aes, if so, the aes hash will be used, otherwise, the memhash will be used.

Summarize

  1. map is a reference type

  2. map traversal is unordered

  3. map is not thread safe

  4. The hash collision resolution method of map is the linked list method

  5. The expansion of the map does not necessarily add space, it may also just do memory sorting

  6. The migration of the map is carried out step by step, and at least one migration will be done for each assignment.

  7. Deleting the key in the map may lead to many empty kvs, which will lead to migration operations. If it can be avoided, try to avoid it.

Guess you like

Origin juejin.im/post/7082735541438906399