[go]map

map的实现细节

// makehmap_small implements Go map creation for make(map[k]v) and
// make(map[k]v, hint) when hint is known to be at most bucketCnt
// at compile time and the map needs to be allocated on the heap.
func makemap_small() *hmap {
    h := new(hmap)
    h.hash0 = fastrand()
    return h
}

// makemap implements Go map creation for make(map[k]v, hint).
// If the compiler has determined that the map or the first bucket
// can be created on the stack, h and/or bucket may be non-nil.
// If h != nil, the map can be created directly in h.
// If h.buckets != nil, bucket pointed to can be used as the first bucket.
func makemap(t *maptype, hint int, h *hmap) *hmap {
    .....

    // initialize Hmap
    if h == nil {
        h = (*hmap)(newobject(t.hmap))
    }
    h.hash0 = fastrand()

    // find size parameter which will hold the requested # of elements
    B := uint8(0)
    for overLoadFactor(hint, B) {
        B++
    }
    h.B = B
    
    ......
    return h
}

//初始化: 最重要的肯定就是确定一开始要有多少个桶，初始的大小还是很重要的，还有一些别的初始化哈希种子等等，问题不大。

// A header for a Go map.
type hmap struct {
    // Note: the format of the hmap is also encoded in cmd/compile/internal/gc/reflect.go.
    // Make sure this stays in sync with the compiler's definition.
    count     int // # live cells == size of map.  Must be first (used by len() builtin) (map的大小.  len()函数就取的这个值)
    flags     uint8  // (map状态标识)
    B         uint8  // log_2 of # of buckets (can hold up to loadFactor * 2^B items) 
                     // 可以最多容纳 6.5 * 2 ^ B 个元素，6.5为装载因子即:map长度=6.5*2^B
                     // B可以理解为map扩容了多少次

    noverflow uint16 // approximate number of overflow buckets; see incrnoverflow for details 
                     // (溢出buckets的数量)
    hash0     uint32 // hash seed
                     // hash 种子

    buckets    unsafe.Pointer // array of 2^B Buckets. may be nil if count==0.
                              // 指向最大2^B个Buckets数组的指针. count==0时为nil.
    oldbuckets unsafe.Pointer // previous bucket array of half the size, non-nil only when growing
                              //指向扩容之前的buckets数组，并且容量是现在一半.不增长就为nil
    nevacuate  uintptr        // progress counter for evacuation (buckets less than this have been evacuated)
                              // 搬迁进度，小于nevacuate的已经搬迁
    extra *mapextra           // optional fields
}

buckets这个参数，它存储的是指向buckets数组的一个指针，当bucket(桶为0时)为nil。
我们可以理解为，hmap指向了一个空bucket数组，

并且当bucket数组需要扩容时，它会开辟一倍的内存空间，并且会渐进式的把原数组拷贝，即用到旧数组的时候就拷贝到新数组。

// A bucket for a Go map.
type bmap struct {
    // tophash generally contains the top byte of the hash value
    // for each key in this bucket. If tophash[0] < minTopHash,
    // tophash[0] is a bucket evacuation state instead.
    tophash [bucketCnt]uint8
    // Followed by bucketCnt keys and then bucketCnt elems.
    // NOTE: packing all the keys together and then all the elems together makes the
    // code a bit more complicated than alternating key/elem/key/elem/... but it allows
    // us to eliminate padding which would be needed for, e.g., map[int64]int8.
    // Followed by an overflow pointer.
}

// Go map 的 buckets结构
type bmap struct {
    // 每个元素hash值的高8位，如果tophash[0] < minTopHash，表示这个桶的搬迁状态
    tophash [bucketCnt]uint8
   // 第二个是8个key、8个value，但是我们不能直接看到；为了优化对齐，go采用了key放在一起，value放在一起的存储方式，
   // 第三个是溢出时，下一个溢出桶的地址
}

bucket（桶），每一个bucket最多放8个key和value，最后由一个overflow字段指向下一个bmap，
注意key、value、overflow字段都不显示定义，而是通过maptype计算偏移获取的。

bucket工作机制

第一部分: tophash 存储的是哈希函数算出的哈希值的高八位。
    是用来加快索引的。
    因为把高八位存储起来，这样不用完整比较key就能过滤掉不符合的key，
    1. 加快查询速度当一个哈希值的高8位和存储的高8位相符合，
    2. 再去比较完整的key值，进而取出value。

第二部分，存储的是key 和value，就是我们传入的key和value，
    注意，它的底层排列方式是，
        key全部放在一起，value全部放在一起。
        
        当key大于128字节时，bucket的key字段存储的会是指针，指向key的实际内容；
        value也是一样。

        这样排列好处是在key和value的长度不同的时候，可以消除padding带来的空间浪费。
        
        并且每个bucket最多存放8个键值对。

第三部分，存储的是当bucket溢出时，指向的下一个bucket的指针

bucket的设计细节

在golang map中出现冲突时，
    不是每一个key都申请一个结构通过链表串起来，而是以bmap为最小粒度挂载，
    一个bmap可以放8个key和value。
    
    这样减少对象数量，减轻管理内存的负担，利于gc。

如果插入时，bmap中key超过8，
    那么就会申请一个新的bmap（overflow bucket）挂在这个bmap的后面形成链表，
    优先用预分配的overflow bucket，
    如果预分配的用完了，那么就malloc一个挂上去。
    
    注意golang的map不会shrink，内存只会越用越多，overflow bucket中的key全删了也不会释放

参考

get

计算出key的hash
用最后的“B”位来确定在哪个桶（“B”就是前面说的那个，B为4，就有16个桶，0101用十进制表示为5，所以在5号桶）
根据key的前8位快速确定是在哪个格子（额外说明一下，在bmap中存放了每个key对应的tophash，是key的前8位）
最终还是需要比对key完整的hash是否匹配，如果匹配则获取对应value
如果都没有找到，就去下一个overflow找

map由2个数据结构来存储数据.
第一个数据结构是一个数组, 内部存储的是用于选择桶的搞八位值, 这个数组用于区分每个键值对要存在哪个桶里
第二个数据结构是一个字节数组, 用于存放键值对. 该字节数组先一次存储了桶里所有key,再一次存储桶里所有的值.(节约内存)

总结一下：通过后B位确定桶，通过前8位确定格子，循环遍历连着的所有桶全部找完为止。
那么为什么要有这个tophash呢？因为tophash可以快速确定key是否正确，你可以把它理解成一种缓存措施，如果前8位都不对了，后面就没有必要比较了。

map的实现细节

bucket工作机制

get

猜你喜欢