[Go] Implementation principle of Map in Go

Table of contents

1. Implementation principle

2. Source code analysis


1. Implementation principle

in Go mapis an implementation of a hash table. It is a data structure for storing key-value pairs, where the key (key) and value (value) can be of any type. Here's maphow it works in Go:

  1. Hash function: When mapinserting a key-value pair into a Go uses a hash function to convert the key into an index. A hash function maps a key to a fixed set of integer values ​​that determine where the key-value pair is stored in the internal array.

  2. Buckets and linked lists: mapThe underlying data structure of a bucket is an array of buckets, each of which contains a linked list or a binary tree. When multiple keys get the same index calculated by the hash function, they will be stored in the same bucket. If the length of the linked list is short, Go will use the linked list to store key-value pairs; if the length of the linked list is long, Go will convert the linked list into a balanced binary tree to improve performance.

  3. Collision resolution: Since different keys may produce the same hash value, a collision occurs. To solve the collision problem, Go uses linked lists or balanced binary trees to store key-value pairs with the same index. When searching, inserting or deleting, the linked list or binary tree is traversed according to the hash value of the key and the comparison function.

  4. Dynamic expansion: When inserting key-value pairs, if the load factor of the hash table exceeds the threshold, dynamic expansion will be triggered. During scaling, Go creates a larger bucket array and reassigns existing key-value pairs to new buckets. This reduces collisions and maintains lookup performance.

To sum up, the hash table in Go is mapused as the underlying implementation, the key-value pair is mapped to the index of the array through the hash function, and the linked list or balanced binary tree is used to solve the problem of key collision. It provides efficient lookup, insertion, and deletion operations, and it automatically scales up when needed.

2. Source code analysis

The source code of map is located in src/runtime/map.go.

The map is also stored in an array. Each array subscript stores a bucket. The type of the bucket is shown in the following code. Each bucket can store 8 kv key-value pairs. When the kv pair stored in each bucket arrives After 8, it will point to a new bucket through the overflow pointer, thus forming a linked list. Looking at the structure of bmap, I think everyone should be very puzzled. I didn’t see the structure of kv and the overflow pointer. In fact, these two structures are not the same. The definition is not displayed, and it is accessed through pointer arithmetic.

// A bucket for a Go map.
type bmap struct {
	// tophash generally contains the top byte of the hash value
	// for each key in this bucket. If tophash[0] < minTopHash,
	// tophash[0] is a bucket evacuation state instead.
	tophash [bucketCnt]uint8
	// Followed by bucketCnt keys and then bucketCnt elems.
	// NOTE: packing all the keys together and then all the elems together makes the
	// code a bit more complicated than alternating key/elem/key/elem/... but it allows
	// us to eliminate padding which would be needed for, e.g., map[int64]int8.
	// Followed by an overflow pointer.
}

Looking at the above code and comments, we can get the kv stored in the bucket like this, tophash is used to quickly find whether the key value is in the bucket, and the difference is compared by the true value every time;

There is also the storage of kv, why not k1v1, k2v2..... but k1k2...v1v2..., let's see the map[int64]int8 in the above comment, the key is int64 (8 bytes), value is int8 (one byte), and the length of kv is different. If it is stored in kv format, considering memory alignment v will also occupy int64, and when stored in the latter, 8 v just occupy one int64. From this, we can see The map design of Go is ingenious.

Finally, let's analyze the overall memory structure of go and read the source code stored in the map. As shown in the figure below, when storing a kv pair in the map, the hash value is obtained through k, and the lower eight bits of the hash value and the length of the bucket array take the remainder , locate the subscript in the array, the high eight bits of the hash value are stored in the tophash in the bucket, which is used to quickly determine whether the key exists, and the specific values ​​of the key and value are stored by pointer arithmetic. When a bucket is full , link to the next bucket through the overfolw pointer.

 The source code of go's map storage is as follows:

// Like mapaccess, but allocates a slot for the key if it is not present in the map.
func mapassign(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {
	if h == nil {
		panic(plainError("assignment to entry in nil map"))
	}
	if raceenabled {
		callerpc := getcallerpc()
		pc := abi.FuncPCABIInternal(mapassign)
		racewritepc(unsafe.Pointer(h), callerpc, pc)
		raceReadObjectPC(t.key, key, callerpc, pc)
	}
	if msanenabled {
		msanread(key, t.key.size)
	}
	if asanenabled {
		asanread(key, t.key.size)
	}
	if h.flags&hashWriting != 0 {
		fatal("concurrent map writes")
	}
	hash := t.hasher(key, uintptr(h.hash0))

	// Set hashWriting after calling t.hasher, since t.hasher may panic,
	// in which case we have not actually done a write.
	h.flags ^= hashWriting

	if h.buckets == nil {
		h.buckets = newobject(t.bucket) // newarray(t.bucket, 1)
	}

again:
	bucket := hash & bucketMask(h.B)
	if h.growing() {
		growWork(t, h, bucket)
	}
	b := (*bmap)(add(h.buckets, bucket*uintptr(t.bucketsize)))
	top := tophash(hash)

	var inserti *uint8
	var insertk unsafe.Pointer
	var elem unsafe.Pointer
bucketloop:
	for {
		for i := uintptr(0); i < bucketCnt; i++ {
			if b.tophash[i] != top {
				if isEmpty(b.tophash[i]) && inserti == nil {
					inserti = &b.tophash[i]
					insertk = add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
					elem = add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))
				}
				if b.tophash[i] == emptyRest {
					break bucketloop
				}
				continue
			}
			k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
			if t.indirectkey() {
				k = *((*unsafe.Pointer)(k))
			}
			if !t.key.equal(key, k) {
				continue
			}
			// already have a mapping for key. Update it.
			if t.needkeyupdate() {
				typedmemmove(t.key, k, key)
			}
			elem = add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))
			goto done
		}
		ovf := b.overflow(t)
		if ovf == nil {
			break
		}
		b = ovf
	}

	// Did not find mapping for key. Allocate new cell & add entry.

	// If we hit the max load factor or we have too many overflow buckets,
	// and we're not already in the middle of growing, start growing.
	if !h.growing() && (overLoadFactor(h.count+1, h.B) || tooManyOverflowBuckets(h.noverflow, h.B)) {
		hashGrow(t, h)
		goto again // Growing the table invalidates everything, so try again
	}

	if inserti == nil {
		// The current bucket and all the overflow buckets connected to it are full, allocate a new one.
		newb := h.newoverflow(t, b)
		inserti = &newb.tophash[0]
		insertk = add(unsafe.Pointer(newb), dataOffset)
		elem = add(insertk, bucketCnt*uintptr(t.keysize))
	}

	// store new key/elem at insert position
	if t.indirectkey() {
		kmem := newobject(t.key)
		*(*unsafe.Pointer)(insertk) = kmem
		insertk = kmem
	}
	if t.indirectelem() {
		vmem := newobject(t.elem)
		*(*unsafe.Pointer)(elem) = vmem
	}
	typedmemmove(t.key, insertk, key)
	*inserti = top
	h.count++

done:
	if h.flags&hashWriting == 0 {
		fatal("concurrent map writes")
	}
	h.flags &^= hashWriting
	if t.indirectelem() {
		elem = *((*unsafe.Pointer)(elem))
	}
	return elem
}

Guess you like

Origin blog.csdn.net/fanjufei123456/article/details/132016865