Originally, the map traversal process was relatively simple: traverse all buckets and the overflow buckets behind them, and then traverse all cells in the bucket one by one. Each bucket contains 8 cells. The key and value are taken out from the cell with the key, and the process is completed.
However, reality is not that simple. Remember the expansion process mentioned earlier? The expansion process is not an atomic operation. It only moves up to 2 buckets at a time. Therefore, if the expansion operation is triggered, the state of the map will be in an intermediate state for a long time: some buckets have been moved to new homes, and some buckets have been moved to new homes. Some buckets are still in the same place.
Therefore, if the traversal occurs during the expansion process, it will involve the process of traversing the old and new buckets, which is the difficulty.
Let me first write a simple code sample, pretending not to know what function is specifically called during the traversal process:
package main
import "fmt"
func main() {
ageMp := make(map[string]int)
ageMp["qcrao"] = 18
for name, age := range ageMp {
fmt.Println(name, age)
}
}
Excuting an order:
go tool compile -S main.go
Get the assembly command. I won’t explain it line by line here. You can read the previous articles, which explain it in detail.
The key lines of assembly code are as follows:
// ......
0x0124 00292 (test16.go:9) CALL runtime.mapiterinit(SB)
// ......
0x01fb 00507 (test16.go:9) CALL runtime.mapiternext(SB)
0x0200 00512 (test16.go:9) MOVQ ""..autotmp_4+160(SP), AX
0x0208 00520 (test16.go:9) TESTQ AX, AX
0x020b 00523 (test16.go:9) JNE 302
// ......
In this way, regarding map iteration, the underlying function calling relationship is clear at a glance. First, mapiterinit
the function is called to initialize the iterator, and then mapiternext
the function is called in a loop to iterate the map.
Iterator structure definition:
type hiter struct {
// key 指针
key unsafe.Pointer
// value 指针
value unsafe.Pointer
// map 类型,包含如 key size 大小等
t *maptype
// map header
h *hmap
// 初始化时指向的 bucket
buckets unsafe.Pointer
// 当前遍历到的 bmap
bptr *bmap
overflow [2]*[]*bmap
// 起始遍历的 bucket 编号
startBucket uintptr
// 遍历开始时 cell 的编号(每个 bucket 中有 8 个 cell)
offset uint8
// 是否从头遍历了
wrapped bool
// B 的大小
B uint8
// 指示当前 cell 序号
i uint8
// 指向当前的 bucket
bucket uintptr
// 因为扩容,需要检查的 bucket
checkBucket uintptr
}
mapiterinit
It is to initialize and assign values to the fields in the hiter structure.
As mentioned before, even if a hard-coded map is traversed, the results will be out of order every time. Below we can take a closer look at their implementation.
// 生成随机数 r
r := uintptr(fastrand())
if h.B > 31-bucketCntBits {
r += uintptr(fastrand()) << 31
}
// 从哪个 bucket 开始遍历
it.startBucket = r & (uintptr(1)<<h.B - 1)
// 从 bucket 的哪个 cell 开始遍历
it.offset = uint8(r >> h.B & (bucketCnt - 1))
For example, B = 2, then uintptr(1)<<h.B - 1
the result is 3, and the lower 8 bits are 0000 0011
. By ANDing r with it, you can get a 0~3
bucket number; bucketCnt - 1 is equal to 7, and the lower 8 bits are 0000 0111
. After shifting r to the right by 2 bits, Anded with 7, you can get a 0~7
cell with number.
Therefore, in mapiternext
the function, the traversal will start from the cell with the it.offset number of it.startBucket, and the key and value will be taken out until it returns to the starting bucket to complete the traversal process.
The source code part is relatively easy to understand, especially after understanding the previously commented sections of code, there is no pressure to read this part of the code. So, next, I will explain the entire traversal process graphically, hoping it will be clear and easy to understand.
Suppose we have a map as shown in the figure below. Initially, B = 1 and there are two buckets. Later, expansion is triggered (don’t go into the expansion conditions here, it is just a setting), and B becomes 2. Moreover, the content in bucket No. 1 has been moved to the new bucket, 1 号
splitting into 1 号
sum 3 号
; 0 号
the bucket has not been moved yet. The old bucket hangs on *oldbuckets
the pointer, and the new bucket hangs on *buckets
the pointer.
At this time, we traverse this map. Assume that after initialization, startBucket = 3, offset = 2. Therefore, the starting point of the traversal will be cell No. 2 of bucket No. 3. The following picture is the state when the traversal starts:
The one marked in red indicates the starting position, and the bucket traversal order is: 3 -> 0 -> 1 -> 2.
Because bucket No. 3 corresponds to the old bucket No. 1, first check whether the old bucket No. 1 has been relocated. The judgment method is:
func evacuated(b *bmap) bool {
h := b.tophash[0]
return h > empty && h < minTopHash
}
If the value of b.tophash[0] is within the flag value range, that is, in the (0,4) interval, it means that it has been relocated.
empty = 0
evacuatedEmpty = 1
evacuatedX = 2
evacuatedY = 3
minTopHash = 4
In this example, the old bucket No. 1 has been moved. So its tophash[0] value is in the range of (0,4), so only the new bucket No. 3 needs to be traversed.
Traverse the cells of bucket No. 3 in sequence, and you will find the first non-empty key: element e. At this point, the mapiternext function returns, and our traversal result only has one element:
Since the returned key is not empty, the mapiternext function will continue to be called.
Continue to traverse from the last traversed place, and find element f and element g from the new overflow bucket No. 3.
Traversing the result set also grows:
After traversing the new bucket No. 3, it returns to the new bucket No. 0. Bucket No. 0 corresponds to the old bucket No. 0. After checking, the old bucket No. 0 has not been relocated, so the traversal of the new bucket No. 0 is changed to the old bucket No. 0. Does that mean taking out all the keys in the old bucket No. 0?
It's not that simple. Recall that the old bucket No. 0 will split into two buckets after the relocation: new No. 0 and new No. 2. What we are traversing at this time is only the new bucket No. 0 (note that traversals are all traversal *bucket
pointers, which are the so-called new buckets). Therefore, we will only take out those keys in the old bucket No. 0 that are allocated to the new bucket No. 0 after fission.
Therefore, lowbits == 00
the result set will be traversed:
Like the previous process, continue to traverse the new bucket No. 1 and find that the old bucket No. 1 has been moved. You only need to traverse the existing elements in the new bucket No. 1. The result set becomes:
Continue to traverse the new bucket No. 2, which comes from the old bucket No. 0, so we need the keys in the old bucket No. 0 that will fission into the new bucket No. 2, that is, the keys of lowbit == 10
.
In this way, traversing the result set becomes:
Finally, when the traversal continues to the new bucket No. 3, it is found that all buckets have been traversed and the entire iteration process is completed.
By the way, if you encounter a key math.NaN()
like this, the processing method is similar. The core still depends on which bucket it falls into after being split. Just look at the lowest bit of its top hash. If the lowest bit of the top hash is 0, it is assigned to the X part; if it is 1, it is assigned to the Y part. Based on this, decide whether to remove the key and put it in the traversal result set.
The core of map traversal is to understand that when the capacity is doubled, the old bucket will be split into two new buckets. The traversal operation will be performed in the order of the new bucket's serial number. When the old bucket is not moved, the key to be moved to the new bucket in the future must be found in the old bucket.