Go 中有基础类型、聚合类型、引用类型和接口类型。基础类型包括整数、浮点数、布尔值、字符串；聚合类型包括数组、结构体；引用类型包括指针、切片、map、function、channel。在本文中，介绍部分聚合类型、引用类型和接口类型的底层表示及原理。

空结构体

空结构体长度为 0，主要为了节约内存。

当我们想要构造一个 hashset 时，可用空结构体：

type hashSet map[string]struct{
    
    }
aHashSet := hashSet{
    
    
    "a": struct{
    
    }{
    
    },
}

只想让 channel 作为纯型号，不携带信息时：

myChannel := make(chan struct{
    
    })

package main

import (
 "fmt"
 "unsafe"
)

type K struct{
    
    }

func main() {
    
    
 c := K{
    
    }
 d := K{
    
    }
 fmt.Println(unsafe.Sizeof(c)) // 0
 fmt.Println(unsafe.Sizeof(d)) // 0
}

空结构体独立存在，即不被包含到其他结构体中时，指针指向相同的地址空间：

fmt.Printf("%p", &c) // 0xf3a418
fmt.Printf("%p", &d) // 0xf3a418

该地址称为 zerobase：

// base address for all 0-byte allocations
var zerobase uintptr

非独立情况，比如：

type Test struct {
    
    
    kInstance K
    num int
}

t := &Test{
    
    }
fmt.Printf("%p", &t.kInstance) // 0xc0000a6078

字符串

在 go 中，对字符串的操作实际上是操作的 stringStruct 结构体：

type stringStruct struct {
    
    
    str unsafe.Pointer // 指向底层 Byte 数组
    len int            // Byte 数组的长度，非字符数量
}

当我们获取字符串的空间大小时，得到的是 16，因为 unsafe.Pointer 占 8 字节，int 类型占 8 字节：

fmt.Println(unsafe.Sizeof("haha")) // 16

关于结构体大小计算，参考类型内存对齐

注意：

在 go 中，int8 : 1字节，int16 : 2字节，int32 : 4字节，int64 : 8字节，int 默认 1 个机器字。
对 string 类型取 len() 时得到的时字节数，而不是长度。
对字符串直接下标访问，得到的是字节。
字符串 range 遍历时，被解码成 rune 类型。

一般做字符串切分时，先将字符串转为 rune 数组，再切分：

s := "abcdefg"
fmt.Printf("%c", []rune(s)[:2])

切片

在 go 中，对切片的操作实际上是操作的 slice 结构体：

type slice struct {
    
    
    array unsafe.Pointer
    len int
    cap int
}

通过数组创建切片，此时 len() 为切片内元素个数，cap() 为 arr 数组从被切分的初始位置到最后一个元素的长度

arr := []int{
    
    1,2,3,4,5,6}
mySlice := arr[1:3]

unsafe.Sizeof(mySlice) // 24

通过字面量创建时，会先新建一个数组，再创建结构体：

slice1 := []int{
    
    1,2,3}

通过 make 创建切片时，会调用 runtime 中的 makeslice 方法。

切片追加

切片内元素还未撑满容量时，对切片追加元素则直接添加到最后。
当切片长度等于容量时，内部会调用 runtime.growslice() 默认重新生成初始容量两倍的新切片，再进行追加。
如果期望容量大于当前容量的两倍，就用期望容量。
如果当前容量小于 1024，则容量翻倍。
容量大于 1024 时，每次增加 25%。
切片扩容时，并发不安全，需要加锁。

Map

go 中构建 HashMap 采用的拉链法。

// A header for a Go map. 
type hmap struct {
    
    
 // Note: the format of the hmap is also encoded in cmd/compile/internal/reflectdata/reflect.go.
 // Make sure this stays in sync with the compiler's definition.
 count     int // # live cells == size of map.  Must be first (used by len() builtin)
 flags     uint8
 B         uint8  // log_2 of # of buckets (can hold up to loadFactor * 2^B items)
 noverflow uint16 // approximate number of overflow buckets; see incrnoverflow for details
 hash0     uint32 // hash seed

 buckets    unsafe.Pointer // array of 2^B Buckets. may be nil if count==0.
 oldbuckets unsafe.Pointer // previous bucket array of half the size, non-nil only when growing
 nevacuate  uintptr        // progress counter for evacuation (buckets less than this have been evacuated)

 extra *mapextra // optional fields
}

桶 buckets 的数据结构如下：

// A bucket for a Go map.
type bmap struct {
    
    
 // tophash generally contains the top byte of the hash value
 // for each key in this bucket. If tophash[0] < minTopHash,
 // tophash[0] is a bucket evacuation state instead.
 tophash [bucketCnt]uint8
 // Followed by bucketCnt keys and then bucketCnt elems.
 // NOTE: packing all the keys together and then all the elems together makes the
 // code a bit more complicated than alternating key/elem/key/elem/... but it allows
 // us to eliminate padding which would be needed for, e.g., map[int64]int8.
 // Followed by an overflow pointer.
}

在一个 bucket 中存在 3 个数组：tophash、keys、elems。tophash 中存放的是容量为 bucketCnt 的 hash 值，具体来说是记录的 key 的 hash 值得高 8 位；bucketCnt 个 keys 和 bucketCnt 个 elems 分别放在两个数组(keys, elems)中。

如果容量超过 bucketCnt，则 overflow.nextOverflow 指针指向其他 bmap(溢出桶中的bmap)。

通过 make 创建 map 时，内部调用 runtime.makemap() 方法：

// make 创建 map
myMap := make([string]int, 10)

// map.go 中
func makemap(t *maptype, hint int, h *hmap) *hmap {
    
    
 mem, overflow := math.MulUintptr(uintptr(hint), t.bucket.size)
 if overflow || mem > maxAlloc {
    
    
  hint = 0
 }

 // 初始化 hmap
 if h == nil {
    
    
  h = new(hmap)
 }
 h.hash0 = fastrand()

// 根据 hint 计算 B 的大小，通过 B 计算桶的容量
 B := uint8(0)
 for overLoadFactor(hint, B) {
    
    
  B++
 }
 h.B = B

 if h.B != 0 {
    
    
  var nextOverflow *bmap

  // 创建数组来放置桶
  h.buckets, nextOverflow = makeBucketArray(t, h.B, nil)
  // 存放溢出桶
  if nextOverflow != nil {
    
    
   h.extra = new(mapextra)
   h.extra.nextOverflow = nextOverflow
  }
 }

 return h
}

Map 扩容

当溢出桶过多时，会采取扩容。
runtime.mapassign() 可能触发得扩容情况：
- 装载因子大于 6.5
- 溢出桶超过普通桶
扩容方式：
- 等量扩容
- 翻倍扩容

扩容步骤1：
runtime.hashGrow()：

func hashGrow(t *maptype, h *hmap) {
    
    
 bigger := uint8(1)
 if !overLoadFactor(h.count+1, h.B) {
    
    
  bigger = 0
  h.flags |= sameSizeGrow
 }
 // 1.
 oldbuckets := h.buckets
 // 2.
 newbuckets, nextOverflow := makeBucketArray(t, h.B+bigger, nil)

 flags := h.flags &^ (iterator | oldIterator)
 if h.flags&iterator != 0 {
    
    
  flags |= oldIterator
 }
 h.B += bigger
 // 4.
 h.flags = flags
 // 1.
 h.oldbuckets = oldbuckets
 // 3.
 h.buckets = newbuckets
 h.nevacuate = 0
 h.noverflow = 0

 if h.extra != nil && h.extra.overflow != nil {
    
    
  if h.extra.oldoverflow != nil {
    
    
   throw("oldoverflow is not nil")
  }
  // 5.
  h.extra.oldoverflow = h.extra.overflow
  h.extra.overflow = nil
 }
 if nextOverflow != nil {
    
    
  if h.extra == nil {
    
    
   h.extra = new(mapextra)
    }
  h.extra.nextOverflow = nextOverflow
 }
}

oldbuckets 指向原有的桶组
创建一组新桶
buckets 指向新的桶组
标记 map 为扩容状态
更新溢出桶信息

扩容步骤2：

将所有数据从旧桶迁移到新桶
采用渐进式驱逐
每次操作一个旧桶时，将旧桶数据迁移到新桶

扩容步骤3：

所有的旧桶驱逐完成后，回收 oldbuckets

tips：扩容并不一定都是增大，也可能是整理

普通 Map 是并发不安全的，用 sync.Map 代替
比如 A 协程读桶数据时，B 驱逐了桶的数据。sync.Map 结构体如下：

// %GOROOT%/src/sync/map.go
type Map struct {
    
    
 mu Mutex
 read atomic.Pointer[readOnly]
 dirty map[any]*entry
// 未命中时自加
 misses int
}

type readOnly struct {
    
    
 m       map[any]*entry
// 是否追加数据
 amended bool // true if the dirty map contains some key not in m.
}

type entry struct {
    
    
 p atomic.Pointer[any]
}

syncMap 数据存放示意图如下：
在这里插入图片描述

读/改数据时，先进入 read 的 map，如果数据不存在则表示未命中，判断 amended 是否为 true，为 true 则去读取 dirty 的 map，misses 加 1，否则退出。
追加数据时，先进入 read 的 map，如果相应数据不存在则表示需要追加，退出 m，使用 mu 给 dirty 上锁，然后新增键值对，设置 amended 为 true：
当 misses 的值等于 dirty 的长度时，进行 dirty 提升，dirty(map) 取代原有的 m(map)，新的 dirty 置空，另外重置 misses 为 0，amended 为 false；后续需要追加时再重建 dirty：
dirty 提升前正常删除，比如删除键 “c”，找到 “c” 并把 *entry 指向空 nil。
删除 “c” 后 dirty 出现了提升情况，在重建 dirty 时，被删除的 “c” 就不会再次重建，并且 nil 改为 expunged 描述：

sync.Map 的基础用法：

var m sync.Map
// 1. 写入
m.Store("a", 1)

// 2. 读取
a, _ := m.Load("a")

// 3. 遍历
m.Range(func(key, value interface{
    
    }) bool {
    
    
    k := key.(string)
    v := value.(int)
    fmt.Println(k, v)
    return true
})

// 4. 删除
m.Delete("a")
a, ok := m.Load("qcrao")
fmt.Println(a, ok)

// 5. 读取或写入
m.LoadOrStore("b", 2)
b, _ = m.Load("b")
fmt.Println(b)

接口

type Car interface {
    
    
 Drive()
}
type Truck struct {
    
    
}

func (t Truck) Drive() {
    
    

}

var t Car = Truck{
    
    }
t.Drive()

一个接口的值 t 的底层表示：

// %GOROOT%/src/runtime/runtime2.go
type iface struct {
    
    
 tab  *itab
 data unsafe.Pointer // 指向 Truck{}
}

type itab struct {
    
    
 inter *interfacetype // 接口类型
 _type *_type // 接口装载的值的具体类型
 hash  uint32 // copy of _type.hash. Used for type switches.
 _     [4]byte
 fun   [1]uintptr // variable sized. fun[0]==0 means _type does not implement inter.
}

空接口 interface{} 的底层表示：

type eface struct {
    
    
 _type *_type // 没有方法
 data  unsafe.Pointer
}

interface{} 一般用于接收任意类型，类似泛型的作用。

深入理解 Golang: 聚合、引用和接口类型的底层数据结构

空结构体

字符串

切片

Map

接口

猜你喜欢