Mutex analysis

The implementation of locks generally depends on atomic operations and semaphores. Some atomic operations in the atomic package are used to lock locks, and semaphores are used to block and wake up threads.
Lock implementation reference document
Mutex underlying structure:

type Mutex struct {  
     state int32  
     sema  uint32
 }

Locking:
lock by atomic operation cas, if the lock is unsuccessful, choose to spin and retry to lock or block and wait for the lock to be woken up according to different scenarios
Please add a picture description

func (m *Mutex) Lock() {
	// Fast path: grab unlocked mutex.
	if atomic.CompareAndSwapInt32(&m.state, 0, mutexLocked) {
		if race.Enabled {
			race.Acquire(unsafe.Pointer(m))
		}
		return
	}
	// Slow path (outlined so that the fast path can be inlined)
	m.lockSlow()
}

Unlock:
Unlock by atomic operation add, if there are still goroutines waiting, wake up the waiting goroutine

func (m *Mutex) Unlock() {
	if race.Enabled {
		_ = m.state
		race.Release(unsafe.Pointer(m))
	}

	// Fast path: drop lock bit.
	new := atomic.AddInt32(&m.state, -mutexLocked)
	if new != 0 {
		// Outlined slow path to allow inlining the fast path.
		// To hide unlockSlow during tracing we skip one extra frame when tracing GoUnblock.
		m.unlockSlow(new)
	}
}
  1. Using Unlock() before Lock() will cause a panic exception
  2. After using Lock() to lock, Lock() again will cause deadlock (reentrancy is not supported), and Unlock() is required to unlock it before locking again
  3. The lock state is not associated with goroutine, one goroutine can Lock, another goroutine can Unlock

Spin: The CPU rate is high, but no context switching is required, which is suitable for waiting in a short period of time.
Condition: The number of cpu cores is greater than 1. There is free P. Specifies the maximum number of spins 4. The lock is not in starvation mode. The local run queue is empty.

Semaphore:
implement sleep and wake up coroutines

信号量有两个操作P和V
P(S):分配一个资源
1. 资源数减1:S=S-1
2. 进行以下判断
    如果S<0,进入阻塞队列等待被释放
    如果S>=0,直接返回
 
V(S):释放一个资源
1. 资源数加1:S=S+1
2. 进行如下判断
    如果S>0,直接返回
    如果S<=0,表示还有进程在请求资源,释放阻塞队列中的第一个等待进程
    
golang中信号量操作:runtime/sema.go
P操作:runtime_Semacquire
V操作:runtime_Semrelease

underlying implementation

The bottom layer is implemented based on the runtime Semaphore mechanism.
Data structure:
A global variable, semtable array, is defined in sema.go. The size is 251, and the element is an anonymous structure. Here, in order to avoid the false sharing problem (parameters are read to different CPUs at the same time in the same cache line), memory filling is done.

// Prime to not correlate with any user patterns.
素数不与任何用户模式相关联。该值大小是一个简单的哈希表。使元素的数量成为质数允许使用简单的散列函数,具有可接受的意外冲突概率。 
const semTabSize = 251

type semTable [semTabSize]struct {
	root semaRoot
	pad  [cpu.CacheLinePadSize - unsafe.Sizeof(semaRoot{})]byte
}

The semaRoot held by each element is the core of this data structure.

// semaRoot持有一个具有不同地址(sudog.elem)的sudog平衡树,
// 每个sudog都可以通过s.waitlink依次指向一个相同地址等待的sudog列表, 
// 在具有相同等待地址的sudog内部列表上的操作时间复杂度都是O(1)。
// 顶层semaRoot列表的扫描为O(logn),其中n是阻止goroutines的不同信号量地址的数量。
// 为sync.Mutex准备的异步信号量
type semaRoot struct {
	lock  mutex
	treap *sudog        // 平衡树的根节点
	nwait atomic.Uint32 // Number of waiters. Read w/o the lock.
}

sudog structure: the source code is defined in runtime/runtime2.go

type sudog struct {
 g *g
 next *sudog
 prev *sudog
 elem unsafe.Pointer //数据元素 (可能指向栈)
 // 下面的字段不会并发访问
 // 对于channels, waitlink 只被g访问
  // 对于semaphores, 所有自动(包括上面的)只有获取semaRoot的锁才能被访问
 acquiretime int64
 releasetime int64
 ticket      uint32
  //isSelect表示g正在参与一个select,因此必须对g.selectDone进行CAS才能赢得唤醒竞争
 isSelect bool
  //success表示channel c上的通信是否成功。如果goroutine因为在通道c上传递了一个值而被唤醒,则为true;
  //如果因为channel c关闭而被唤醒,则为false
 success bool
  
 parent   *sudog // semaRoot binary tree
 waitlink *sudog // g.waiting list or semaRoot
 waittail *sudog // semaRoot
 c        *hchan // channel
}

The next, prev, and parent fields form a balanced tree, and waitlink and waittail form a linked list structure of the same semaphore address.
Main dependent functions:

//go:linkname sync_runtime_Semacquire sync.runtime_Semacquire
func sync_runtime_Semacquire(addr *uint32) {
	semacquire1(addr, false, semaBlockProfile, 0, waitReasonSemacquire)
}

//go:linkname sync_runtime_Semrelease sync.runtime_Semrelease
func sync_runtime_Semrelease(addr *uint32, handoff bool, skipframes int) {
	semrelease1(addr, handoff, skipframes)
}

//go:linkname sync_runtime_SemacquireMutex sync.runtime_SemacquireMutex
func sync_runtime_SemacquireMutex(addr *uint32, lifo bool, skipframes int) {
	semacquire1(addr, lifo, semaBlockProfile|semaMutexProfile, skipframes, waitReasonSyncMutexLock)
}

Semacquire1 implementation:
Get the current g and judge whether it is consistent with the actual g running on m.
Loop judge the value of the semaphore. If it is equal to 0, it will directly return false to enter the harder case; otherwise, the atomic operation *addr -= 1 succeeds, which is equivalent to Get the semaphore and return directly

func semacquire1(addr *uint32, lifo bool, profile semaProfileFlags, skipframes int, reason waitReason) {
	gp := getg()
	if gp != gp.m.curg {
		throw("semacquire not on the G stack")
	}

	// Easy case. 检查信号量大于0且CAS成功则直接返回
	if cansemacquire(addr) {
		return
	}

	// Harder case:
	//	increment waiter count
	//	try cansemacquire one more time, return if succeeded
	//	enqueue itself as a waiter
	//	sleep
	//	(waiter descriptor is dequeued by signaler)
	s := acquireSudog()    //获取一个sudog对象
	root := semtable.rootFor(addr)    //根据信号量地址hash到semtable中
	t0 := int64(0)
	s.releasetime = 0
	s.acquiretime = 0
	s.ticket = 0
	if profile&semaBlockProfile != 0 && blockprofilerate > 0 {
		t0 = cputicks()
		s.releasetime = -1
	}
	if profile&semaMutexProfile != 0 && mutexprofilerate > 0 {
		if t0 == 0 {
			t0 = cputicks()
		}
		s.acquiretime = t0
	}
	for {
		lockWithRank(&root.lock, lockRankRoot)
		// Add ourselves to nwait to disable "easy case" in semrelease.
		root.nwait.Add(1)
		// Check cansemacquire to avoid missed wakeup.
		if cansemacquire(addr) {
			root.nwait.Add(-1)
			unlock(&root.lock)
			break
		}
		// Any semrelease after the cansemacquire knows we're waiting
		// (we set nwait above), so go to sleep.
		root.queue(addr, s, lifo)
		goparkunlock(&root.lock, reason, traceEvGoBlockSync, 4+skipframes)
		if s.ticket != 0 || cansemacquire(addr) {
			break
		}
	}
	if s.releasetime > 0 {
		blockevent(s.releasetime-t0, 3+skipframes)
	}
	releaseSudog(s)
}

acquireSudog source location runtime/proc.go

//go:nosplit
func acquireSudog() *sudog {
      // 设置禁止抢占
	mp := acquirem()
	pp := mp.p.ptr()
//当前本地sudog缓存没有了,则去全局缓存中拉取一批
	if len(pp.sudogcache) == 0 {
		lock(&sched.sudoglock)
		// First, try to grab a batch from central cache.
// 首先尝试从全局缓存中获取sudog,直到本地容量达到50%
		for len(pp.sudogcache) < cap(pp.sudogcache)/2 && sched.sudogcache != nil {
			s := sched.sudogcache
			sched.sudogcache = s.next
			s.next = nil
			pp.sudogcache = append(pp.sudogcache, s)
		}
		unlock(&sched.sudoglock)
		// If the central cache is empty, allocate a new one.
		if len(pp.sudogcache) == 0 {
			pp.sudogcache = append(pp.sudogcache, new(sudog))
		}
	}
	n := len(pp.sudogcache)
	s := pp.sudogcache[n-1]
	pp.sudogcache[n-1] = nil
	pp.sudogcache = pp.sudogcache[:n-1]
	if s.elem != nil {
		throw("acquireSudog: found s.elem != nil in cache")
	}

  //解除抢占限制
	releasem(mp)
	return s
}

Here sudog acquisition uses the second-level cache, that is, the P local sudog cache and the global sched global sudog cache. When the local sudog cache is insufficient, it will be obtained from the global cache; if there is no global cache, a new sudog will be reassigned.

Increment nwait to avoid fast path in semrelease

Check cansemacquire again to avoid missing wakeup, nwait-1 if successful and return

Encapsulate the current g into sudog and put it in the waiting queue

// queue adds s to the blocked goroutines in semaRoot.
func (root *semaRoot) queue(addr *uint32, s *sudog, lifo bool) {
 s.g = getg()
 s.elem = unsafe.Pointer(addr)
 s.next = nil
 s.prev = nil
 
 var last *sudog
 pt := &root.treap
 for t := *pt; t != nil; t = *pt {
    //说明存在相同地址的节点
  if t.elem == unsafe.Pointer(addr) {
   // Already have addr in list.
   if lifo {//先进先出的话 将新节点放到链表的第一位
    // 用s将t替换掉 
    *pt = s
    s.ticket = t.ticket
    s.acquiretime = t.acquiretime
    s.parent = t.parent
    s.prev = t.prev
    s.next = t.next
    if s.prev != nil {
     s.prev.parent = s
    }
    if s.next != nil {
     s.next.parent = s
    }
    // 将t放入到s的等待链表的第一位
    s.waitlink = t
    s.waittail = t.waittail
    if s.waittail == nil {
     s.waittail = t
    }
    t.parent = nil
    t.prev = nil
    t.next = nil
    t.waittail = nil
   } else {
    // 将s放到等待列表的末尾
    if t.waittail == nil {
     t.waitlink = s
    } else {
     t.waittail.waitlink = s
    }
    t.waittail = s
    s.waitlink = nil
   }
   return
  }
  last = t
    // 根据地址大小来进行查找
  if uintptr(unsafe.Pointer(addr)) < uintptr(t.elem) {
   pt = &t.prev
  } else {
   pt = &t.next
  }
 }
 // 将s作为一个新的叶子节点加入到唯一地址树中
 // 平衡树是一个treap树,使用ticket作为随机堆优先级
 // 也就是说,它是根据elem地址排序的二叉树
 // 但是在代表这些地址的可能的二叉树空间中,是通过ticket满足s.ticket均 <=s.prev.ticket 和 s.next.ticket来维护堆
  // 的顺序,从而平均得保持平衡。
 // https://en.wikipedia.org/wiki/Treap
 // https://faculty.washington.edu/aragon/pubs/rst89.pdf
 // s.ticket在几个地方与零比较,因此设置了最低位
 // 这不会明显影响treap的质量
 s.ticket = fastrand() | 1
 s.parent = last
 *pt = s
 
 // 根据ticket翻转树
 for s.parent != nil && s.parent.ticket > s.ticket {
  if s.parent.prev == s {
   root.rotateRight(s.parent)
  } else {
   if s.parent.next != s {
    panic("semaRoot queue")
   }
   root.rotateLeft(s.parent)
  }
 }
}
 

The tree structure that enters the team is a treap, treap=tree+heap, that is, it has the characteristics of tree and the characteristics of heap.
The main idea is to give each node a random weight (here is a random value ticket) on the basis of the binary search tree, and then reorganize all nodes according to the weight without destroying the nature of the binary search tree through rotation.
Make it satisfy the nature of the heap. Due to the randomness of the weight, it can be considered that the treap can be relatively balanced under the addition and deletion operations, and will not degenerate into a linked list.

sudog is allocated from a special pool. Use acquired Sudog and releaseSudog to allocate and release them.
The next step is to release releaseSudog:
In order to ensure the reuse of sudog, when the goroutine is awakened, the current sudog needs to be recycled into the cache for subsequent use.
I just mentioned that the second-level cache of P and sched is involved here. So when sudog is returned, if the local sudog is full, half of the local cache will be returned to the global cache.

//go:nosplit
func releaseSudog(s *sudog) {
... ...
 gp := getg()
 if gp.param != nil {
  throw("runtime: releaseSudog with non-nil gp.param")
 }
 mp := acquirem() // 设置P禁止抢占
 pp := mp.p.ptr()
 if len(pp.sudogcache) == cap(pp.sudogcache) {
  // 将本地一半的sudog缓存放回全局缓存
  var first, last *sudog
  for len(pp.sudogcache) > cap(pp.sudogcache)/2 {
   n := len(pp.sudogcache)
   p := pp.sudogcache[n-1]
   pp.sudogcache[n-1] = nil
   pp.sudogcache = pp.sudogcache[:n-1]
   if first == nil {
    first = p
   } else {
    last.next = p
   }
   last = p
  }
  lock(&sched.sudoglock)
  last.next = sched.sudogcache
  sched.sudogcache = first
  unlock(&sched.sudoglock)
 }
 pp.sudogcache = append(pp.sudogcache, s)
 releasem(mp)
}

Unlock after the lock is completed: mainly call runtime_Semrelease:
take the modulo according to the semaphore address offset &semtable[(uintptr(unsafe.Pointer(addr))>>3)%semTabSize].root gets semaRoot

The semaphore atomic increment +1, so the goroutine blocked by semacquire1 may pass the cansemacquire operation

Determine whether the value of root.nwait is 0 through atoms, and if it is 0, it means that there is no currently blocked goroutine. The check here must occur in atomic.Xadd(&root.nwait, 1) in semacquire1 to prevent missing wakeup operations.

Check the value of root.nwait again after locking, and return directly if there is no blocked goroutine.

Otherwise, dequeue sudog on the current semaphore from the treap.

//go:linkname sync_runtime_Semrelease sync.runtime_Semrelease
func sync_runtime_Semrelease(addr *uint32, handoff bool, skipframes int) {
 semrelease1(addr, handoff, skipframes)
}
func semrelease1(addr *uint32, handoff bool, skipframes int) {
 root := semroot(addr)
 atomic.Xadd(addr, 1)
 
 // 快速路径:没有等待者?
 // 检查必须发生在xadd之后,避免错过wakeup
 // (详见semacquire中的循环).
 if atomic.Load(&root.nwait) == 0 {
  return
 }
 
 //查找一个等待着并唤醒它
 lockWithRank(&root.lock, lockRankRoot)
 if atomic.Load(&root.nwait) == 0 {
    //计数已经被其他goroutine消费,所以不需要唤醒其他goroutine
  unlock(&root.lock)
  return
 }
 s, t0 := root.dequeue(addr)//查找第一个出现的addr
 if s != nil {
  atomic.Xadd(&root.nwait, -1)
 }
 unlock(&root.lock)
 if s != nil { // 可能比较慢 甚至被挂起所以先unlock
  acquiretime := s.acquiretime
  if acquiretime != 0 {
   mutexevent(t0-acquiretime, 3+skipframes)
  }
  if s.ticket != 0 {
   throw("corrupted semaphore ticket")
  }
  if handoff && cansemacquire(addr) {
   s.ticket = 1
  }
    readyWithTime(s, 5+skipframes) //goready(s.g,5)标记runnable 等待被重新调度
  if s.ticket == 1 && getg().m.locks == 0 {
   // 直接切换G
   // readyWithTime已经将等待的G作为runnext放到当前的P
      // 我们现在调用调度器可以立即执行等待的G
      // 注意waiter继承了我们的时间片:这是希望避免在P上无限得进行激烈的信号量竞争
   // goyield类似于Gosched,但是它是发送“被强占”的跟踪事件,更重要的是,将当前G放在本地runq
      // 而不是全局队列。
      // 我们仅在饥饿状态下执行此操作(handoff=true),因为非饥饿状态下,当我们yielding/scheduling时,
      // 其他waiter可能会获得信号量,这将是浪费的。我们等待进入饥饿状体,然后开始进行ticket和P的手递手交接
   // See issue 33747 for discussion.
   goyield()
  }
 }
}

Find the first goroutine in semaRoot that is blocked on the specified semaphore addr. After getting familiar with treap structure and queue logic, dequeue here is relatively simple:

Find the sudog node specified addr in the treap

If the length of the linked list is greater than 1, pop the head node and return the popped sudog

If the length of the linked list is equal to 1, the node of the treap tree needs to be removed. At this time, the node needs to be balanced according to the weight through circular rotation, and the target node is rotated into a leaf node, and then deleted

Returns nil,0 if not found

If found, judge the waiting list of the node

// 如果semacquire1中设置了对sudog进行概要分析,dequeue计算到现在为止唤醒goroutine的时间作为now返回,否则now值为0
func (root *semaRoot) dequeue(addr *uint32) (found *sudog, now int64) {
 ps := &root.treap
 s := *ps
 for ; s != nil; s = *ps {
  if s.elem == unsafe.Pointer(addr) {//查找到指定信号量地址上的sudog
   goto Found
  }
  if uintptr(unsafe.Pointer(addr)) < uintptr(s.elem) {
   ps = &s.prev
  } else {
   ps = &s.next
  }
 }
 return nil, 0
 
Found:
 now = int64(0)
 if s.acquiretime != 0 {
  now = cputicks()
 }
 if t := s.waitlink; t != nil {
  // 用t替换唯一addrs的根树中的s
  *ps = t
  t.ticket = s.ticket
  t.parent = s.parent
  t.prev = s.prev
  if t.prev != nil {
   t.prev.parent = t
  }
  t.next = s.next
  if t.next != nil {
   t.next.parent = t
  }
  if t.waitlink != nil {
   t.waittail = s.waittail
  } else {
   t.waittail = nil
  }
  t.acquiretime = now
  s.waitlink = nil
  s.waittail = nil
 } else {//该信号量地址上 只有一个sudog时
  // 将s旋转为树的叶子节点方便移除,同时注意权重
  for s.next != nil || s.prev != nil {
   if s.next == nil || s.prev != nil && s.prev.ticket < s.next.ticket {
    root.rotateRight(s)
   } else {
    root.rotateLeft(s)
   }
  }
  // s当前为叶子节点,移除s
  if s.parent != nil {
   if s.parent.prev == s {//为根节点的左孩子
    s.parent.prev = nil
   } else {//为根节点的右孩子
    s.parent.next = nil
   }
  } else {//当前treap只有s一个节点
   root.treap = nil
  }
 }
 s.parent = nil
 s.elem = nil
 s.next = nil
 s.prev = nil
 s.ticket = 0
 return s, now
}

If the dequeued sudog is not empty, set the root.nwait atom -1, and release the lock (), so that other goroutines can continue to execute

readyWithTime wakes up g in sudog and puts it in the next execution position of the current P local queue

func readyWithTime(s *sudog, traceskip int) {
 if s.releasetime != 0 {
  s.releasetime = cputicks()
 }
 goready(s.g, traceskip)
}
func goready(gp *g, traceskip int) {
 systemstack(func() { //切换到系统堆栈
  ready(gp, traceskip, true)
 })
}
// 标记 gp准备run
func ready(gp *g, traceskip int, next bool) {
 if trace.enabled {
  traceGoUnpark(gp, traceskip)
 }
 
 status := readgstatus(gp)
 
 // Mark runnable.
 _g_ := getg()
 mp := acquirem() // 设置禁止P抢占
 if status&^_Gscan != _Gwaiting {
  dumpgstatus(gp)
  throw("bad g->status in ready")
 }
 
 // status is Gwaiting or Gscanwaiting, make Grunnable and put on runq
 casgstatus(gp, _Gwaiting, _Grunnable)
  // 将g放到P的本地队列,注意这里next=true即放到本地队列的下一个执行位置
  // 否则放到对尾
 runqput(_g_.m.p.ptr(), gp, next)
 wakep()
 releasem(mp)//解除抢占
}

In the starvation state, call goyield() to yield the current time slice, and the waiting g inherits the time slice, avoiding infinite competition for the semaphore. Because readyWithTime has put the waiting G into the next position of P's local queue, the scheduler will execute sg immediately

func goyield() {
 checkTimeouts()
 mcall(goyield_m)
}
func goyield_m(gp *g) {
 if trace.enabled {
  traceGoPreempt()
 }
 pp := gp.m.p.ptr()
 casgstatus(gp, _Grunning, _Grunnable)//让出时间片
 dropg()
 runqput(pp, gp, false)//将当前g放到P本地队列尾部
 schedule()//触发调度
}

Semacquire and semrelease are paired to implement simple sleep and wakeup primitives. It mainly solves the problem of resource contention in concurrent scenarios. Obviously, they must occur in scenarios executed on two different ms. Let us assume that m1 and m2

  1. When g1 on m1 executes to semacquire1, if the fast path cansemacquire succeeds, it means that g1 has grabbed the lock and can continue to execute. But once it fails and still cannot grab the lock under Harder Case, it will enter goparkunlock, put the current g1 in the waiting queue, and then let m1 switch and execute other g.

  2. When g2 on m2 starts to call semrelease1, put the waiting g1 back into the local scheduling queue of P. If it is currently in starvation mode (handoff=true), let the current waiting inheritance time slice be executed immediately. If it succeeds, it will be returned in semacquire1 sudog.

Guess you like

Origin blog.csdn.net/weixin_56766616/article/details/129956072