Goroutine运行时间过长而发生的抢占调度详解~

原文地址:Goroutine运行时间过长而发生的抢占调度详解~

本文主要关注以下两点:

  1. 发生抢占调度的情况。

  2. 因运行时间过长发生的抢占调度的特点。

sysmon系统监控线程会定期(10毫秒)通过retake对goroutine发起抢占。

来看runtime/proc.go文件4376行分析retake:

// forcePreemptNS is the time slice given to a G before it is// preempted.const forcePreemptNS = 10 * 1000 * 1000 // 10msfunc retake(now int64) uint32 {
   
       n := 0    // Prevent allp slice changes. This lock will be completely    // uncontended unless we're already stopping the world.    lock(&allpLock)    // We can't use a range loop over allp because we may    // temporarily drop the allpLock. Hence, we need to re-fetch    // allp each time around the loop.    for i := 0; i < len(allp); i++ { //遍历所有的P        _p_ := allp[i]        if _p_ == nil {
   
               // This can happen if procresize has grown            // allp but not yet created new Ps.            continue        }               //_p_.sysmontick用于sysmon线程记录被监控p的系统调用时间和运行时间        pd := &_p_.sysmontick        s := _p_.status        if s == _Psyscall { //P处于系统调用之中,需要检查是否需要抢占            // Retake P from syscall if it's there for more than 1 sysmon tick (at least 20us).            t := int64(_p_.syscalltick)            if int64(pd.syscalltick) != t {
   
                   pd.syscalltick = uint32(t)                pd.syscallwhen = now                continue            }            // On the one hand we don't want to retake Ps if there is no other work to do,            // but on the other hand we want to retake them eventually            // because they can prevent the sysmon thread from deep sleep.            if runqempty(_p_) &&  atomic.Load(&sched.nmspinning)+atomic.Load(&sched.npidle) > 0 && pd.syscallwhen+10*1000*1000 > now {
   
                   continue            }            // Drop allpLock so we can take sched.lock.            unlock(&allpLock)            // Need to decrement number of idle locked M's            // (pretending that one more is running) before the CAS.            // Otherwise the M from which we retake can exit the syscall,            // increment nmidle and report deadlock.            incidlelocked(-1)            if atomic.Cas(&_p_.status, s, _Pidle) {
   
                   if trace.enabled {
   
                       traceGoSysBlock(_p_)                    traceProcStop(_p_)                }                n++                _p_.syscalltick++                handoffp(_p_)            }            incidlelocked(1)            lock(&allpLock)        } else if s == _Prunning { //P处于运行状态,需要检查其是否运行得太久了            // Preempt G if it's running for too long.            //_p_.schedtick:每发生一次调度,调度器++该值            t := int64(_p_.schedtick)            if int64(pd.schedtick) != t {
   
                   //监控线程监控到一次新的调度,所以重置跟sysmon相关的schedtick和schedwhen变量                pd.schedtick = uint32(t)                pd.schedwhen = now                continue            }                       //pd.schedtick == t说明(pd.schedwhen ~ now)这段时间未发生过调度,            //所以这段时间是同一个goroutine一直在运行,下面检查一直运行是否超过了10毫秒            if pd.schedwhen+forcePreemptNS > now {
   
                   //从某goroutine第一次被sysmon线程监控到正在运行一直运行到现在还未超过10毫秒                continue            }            //连续运行超过10毫秒了,设置抢占请求            preemptone(_p_)        }    }    unlock(&allpLock)    return uint32(n)}

retake根据p两种不同状态决定是否需发起抢占调度:

  1. _Prunning表示对应的goroutine正在运行,如其运行时间超过10毫秒则需抢占。

  2. _Psyscall表示对应的goroutine正在内核执行系统调用,此时需根据多条件决定是否需抢占调度。

sysmon如监控到某goroutine连续运行超过10毫秒,则调preemptone向该goroutine发起抢占调度,来看runtime/proc.go文件4465行分析preemptone:

// Tell the goroutine running on processor P to stop.// This function is purely best-effort. It can incorrectly fail to inform the// goroutine. It can send inform the wrong goroutine. Even if it informs the// correct goroutine, that goroutine might ignore the request if it is// simultaneously executing newstack.// No lock needs to be held.// Returns true if preemption request was issued.// The actual preemption will happen at some point in the future// and will be indicated by the gp->status no longer being// Grunningfunc preemptone(_p_ *p) bool {
   
       mp := _p_.m.ptr()    if mp == nil || mp == getg().m {
   
           return false    }    //gp是被抢占的goroutine    gp := mp.curg    if gp == nil || gp == mp.g0 {
   
           return false    }    gp.preempt = true  //设置抢占标志    // Every call in a go routine checks for stack overflow by    // comparing the current stack pointer to gp->stackguard0.    // Setting gp->stackguard0 to StackPreempt folds    // preemption into the normal stack overflow check.    //stackPreempt是一个常量0xfffffffffffffade,是非常大的一个数    gp.stackguard0 = stackPreempt  //设置stackguard0使被抢占的goroutine去处理抢占请求    return true}

preemptone设置被抢占的goroutine对应g结构体中preempt为true和stackguard0为stackPreempt(stackPreempt是一常量0xfffffffffffffade,是非常大的数)就返回,没强制被抢占的goroutine停止运行。

处理定义的抢占标识函数调用链为morestack_noctxt()->morestack()->newstack()。

以程序为例:

package mainimport "fmt"func sum(a, b int) int {
   
       a2 := a * a    b2 := b * b    c := a2 + b2    fmt.Println(c)    return c}func main() {
   
       sum(1, 2)}

用gdb反汇编main结果如下:

=> 0x0000000000486a80 <+0>:   mov   %fs:0xfffffffffffffff8,%rcx     0x0000000000486a89 <+9>:   cmp   0x10(%rcx),%rsp     0x0000000000486a8d <+13>:  jbe   0x486abd <main.main+61>     0x0000000000486a8f <+15>:  sub   $0x20,%rsp     0x0000000000486a93 <+19>: mov   %rbp,0x18(%rsp)     0x0000000000486a98 <+24>: lea   0x18(%rsp),%rbp     0x0000000000486a9d <+29>: movq   $0x1,(%rsp)     0x0000000000486aa5 <+37>: movq   $0x2,0x8(%rsp)     0x0000000000486aae <+46>: callq   0x4869c0 <main.sum>     0x0000000000486ab3 <+51>: mov   0x18(%rsp),%rbp     0x0000000000486ab8 <+56>: add   $0x20,%rsp     0x0000000000486abc <+60>: retq        0x0000000000486abd <+61>: callq  0x44ece0 <runtime.morestack_noctxt>     0x0000000000486ac2 <+66>: jmp   0x486a80 <main.main>

对morestack_noctxt调用在尾部,是通过jbe过来的,来看前三条指令:

0x0000000000486a80 <+0>: mov   %fs:0xfffffffffffffff8,%rcx #main函数第一条指令,rcx = g0x0000000000486a89 <+9>: cmp   0x10(%rcx),%rsp 0x0000000000486a8d <+13>: jbe   0x486abd <main.main+61>

jbe是条件跳转指令,根据上条指令执行结果决定是否需跳转。

main首条指令就是从TLS(Go根据fs实现TLS)读取当前正在运行的g的指针放在rcx,次条指令源操作数为间接寻址,从内存读取相对于g偏移16的地址对应内容到rsp。

先来看下g结构体的定义:

type g struct {
   
       stack               stack       stackguard0    uintptr     stackguard1    uintptr     ......}type stack struct {
   
       lo uintptr  //8 bytes    hi uintptr  //8 bytes}

g的stack占16字节(lo、hi各8字节),所以g结构体起始位置加偏移量16对应stackguard0,因此main次条指令意为比较栈顶寄存器rsp和stackguard0的值,如rsp较小,表示当前g的栈快用完了,有溢出风险,需扩栈,假设main goroutine被设置了抢占标识,那么rsp会远小于stackguard0,因此stackguard0被设置抢占标记,代码就会跳到0x0000000000486abd处执行call调morestack_noctxt,该call会将紧随其后的一条指令的地址0x0000000000486ac2压入堆栈,再跳到morestack_noctxt去执行。

来看此时rsp、g、main的栈状态图:

morestack_noctxt用JMP直接跳到morestack继续执行,未使用call调morestack,所以rsp并未发生变化。

morestack执行流程类似于之前分析过的mcall,先保存调用morestack的goroutine(此场景为main goroutine)的调度信息对应的g结构体的sched中,之后切到当前工作线程的g0栈继续执行newstack。

来看runtime/asm_amd64.s文件433行分析morestack:

// morestack but not preserving ctxt.TEXT runtime·morestack_noctxt(SB),NOSPLIT,$0    MOVL  $0, DX    JMP  runtime·morestack(SB)// Called during function prolog when more stack is needed.//// The traceback routines see morestack on a g0 as being// the top of a stack (for example, morestack calling newstack// calling the scheduler calling newm calling gc), so we must// record an argument size. For that purpose, it has no arguments.TEXT runtime·morestack(SB),NOSPLIT,$0-0    ......    get_tls(CX)    MOVQ  g(CX), SI  # SI = g(main goroutine对应的g结构体变量)    ......    #SP栈顶寄存器现在指向的是morestack_noctxt函数的返回地址,    #所以下面这一条指令执行完成后AX = 0x0000000000486ac2    MOVQ  0(SP), AX    #下面两条指令给g.sched.PC和g.sched.g赋值,我们这个例子g.sched.PC被赋值为0x0000000000486ac2,    #也就是执行完morestack_noctxt函数之后应该返回去继续执行指令的地址。    MOVQ  AX, (g_sched+gobuf_pc)(SI) #g.sched.pc = 0x0000000000486ac2    MOVQ  SI, (g_sched+gobuf_g)(SI) #g.sched.g = g    LEAQ  8(SP), AX  #main函数在调用morestack_noctxt之前的rsp寄存器    #下面三条指令给g.sched.sp,g.sched.bp和g.sched.ctxt赋值    MOVQ  AX, (g_sched+gobuf_sp)(SI)    MOVQ  BP, (g_sched+gobuf_bp)(SI)    MOVQ  DX, (g_sched+gobuf_ctxt)(SI)    #上面几条指令把g的现场保存了起来,下面开始切换到g0运行    #切换到g0栈,并设置tls的g为g0    #Call newstack on m->g0's stack.    MOVQ  m_g0(BX), BX    MOVQ  BX, g(CX)  #设置TLS中的g为g0    #把g0栈的栈顶寄存器的值恢复到CPU的寄存器,达到切换栈的目的,下面这一条指令执行之前,    #CPU还是使用的调用此函数的g的栈,执行之后CPU就开始使用g0的栈了    MOVQ  (g_sched+gobuf_sp)(BX), SP    CALL  runtime·newstack(SB)    CALL  runtime·abort(SB)// crash if newstack returns    RET

切到g0前,当前goroutine的现场信息被保存到对应g结构体的sched,main下次被调度时,调度器可将g.sched.sp恢复到CPU的rsp完成栈切换,之后将g.sched.pc恢复到CPU的rip中,之后CPU继续执行callq后的【0x0000000000486ac2 <+66>: jmp   0x486a80 <main.main>】指令,此时状态如下:

来看runtime/stack.go文件899行分析newstack:

// Called from runtime·morestack when more stack is needed.// Allocate larger stack and relocate to new stack.// Stack growth is multiplicative, for constant amortized cost.//// g->atomicstatus will be Grunning or Gscanrunning upon entry.// If the GC is trying to stop this g then it will set preemptscan to true.//// This must be nowritebarrierrec because it can be called as part of// stack growth from other nowritebarrierrec functions, but the// compiler doesn't check this.////go:nowritebarrierrecfunc newstack() {
   
       thisg := getg() // thisg = g0    ......    // 这行代码获取g0.m.curg,也就是需要扩栈或响应抢占的goroutine    // 对于我们这个例子gp = main goroutine    gp := thisg.m.curg    ......    // NOTE: stackguard0 may change underfoot, if another thread    // is about to try to preempt gp. Read it just once and use that same    // value now and below.    //检查g.stackguard0是否被设置为stackPreempt    preempt := atomic.Loaduintptr(&gp.stackguard0) == stackPreempt    // Be conservative about where we preempt.    // We are interested in preempting user Go code, not runtime code.    // If we're holding locks, mallocing, or preemption is disabled, don't    // preempt.    // This check is very early in newstack so that even the status change    // from Grunning to Gwaiting and back doesn't happen in this case.    // That status change by itself can be viewed as a small preemption,    // because the GC might change Gwaiting to Gscanwaiting, and then    // this goroutine has to wait for the GC to finish before continuing.    // If the GC is in some way dependent on this goroutine (for example,    // it needs a lock held by the goroutine), that small preemption turns    // into a real deadlock.    if preempt {
   
           //检查被抢占goroutine的状态        if thisg.m.locks != 0 || thisg.m.mallocing != 0 || thisg.m.preemptoff != "" ||  thisg.m.p.ptr().status != _Prunning {
   
               // Let the goroutine keep running for now.            // gp->preempt is set, so it will be preempted next time.            //还原stackguard0为正常值,表示我们已经处理过抢占请求了            gp.stackguard0 = gp.stack.lo + _StackGuard                       //不抢占,调用gogo继续运行当前这个g,不需要调用schedule函数去挑选另一个goroutine            gogo(&gp.sched) // never return        }    }    //省略的代码做了些其它检查所以这里才有两个同样的判断    if preempt {
   
           if gp == thisg.m.g0 {
   
               throw("runtime: preempt g0")        }        if thisg.m.p == 0 && thisg.m.locks == 0 {
   
               throw("runtime: g is running but p is not")        }        ......        //下面开始响应抢占请求        // Act like goroutine called runtime.Gosched.        //设置gp的状态,省略的代码在处理gc时把gp的状态修改成了_Gwaiting        casgstatus(gp, _Gwaiting, _Grunning)               //调用gopreempt_m把gp切换出去        gopreempt_m(gp) // never return    }    ......}

newstack作用一就是扩栈,第二就是是用来响应sysmon提出的抢占请求,它先检查g.stackguard0是否被设置为stackPreempt,是的话表示sysmon已发现运行超时并提出抢占请求,做一些基本检查后如发现当前goroutine可以被抢占则调gopreempt_m完成调度,来看runtime/proc.go文件2644行分析gopreempt_m:

func gopreempt_m(gp *g) {
   
       if trace.enabled {
   
           traceGoPreempt()    }    goschedImpl(gp)}

gopreempt_m通过调goschedImpl完成实际调度切换工作。

goschedImpl先将gp状态由_Grunning改为_Grunnable,后通过dropg解除当前工作线程m与gp间的关联,再将gp放入全局运行队列等待调度器调度,最后调schedule进入下一轮调度循环。

经过这一轮分析,可知go发起抢占调度是有条件的,sysmon负责给被抢占的goroutine设置抢占标记,抢占的goroutine在入口处检查g的stackguard0再决定是否需调用morestack_noctxt,最终调到newstack完成抢占调度。

以上仅为个人观点,不一定准确,能帮到各位那是最好的。

好啦,到这里本文就结束了,喜欢的话就来个三连击吧。

以上均为个人认知,如有侵权,请联系删除。

 

 

おすすめ

転載: blog.csdn.net/luyaran/article/details/121029983