Full analysis of GMP scheduling 11 scenarios

Added on the basis of reprinting

(1) Scene 1

P owns G1, M1 starts to run G1 after acquiring P. G1 uses go func() to create G2. For locality, G2 is added to P1's local queue first.
Insert picture description here

(2) Scene 2

After G1 runs (function: goexit), the goroutine running on M is switched to G0, and G0 is responsible for the switch of the coroutine during scheduling (function: schedule) . Take G2 from P's local queue, switch from G0 to G2, and start running G2 (function: execute). The reuse of thread M1 is realized.
Insert picture description here
Each M has a G0, responsible for scheduling the G in the bound P local queue

(3) Scene 3

Assume that each P's local queue can only store 3 Gs. G2 wants to create 6 Gs, the first 3 Gs (G3, G4, G5) have been added to the local queue of p1, and the local queue of p1 is full.
Insert picture description here

(4) Scene 4

When G2 created G7, it found that the local queue of P1 was full, and load balancing was required (the first half of the G in the local queue in P1, and the newly created G 转移到全局队列)

(It is not necessarily a new G in the implementation. If G is executed after G2, it will be saved in the local queue, and an old G will be used to replace the new G to join the global queue)

Insert picture description here
When these Gs are transferred to the global queue, the order will be disrupted . So G3, G7, G4, are transferred to the global queue.

(5) Scene 5

When G2 creates G8, P1's local queue is not full, so G8 will be added to P1's local queue.
Insert picture description here
The reason why G8 is added to the local queue at P1 is that P1 is bound to M1 at this time, and G2 is currently executed by M1. Therefore, the new G created by G2 will be placed on the P bound to its M first.

(6) Scene 6

Rule: When creating G, running G will try to wake up other idle combinations of P and M (must be on the premise that there is idle P) to execute.
Insert picture description here
Suppose that G2 wakes up M2, M2 binds P2, and runs G0, but the local queue of P2 does not have G, and M2 is a spinning thread at this time (no G but a running thread, constantly looking for G).

自旋线程:
M和P是绑定的状态,但P的本地队列为空,在运行G0。这时,M称为自旋线程
一般来说,自旋线程状态是短暂的一个过度状态,因为会很快从全局队列中获取G,或者全局队列为空时,触发work stealing机制。当任何地方都没有了G,也就意味着程序结束,组合绑定解除,P和M被销毁

当P的本地队列有G时,运行的G0被调度执行别的G时,自旋状态解除,称为非自旋状态

(7) Scene 7

M2 tries to take a batch of G from the global queue ("GQ") and put it on the local queue of P2 (function: findrunnable()). The number of G taken by M2 from the global queue conforms to the following formula:

n = min(len(GQ)/GOMAXPROCS + 1, len(GQ/2))

Take at least 1 g from the global queue, but don't move too many g from the global queue to the p local queue each time, leaving some points for other p. This is load balancing from the global queue to the P local queue.
Insert picture description here
Suppose there are a total of 4 Ps in our scene (GOMAXPROCS is set to 4, then we allow up to 4 Ps to be used by M). Therefore, M2 can only move the P2 local queue from the global queue with 1 G (that is, G3), and then complete the switch from G0 to G3, and run G3.

(8) Scene 8

Suppose that G2 has been running on M1. After 2 rounds, M2 has obtained G7 and G4 from the global queue to the local queue of P2 and completed the operation. The global queue and the local queue of P2 are both empty, as shown in the left half of the scene 8 section.
Insert picture description here
There is no G in the global queue, then m must perform work stealing : steal half of G from other P with G, and put it in its own P local queue. P2 takes half of the G from the tail of the local queue of P1 . In this example, half of the G is only one G8, puts it in the local queue of P2 and executes it.

(9) Scene 9

The local queues G5 and G6 of P1 have been stolen by other M and run to completion. Currently, M1 and M2 are running G2 and G8, respectively. M3 and M4 have no goroutines to run. M3 and M4 are in a spinning state, and they are constantly looking for goroutines.
Insert picture description here
Why do you want to spin m3 and m4? Spin is essentially running, and the thread is running but does not execute G, which becomes a waste of CPU. Why not destroy the scene to save CPU resources. Because creating and destroying the CPU will also waste time, we hope that when a new goroutine is created, M will run it immediately. If it is destroyed and then created, it will increase the delay and reduce the efficiency. Of course, it is also considered that too many spinning threads are a waste of CPU, so there are at most GOMAXPROCS spinning threads in the system (GOMAXPROCS=4 in the current example, so a total of 4 P), and the extra threads that do nothing will make them sleep .

(10) Scene 10

Assume that in addition to M3 and M4 as spinning threads, there are also M5 and M6 as idle threads (not bound to P. Note that we can only have 4 P at most here, so the number of P should always be M>= P , most of them are M preempting P that needs to be run ), G8 creates G9, G8 makes a blocking system call, M2 and P2 are immediately unbound, P2 will perform the following judgments: if P2 local queue has G, global queue If there is a G or a free M, P2 will immediately wake up 1 M and bind it, otherwise P2 will be added to the free P list and wait for M to obtain available p. In this scenario, the P2 local queue has G9, which can be bound to other idle threads M5.

Insert picture description here

(11) Scene 11

G8 creates G9, if G8 makes a non-blocking system call. M2 and P2 will be tied to the solution, but will remember M2 P2 , then call the G8 and M2 into the state system. When G8 and M2 exit the system call, they will try to obtain the previously bound P2 . If they cannot be obtained, then obtain the free P. If there is still not, G8 will be recorded as a runnable state and added to the global queue. M2 is not P is bound to become dormant (long dormancy waiting for GC to recycle and destroy)
Insert picture description here

summary

In summary, the Go scheduler is very lightweight and simple, enough to support goroutine's scheduling work and give Go native (powerful) concurrency capabilities. The essence of Go scheduling is to allocate a large number of goroutines to a small number of threads for execution, and use multi-core parallelism to achieve more powerful concurrency.

Guess you like

Origin blog.csdn.net/csdniter/article/details/112028411