A Preliminary Study of Go Concurrency

1 Introduction

Here we will not go into details on the synchronization method of processes and threads, the relationship between parallel programs and concurrent programs, or the overhead of multi-process and multi-threading, and directly cut into the concurrent thread model and internal implementation of Go language.

(1) Multi-threading model

The multithreading model is the different connection methods of user-level threads and kernel-level threads.

① Many-to-one model (M : 1)

Map multiple user-level threads to a kernel-level thread, and thread management is done in user space.
In this mode, user-level threads are invisible (ie, transparent) to the operating system.
Advantages: The advantage
    of this model is that thread context switches all occur in user space, avoiding mode switches and thus having a positive impact on performance.

shortcoming:

    All threads are based on a kernel scheduling entity, the kernel thread, which means that only one processor can be utilized, which is unacceptable in a multiprocessing environment. Essentially, user threads only solve the concurrency problem, but not Parallel issues.

If a thread falls into kernel mode due to I/O operation, and the kernel mode thread blocks waiting for I/O data, all threads will be blocked, and user space can also use non-blocking I/O, but it still has performance and complexity degree issue.

② One-to-one model (1:1)

Map each user-level thread to a kernel-level thread.
Each thread is scheduled independently by the kernel scheduler, so if one thread blocks, it does not affect other threads.

advantage:

    With the support of multi-core processor hardware, the kernel space thread model supports true parallelism. When one thread is blocked, another thread is allowed to continue executing, so the concurrency capability is strong.

shortcoming:

    Every time a user-level thread is created, a kernel-level thread needs to be created to correspond to it, so the overhead of creating a thread is relatively large, which will affect the performance of the application.

③ Many-to-many model (M : N)

The ratio of the number of kernel threads to user threads is M: N, and the kernel user space combines the advantages of the first two.

This model requires the kernel thread scheduler and the user space thread scheduler to interoperate. In essence, multiple threads are bound to multiple kernel threads, which makes most thread context switching occur in user space, while multiple threads are bound to multiple kernel threads. Kernel threads in turn can make full use of processor resources.

(2) Scheduling implementation of concurrency mechanism   

The coroutine in golang implements the M:N threading model. The built-in scheduler in golang allows each CPU in the multi-core CPU to execute a coroutine. To understand the principle of the goroutine mechanism, the key is to understand the implementation of the Go language scheduler.

① How the scheduler works

There are four important structures that support the entire scheduler implementation in the Go language, namely G, P, M, and scheduler. The
first three are defined in runtime.h, and the scheduler is defined in proc.c.

G is the core structure of goroutine implementation, it contains stack, instruction pointer, and other information important for scheduling goroutine, such as its blocked channel.

The P structure is a Processor. Its main purpose is to execute goroutines. It maintains a goroutine queue, that is, runqueue. The Processor is the important part that gets us from N:1 scheduling to N:M scheduling.

The M structure is Machine, the system thread, which is managed by the operating system, and the goroutine runs on top of M; M is a large structure that maintains the small object memory cache (mcache), the currently executing goroutine, and random numbers. generator and so much more information.

The Sched structure is the scheduler, which maintains the queues that store M and G and some state information of the scheduler.

Here we introduce the runtime.GOMAXPROC function, which specifies the number of P instead of the number of M, which needs attention.

2,G

A G is equivalent to a Goroutine, which also corresponds to an anonymous or named function that we want to execute concurrently using the go statement. We use the go statement to submit a concurrent execution task to the runtime system of the Go language, and the runtime system of the Go language When the system will execute and complete this task concurrently according to our requirements.

Such a structure is defined in runtime.h

struct G
{
    uintptr stackguard; // The lower bound of the free space of the segment stack
    uintptr stackbase; // stack base address of segment stack
    Gobuf sched; //When the process is switched, use the sched field to save the context
    uintptr    stack0;
    FuncVal* fnstart; // function run by goroutine
    void* param; // used to pass parameters, other goroutines set param when sleeping, and this goroutine can get it when it wakes up
    int16 status; // 状态 Gidle, Grunnable, Grunning, Gsyscall, Gwaiting, Gdead
    int64 goid; // goroutine's id number
    G*    schedlink;
    M*    m;        // for debuggers, but offset not hard-coded
    M* lockedm; // G is locked and can only run on this m
    uintptr gopc; // The pc of the go expression that created this goroutine
    ...
};

3,P

P is the key that enables G to operate in M. The runtime system of Go language will allow P to establish or disconnect associations with different Ms in a timely manner, so that the runnable Gs in P can obtain the running time in time when needed.

As mentioned earlier, the maximum number of P indirectly owned by a single Go program can be changed via runtime.GOMAXPROCS. Each P needs to be associated with an M in order for the runnable G in it to be executed. When M is blocked due to a system call, the runtime system separates M from its associated P. At this time, if there are still unrun Gs in the runnable G queue of this P, then the runtime system will find an idle M, or create a new M, and associate with this P to meet the running needs of these Gs . (The maximum value of p is 256)

struct P
{
    Lock;
    uint32    status;  // Pidle、Prunning、Psyscall、Pgcstop和Pdead
    P*    link;
    uint32 schedtick; // increment it each time it is scheduled
    M* m; // link to its associated M (nil if idle)
    MCache * mcache;

    G * runq [256];
    int32    runqhead;
    int32    runqtail;

    // Available G's (status == Gdead)
    G*    gfree;
    int32    gfreecnt;
    byte    pad[64];
};
Container: P's runnable G queue, P's free G queue

4,M

An M represents a kernel thread, and the reason for creating an M is that there are not enough M to associate with P and run the runnable G in it.

M is added to the global M list when it is created. Next, its start function and the P to be associated with are set. Finally, the runtime system creates and associates a new kernel thread for them. Get ready for G.

struct M
{
    G* g0; // goroutine with scheduling stack
    G* gsignal; // signal-handling G goroutine that handles the signal
    void    (*mstartfn)(void);
    G* curg; // currently running goroutine in M
    P* p; // Associate P to execute Go code (P is nil if no Go code is executed)
    P*    nextp;
    int32    id;
    int32 mallocing; //status
    int32    throwing;
    int32 gcing;
    int32    locks;
    int32 helpgc; //Not 0 means that m is doing help gc. helpgc equals n is just a number
    bool    blockingsyscall;
    bool    spinning;
    Note    park;
    M* alllink; // This field is used to link allm
    M*    schedlink;
    MCache * mcache;
    G*    lockedg;
    M*    nextwaitm;    // next M waiting for lock
    GCStats gcstats;
    ...
};
g0 is a goroutine with a scheduling stack, which is a special goroutine. The stack of a normal goroutine is a growable stack allocated on the heap, and the stack of g0 is the stack of the thread corresponding to M.

5. Scheduler

Scheduler container: scheduler's idle M list, scheduler's idle P list, scheduler's runnable G queue, and scheduler's free G list

Runtime system containers: global M-list, global P-list, and global G-list.

(1) Scheduling details

We represent Machine Processor and Goroutine with triangles, rectangles and circles, respectively.

our_cast

In the scenario of a single-core processor, all goroutines run in the same M system thread, and each M system thread maintains one Processor. At any time, there is only one goroutine in a Processor, and other goroutines wait in the runqueue. After a goroutine runs its own time slice, it gives up the context and returns to the runqueue.

in_motion

In the case of multi-core processors, in order to run goroutines, each M system thread will hold a Processor.

Under normal circumstances, the scheduler will schedule according to the above process, but the thread will block and so on. Let's take a look at the goroutine's handling of thread blocking and so on.

① Thread blocking

When the running goroutine is blocked, such as making a system call, another system thread (M1) will be created, the current M thread gives up its Processor, and P goes to the new thread to run.
syscall

② runqueue execution completed

When the runqueue of one of the Processors is empty, no goroutines can be scheduled. It will steal half of the goroutines from another context.
steal


6. Summary


7. Reference

http://morsmachine.dk/go-scheduler

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324722328&siteId=291194637