Coroutine coroutine

Alibaba opened coobjc two days ago , and it has been more than 2,000 stars in a few days. I also looked at the source code. The main concern is the implementation of coroutines. I spent two days on the weekend referring to Go ’s predecessor, libtask and Aeolus Coroutine realized part of it and read some articles to sort it out a little.

Coroutine

Coroutines are computer-program components that generalize subroutines for non-preemptive multitasking, by allowing multiple entry points for suspending and resuming execution at certain locations. Coroutines are well-suited for implementing familiar program components such as cooperative tasks, exceptions, event loops, iterators, infinite lists and pipes.

Process -> Thread -> Coroutine

Coroutine (Coroutine) compiler level, process (Process) and thread (Thread) operating system level

Process (process) and thread (Thread) are os through the scheduling algorithm, save the current context, and then restart the calculation from the place where it was last suspended, where the restart is unpredictable, the number of instructions and code run by the CPU each time CPU time is related. When the CPU time allocated to os arrives, it will be forced to suspend by os, and developers cannot control them accurately.

Coroutine is a lightweight user-mode thread that implements non-preemptive scheduling, that is, switching from the current coroutine to other coroutines is controlled by the current coroutine. The current coroutine framework is generally designed as 1: N mode. The so-called 1: N is a thread as a container to place multiple coroutines. So who will switch these coroutines in time? The answer is that some coroutines actively give up the CPU themselves, that is, there is a scheduler in each coroutine pool. This scheduler is passively scheduled. This means that he will not take the initiative to dispatch. And when a coroutine finds that it can't execute it (such as waiting asynchronously for the data from the network to come back, but there is no data yet), then the coroutine can be notified by this coroutine. At this time, the code of the scheduler is executed and the scheduler The controller finds the coroutine that currently needs the most CPU according to the scheduling algorithm designed in advance. Switch the CPU context of this coroutine to hand over the running power of the CPU to this coroutine, until the coroutine needs to wait for execution, or it calls the API that actively surrenders the CPU to trigger the next scheduling.

Pros and cons

advantage:

  • Coroutines are more lightweight, the creation cost is smaller, and memory consumption is reduced.
    Coroutines themselves can be done in user mode. The volume of each coroutine is much smaller than the thread, so a process can accommodate a considerable number of coroutines
  • The collaborative user-mode scheduler reduces the CPU context switching overhead and improves the CPU cache hit rate
    . The advantage of collaborative scheduling over preemptive scheduling is that the context switching overhead is less and it is easier to run the cache hot. Compared with multithreading, the greater the number of threads, the more obvious the performance advantage of coroutines. The process / thread switching needs to be done in the kernel, but the coroutine does not. The coroutine is realized by the user mode stack, which is more lightweight and faster. There are great advantages in heavy I / O programs. For example, in a crawler, opening hundreds of threads will significantly slow down the speed, but opening coroutines will not.

However, coroutines also give up the concept of priority of native threads. If there is a long-time computing task, because the kernel scheduler always prioritizes IO tasks, so that it will respond as soon as possible, it will affect the response delay of IO tasks. Assuming that there is a coroutine in this thread that is CPU-intensive, he does not have IO operations, that is, he will not actively trigger the process scheduled by the scheduler, then there will be cases where other coroutines cannot be executed, so in this case Programmers need to avoid it themselves.

In addition, the single-threaded coroutine solution cannot fundamentally avoid blocking, such as file operations and memory page faults, which are all factors that affect latency.

  • Reduce synchronous locking and improve performance as a whole. The
    coroutine solution is based on the event loop scheme and reduces the frequency of synchronous locking. However, if there is competition, the critical section cannot be guaranteed, so the place to be locked still needs to add a coroutine lock.
  • You can write asynchronous code according to synchronous thinking, that is, use synchronous logic and write callbacks scheduled by coroutines. It
    should be noted that coroutines can indeed reduce the use of callbacks but cannot completely replace callbacks. Instead of using coroutines in event-driven programming, callbacks are more suitable.

Disadvantages:

  • There must be no blocking operations in the execution of coroutines, otherwise the entire thread will be blocked (coroutines are at the language level, threads, and processes are at the operating system level)
  • Need to pay special attention to the use of global variables and object references
  • Coroutines can deal with the efficiency of IO-intensive programs, but dealing with CPU-intensive is not its strengths.
    Assuming that there is a coroutine in this thread that is CPU-intensive, he does not have IO operations, that is, he will not actively trigger the process scheduled by the scheduler, then there will be cases where other coroutines cannot be executed, so in this case Programmers need to avoid it themselves.

Applicable scene

  • High-performance computing sacrifices fairness in exchange for throughput. The earliest successful cases of coroutines from the field of high-performance computing, collaborative scheduling can be traded for throughput at the expense of fairness compared to preemptive scheduling
  • The task of IO Bound
    In IO intensive programs, because the IO operation is much smaller than the operation of the CPU, it often requires the CPU to wait for the IO operation. Under synchronous IO, the system needs to switch threads so that the operating system can perform other things during the IO process. In this way, although the code is in line with human thinking habits, a large amount of thread switching brings a lot of performance waste.

So people invented asynchronous IO. It is when the data arrives that triggers my callback. To reduce the performance loss caused by thread switching. But this kind of disadvantage is also great. The biggest problem is that it destroys the linear thinking mode of human beings. You must divide a logically linear process into several fragments. The start and end of each fragment are asynchronous events. Finish and start. Although you can adapt to this mode of thinking after some training, you still have to pay an extra mental burden. Corresponding to the human thinking mode, most popular programming languages ​​are imperative, and the program itself presents a roughly linear structure. Asynchronous callback not only destroys the continuity of thinking at the same time, but also destroys the continuity of the program, allowing you to spend more energy when reading the program. These factors are additional maintenance costs for a software project, so most companies do not favor asynchronous callback frameworks such as node.js or RxJava, although these frameworks can improve the concurrency of the program.

But coroutines can solve this problem well. For example, write an IO operation as a coroutine. When the IO operation is triggered, the CPU is automatically given out to other coroutines. Be aware that the switching of coroutines is very light. The coroutine preserves performance through this encapsulation of asynchronous IO and also ensures the easy writing and readability of the code.

  • Generator-style streaming computing
    eliminates Callback Hell (callback hell), uses a synchronous model to reduce development costs while retaining the benefits of more flexible control flow, such as sending three requests at the same time; at this time, using the stack economically can fully play the "light Volume "advantage.

ucontext

Coroutines generally have two types of implementation, one is stackless and the other is stackful.

structure

 

struct ucontext {
    /*
     * Keep the order of the first two fields. Also,
     * keep them the first two fields in the structure.
     * This way we can have a union with struct
     * sigcontext and ucontext_t. This allows us to
     * support them both at the same time.
     * note: the union is not defined, though.
     */
    sigset_t    uc_sigmask;  //这个上下文要阻塞的信号
    mcontext_tt uc_mcontextt;  //保存的上下文的特定机器表示,包括调用线程的特定寄存器等

    struct __ucontext *uc_link;  //指向当前的上下文结束时要恢复到的上下文
    stack_t     uc_stack;  //该上下文中使用的栈
    int     __spare__[8];  
};

getcontext

 

int getcontext(ucontext_t *ucp)

This function initializes ucpthe structure pointed to ucontext_t(used to save the context of the previous execution state) and fills in the currently valid context

setcontext

 

int setcontext(const ucontext_t *ucp)

The function restores the user context ucpto the context pointed to. Successful calls will not return. ucpThe context pointed to should be generated by getcontext () or makecontext () .

If the context is generated by getcontext () , switch to the context, and the execution of the program continues after getcontext () .
If the context is generated by makecontext () , switch to the context, and the execution of the program switches to the function with the second parameter specified by the call to makecontext () . When the function returns, we continue to pass in the context pointed to in the context of the first parameter in makecontext ()uc_link . If it is NULL, the procedure ends.
On success, getcontext () returns 0, and setcontext () does not return. On error, both return -1 and assign the appropriate errno.

makecontext

 

void makecontext(ucontext_t *ucp, void (*func)(void), int argc, ...)

ucpThe context pointed to by the function modification is the one initialized ucpby getcontext () . When this context is restored using swapcontext () or setcontext () , the execution of the program will switch to functhe call to the parameters passed by the makecontext () call . Before makecontext () makes a call, the application must ensure that the context's stack allocation has been modified. The application should ensure that the values are the same as those passed in (the parameters are all 4 bytes in value); otherwise undefined behavior will occur. When the modified context of makecontext () returns, it is used to decide whether the context is to be restored. The application needs to be initialized before calling makecontext () .argcfunc
argcfuncint
uc_linkuc_link

swapcontext

 

int swapcontext(ucontext_t *restrict oucp, const ucontext_t *restrict ucp)

The function saves the current context to oucpthe pointed data structure and sets it to ucpthe pointed context.
Upon successful completion, swapcontext () returns 0. Otherwise, it returns -1 and assigns the appropriate errno.
The swapcontext () function may fail for the following reasons:
ENOMEM ucp parameter does not have enough stack space to complete the operation

The actual use of ucontext coroutines

Encapsulate getcontext, makecontext, swapcontext into a cooperative coroutine similar to lua, it is necessary to actively yield the CPU in the code.
The stack of the coroutine uses malloc for heap allocation. The allocated space is consistent with the use of the stack in 64-bit systems. The address is decremented. The size set by uc_stack.uc_size does not seem to have much practical effect. Once in use, it exceeds the allocated heap. The size will continue to be used towards the heap with a small address. At this time, it will cause out-of-bounds use of the heap memory. Changing the data allocated on the heap before will cause various unpredictable behaviors. After the coredump, there is no actual reason. .
For the estimation of the stack size of the coroutine function, the cost of calling the local variables in all other APIs in the coroutine function will be allocated to the memory used for the coroutine, and there will be some unpredictable variables, such as calling Third-party API, there are very large variables in the third-party API, mmap can be used to allocate memory at the beginning of the actual use process, set the GUARD_PAGE for mprotect protection of the allocated memory, for memory overflow, accurately determine the location, and appropriately adjust the stack that needs to be allocated size.

ucontext cluster function learning_actual use.png

Aeolian coroutine

Fengshen coroutine is based on ucontext package

schedule

 

struct schedule {
    char stack[STACK_SIZE]; // 原来schedule里面就已经存有了stack
    ucontext_t main; // ucontext_t你可以看做是记录上下文信息的一个结构
    int nco; // 协程的数目
    int cap; // 容量
    int running; // 正在运行的coroutine的id
    struct coroutine **co; // 这里是一个二维的指针
};

coroutine

 

struct coroutine {
    coroutine_func func; // 运行的函数
    void *ud; // 参数
    ucontext_t ctx; // 用于记录上下文信息的一个结构
    struct schedule * sch; // 指向schedule
    ptrdiff_t cap; // 堆栈的容量
    ptrdiff_t size; // 用于表示堆栈的大小
    int status;
    char *stack; // 指向栈地址么?
};

coroutine_new

 

int coroutine_new(struct schedule *S, coroutine_func func, void *ud)

Create a coroutine, the coroutine will be added to the schedule coroutine list, func is the function it executes, and ud is the function of func. Returns the number of the created thread in the schedule

coroutine_yield

 

void coroutine_yield(struct schedule * S)

Suspend the coroutine currently executing in the scheduler schedule and switch to the main function.

coroutine_resume

 

void coroutine_resume(struct schedule * S, int id) { 

Resume running the coroutine with id number in scheduler schedule

coroutine_close

 

void coroutine_close(struct schedule *S)

Close all coroutines in schedule



Author: Small Ryosuke
link: https: //www.jianshu.com/p/2782f8c49b2a
Source: Jane books
are copyrighted by the author. For commercial reproduction, please contact the author for authorization, and for non-commercial reproduction, please indicate the source.

Published 13 original articles · Likes6 · Visitors 10,000+

Guess you like

Origin blog.csdn.net/majianting/article/details/103895407