Interpretation of source code of libco coroutine library

Coroutines, also known as user-level threads, are scheduled at the application layer, which can reduce the time for thread switching blocked by calling system calls. There are currently many implementations of coroutines, because WeChat uses a lot of its direct research. The libco coroutine library, so I chose Tencent's open source libco coroutine library for research and learn the basic ideas of coroutines.

1. Basic Principles

Coroutines can essentially be seen as subroutines and functions. A thread can run multiple coroutines, but only one coroutine can run at the same time. The switching of the coroutine on the thread is due to the blocking operation or the initiative to give up the right to use the thread. For example, there are 10 coroutines, the current thread is running coroutine 1, and then coroutine 1 performs a recv blocking operation. The coroutine scheduler can detect this operation, switch out coroutine 1, and schedule coroutine 2. come in and execute. If there is no coroutine scheduler, at this time, coroutine 1 will block due to calling the recv system call and the data has not arrived, and go to sleep. At this time, the operating system will switch threads and schedule other threads to execute, and thread switching is very Time-consuming, as high as tens of microseconds (20us in the colleague's test), even if the newly executed thread is related to user tasks, the user tasks will consume tens of microseconds of thread switching. However, if coroutines are used, the switching between coroutines only takes a few hundred nanoseconds (the colleague's test is 0.35us, or 350 nanoseconds), which takes very little time. This is where coroutines come into play.

The source code part of libco is explained below. There is an article: C++ open source coroutine library libco-principle and application.pdf, which explains the principle of libco in depth, and it is not boring. I highly recommend readers to read this article first.

Since libco is an asymmetric coroutine mechanism, if you switch from the current coroutine A to the coroutine B, but the coroutine B does not switch to the next coroutine, after the execution of the coroutine B, it will return to the coroutine A for execution.

2, libco basic framework

The basic framework in libco is as follows (quoted from the C/C++ coroutine library libco: How WeChat Completes Asynchronous Transformation Beautifully ):

The coroutine interface layer implements the basic source language of coroutines. Simple interfaces such as co_create and co_resume are responsible for coroutine creation and recovery. The co_cond_signal class interface can create a coroutine semaphore between coroutines, which can be used for synchronous communication between coroutines.

The system function Hook layer is mainly responsible for the conversion from synchronous API to asynchronous execution in the system. For common synchronous network interfaces, the Hook layer will register this network request as an asynchronous event, and then wait for the event-driven layer to wake up and execute.

The event-driven layer implements a simple and efficient asynchronous network framework, which includes events and timeout callbacks required by the asynchronous network framework. For requests from the Hook layer of the synchronous system function, event registration and callback are essentially the yield and recovery of the coroutine.

This article gives readers a general understanding of the framework and principles of libco coroutines by explaining several main functions of the interface layer. The next article will explain how libco handles event loops.

Below we analyze several main coroutine functions one by one.

3. Main function source code analysis

co_create First, let's open the function created by the coroutine. The source code is as follows:

int co_create( stCoRoutine_t **ppco,const stCoRoutineAttr_t *attr,pfn_co_routine_t pfn,void *arg )
{
	if( !co_get_curr_thread_env() ) 
	{
		co_init_curr_thread_env();
	}
	stCoRoutine_t *co = co_create_env( co_get_curr_thread_env(), attr, pfn,arg );
	*ppco = co;
	return 0;
}
void co_init_curr_thread_env()
{
	pid_t pid = GetPid();	
	g_arrCoEnvPerThread[ pid ] = (stCoRoutineEnv_t*)calloc( 1,sizeof(stCoRoutineEnv_t) );
	stCoRoutineEnv_t *env = g_arrCoEnvPerThread[ pid ];

	env->iCallStackSize = 0;
	struct stCoRoutine_t *self = co_create_env( env, NULL, NULL,NULL );
	self->cIsMain = 1;

	env->pending_co = NULL;
	env->occupy_co = NULL;

	coctx_init( &self->ctx );

	env->pCallStack[ env->iCallStackSize++ ] = self;

	stCoEpoll_t *ev = AllocEpoll();
	SetEpoll( env,ev );
}

The first line of co_create() is the judgment that the current thread initializes the environment variable. If the environment is not initialized, then calling co_init_curr_thread_env() to initialize the environment will generate the first coroutine env- of the current environment g_arrCoEnvPerThread[ GetPid() ]. >pCallStack, its cIsMain flag is 1, iCallStackSize indicates the number of coroutine layers, currently there is only 1 layer, the AllocEpoll() function initializes the two lists of pstActiveList and pstTimeoutList of the current environment env, which respectively record the active coroutine and Timeout coroutine. The environment initialization operation is performed only once in a thread. After the initialization is completed, co_create_env() will be called to create a new coroutine. The env field in the structure of the new coroutine always points to the current coroutine environment g_arrCoEnvPerThread[ GetPid() ]. After the new coroutine is created, nothing is done.

co_resume
```
void co_resume( stCoRoutine_t *co )
{
	stCoRoutineEnv_t *env = co->env;
	stCoRoutine_t *lpCurrRoutine = env->pCallStack[ env->iCallStackSize - 1 ];
	if( !co->cStart )
	{
		coctx_make( &co->ctx,(coctx_pfn_t)CoRoutineFunc,co,0 );
		co->cStart = 1;
	}
	env->pCallStack[ env->iCallStackSize++ ] = co;
	co_swap( lpCurrRoutine, co );
}
```
The co_resume() function is a function that switches coroutines, and can also be called a function that starts coroutines. The first line of the co_resume() function gets the coroutine environment env of the current thread, and the second line gets the currently executing coroutine, that is, the coroutine that is about to be switched out. Next, determine whether the coroutine co to be switched has been switched. If not, prepare a context for co, and set the cStart field to 1. The context prepared for co here is in the coctx_make() function. This function assigns the function pointer CoRoutineFunc to reg[0] of co->ctx. When the context is switched in the future, it can switch to the point pointed to by reg[0]. Address to execute. After preparing the context of co, push the coroutine co to be switched onto the stack and place it at the top of the coroutine stack of the coroutine environment env, indicating that the current latest coroutine is co. Note that this does not mean that only the top of the coroutine stack is co. There may be some locations in the stack where co is also stored. Finally, call co_swap(), which switches the context of the coroutine to the context of co, and executes it in the function specified by co. The previously switched coroutine is suspended until co actively yields and gives up the cpu, The execution of the switched out coroutine will be resumed. Note that all the coroutines here are executed in the current coroutine, that is, all the coroutines are executed serially. After calling co_resume(), execute The context jumps into the code space of co. Because co_swap() waits for co to give up the cpu before returning, and co's coroutine may resume a new coroutine to continue executing, so the co_swap() function call may take a long time to return.
```
void co_swap(stCoRoutine_t* curr, stCoRoutine_t* pending_co)
{
 	stCoRoutineEnv_t* env = co_get_curr_thread_env();

	//get curr stack sp
	char c;
	curr->stack_sp= &c;

	if (!pending_co->cIsShareStack)
	{
		env->pending_co = NULL;
		env->occupy_co = NULL;
	}
	else 
	{
		env->pending_co = pending_co;
		//get last occupy co on the same stack mem
		stCoRoutine_t* occupy_co = pending_co->stack_mem->occupy_co;
		//set pending co to occupy thest stack mem;
		pending_co->stack_mem->occupy_co = pending_co;

		env->occupy_co = occupy_co;
		if (occupy_co && occupy_co != pending_co)
		{
			save_stack_buffer(occupy_co);
		}
	}

	//swap context
	coctx_swap(&(curr->ctx),&(pending_co->ctx) );
	//stack buffer may be overwrite, so get again;
	stCoRoutineEnv_t* curr_env = co_get_curr_thread_env();
	stCoRoutine_t* update_occupy_co =  curr_env->occupy_co;
	stCoRoutine_t* update_pending_co = curr_env->pending_co;
	
	if (update_occupy_co && update_pending_co && update_occupy_co != update_pending_co)
	{
		//resume stack buffer
		if (update_pending_co->save_buffer && update_pending_co->save_size > 0)
		{
			memcpy(update_pending_co->stack_sp, update_pending_co->save_buffer, update_pending_co->save_size);
		}
	}
}
```
In the co_swap() function code, since libco is not a shared stack mode, that is, pending_co->cIsShareStack is 0, the if branch is executed, and then coctx_swap() is executed. This is a piece of assembly source code, and the content is to jump from the context of curr Go to the context of pending_co and execute it by calling back the CoRoutineFunc() function. At this time, the cpu of the current thread has started to execute the code in the pending_co coroutine, and the code below coctx_swap() will not be executed until pending_co actively gives up the cpu. update_occupy_co is NULL, and the following if statement is not executed, so it is equivalent to no code below coctx_swap(), and returns directly to the curr coroutine.
co_yield
co_yield() and co_yield_ct() have the same function, both of which make the current coroutine give up the cpu.
```
void co_yield_env( stCoRoutineEnv_t *env )
{
	
	stCoRoutine_t *last = env->pCallStack[ env->iCallStackSize - 2 ];
	stCoRoutine_t *curr = env->pCallStack[ env->iCallStackSize - 1 ];

	env->iCallStackSize--;

	co_swap( curr, last);
}
```
The second line in the co_yield_env() function obtains the currently executed coroutine, that is, the stack top of the coroutine stack of the current coroutine environment, and the first line of the function obtains the second top of the coroutine stack, which is the last switched one. Coroutine last, it can also be seen from here that libco's coroutine gives up cpu, only to the coroutine that was switched out last time. The last line is the co_swap() function, as mentioned earlier, this function will enter the last coroutine Execute the code in the context of the process, that is, go back to the place of co_swap() inside the co_resume() function last time, and continue to go down.
When the coroutine ends normally, the CoRoutineFunc() function will continue to be executed, and the coroutine's cEnd is set to 1, which means that it has ended, and executes co_yield_env() once, yields the cpu, and switches back to the coroutine that was yielded last time to continue execution.
Here is a point that I didn’t quite understand before, and I suspected that a stack overflow would occur. That is, after calling co_yield_env() and entering co_swap(), call coctx_swap() to switch to the context of the last coroutine, then the variables in the co_swap() function of the current coroutine are all on the stack space , after switching to the context of the last coroutine, those variables are still on the stack space and will not be destroyed until they return to the coroutine of the main function, but they are not destroyed. In fact, this is a misunderstanding. These variables are not actually on the stack space, but in the general-purpose registers of the CPU. After calling coctx_swap(), these register variables will be saved to the stack space of the current coroutine. A piece of heap space from the co_create() function malloc. This is because the cpu has a large number of working registers and fewer local variables, and the variables of the co_swap() function are local variables, which are directly stored in the cpu's working registers, and the role of coctx_swap() is to use the CPU's various general-purpose variables. The registers are saved to the positions of regs[1] ~ regs[6] of the coctx_t structure, and then the contents of regs[1] ~ regs[6] of the coctx_t structure of the last coroutine are loaded into the current general-purpose registers, and the cpu will be executed. The execution order is switched to the last coroutine to execute.
co_release
The function of co_release() is relatively simple, that is, to release resources
```
void co_release( stCoRoutine_t *co )
{
	if( co->cEnd )
	{
		free( co );
	}
}
```

co_self The
co_self() function is to obtain the currently executing coroutine, as long as the coroutine at the top of the thread stack of the current coroutine environment is obtained.

stCoRoutine_t *co_self()
{
	return GetCurrThreadCo();
}
stCoRoutine_t *GetCurrThreadCo( )
{
	stCoRoutineEnv_t *env = co_get_curr_thread_env();
	if( !env ) return 0;
	return GetCurrCo(env);
}
stCoRoutine_t *GetCurrCo( stCoRoutineEnv_t *env )
{
	return env->pCallStack[ env->iCallStackSize - 1 ];
}

co_enable_hook_sys
libco encapsulates system calls, and adds a layer of hooks in front of system calls, such as send/recv/condition_wait and other functions, with this layer of hooks, threads can be switched without blocking threads during system calls, co_enable_hook_sys() The function allows coroutine hooks, of course, it does not allow hooks, and uses native system calls directly.
```
void co_enable_hook_sys()
{
	stCoRoutine_t *co = GetCurrThreadCo();
	if( co )
	{
		co->cEnableSysHook = 1;
	}
}
```

Interpretation of source code of libco coroutine library

Guess you like