1. The realization principle of the function stack

2. Create a coroutine context environment: coctx_make

3. The context switch of the coroutine: co_swap

When calling the co_resume function to execute a coroutine, two functions need to be called: coctx_make and co_swap. The former is used to create the context of the coroutine, and the latter is used to switch the context. These two functions are the key to the realization of the coroutine.

When introducing the context of creating a coroutine and context switching, let's first introduce the implementation principle of the function stack.

1. The realization principle of the function stack

1.1 Function stack frame

When a program is running, Linux will allocate memory for a program. The memory area includes code segment, data segment, heap, stack, etc. The address of the heap increases from low address to high address, and the stack address from high address to low address increase. The following figure shows the structure of a stack.

In the figure, there are two stack frames with specific structures drawn, namely function A and function B. There is an area identified by an ellipsis at the top of the stack frame of function A. This area saves the register value of the previous stack frame and the local variables created by function A itself. The following parameters n to parameter 1 are the calling parameters of function A to be passed to function B. So how can function B be obtained? The answer is to use registers.

When the CPU calculates, it puts many variables in registers. According to the different hardware systems, the number and functions of registers are also different. Generally in x86 32-bit, the register %esp saves the value of the top pointer of the stack frame, and %ebp saves the value of the bottom pointer of the stack frame, so the head and end of the current stack frame can be known through the %esp sum %ebp. In addition to these two registers, there are other general-purpose registers ( %eax, %edxetc.) that are used to store temporary values during program execution. When the pushl instruction is executed, it means that a value is pushed into the stack. At this time, the value of %esp will be reduced by 4 bytes. When the popl instruction is executed, it means that a value is popped from the stack. At this time, the value of %esp will increase by 4 bytes. In general, the value of the %esp register always points to the top of the stack.

After understanding the basic knowledge of registers, we can now know how function B can obtain the parameters passed to it by function A. The address of parameter 1 is %ebp + 8, the address of parameter 2 is , and the address of %ebp + 12parameter n is %ebp + 4 + 4 * n. I believe you have already understood that these parameters can be obtained by looking up the pointer at the bottom of the stack, and the reason why these parameters are here is of course that function A has prepared them in advance. In addition 返回地址, it is stored at the bottom of all parameters . This is the address of the instruction to be executed next after the function B returns.

After looking at function A, look at function B again. At the top of the stack frame of function B 被保存的 %ebp, this refers to the pointer to the bottom of the stack of function A. After all, %ebp this register is only one, so when a new function is put on the stack, the old one must be saved and restored after the function is popped off the stack. Below this old stack bottom pointer are other register variables that need to be saved and local variables used internally by function B itself. The next step is 参数构造区域, that is, function B is about to call another function, and the parameters are prepared here. It can be seen that the stack frame structure of function B and function A is similar.

1.2 Function call examples

int caller()
{
    int arg1 = 534;
    int arg2 = 1057;

    int sum = swap_add(&arg1, &arg2);
    int diff = arg1 - arg2;

    return sum * diff;
}

int swap_add(int *xp, int *yp)
{
    int x = *xp;
    int y = *yp;

    *xp = y;
    *yp = x;

    return x + y;
}

Next, we analyze this program line by line. First, the caller function, as shown in the figure below, is the assembly code on the left and the call stack of the function on the right.

First look at the first three lines of code:

pushl %ebp      // 保存旧的 %ebp
movl %esp, %ebp // 将 %ebp 设置为 %esp
subl $24, %esp  // 将 %esp 减 24 开辟栈空间

These three lines are actually preparing the stack frame. The first line saves the old one %ebp, which is the bottom pointer of the stack frame of the caller outer function. At this time, the new stack space has not been created, but the old %ebp row space will be used as the bottom of the new stack frame, which is the bottom pointer of the stack frame, so the second line will be %espthe value of the stack pointer (always pointing to the top of the stack) Set to the %ebp top. The third line will %esp move down 24 bytes. This line actually caller opens up stack space for the function . As can be seen from the figure, the following space is used to save caller the local variables arg1 and arg2, and the parameters passed to the next function. Some of the space is unused, this is for address alignment, it does not affect our analysis and can be ignored.

After opening up the stack frame, the caller internal logic is executed . caller First, two local variables ( arg1and arg2. The corresponding assembly code are created:

movl $534, -4(%ebp)
movl $1057, -8(%ebp)

Which -4(%ebp) represents %ebp - 4 the position, that is, in FIG arg1 location, the arg2 position is %ebp - 8 a position. These two lines save the 534 sum 1057to these two locations. Continuing on are these lines:

leal -8(%ebp), %eax  // 把 %ebp - 8 这个地址保存到 %eax 
movl %eax, 4(%esp)   // 把 %eax 的值保存到 %esp + 4 这个位置上
leal -4(%ebp), %eax  // 把 %ebp - 4 这个地址保存到 %eax 
movl %eax, ($esp)    // 把 %eax 的值保存到 %esp 这个位置上

The first line %ebp - 8 stored to this address %eax , whereas %ebp - 8 is arg2 an address, the address on the next line %esp + 4 in this position, i.e. the figures &arg2 that the region of the block. In fact, this line is swap_add preparing parameters for the function &arg2, and the following two lines are preparing parameters &arg1.

The next line is call swap_add. This line is to call the function swap_add . The instruction will push the return address of the function onto the stack and set the program counter PC to the function swap_add的起始地址。. The return address here is the address of swap_add the code to be executed after the function returns, that is, the int diff = arg1 - arg2 address. We first enter the swap_add function, the following is the corresponding code execution diagram:

pushl %ebp      // 保存旧的 %ebp
movl %esp, %ebp // 将 %ebp 设置为 %esp
pushl %ebx      // 保存 %ebx

swap_add The first three lines of the corresponding assembly code are caller similar. They also save the old frame pointer, but because there is swap_add no need to save additional variables, only one more register is needed %ebx, so the old value of this register is saved here, but it is not %esp directly moved down. A length of operation.

movl 8(%ebp), %edx // 从 %ebp + 8 取值保存到 %edx
movl 12(%ebp), %ecx // 从 %ebp + 12 取值保存到 %ecx

These two lines are from caller stored parameters &arg1 and &arg2 obtaining the local address values, and obtain the address arg1and arg2 actual values.

mov1 %edx, %ebx
mov1 %ecx, %eax
mov1 %eax, %edx
mov1 %ebx, %exc

These 4 lines are swap operations. Look at the following lines:

addl %ebx, %eax // 将返回值保存到寄存器 %eax 
pop %ebx
pop %ebp
ret

swap_add The return value of the function is stored in %eax and caller is obtained from this register for a while. swap_add The last few lines pop operation, %ebx and %ebp were restored caller values. Finally, when the execution ret returns to the caller middle, the ret instruction will pop the return address from the stack and set the value of the program counter PC to the value of the return address.
Next, we continue to return to the caller middle, just executed call swap_add, the following lines are executed int diff = arg1 - arg2, and the results are saved in the %edx middle. The last line is calculation sum * diff, and the corresponding assembly code is imull %edx, %eax. Here is the %edx and the %eax values are multiplied and the results stored into %eax the. In the above analysis, we know that %eax holds swap_add the return value here is from %eax out the return value is calculated, and the results continue to be saved to %eax , whereas this value is caller the return value, so call caller function can also be from this register Get the return value. caller The last line of assembly code of the function is retthat this will destroy caller the stack frame and restore the old value of the corresponding register. At this point, the calling process caller with swap_addthis function is all analyzed.

2. Coroutine function: CoRoutineFunc

static int CoRoutineFunc( stCoRoutine_t *co,void * )
{
	if( co->pfn )
	{
		co->pfn( co->arg );
	}
	co->cEnd = 1; // 协程执行结束标识

	stCoRoutineEnv_t *env = co->env;

	co_yield_env( env );

	return 0;
}

// 协程执行结束，从线程环境栈减1，并切换到另外一个协程
void co_yield_env( stCoRoutineEnv_t *env )
{
	
	stCoRoutine_t *last = env->pCallStack[ env->iCallStackSize - 2 ];
	stCoRoutine_t *curr = env->pCallStack[ env->iCallStackSize - 1 ];

	env->iCallStackSize--;

	co_swap( curr, last);
}

3. Create a coroutine context environment: coctx_make

/* 用于分配coctx_swap两个参数内存区域的结构体，仅32位下使用，64位下两个参数直接由寄存器传递 */
struct coctx_param_t
{
	const void *s1;
	const void *s2;
};

int coctx_make( coctx_t *ctx,coctx_pfn_t pfn,const void *s,const void *s1 )
{
	//make room for coctx_param
    /*
    * ctx->ss_sp 对应的空间是在堆上分配的，在协程创建时初始化，地址是从低到高的增长，而栈是往低地址方向增长的，
    * 所以要使用这一块人为改变的栈帧区域，首先地址要调到最高位，即ss_sp + ss_size的位置
    */
	char *sp = ctx->ss_sp + ctx->ss_size - sizeof(coctx_param_t);
	sp = (char*)((unsigned long)sp & -16L); // 16字节对齐

    /* 栈中保存函数的参数 */
	coctx_param_t* param = (coctx_param_t*)sp ;
	param->s1 = s;
	param->s2 = s1;

	memset(ctx->regs, 0, sizeof(ctx->regs));

	ctx->regs[ kESP ] = (char*)(sp) - sizeof(void*); // 保存栈栈顶指针，kESP = 7
	ctx->regs[ kEIP ] = (char*)pfn; // 保存函数指针，kEIP = 0

  	//------- ss_sp + ss_size
	//|pading | 这里是对齐区域
	//|s2     |
	//|s1     |
	//|-------- <- 原esp 
	//|返回地址 |
	//|返回地址 |
	//|-------- <- sp(原esp - sizeof(void*) * 2)
	//|        |
	//--------- ss_sp

	return 0;
}

4. Context switch of the coroutine: co_swap

/* 当前准备让出CPU的协程叫做current协程，把即将调入执行的叫做 pending 协程 */
void co_swap(stCoRoutine_t* curr, stCoRoutine_t* pending_co)
{
 	stCoRoutineEnv_t* env = co_get_curr_thread_env();

	// 在函数头放一个局部变量，可以获取sp栈顶指针
	char c;
	curr->stack_sp= &c;

	if (!pending_co->cIsShareStack)
	{
		env->pending_co = NULL;
		env->ocupy_co = NULL;
	}
	else 
	{
		env->pending_co = pending_co;

		/* 获取当前占用共享栈的是哪个协程 */
		stCoRoutine_t* ocupy_co = pending_co->stack_mem->ocupy_co;

		/* 将共享栈的占用协程设置为即将换入的协程 */
		pending_co->stack_mem->ocupy_co = pending_co;

        /* 保存换出的协程 */
		env->ocupy_co = ocupy_co;
        
        /* 保存换出的协程的栈内容到协程实体的结构体中 */
		if (ocupy_co && ocupy_co != pending_co)
		{
			save_stack_buffer(ocupy_co);
		}
	}

	/* 切换协程的上下文 */
	coctx_swap(&(curr->ctx),&(pending_co->ctx) );

	// stack buffer may be overwrite, so get again;
	stCoRoutineEnv_t* curr_env = co_get_curr_thread_env();
	stCoRoutine_t* update_ocupy_co =  curr_env->ocupy_co;
	stCoRoutine_t* update_pending_co = curr_env->pending_co;
	
	if (update_ocupy_co && update_pending_co && update_ocupy_co != update_pending_co)
	{
		/* 将save_buffer中的栈内容复制到共享栈中 */
		if (update_pending_co->save_buffer && update_pending_co->save_size > 0)
		{
			memcpy(update_pending_co->stack_sp, update_pending_co->save_buffer, update_pending_co->save_size);
		}
	}
}

/* 将协程的共享栈内容保存到协程实体的结构体中 */
void save_stack_buffer(stCoRoutine_t* ocupy_co)
{
	///copy out
	stStackMem_t* stack_mem = ocupy_co->stack_mem;
	int len = stack_mem->stack_bp - ocupy_co->stack_sp;

	if (ocupy_co->save_buffer)
	{
		free(ocupy_co->save_buffer), ocupy_co->save_buffer = NULL;
	}

	ocupy_co->save_buffer = (char*)malloc(len); //malloc buf;
	ocupy_co->save_size = len;

	memcpy(ocupy_co->save_buffer, ocupy_co->stack_sp, len);
}

After coctx_swap is executed, the CPU ran to execute the code in pendding, that is to say, after executing the statement of coctx_swap, the next statement to be executed is not stCoRoutineEnv_t* curr_env = co_get_curr_thread_env();, but in pedding Statement. Pay special attention to this point. So when is the statement after the coctx_swap statement executed? It will continue to execute here after the coroutine is executed by other places co_resume. The rest is simple. When switching out, copy the contents of the stack and save it in a buffer. When switching back, copy the contents of the buffer to the stack. This is the execution process of the coroutine.

4.1 Context switch: coctx_swap

.globl coctx_swap
#if !defined( __APPLE__ )
.type  coctx_swap, @function
#endif
coctx_swap:

#if defined(__i386__)
	leal 4(%esp), %eax // 把%esp + 4的地址保存到%eax中
	movl 4(%esp), %esp // %esp 保存 %esp + 4地址指向的值
	leal 32(%esp), %esp // %esp = %esp + 32，此时%esp指向parm a : &regs[7] + sizeof(void*)

    // 接下来把所有的寄存器值保存到当前协程的8个寄存器数组中
	pushl %eax // esp ->parm a 
	pushl %ebp
	pushl %esi
	pushl %edi
	pushl %edx
	pushl %ecx
	pushl %ebx
	pushl -4(%eax)

    // 更新%esp的值
	movl 4(%eax), %esp // parm b -> &regs[0]

    // 把即将运行的协程的寄存器值从内存中弹出保存到CPU的寄存器中
	popl %eax  //ret func addr
	popl %ebx  
	popl %ecx
	popl %edx
	popl %edi
	popl %esi
	popl %ebp
	popl %esp
	pushl %eax //set ret func addr

	xorl %eax, %eax
	ret

Libco source code reading (4): the context of the coroutine