Use setjmp and longjmp in C language to realize exception capture and coroutine


This is Brother Dao's original 017

I. Introduction

In the C standard library, there are two powerful functions: setjmp and longjmp . I wonder if you guys have used them in the code? I asked several colleagues in the body, some people don't know these two functions, some people know this function, but have never used it.

Judging from the scope of knowledge points, the functions of these two functions are relatively simple , and a simple sample code can make it clear. However, we need to diverge and think from this knowledge point , in different dimensions, associate and compare this knowledge point with other similar knowledge in this programming language ; compare it with similar concepts in other programming languages ; and then Think about where this knowledge point can be used, and how others use it.

Today, let's talk about these two functions. Although it can't be used in general programs, on some occasion in the future, when you need to deal with some more peculiar program flows, maybe they can bring you unexpected results.

For example: we will compare setjmp/longjmp with goto statements in terms of function; compare with the return value of functions ; compare with the use scenarios of coroutines in the language . fork Python/Lua

Two, function syntax introduction

1. Minimal example

Let's not make sense, just look at this simplest example code, it doesn't matter if you don't understand it, you are familiar with it:

int main()
{
    // 一个缓冲区,用来暂存环境变量
    jmp_buf buf;
    printf("line1 \n");
    
    // 保存此刻的上下文信息
    int ret = setjmp(buf);
    printf("ret = %d \n", ret);
    
    // 检查返回值类型
    if (0 == ret)
    {
        // 返回值0:说明是正常的函数调用返回
        printf("line2 \n");
        
        // 主动跳转到 setjmp 那条语句处
        longjmp(buf, 1);
    }
    else
    {
        // 返回值非0:说明是从远程跳转过来的
        printf("line3 \n");
    }
    printf("line4 \n");
    return 0;
}

Results of the:

The execution sequence is as follows (if you don't understand, don't go into it , look back after reading the explanation below):

2. Function description

First look at the signatures of these two functions:

int setjmp(jmp_buf env);
void longjmp(jmp_buf env, int value);

They are all declared in the header file , Wikipedia explains as follows: setjmp.h

setjmp: Sets up the local jmp_buf buffer and initializes it for the jump. This routine saves the program’s calling environment in the environment buffer specified by the env argument for later use by longjmp. If the return is from a direct invocation, setjmp returns 0. If the return is from a call to longjmp, setjmp returns a nonzero value。
longjmp:Restores the context of the environment buffer env that was saved by invocation of the setjmp routine in the same invocation of the program. Invoking longjmp from a nested signal handler is undefined. The value specified by value is passed from longjmp to setjmp. After longjmp is completed, program execution continues as if the corresponding invocation of setjmp had just returned. If the value passed to longjmp is 0, setjmp will behave as if it had returned 1; otherwise, it will behave as if it had returned value。

Let me use my own understanding to explain the above paragraph in English:

setjmp function

  1. Function: Save various context information when executing this function, mainly the value of some registers;
  2. Parameters: The buffer used to save the context information, which is equivalent to taking a snapshot of the current context information and saving it;
  3. Return value: There are two return values. If the setjmp function is called directly, the return value is 0; if the longjmp function is called to jump over, the return value is non-zero; here can be compared with the function fork that creates the process.

longjmp function

  1. Function: Jump to the context (snapshot) saved in the parameter env buffer to execute;
  2. Parameters: The env parameter specifies which context (snapshot) to jump to for execution, value is used to provide return judgment information to the setjmp function, that is to say: when the longjmp function is called, this parameter value will be used as the return value of the setjmp function;
  3. Return value: No return value. Because when this function is called, it jumps directly to the code in other places for execution, and will not come back again.

Summary: These two functions are used together to realize the jump of the program.

3. setjmp: save context information

We know that after C code is compiled into a binary file, it is loaded into memory during execution, and the CPU takes out each instruction to the code segment in order to execute it. There are many registers in the CPU to save the current execution environment , such as: code segment register CS, instruction offset register IP, and of course there are many other registers. We call this execution environment the context.

When the CPU obtains the next execution instruction, the instruction to be executed can be obtained through the two registers of CS and IP , as shown in the following figure:

Add some knowledge points:

  1. In the above figure, the code segment register CS is regarded as a base address, that is to say: CS points to the starting address of the code segment in memory, and the IP register represents the offset of the next instruction address to be executed from this base address . Therefore, every time you fetch an instruction, you only need to add the values ​​in these two registers to get the address of the instruction;
  2. In fact, on the x86 platform, the code segment register CS is not a base address, but a selector. There is a table somewhere in the operating system, this table stores the real start address of the code segment, and the CS register only stores an index value, this index value points to a table entry in this table, which involves Related knowledge of virtual memory;
  3. After obtaining an instruction, the IP register automatically moves down to the beginning of the next instruction. As for how many bytes are moved, it depends on how many bytes are occupied by the currently fetched instruction.

The CPU is a big fool. It doesn't have any ideas. Whatever we let it do, it does what it does. For example, fetch instructions: as long as we set the CS and IP registers, the CPU will use the values ​​in these two registers to fetch instructions. If these two registers are set to a wrong value, the CPU will fetch instructions stupidly, but it will crash during execution.

We can simply understand these register information as context information, and the CPU will execute it according to the context information. Therefore, the C language prepared the setjmp library function for us to save the current context information and temporarily store it in a buffer.

What is the purpose of preservation? In order to be able to restore to the current place in the future to continue execution.

There is a simpler example: snapshots in the server. What is the purpose of snapshots? When the server has an error, you can revert to a certain snapshot !

4. longjmp: implement jump

Speaking of jumps, the concept that immediately popped out of my mind is the goto statement . I found that many tutorials have a lot of opinions on the goto statement and think that you should try not to use it in your code. This point of view is a good starting point: if goto is used too much, it will affect the understanding of the code execution order.

But if you look at the code of the Linux kernel, you can find a lot of goto statements. Again: Find a balance between code maintenance and execution efficiency .

Jumping changes the execution sequence of the program. The goto statement can only jump within the function, and it can't do anything if it crosses the function.

Therefore, the C language provides us with the longjmp function to implement remote jumps , which can be seen from its name, which means that it can jump across functions.

From the perspective of the CPU, the so-called jump is to set various registers in the context as a snapshot at a certain time. Obviously, in the above setjmp function, the context information (snapshot) at that time has been stored in a temporary buffer. area in, and if you want to jump to that place then executed directly tells the CPU on the line.

How to tell the CPU? Just overwrite the register information in the temporary buffer over the registers used in the CPU.

5. setjmp: return type and return value

In some programs that require multiple processes, we often use the fork function to "incubate" a new process from the current process, and the new process is executed from the next statement of the fork function .

For the main process , returning after calling the fork function also continues to execute the next statement, so how to distinguish between the main process and the new process? The fork function provides a return value for us to distinguish:

The fork function returns 0: it means this is a new process; the
fork function returns non-zero: it means the original main process, and the return value is the process number of the new process.

Similarly, the setjmp function also has different return types. Perhaps it is not accurate to express the return type. It can be understood like this: Returning from the setjmp function, there are 2 scenarios in total :

  1. When setjmp is actively called: 0 is returned. The purpose of the active calling is to save the context and create a snapshot.
  2. When jumping through longjmp: return non-zero, and the return value at this time is specified by the second parameter of longjmp.

According to the above two different values, we can perform different branch processing. When returning by longjmp jump , different non-zero values can be returned according to the actual scene . For those who have programming experience in scripting languages ​​such as Python and Lua , did they think of the yield/resume function ? Their external performance on parameters and return values ​​is the same!

Summary: So far, I basically finished the usage of the two functions setjmp/longjmp. I don't know if I have described it clearly enough. At this point, look at the sample code at the beginning of the article, it should be clear at a glance.

Three, use setjmp/longjmp to achieve exception capture

Since the C library provides us with this tool, there must be certain usage scenarios . Exception capture is directly supported at the grammatical level in some high-level languages ​​(Java/C++), usually try-catch statements, but you need to implement them yourself in C language.

Let's demonstrate one of the simplest exception capture model, a total of 56 lines of code:

#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <setjmp.h>

typedef int     BOOL;
#define TRUE    1
#define FALSE   0

// 枚举:错误代码
typedef enum _ErrorCode_ {
    ERR_OK = 100,         // 没有错误
    ERR_DIV_BY_ZERO = -1  // 除数为 0
} ErrorCode;

// 保存上下文的缓冲区
jmp_buf gExcptBuf;

// 可能发生异常的函数
typedef int (*pf)(int, int);
int my_div(int a, int b)
{
    if (0 == b)
    {
        // 发生异常,跳转到函数执行之前的位置
        // 第2个参数是异常代码
        longjmp(gExcptBuf, ERR_DIV_BY_ZERO);
    }
    // 没有异常,返回正确结果
    return a / b;
}

// 在这个函数中执行可能会出现异常的函数
int try(pf func, int a, int b)
{
    // 保存上下文,如果发生异常,将会跳入这里
    int ret = setjmp(gExcptBuf);
    if (0 == ret)
    {
        // 调用可能发生异常的哈数
        func(a, b);
        // 没有发生异常
        return ERR_OK;
    }
    else
    {
        // 发生了异常,ret 中是异常代码
        return ret;
    }
}

int main()
{
    int ret = try(my_div, 8, 0);     // 会发生异常
    // int ret = try(my_div, 8, 2);  // 不会发生异常
    if (ERR_OK == ret)
    {
        printf("try ok ! \n");
    }
    else
    {
        printf("try excepton. error = %d \n", ret);
    }
    
    return 0;
}

The code does not need to be explained in detail, just look at the comments in the code to understand. This code is only indicative , and it must be used in production code with a more complete package.

One thing to note: setjmp/longjmp only changes the execution order of the program. If some data of the application needs to be rolled back, we need to process it manually.

Fourth, use setjmp/longjmp to implement the coroutine

1. What is a coroutine

In a C program, if the sequence that needs to be executed concurrently is generally implemented by threads, then what is a coroutine ? Wikipedia's explanation of the coroutine is:

More detailed information on this page coroutine , pages specifically described coroutines and threads, comparison generator, various language implementation mechanism.

We use producers and consumers to briefly understand the difference between coroutines and threads:

2. Producers and consumers in threads

  1. Producer and consumer are two parallel execution sequences, usually two threads are used to execute;
  2. When the producer produces the goods, the consumer is in a waiting state (blocked). After the production is completed, the semaphore informs consumers to consume the goods;
  3. When consumers consume goods, producers are in a waiting state (blocked). After the consumption is over, the producer is notified by the semaphore to continue producing the goods.

3. Producers and consumers in the coroutine

  1. Producers and consumers execute in the same execution sequence, alternate execution by jumping to the execution sequence;
  2. After the producer produces the goods, it gives up the CPU and lets the consumer execute it;
  3. After the consumer consumes the product, he abandons the CPU and lets the producer execute it;

4. Implementation of coroutine in C language

Here is the simplest model , the mechanism of the coroutine is realized through setjmp/longjmp, the main purpose is to understand the execution sequence of the coroutine, without solving the problem of passing parameters and return values.

If you want to study the implementation of coroutines in C language, you can take a look at the concept of Duff devices , where goto and switch statements are used to implement branch jumps. The syntax used is weird but legal.

typedef int     BOOL;
#define TRUE    1
#define FALSE   0

// 用来存储主程和协程的上下文的数据结构
typedef struct _Context_ {
    jmp_buf mainBuf;
    jmp_buf coBuf;
} Context;

// 上下文全局变量
Context gCtx;

// 恢复
#define resume() \
    if (0 == setjmp(gCtx.mainBuf)) \
    { \
        longjmp(gCtx.coBuf, 1); \
    }

// 挂起
#define yield() \
    if (0 == setjmp(gCtx.coBuf)) \
    { \
        longjmp(gCtx.mainBuf, 1); \
    }

// 在协程中执行的函数
void coroutine_function(void *arg)
{
    while (TRUE)  // 死循环
    {
        printf("\n*** coroutine: working \n");
        // 模拟耗时操作
        for (int i = 0; i < 10; ++i)
        {
            fprintf(stderr, ".");
            usleep(1000 * 200);
        }
        printf("\n*** coroutine: suspend \n");
        
        // 让出 CPU
        yield();
    }
}

// 启动一个协程
// 参数1:func 在协程中执行的函数
// 参数2:func 需要的参数
typedef void (*pf)(void *);
BOOL start_coroutine(pf func, void *arg)
{
    // 保存主程的跳转点
    if (0 == setjmp(gCtx.mainBuf))
    {
        func(arg); // 调用函数
        return TRUE;
    }

    return FALSE;
}

int main()
{
    // 启动一个协程
    start_coroutine(coroutine_function, NULL);
    
    while (TRUE) // 死循环
    {
        printf("\n=== main: working \n");

        // 模拟耗时操作
        for (int i = 0; i < 10; ++i)
        {
            fprintf(stderr, ".");
            usleep(1000 * 200);
        }

        printf("\n=== main: suspend \n");
        
        // 放弃 CPU,让协程执行
        resume();
    }

    return 0;
}

The print information is as follows:

Five, summary

The focus of this article is to introduce the syntax and usage scenarios of setjmp/longjmp. In some demand scenarios, it can achieve a multiplier effect with half the effort.

Of course, you can also use your imagination to achieve more fancy functions by executing sequence jumps, everything is possible!


Don't brag, don't hype, don't exaggerate, write every article carefully!
Welcome to forward, to share to friends around technology, Columbia Road, to express my heartfelt thanks! Forwarded recommended language has helped you to think it over:

This summary article summarized by Brother Dao was written very carefully, which is very helpful to my technical improvement. Good things to share!

Finally, I wish you: in the face of code, there will be no bugs; in the face of life, spring blossoms!


[Original Statement]

OF: Columbia Road (public No.: The IOT of town things )
know almost: Columbia Road
station B: Share Columbia Road
Denver: Columbia Road share
CSDN: Columbia Road Share


I will put ten years of practical experience in embedded development project outputs summary!

Press and hold the two-dimensional code in the picture to follow, follow + star public account , each article has dry goods.


Reprint: Welcome to reprint, but without the consent of the author, this statement must be retained, and the original link must be given in the article.

Recommended reading

C language pointer-from the underlying principles to fancy skills, with pictures and codes to help you explain a thorough
step by step analysis-how to use C to implement object-oriented programming, the
original gdb debugging principle is so simple
, those things about encryption, certificates,
deep into the LUA script Language, let you fully understand the principle of debugging

Guess you like

Origin blog.csdn.net/u012296253/article/details/113543344