linux_timing race-pause function-sigsuspend function-asynchronous I/O-reentrant function-non-reentrant function

Continued from the previous article: linux_signal capture-signal function-sigaction function-sigaction structure

  Today I will share the knowledge of timing race. The problem of timing race will definitely be related to CPU. I will also learn two functions, pause function and sigsuspend function. I will also share what is reentrant function and non-reentrant function. Not much to say, let's take a bowl of timing competition:

The catalog of articles published by this blogger on CSDN: [ My CSDN catalog, as a guide to the types of articles published by bloggers on CSDN ]

Before introducing the timing race, let's introduce the pause function .

1. pause function

Function function:
  Calling this function can cause the process to actively suspend and wait for the signal to wake up. The process calling this system call will be in a blocked state (voluntarily giving up the cpu) until a signal is delivered to wake it up.
Header file:
  #include <unistd.h>
Function prototype:
  int pause(void);
Function parameters:
  None
Return value:
  Return value: -1 and set errno to EINTR
    ① If the default processing action of the signal is to terminate the process, the process terminates , does the pause function have a chance to return.
    ② If the default processing action of the signal is to ignore, the process continues to be suspended, and the pause function does not return.
    ③ If the processing action of the signal is to capture, then [after calling the signal processing function, pause returns -1]
    errno is set to EINTR, which means "interrupted by the signal". Think about which function we have that only returns an error value.
    ④ The signal received by pause cannot be shielded. If it is shielded, pause cannot be woken up.

1.1. Example – use of the pause function:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <signal.h>
void donothing(int signo)
{
    
    
}
unsigned int mysleep(unsigned int seconds) 
{
    
    
    unsigned int ret;
    struct sigaction act, oldact;
    act.sa_handler = donothing;
    sigemptyset(&act.sa_mask);//信号集清零
    act.sa_flags = 0;
//注册信号捕捉函数
    sigaction(SIGALRM, &act, &oldact);
    alarm(seconds);         //定时固定的秒数  1 
    pause();                //挂起
    ret = alarm(0);  
    sigaction(SIGALRM, &oldact, NULL);  //恢复SIGALRM 默认处理方式
    return ret;
}
int main(void)
{
    
    
    mysleep(5);
    return 0;
}

2. Timing race

  Timing race: Due to the different order of execution between processes, different results are produced after the same process is run multiple times.
Summary of race problems:
  Race conditions are closely related to system load, reflecting the unreliability of signals. The heavier the system load, the stronger the signal unreliability.
   The unreliability is due to its implementation principle. The signal is implemented by software (highly dependent on kernel scheduling, strong delay), after each system call, or after the interrupt processing is completed, it is necessary to scan the pending signal set in the PCB to determine whether a certain signal should be processed. signal. Timing violations occur when the system is heavily loaded.
   This kind of unexpected situation can only be foreseen in advance and actively avoided in the process of writing the program, and cannot be compensated by other means such as gdb program debugging. And because the error is irregular, it is very difficult to capture and reproduce it later.

3. Timing race problem 1 - signal processing

  The reason why there is a timing race problem is that when the CPU executes a process, a process only executes a time segment, so when the program you write runs, sometimes it seems to be no problem to execute thousands of times, but One time it suddenly collapsed. When you went to check the problem, you checked it for a long time, but you didn’t find the problem. Fatal problems, and this kind of problem is not easy to find, can only be avoided through our daily experience in writing code.
  For example, in the example of 1.1, after calling the alarm function, the CPU is lost, and the CPU executes other processes. When the execution time of other processes is longer than the scheduled time, what problems will occur, as shown in the figure below .
insert image description here

  In this way, the program that was originally scheduled for 1s became permanently blocked. This situation is possible, and the probability of occurrence may be one in ten million.
  Imagine this kind of error in commercial code, and the consequences would be devastating.
  Of course, in the above case, there is also a solution, that is, to use the signal shielding mechanism to solve it, so we have to talk about another function sigsuspend.

3.1. Solve the timing problem 1-sigsuspend function

Function:
  hang up and wait for the signal.
Header file:
  #include <signal.h>
Function prototype:
  int sigsuspend(const sigset_t *mask);
Function parameters:
  mask: The set of signal mask words determined during calling this function
Return value:
  Error returns -1, and errno is set to indicate Error (usually EINTR).
  EINTR: Interrupted by a signal.

  You can control the program execution logic by setting the method of shielding SIGALRM, but no matter how you set it, the program may lose cpu resources between the two operations of "unmasking the signal" and "suspending to wait for the signal". Unless these two steps are combined into one " atomic operation ". The sigsuspend function has this function. In the case of strict timing requirements, sigsuspend should be used instead of pause.
  Atomic operation: when the CPU executes this function, it will finish executing it without stopping

3.2. Example - code to solve the timing race problem 1 in Example 1.1:

#include <unistd.h>
#include <signal.h>
#include <stdio.h>
void sig_alrm(int signo)
{
    
    
    /* nothing to do */
}
unsigned int mysleep(unsigned int nsecs)
{
    
    
    struct sigaction newact, oldact;
    sigset_t newmask, oldmask, suspmask;
    unsigned int unslept;

    /*为SIGALRM设置捕捉函数,一个空函数*/
    newact.sa_handler = sig_alrm;
	//将信号集清零
    sigemptyset(&newact.sa_mask);
    newact.sa_flags = 0;
	//注册信号捕捉函数,oldact保留原有的信号集
    sigaction(SIGALRM, &newact, &oldact);

    /*设置阻塞信号集,阻塞SIGALRM信号*/
    sigemptyset(&newmask);//将信号集清零
    sigaddset(&newmask, SIGALRM);//将SIGALRM信号加入信号集,置1
	//屏蔽SIGALRM信号,设置信号屏蔽字,oldmask保留原有的信号集
    sigprocmask(SIG_BLOCK, &newmask, &oldmask); //原子操作,即调用该函数期间不能失去cpu

    //定时nsecs秒,到时后可以产生SIGALRM信号
    alarm(nsecs);

    /*构造一个调用sigsuspend临时有效的阻塞信号集,
     *  在临时阻塞信号集里解除SIGALRM的阻塞*/
    suspmask = oldmask;		//
    sigdelset(&suspmask, SIGALRM);	//在suspmask集合中清除对SIGALRM函数的屏蔽

    /*sigsuspend调用期间,采用临时阻塞信号集suspmask替换原有阻塞信号集
     *  这个信号集中不包含SIGALRM信号,同时挂起等待,
     *  当sigsuspend被信号唤醒返回时,恢复原有的阻塞信号集*/
    sigsuspend(&suspmask); 

    unslept = alarm(0);
    //恢复SIGALRM原有的处理动作,呼应前面注释1
    sigaction(SIGALRM, &oldact, NULL);

    //解除对SIGALRM的阻塞,呼应前面注释2
    sigprocmask(SIG_SETMASK, &oldmask, NULL);
    return(unslept);
}
int main(void)
{
    
    
while(1)
{
    
    
        mysleep(2);
        printf("Two seconds passed\n");
    }
    return 0;
}

4. Timing race problem 2 - global variable asynchronous I/O

  Analyze the following parent-child process alternate counting program.
  When the sleep in the capture function is canceled, the program will have problems.
  What is the cause of this problem?

#include <stdio.h>
#include <signal.h>
#include <unistd.h>
#include <stdlib.h>

int n = 0, flag = 0;
void sys_err(char *str)
{
    
    
    perror(str);
    exit(1);
}
void do_sig_child(int num)
{
    
    
    printf("I am child  %d\t%d\n", getpid(), n);
    n += 2;
    flag = 1;
    sleep(1);
}
void do_sig_parent(int num)
{
    
    
    printf("I am parent %d\t%d\n", getpid(), n);
    n += 2;
    flag = 1;
    sleep(1);
}
int main(void)
{
    
    
    pid_t pid;
struct sigaction act;

    if ((pid = fork()) < 0)
        sys_err("fork");
    else if (pid > 0) {
    
         
        n = 1;
        sleep(1);
        act.sa_handler = do_sig_parent;
        sigemptyset(&act.sa_mask);
        act.sa_flags = 0;
        sigaction(SIGUSR2, &act, NULL);             //注册自己的信号捕捉函数  父使用SIGUSR2信号
        do_sig_parent(0);						  
        while (1) {
    
    
            /* wait for signal */;
           if (flag == 1) {
    
                             //父进程数数完成
                kill(pid, SIGUSR1);
                flag = 0;                        //标志已经给子进程发送完信号
            }
        }
    } else if (pid == 0) {
    
           
        n = 2;
        act.sa_handler = do_sig_child;
        sigemptyset(&act.sa_mask);
        act.sa_flags = 0;
        sigaction(SIGUSR1, &act, NULL);

        while (1) {
    
    
            /* waiting for a signal */;
            if (flag == 1) {
    
    
                kill(getppid(), SIGUSR2);
                flag = 0;//分析,若是在cpu执行到此处时,收到父进程得信号,在flag还未被改完,就去执行do_sig_child该函数,会怎么样?
            }
        }
    }
    return 0;
}			

  In the example, the program execution progress is marked by the flag variable. The flag is set to 1 to indicate that the counting is complete. The flag is set to 0 to indicate the completion of sending a signal to the other party.
   Where the problem occurs, the flag needs to be called immediately after the kill function of the parent-child process, and set to 0 to indicate that the signal has been sent. However, during this period, it is very likely to be scheduled by the kernel and lose the right to execute, while the other party obtains the execution time and calls back the capture function by sending a signal, thereby modifying the global flag.
  How to solve this problem?
  You can use the "lock" mechanism that will be shared later. When manipulating global variables, this problem is solved by locking and unlocking.
Now, if we use global variables during programming, we should subjectively pay attention to the possible problems caused by asynchronous IO of global variables.

5. Timing race problem 3 - reentrant/non-reentrant functions

  A function is called "reentrant" because it is called repeatedly due to a certain timing during the execution of the call (not yet the end of the call). According to the method of function implementation, it can be divided into two types: "reentrant function" and "non-reentrant function".

  Reentrant function: The function cannot contain global variables and static variables, and cannot use malloc, free, etc.
  Non-reentrant function: The function contains global variables and static variables, uses malloc and free, and is a standard I/O function.

Therefore, our signal capture function should be designed as a reentrant function.
See man 7 signal for reentrant functions that signal handlers can call.

The above is the sharing of this time, I hope it can be helpful to the majority of netizens.

Guess you like

Origin blog.csdn.net/qq_44177918/article/details/130299093