In-depth understanding of the Linux kernel--signals

The role of signals

Signals are short messages that can be sent to a process or a group of processes. The two main purposes of using signals are:
1. Let the process know that a specific event has occurred.
2. Force the process to execute the signal handler in its own code.
Insert image description here

The POSIX standard also introduces a new class of signals called real-time signals; their encoding range in Linux is 32 to 64. They differ significantly from regular signals in that they must be queued so that multiple signals sent can be received. On the other hand, regular signals of the same type are not queued: if a regular signal is sent multiple times in succession, only one of them is sent to the receiving process. Although the Linux kernel does not use real-time signals, it fully implements the POSIX standard through several specific system calls.

An important feature of signals is that they can be sent at any time to processes whose state is often unpredictable. Signals sent to non-running processes must be saved by the kernel until the process resumes execution. Blocking a signal (described later) requires that the delivery of the signal be delayed until it is subsequently unblocked, which makes the problem of delivering the signal some time after it is generated even more serious. Therefore, the kernel distinguishes two different stages of signal delivery:
1. Signal generation, where the kernel updates the target process's data structures to indicate that a signal has been sent
2. Signal delivery, where the kernel forces the target process to react to the signal by: or Change the execution state of the target process, or start execution of a specific signal handler, or both.

Once the signal has been delivered, all information about the signal in the process descriptor is canceled. A signal that has been generated but not yet delivered is called a pending signal.

At any time, there is only one pending signal of a given type for a process. Other signals of the same type for the same process are not queued and are simply discarded.

However, real-time signals are different: there can be several pending signals of the same type. In general, signals can remain pending for unpredictable times. The following factors must be considered:
1. Signals are usually only delivered by the currently running process (that is, delivered by the current process).
2. Signals of a given type can be selectively blocked by the process.
3. When a process executes a signal handler function,
the corresponding signal is usually "shielded", that is, the signal is automatically blocked until the end of the handler. Therefore, another occurrence of the signal being handled cannot interrupt the signal handler, so the signal handling function does not have to be reentrant.

The kernel must:
1. Remember which signals each process blocks.
2. When switching from kernel mode to user mode, any process must check whether a signal has arrived.
3. Determine whether the signal can be ignored. This occurs when all of the following conditions are met:
3.1. The target process is not tracked by another process (the PT_PTRACED flag in the ptrace field in the process descriptor is equal to 0).
3.2. The signal is not blocked by the target process.
3.3. The signal is ignored by the target process
4. Handle such a signal, that is, the signal may request to switch the process to a signal processing function at any time during the running of the process, and restore the original execution context after this function returns.

Actions performed before delivering the signal

A process responds to a signal in three ways:

Explicitly ignore signals.
Perform the default action associated with the signal

Terminate
    进程被终止。
Dump
    进程被终止，并且，如果可能，创建包含进程执行上下文的核心转储文件；
lgnore
    信号被忽略。
Stop
    进程被停止，即把进程置为TASK_STOPPED状态
Continue
    如果进程被停止(TASK_STOPPED)。就把它置为TASK_RUNNING状态。

Capture the signal by calling the corresponding signal handling function.

If a signal is received while a process is being traced, the kernel stops the process and sends a SIGCHLD signal to the tracing process to notify it. The tracking process can use the SIGCOUNT signal to resume the execution of the tracked process. The SIGKILL and SIGSTOP signals cannot be explicitly ignored, caught, or blocked, so their default actions must generally be performed.

POSIX signals and multi-threaded applications

The POSIX 1003.1 standard has some strict requirements for signal handling in multi-threaded applications:
1. The signal handler must be shared among all threads of the multi-threaded application; however, each thread must have its own pending signal mask and blocking signal mask. Pending means that the signal has been generated and is waiting to be processed. Blocking means that even if the signal has been generated, it will not be actually processed.

2. The POSIX library functions kill() and sigqueue() must send signals to all multi-threaded applications rather than to a special thread. The same is true for all signals generated by the kernel (such as: SIGCHLD, SIGINT or SIGQUIT).

3. Each signal sent to a multi-threaded application is sent to only one thread, which is arbitrarily selected by the kernel from threads that will never block the signal.

4. If a fatal signal is sent to a multi-threaded application, the kernel will kill all threads of the application, not just the thread that received the signal.

There are two exceptions: it is not possible to send signals to process 0 (swapper), and signals sent to process 1 (init) are always discarded until they are caught. Therefore, process 0 never dies, while process 1 only dies when the init program terminates. If a suspend signal is sent to a specific process, the signal is private; if it is sent to the entire thread group, it is shared.

Data structures related to signals

Insert image description here

The blocked field stores the signals currently blocked by the process. It is an array of sigset_t bits, one element for each signal type: The number of the signal corresponds to the corresponding bit index in the sigset_t type variable plus 1.

Signal descriptor and signal handler descriptor

The signal descriptor is shared by all processes belonging to the same thread group. The fields related to signal processing in the signal descriptor are shown in Table 11-4:
Insert image description here

sigaction data structure

sa_handler
    指向信号处理程序的一个指针/SIG_DFL/SIG_IGN
sa_flags
    这是一个标志集
sa_mask
    当运行信号处理程序时要屏蔽的信号。

Insert image description here

pending signal queue

Several system calls generate signals that are sent to an entire thread group, such as kill() and rt_sigqueueinfo(), while others generate signals that are sent to specific processes, such as tkill() and tgkill(). The kernel associates two suspend signal queues with each process:
1. Shared suspend signal queue, which stores the suspend signals of the entire thread group.
2. Private suspension signal queue, which stores the suspension signals of specific processes (lightweight processes).

The pending signal queue consists of the sigpending data structure, which is defined as follows:
Insert image description here

siginfo_t is a 128-byte data structure that stores information about the occurrence of a specific signal. It contains the following fields:

si_signo
    信号编号。
si_errno
    引起信号产生的指令的出错码，或者如果没有错误则为0。
si_code
    发送信号者的代码(参见表11-8)。

Insert image description here

generate signal

Many kernel functions generate signals: they update the descriptors of one or more processes as needed. They do not directly perform the signal delivery operation in the second step, but may wake up some processes based on the type of signal and the status of the target process, and prompt these processes to receive the signal.
Insert image description here

All functions in Table 11-9 call the specific_send_sig_info() function at the end.
Insert image description here
All functions in Table 11-10 call the group_send_sig_info() function at the end.

specific_send_sig_info() function

The specific_send_sig_info() function sends a signal to the specified process. It acts on three parameters: the
Insert image description here
specific_send_sig_info() function must be called when the local interrupt is turned off and the t->sighand->siglock spin lock has been obtained. The function performs the following steps:

Checks whether the process ignores the signal, and returns 0 (no signal is generated) if so. When the following three conditions for ignoring signals are all met, the signal is ignored:
· The process is not tracked (the PT_PTRACED flag in t->ptrace is cleared to 0)
. The signal is not blocked (sigismember(&t->blocked,sig ) returns 0)
· Either ignore the signal explicitly (the sa_handler field of t->sighand->action[sig-1] is equal to SIG_IGN), or ignore the signal implicitly (the sa_handler field is equal to SIG_DFL and the signal is SIGCONT, SIGCHLD, SIGWINCH or SIGURG)
Checks whether the signal is non-real-time (sig<32) and whether there is already another identical pending signal on the process's private pending signal queue. If so, nothing needs to be done, so 0 is returned.
Call send_signal(sig,info,t,&t->pending) to add the signal to the process's pending signal collection
If send_signal() ends successfully and the signal is not blocked (sigismember(&t->blocked,sig) returns 0), the signal_wake_up() function is called to notify the process of a new pending signal.

signal_wake_up

a. Set the TIF_SIGPENDING flag in t->thread_info->flags.
b. If the process is in the TASK_INTERRUPTIBLE or TASK_STOPPED state and the signal is SIGKILL, call try_to_wake_up() to wake up the process.
c. If try_to_wake_up() returns 0, it means that the process is already runnable: in this case, it checks whether the process is already running on another CPU, and if so, sends an inter-processor interrupt to that CPU to force Rescheduling of the current process. Because each process checks for the presence of a pending signal when returning from the scheduling function, interprocessor interrupts ensure that the target process quickly notices the new pending signal.
5. Return 1 (the signal has been successfully generated).

send_signal() function

The send_sigmal() function inserts a new element into the pending signal queue,

static int send_signal(int sig, struct siginfo *info, struct task_struct *t,
			struct sigpending *signals)

Add the signal to the process pending signal mask. When needed, allocate and construct sigqueue and add it to the sigqueue linked list. The sigqueue linked list allows a signal specified in the mask to exist in multiple linked list sections. sigqueue can further store the data information associated with the signal.

group_send_sig_info() function

The group_send_sig_info() function sends a signal to the entire thread group. It acts on three parameters: the signal number sig, the address of the siginfo_t table info (optional values are 0, 1 or 2, as described in the previous section "specific_send_sig_info() function") and the address of the process descriptor p .

int group_send_sig_info(int sig, struct siginfo *info, struct task_struct *p)

This function mainly performs the following steps:

Check whether the parameter sig is correct:

if(sig<0 Il sig>64)
	return -EINVAL;

If the signal is sent by a user-mode process, this function determines whether the operation is allowed. A signal can be delivered only if at least one of the following conditions is true:
. The owner of the sending process has the appropriate capabilities (this usually means issuing the signal through the system administrator).
· The signal is SIGCONT and the target process and the sending process are in the same registration session.
· Both processes belong to the same user.
If the user mode process is not allowed to send signals, the function returns the value -EPERM.
If the value of parameter sig is 0, the function does not generate any signal and returns immediately:

if(!sig ll !p->sighand)
	return 0;

Because 0 is an invalid signal encoding, it is used to let the sending process check whether it has the necessary privileges to send the signal to the target thread group. The function also returns if the target process is being killed (known by checking whether its signal handler descriptor has been released).
4. Obtain the p->sighand->siglock spin lock and turn off local interrupts.
5. Call the handle_stop_signal() function, which checks certain types of signals that may invalidate other pending signals of the target thread group.

The handle_stop_signal() function performs the following steps:

a. If the thread group is being killed (the SIGNAL_GROUP_EXIT flag in the flags field of the signal descriptor is set), the function returns.
b. If sig is a SIGSTOP, SIGTSTP, SIGTTIN or SIGTTOU signal, call the rm_from_queue() function to delete the SIGCONT signal from the shared pending signal queue p->signal->shared_pending and the private signal queue of all members of the thread group.
c. If sig is a SIGCONT signal, call the rm_from_queue() function to delete all SIGSTOP, SIGTSTP, SIGTTIN and SIGTTOU signals from the shared pending signal queue p->signal->shared_pending, and then privately suspend the processes belonging to the thread group. Delete the above signal from the signal queue and wake up the process:

rm_from_queue(0x003c0000,&p->signal->shared_pending);
t = p;
do {
	rm_from_queue(0x003c0000,&t->pending);
	try_to_wake_up(t,TASK_STOPPED,0);
	t = next_thread(t);
} while(t != p);

Mask 0x003c0000 selects the above four stop signals. The macro next_thread returns the descriptor addresses of different lightweight processes in the thread group each time it loops.
6. Check whether the thread group ignores the signal, and if so, returns a value of 0 (success). If the three conditions for ignoring signals mentioned in the previous section "The role of signals" are met (see also step 1 in the previous section "specific-send-sig.info() function"), then Ignore the signal.
7. Check whether the signal is non-real-time and there is already another identical signal in the thread group's shared pending signal queue. If so, nothing needs to be done, so a 0 value is returned (success).

if(sig<32 && sigismember(&p->signal->shared_pending.signal,sig))
	return 0;

Call the send_signal() function to add the signal to the shared pending signal queue. If send_signal() returns a non-zero error code, the function terminates and returns the same value.
Call the __group_complete_signal() function to wake up a lightweight process in the thread group
10. Release the p->sighand->siglock spin lock and turn on local interrupts.
11. Return 0 (success).

__group_complete_sigmal

The function __group_complete_sigmal() scans the processes in the thread group to find processes that can receive new signals. Processes that meet all of the following conditions may be selected:
. The process does not block signals.
. The state of the process is not EXIT_ZOMBIE, EXIT_DEAD, TASK_TRACED, or TASK_STOPPED (as an exception, if the signal is SIGKILL, the process may be in the TASK_TRACED or TASK_STOPPED state).
. The process is not being killed, that is, its PF_EXITING flag is not set.
. The process is either currently running on the CPU, or its TIF_SIGPENDING flag has not been set.
(In fact, there is no point in waking up a process with a pending signal: usually, the wake-up operation is already performed by the kernel control path with the TIF_SIGPENDING flag set; on the other hand, if the process is executing, it should be notified that there is a new The hang signal.) A thread group may have many processes that meet the above conditions, and the function selects one of the processes according to the following rules: .
If the process identified by p (the descriptor address passed by the parameter of group_send_sig_info()) satisfies all Priority criteria, and therefore can receive the signal, the function selects the process. Otherwise, the function searches for an appropriate process by scanning the members of the thread group, starting with the process that received the thread group's last signal (p->signal->curr_target). If the function __group_complete_signal() successfully finds an appropriate process, it starts delivering signals to the selected process. First, the function checks if the signal is fatal, and if so, kills the entire thread group by sending a SIGKILL signal to all lightweight processes in the thread group. Otherwise, the function calls the signal_wake_up() function to notify the selected process that a new suspend signal has arrived.

convey signal

Ensure that the process's pending signal is handled by the kernel. The kernel checks the value of the process's TIF_SIGPENDING flag before allowing the process to resume execution in user mode. Whenever the kernel handles an interrupt or exception, it checks for a pending signal.

do_signal

To handle non-blocking pending signals, the kernel calls the do_signal() function, which receives two parameters:
Insert image description here
If the interrupt handler calls do_signal(), the function returns immediately:

if((regs->xcs & 3)  != 3)
	return l;

If the oldset parameter is NULL, the function initializes it with the address of the current->blocked field:

if  (!oldset)
	oldset =&current->blocked;

The core of the do_signal() function consists of a loop that repeatedly calls the dequeue_signal() function. The loop does not end until there are no non-blocking pending signals in the private pending signal queue and the shared pending signal queue. The return code of dequeue_signal() is stored in the signr local variable. If the value is 0, it means that all pending signals have been processed and do_signal() can end. As long as a non-zero value is returned, it means that the pending signal is waiting to be processed, and dequeue_sigmal() was called after do_signal() processed the current signal.

dequeue_signal

The dequeue_signal() function first considers all signals in the private pending signal queue, starting with the lowest numbered pending signal. Then consider signals in a shared queue. It updates the data structure to indicate that the signal is no longer pending and returns its number. See how the do_signal() function handles each pending signal, whose number is returned by dequeue_signal(). First, it checks whether the current receiving process is being monitored by some other process; in the affirmative case, do_signal() calls do_notify_parent_cldstop() and schedule() to let the monitoring process know the process's signal handling. Then, do_signal() assigns the address of the k_sigaction data structure to process the signal to the local variable ka: ka =¤t->sig->action[signr-1]; Three operations can be performed according to the content of ka: ignore the signal, execute Default action or execute signal handler. If the passed signal is explicitly ignored, the do_signal() function simply continues the loop and thus considers another pending signal.

Perform the default action of the signal

If ka->sa.sa_handler is equal to SIG_DFL, do_signal() must perform the default operation of the signal. The only exception is when the receiving process is init, in which case this signal is discarded:

if(current->pid == 1)
	continue;

If the receiving process is another process, it is also very simple to handle the signal whose default operation is Ignore:

if(signr==SIGCONT ll signr==SIGCHLD II signr==SIGWINCH || signr==SIGURG)
	continue;

A signal whose default action is Stop may stop all processes in the thread group. To this end, do_signal() sets the status of the process to TASK_STOPPED, and then calls the schedule() function

if(signr==SIGSTOP ll signr==SIGTSTP || signr==SIGTTIN II signr==SIGTTOU) {
	if(signr != SIGSTOP && is_orphaned_pgrp(current->signal->pgrp))
		continue;
	do_signal_stop(signr);
}

The difference between SIGSTOP and other signals is subtle: SIGSTOP always stops a thread group, while other signals only stop thread groups that are not in the "orphan process group".

do_signal_stop

The do_signal_stop() function checks whether current is the first stopped process in the thread group, and if so, it activates "group stop": essentially, this function assigns a positive value to the group_stop_count field in the signal descriptor and wakes up All processes in the thread group. All such processes check this field to confirm that a "group stop" operation is in progress, then set the process's status to TASK_STOPPED and call schedule(). If the parent process of the thread group leader process does not set the SA_NOCLDSTOP flag of SIGCHLD, then the do_signal_stop() function will also send the SIGCHLD signal to it.

Signals whose default operation is Dump create a "dump" file in the process's working directory that lists the entire contents of the process's address space and CPU registers. After do_signal() creates the dump file, it kills the thread group.

The default operation for the remaining 18 signals is Terminate, which simply kills the thread group. To kill an entire thread group, the function call do_group_exit() performs a complete "group exit" process.

capture signal

If the signal has a dedicated handler, the do_signal() function must force the handler to execute. This is done by calling handle_signal():

handle_signal(signr,&info,aka,oldset,regs);
if(ka->sa.sa_flags& SA_ONESHOT)
	ka->sa.sa_handler = SIG_DFL;
return 1;

Executing a signal handler is a rather complex task because you need to carefully handle the contents of the stack when switching between user mode and kernel mode. We will correctly explain the tasks undertaken here.

The signal handler is a function defined by the user-mode process and included in the user-mode code segment. The handle_signal() function runs in kernel mode, while the signal handler runs in user mode. This means that before the current process resumes "normal" execution, it must first execute the user mode signal handler. In addition, when the kernel intends to resume normal execution of the process, the kernel-mode stack no longer contains the hardware context of the interrupted program, because the kernel-mode stack is cleared whenever there is a transition from kernel mode to user mode. Another complication is that the signal handler can invoke a system call, in which case, after executing the system call's service routine, control must return to the signal handler rather than to the normal code flow of the interrupted program.

The solution adopted by Linux is to copy the hardware context saved in the kernel mode stack to the user mode stack of the current process. The user-mode stack is also modified in such a way that when the signal handler terminates, the sigreturn() system call is automatically called to copy this hardware context back to the kernel-mode stack and restore the original contents of the user-mode stack.

Figure 11-2 illustrates the execution flow for a function that catches a signal.
1. A non-blocking signal is sent to a process.
2. When an interrupt or exception occurs, the process switches to kernel mode. Just before returning to user mode, the kernel executes the do_signal() function.
3. This function in turn processes the signal (by calling handle_signal()) and establishes the user mode stack (by calling setup_frame() or setup_rt_frame()). When the process switches to user mode again, the signal handler starts executing because the starting address of the signal handler is forced into the program counter.

When the handler terminates, the return code placed on the user mode stack by the setup_frame() or setup_rt_frame() function is executed. This code calls the sigreturn() or rt_sigreturn() system call. The corresponding service routine copies the user-mode stack hardware context of the normal program to the kernel-mode stack and restores the user-mode stack to its original state (by calling restore_sigcontext() ). When this system call ends, the normal process can therefore resume its own execution.

Insert image description here

setup_frame

To properly set up the process's user-mode stack, the handle_signal() function either calls setup_frame() (for signals that do not require a siginfo_t table) or calls setup_rt_frame(). To choose between these two functions, the kernel checks for signals related The SA_SIGINFO flag value of the sa_flags field of the sigaction table. The setup_frame() function receives four parameters, which have the following meanings: The
Insert image description here
setup_frame() function pushes a data structure called a frame into the user mode stack. This frame contains the information needed to process the signal. The required information and ensure that it is returned to the handle_signal() function correctly. A frame is a sigframe table containing the following fields (see Figure 11-3): The setup_frame() function first
Insert image description here

calls get_sigframe() to calculate the first memory unit of the frame. This The memory unit is usually in the user mode stack (Note 6), so the function returns the value: (regs->esp - sizeof(struct sigframe))& 0xfffffff8. Because the stack extends towards the lower address, by changing the address on the top of the current stack Subtract its size so that the result aligns with a multiple of 8, and you get the starting address of the frame. The return address is then verified with the access_ok macro. If the address is valid, setup_frame() calls __put_user() repeatedly to fill the frame. All fields of the frame. The pretcode field of the frame is initialized &__kernel_sigreturn, and the addresses of some glue codes are placed in the vsyscall page. Once this operation is completed, the regs area of the kernel mode stack is modified, which ensures that when current resumes it in user mode When executed, control is passed to the signal handler.

regs->esp =(unsigned long)frame;
regs->eip =(unsigned long)ka->sa.sa_handler;
regs->eax =(unsigned long)sig;
regs->edx =regs->ecx = 0;
regs->xds = regs->xes = regs->xss=__USER_DS;
regs->xcs =__USER_CS;

The setup_frame() function resets the contents of the segment registers stored in the kernel mode stack to their default values before ending. The information needed by the signal handler is now at the top of the user mode stack. The setup_rt_frame() function is very similar to setup_frame(), but it stores the user mode stack in an extended frame (saved in the rt_sigframe data structure). This frame also contains the contents of the siginfo_t table related to the signal. Additionally, this function sets the pretcode field so that it points to the __kernel_rt_sigreturm code in the vsyscall page.

check signal sign

After establishing the user mode stack, the handle_signal() function checks the flag value related to the signal. If the signal does not set the SA_NODEFER flag, the signal corresponding to the sa_mask field in the sigaction table must be blocked during the execution of the signal handler:

if(!(ka->sa.sa_flags& SA_NODEFER)){
	spin_lock_irq(&current->sighand->siglock);
	sigorsets(&current->blocked,&current->blocked,&ka->sa.sa_mask);
	sigaddset(&current->blocked,sig);
	recalc_sigpending(current);
	spin_unlock_irq(&current->sighand->siglock);
}

As mentioned earlier, the recalc_sigpending() function checks whether the process has a non-blocking pending signal and sets its TIF_SIGPENDING flag accordingly. Then, handle_signal() returns to do_signal(), which also returns immediately.

Start executing the signal handler

When do_signal() returns, the current process resumes its execution in user mode. Due to the preparation of setup_frame() as mentioned above, the eip register points to the first instruction of the signal handler, and esp points to the first memory location of the frame that has been pushed to the top of the user mode stack. Therefore, the signal handler is executed.

Terminate signal handler

When the signal handler ends, the top address of the stack is returned, which points to the code in the vsyscall page referenced by the frame's pretcode field:

__kernel_sigreturn:
	popl %eax
	movl $__NR_sigreturn,%eax
	int $0x80

Therefore, the signal number (i.e., the sig field of the frame) is discarded from the stack, and the sigreturn() system call is called. The sys_sigreturn() function calculates the address of the data structure regs of type pt_regs, where pt_regs contains the hardware context of the user-mode process. From the value stored in the esp field, derive and check the address of the frame in the user mode stack:

frame =(struct sigframe *)(regs.esp - 8);
if(verify_area(VERIFY_READ,frame,sizeof(*frame)){
	force_sig(SIGSEGV,current);
	return 0;
}

The function then copies the bit array of the blocked signal before calling the signal handler from the frame's sc field to the current's blocked field. As a result, all signals that were blocked for execution of the signal handling function are unblocked. Then call the recalc_sigpending() function. At this time, the sys_sigreturn() function must copy the process hardware context from the sc field of the frame to the kernel mode stack and delete the frame from the user mode stack. These two tasks are completed by calling the restore_sigcontext() function
.

System calls like rt_sigqueueinfo() require the siginfo_t table related to the signal. If the signal is sent by such a system call, the implementation mechanism is very similar. The pretcode field of the extended frame points to the __kernel_rt_sigreturn code in the vsyscall page. It calls the rt_sigreturn() system call in turn. Its corresponding sys_rt_sigreturn() service routine copies the process hardware context from the extended frame to the kernel mode stack and passes it from the user The state stack deletes the extension frame to restore the original contents of the user state stack.

System call re-execution

The kernel cannot always satisfy the request issued by the system call immediately. When this happens, the process issuing the system call is put into the TASK_INTERRUPTIBLE or TASK_UNINTERRUPTIBLE state. If a process is in the TASK_INTERRUPTIBLE state and a process sends a signal to it,
the kernel puts the process into the TASK_RUNNING state without completing the system call. The signal is passed to the process when switching back to user mode. When this happens, the system call service routine does not complete its work but returns an EINTR, ERESTARTNOHAND, ERESTART_RESARTTBLOCK, ERESTARTSYS, or ERESTARTNOINTR error code.

In fact, the only error code the user-mode process gets in this case is EINTR, which indicates that the system call has not yet completed (the application writer can test this error code and decide whether to reissue the system call). The remaining error codes are used internally by the kernel to specify whether to automatically re-execute the system call after the signal handler ends. Table 11-11 lists the error codes associated with uncompleted system calls and the impact these error codes have on the three possible operations of the signal. The meanings of several terms that appear in the table entries are as follows:
Insert image description here
When passing a signal, the kernel must be sure that the process actually issued the system call before trying to re-execute a system call. This is where the orig_eax field of the regs hardware context plays an important role. Let's review how this field is initialized at the beginning of an interrupt or exception handler:
Insert image description here

Therefore, a non-negative number in the orig_eax field means that the signal has woken up a TASK_INTERRUPTIBLE process that was sleeping on the system call. The service routine recognizes that the system call has been interrupted and returns one of the error codes mentioned earlier.

Reexecute a system call interrupted by an uncaught signal

If a signal is explicitly ignored, or if its default operation has been forced, do_signal() analyzes the system call's error code and determines whether to re-automate the unfinished operation as described in Table 11-11. System calls. If system call execution must be restarted, do_signal() modifies the regs hardware context so that when the process returns to user mode, eip points to the int $0x80 instruction or sysenter instruction, and eax contains the system call number:

if(regs->orig_eax >= 0){
	if(regs->eax ==-ERESTARTNOHAND ll regs->eax ==-ERESTARTSYS II regs->eax ==-ERESTARTNOINTR){
		regs->eax = regs->orig_eax;
		regs->eip -= 2;
	}
	if(regs->eax ==-ERESTART_RESTARTBLOCK){
		regs->eax =__NR_restart_syscall;
		regs->eip -= 2;
	}
}

Assign the return code of the system call service routine to the regs->eax field. Note that int $0x80 and sysreturn are both two bytes long, so this function subtracts 2 from eip, making eip point to the instruction that caused the system call.

The ERESTART_RESARTBLOCK error code is special because the system call number of restart_syscall() is stored in the eax register. Therefore, the user mode process will not re-execute the same system call that was interrupted by the signal. This error code is only used for time-related system calls. When these system calls are re-executed, their user mode parameters should be adjusted. A typical example is the nanosleep() system call. Suppose the process calls nanosleep() to pause execution for 20ms, and a signal occurs 10ms later. If the system call is re-executed as usual (without adjusting its user mode parameters), the total time delay will exceed 30ms. Another way can be used. The service routine called by the nanosleep() system assigns the address of the specific service routine used when re-executing to the restart_block field in the current thread_info structure, and returns -ERESTART_RESARTBLOCK when interrupted.
The sys_restart_syscall() service routine only executes the specific nanosleep() service routine. Taking into account the time interval between the invocation of the original system call and the re-execution, the service routine adjusts this delay.

Re-execute the system call for the caught signal

If the signal is caught, then handle_signal() analyzes the error code and possibly the SA_RESTART flag of the sigaction table to determine whether the outstanding system call must be re-executed:

if(regs->orig_eax >= 0){
	switch(regs->eax){
	case -ERESTART_RESTARTBLOCK:
	case -ERESTARTNOHAND:
		regs->eax =-EINTR;
		break;
	case -ERESTARTSYS:
		if(!(ka->sa.sa_flags & SA_RESTART)){
			regs->eax =-EINTR;
			break;
		}
	/* fallthrough */
	case -ERESTARTNOINTR:
		regs->eax = regs->orig_eax;
		regs->eip -= 2;
	}
}

If the system call must be restarted, handle_signal() continues execution exactly like do_signal(); otherwise, it returns an error code -EINTR to the user-mode process.

System calls related to signal handling

kill() system call

Generally, the kill(pid,sig) system call is used to send signals to ordinary processes or multi-threaded applications, and the corresponding service routine is the sys_kill() function. The integer parameter pid has several meanings depending on its value:
Insert image description here

The sys_kill() function builds the minimum siginfo_t table for the signal, and then calls the kill_something_info() function:

info.si_signo = sig;
info.si_errno = 0;
info.si_code = SI_USER;
info._sifields._kill._pid = current->tgid;
info._sifields._kill._uid = current->uid;
return kill_something_info(sig,&info,pid);

kill_something_info also in turn calls kill_proc_info() (which sends a signal to a separate thread group via group_send_sig_info()), or kill_pg_info() (which scans all processes in the target process group and calls send_sig_info() for each process in the target process group) ), or call group_send_sig_info() repeatedly for all processes in the system (if pid equals -1).

The kill() system call can send any signal, even real-time signals numbered between 32 and 64. However, we have seen in the previous section "Generating Signals" that the kill() system call cannot ensure that a new element is added to the target process's pending signal queue, so multiple instances of pending signals may be lost. Real-time signals should be sent through the rt_sigqueueinfo() system call.

tkill() and tgkill() system calls

The tkill() and tgkill() system calls send signals to specified processes in a thread group. All pthread_kill() functions of the pthread library that follow the POSIX standard call either of these two functions to send a signal to the specified lightweight process.

The tkill() system call requires two parameters: the pid PID of the signal receiving process and the signal number sig. The sys_tkill() service routine assigns values to the siginfo table, obtains the process descriptor address, performs permission checks and calls specific_send_sig_info() to send signals.

The tgkill() system call is different from tkill(). tgkill() also requires a third parameter: the thread group ID (tgid) of the thread group where the signal receiving process is located. The sys_tgkill() service routine performs exactly the same operation as sys_tkill(), but also checks whether the signal receiving process actually belongs to the thread group tgid. This additional check resolves a race condition that occurs when sending a message to a process that is being killed: if another multi-threaded application is creating lightweight processes fast enough, the signal may be delivered to a Wrong process.
Because the thread group ID does not change throughout the lifetime of the multi-threaded application, the system call tgkill() solves this problem.

Change signal operations

The sigaction(sig,act,oact) system call allows the user to specify an action for a signal. Of course, if there is no custom signal operation, the kernel performs the default operation associated with the passed signal. The corresponding sys_sigaction() service routine acts on two parameters: the sig signal number and the act table of type old_sigaction (representing a new operation). The third optional output parameter oact can be used to obtain previous operations related to the signal. (The old_sigaction data structure includes the same fields as the sigaction structure, but the order of the fields is different). This function first checks the validity of the act address. Then fill the sa_handler, sa_flags and sa_mask fields of the new_ka local variable of type k_sigaction with the corresponding fields of *act:

_get_user(new_ka.sa.sa_handler,&act->sa_handler);
__get_user(new_ka.sa.sa_flags,&act->sa_flags);
-_get_user(mask,&act->sa_mask);
siginitset(&new_ka.sa.sa_mask,mask);

The function also calls do_sigaction() to copy the new new_ka table to the current->sig->action table entry at the sig-1 position (the number of the signal is greater than the position in the array, because there is no 0 signal):

k =&current->sig->action[sig-1];
if(act){
	*k =*act;
	sigdelsetmask(&k->sa.sa_mask, sigmask(SIGKILL)I sigmask(SIGSTOP));
	if(k->sa.sa_handler== SIG_IGN  II (k->sa.sa_handler == SIG_DFL && (sig==SIGCONT lI sig==SIGCHLD lI sig==SIGWINCH II sig==SIGURG))){
		rm_from_queue(sigmask(sig),&current->signal->shared_pending);
		t = current;
		do {
			rm_from_queue(sigmask(sig),&current->pending);
			recalc_sigpending_tsk(t);
			t = next_thread(t);
		} while(t != current〉;
	}
}

The POSIX standard stipulates that when the default operation is "Ignore", setting the signal operation to SIG_IGN or SIG_DFL will cause any pending signal of the same type to be discarded. Also note that for signal handlers, SIGKILL and SIGSTOP are never blocked regardless of the signal requested to be blocked. The sigaction() system call also allows the user to initialize the sa_flags field of the table sigaction. In Table 11-6 (earlier in this chapter), we list the possible values for this field and their associated meanings.

Check for pending blocking signals

The sigpending() system call allows a process to examine the set of pending blocking signals, that is, those signals that have been generated while the signal is blocked. The corresponding service routine sys_sigpending() only acts on one parameter set, which is the address of the user variable. The bit array must be copied to this variable:

sigorsets(&pending, &current->pending.signal, &current->signal->shared_pending.signal);
sigandsets(&pending, &current->blocked, &pending);
copy_to_user(set, &pending, 4);

Modify the set of blocking signals

The sigprocmask() system call allows a process to modify the set of blocking signals. This system call only applies to regular signals (non-real-time signals). The corresponding sys_sigprocmask() service routine acts on three parameters:
Insert image description here

sys_sigprocmask() calls copy_from_user() to copy the value pointed to by the set parameter to the local variable new_set, and copies the bitmask array of the current standard blocking signal to the old_set local in variables.
Then specify the values of these two variables according to the how flag:

if(copy_from_user(&new_set, set, sizeof(*set)))
	return -EFAULT;
new_set &=~(sigmask(SIGKILL) I sigmask(SIGSTOP));
old_set = current->blocked.sig[0];
if(how == SIG_BLOCK)
	sigaddsetmask(&current->blocked, new_set);
else if(how == SIG_UNBLOCK)
	sigdelsetmask(&current ->blocked, new_set);
else if(how == SIG_SETMASK)
	current->blocked.sig[0]= new_set;
else
	return -EINVAL;
recalc_sigpending(current);
if(oset && copy_to_user(oset, &old_set, sizeof(*oset)))
	return -EFAULT;
return 0;

hang process

The sigsuspend() system call sets the process to the TASK_INTERRUPTIBLE state. Of course, this is set after blocking the standard signal specified by the bitmask array pointed to by the mask parameter. The process is awakened only when a non-ignorable, non-blocking signal is sent to the process. The corresponding sys_sigsuspend() service routine executes these statements:

mask &=~(sigmask(SIGKILL) | sigmask(SIGSTOP));
saveset = current->blocked;
siginitset(&current->blocked, mask);
recalc_sigpending(current);
regs->eax =-EINTR;
while (1)(
	current->state = TASK_INTERRUPTIBLE;
	schedule(   );
	if(do_signal(regs, &saveset))
		return -EINTR;
}

The schedule() function selects another process to run. When the process that issued the sigsuspend() system call starts executing again, sys_sigsuspend() calls the do_signal() function to deliver the signal that woke up the process. If the return value of do_signal() is 1, the signal is not ignored. Therefore, this system call terminates after returning -EINTR error code. The sigsuspend() system call may seem redundant, since the combined execution of sigprocmask() and sleep() apparently produces the same effect. But this is not correct: this is because processes may be interleaved at any time, and you must be aware that calling one system call to perform operation A, followed by another system call to perform operation B, is not the same as calling a single system call Perform operation A, then perform operation B.

In this special case, sigprocmask() can unblock the passed signal before calling sleep(). If this happens, the process can stay in the TASK_INTERRUPTIBLE state, waiting for the signal to be delivered. On the other hand, after unblocking and before schedule() is called, the sigsuspend() system call does not allow the signal to be sent because other processes cannot obtain the CPU during this time interval.

System calls for real-time signals

Because the system calls mentioned previously only apply to standard signals, additional system calls must be introduced to allow user-mode processes to handle real-time signals. Several system calls for real-time signals (rt_sigaction(), rt_sigpending(), rt_sigprocmask(), and rt_sigsuspend()) are similar to those described previously and will not be discussed further. For the same reason, we will not further discuss the two system calls that handle real-time signal queues:
Insert image description here