[Reading notes] Linux kernel design and implementation-process management

1. Process

A process is a program in execution (the target code is stored on a storage medium).

ps: The process is not limited to a section of executable code (also called code section, text section). Usually the process also contains other resources, such as: open files, suspended semaphores.

An execution thread, or thread for short, is an object that is active in a process. Each thread has an independent program counter, process stack and a set of process registers.

The object scheduled by the kernel is a thread, not a process.

ps: The thread implementation of the Linux system is very special: it does not make a special distinction between threads and processes. That is, a thread is nothing but a special process.

2. Process descriptor and task structure

The kernel stores the list of processes in a doubly circular linked list called a task list.
Each item in the linked list is of type task_struct, called a process descriptor (process descriptor) structure, defined in <linux / sched.h>.
The process descriptor contains all information about a specific process.

The data contained in the process descriptor can completely describe a program that is being executed: the file it opens, the address space of the process, pending signals, the state of the process, etc.
Insert picture description here

2.1 Assigning process descriptors

Insert picture description here

2.2 Storage of process descriptors-PID

The kernel identifies each process by a unique process identification value or PID.
PID is a number, expressed as the implicit type of pid_t, which is actually an int type.
PID is actually the maximum number of concurrent processes allowed in the system.
The maximum default value of PID is 32768 (limited by the maximum value of PID defined in <linux / thread.h>), which can be viewed through / proc / sys / kernel / pid_max.

2.3 Process status

The state field in the process descriptor describes the current state of the process. The following five types :

  1. TASK_RUNNING (Run R)-The process is executable: it is either executing or waiting to be executed in the run queue.
  2. TASK_INTERRUPTIBLE (interruptible S)-the process is sleeping (blocked), waiting for certain conditions to be fulfilled.
  3. TASK_UNINTERRUPTIBLE (Uninterruptible D)-This state is the same as the interruptible state except that it will not be awakened or ready to be put into operation even if a signal is received.
  4. __TASK_TRACED (Z)-Processes tracked by other processes, such as the tracer process process via ptrace.
  5. __TASK_STOPPED (stop T)-the process stops executing; the process is not put into operation nor can it be put into operation. Usually this state occurs when SIGSTOP, SIGTSTP, SIGTTIN, SIGTTOU and other signals are received. In addition, any signal received during debugging will cause the process to enter this state.

Insert picture description here

2.4 Set the current process state – set_task_state

set_task_state(task,state);  /*等价于*/
task->state = state;

ps:

set_current_state(state); /* 等价于,参考<linux/sched.h> */
set_task_state(current,state);

2.5 Process context

Executable program code is an important part of the process.
These codes are loaded from an executable file into the process's address space for execution.
General programs are executed in user space. When a program call executes a system call or triggers an exception, it falls into kernel space .
At this point, we call the kernel "executed on behalf of the process" and in the context of the process .

2.6 Process family tree – all processes are descendants of the init process with PID 1

The relationship between processes is stored in the process descriptor.
Each task_struct contains a pointer to its parent process tast_struct, j called parent, and also contains a list of child processes called children.

Q: How to get the process descriptor of the parent process?
A:struct task_struct *my_parent = current->parent;

Q: How to access the child process?
A:

struct task_struct *task;
struct list_head *list;

list_for_each(list,&current->children)
{
	task = list_entry(list,struct task_struct,sibling);
}

The process descriptor of the init process is statically allocated as init_task.
Such as:

struct task_struct *task;
for(task = current;task != &init_task; task = task->parent);
/*task 现在指向init*/

ps: The
for_each_process (task) macro provides the ability to sequentially access the entire task queue (two-way circular linked list), but the cost of traversing all processes through repetition in a system with a large number of processes is very high.
Therefore, if there is no good reason (or there is no other way), don't do it.
eg:

struct task_struct *task;
for_each_process(task){
	/*打印出每一个任务(进程)的名称和PID*/
	printk("%s[%d]\n",task->comm,task->pid);
}

3. Process creation-fork and exec family functions

fork () creates a child process by copying the current process.
The exec family functions copy and read the executable file and load it into the address space to start running.

3.1 Copy-on-write-one of the reasons why Linux has the ability to execute processes quickly

COW technology refers to the resource copy only when it needs to be written, before that, it was only shared in a read-only manner (that is, the parent and child processes share the same copy).

3.2 fork()

Linux implements fork () through the clone () system call.
The following is the general call flow:

fork()->clone()->do_fork()->copy_process()

ps: do_fork () is defined in kernel / fork.c file. (May exist in different paths with different kernel versions) The
do_fork function calls the copy_process function, and then allows the city to start running.
Copy_process function workflow is as follows:

  1. Call dup_task_struct () to create a kernel stack, thread_info structure, and task_struct for the new process. These values ​​are the same as the current values ​​entered into the city. At this time, the descriptors of the child process and the parent process are exactly the same.
  2. Check to make sure that after newly creating this subprocess, the number of processes owned by the current user does not exceed the limit of resources allocated to it.
  3. The child process sets out to distinguish itself from the parent process. Many members in the process descriptor must be cleared or set to initial values. Those who are not inherited members of the entry descriptor are mainly statistical information. Most of the data in task_struct remains unmodified.
  4. The state of the child process is set to TASK_UNINTERRUPTIBLE to ensure that it will not be put into operation.
  5. copy_process () calls copy_flags () to update the flags member of task_struct. The PF_SUPERPRIV flag indicating whether the process has superuser authority is cleared to 0. The PF_FORKNOEXEC flag indicating that the process has not yet called the exec () function is set.
  6. Call alloc_pid () to assign a valid PID to the new process.
  7. According to the parameter flags passed to clone (), copy_process () copies or shares open files, file system information, signal processing functions, process address space, and namespace. Under normal circumstances, these resources will be shared by all threads of a given process, otherwise, these resources are different for each process, so they are copied here (COW).
  8. Finally, copy_process () does the tail-cleaning work and returns a pointer to the child process.

Then go back to the do_fork () function. If the copy_process () function returns successfully, the newly created child process is woken up and put into operation.
The kernel deliberately chooses to execute the child process first (which is not always the case). Generally, the child process will immediately call the exec family function, which can avoid the extra overhead of copying when writing. If the parent process executes first, it may start to write to the address space.

3.3 vfork()

The vfork () system call has the same function as fork () except that it does not copy the page table entry of the parent process.
The child process runs as a separate thread of the parent process in its address space. The parent process is blocked until the child process exits or executes exec ().
Child processes cannot write to the address space.

ps: Ideally, the system should not call vfork (), nor does the kernel need to implement it.

4. Thread implementation in Linux

The threading mechanism provides a set of threads running in a shared memory address space within the same program.
Linux treats all threads as processes.
A thread is simply regarded as a process that shares certain resources with other processes . (Each thread has its own task_struct, so in the kernel, it looks like an ordinary process, but the thread and some other processes share certain resources, such as address space)

4.1 Create a thread

The creation of threads is similar to the creation of ordinary processes, except that when calling clone (), you need to pass some parameter flags to indicate the resources that need to be shared:

clone(CLONE_VM | CLONE_FS | CLONE_FILES | CLONE_SIGHAND,0);  /*结果和调用fork()差不多,只是父子俩共享地址空间,文件系统资源,文件描述符和信号处理程序*/

The parameter flag passed to the clone function determines the behavior of the newly created process and the type of shared resources between the parent and child processes.
As shown in the following table: (<linux / sched.h>)

Parameter flag meaning
CLONE_FILES Parent and child processes share open files
CLONE_FS Parent-child process shared file system information
CLONE_IDLETASK Set PID to 0 (only used by idle processes)
CLONE_NEWNS Create a new namespace for the child process
CLONE_PARENT Specifies that the child process and the parent process have the same parent process
CLONE_PTRACE Continue to debug the child process
CLONE_SETTID Write TID back to user space
CLONE_SETTLS Create new TLS for child process
CLONE_SIGHAND Father and son processes share signal processing functions and blocked signals
CLONE_SYSVSEM Father and son processes share System V SEM_UNDO semantics
CLONE_THREAD The parent and child processes are put into the same thread group
CLONE_VFORK Vfork () is called, so the parent process is ready to sleep and wait for the child process to wake it up
CLONE_UNTRACED Prevent the tracking process from enforcing CLONE_PTRACE in the child process
CLONE_STOP Start the process in TASK_STOPPED state
CLONE_SETTLS Create a new TLS (thread-local storage) for the child process
CLONE_CHILD_CLEARTID Clear the TID of the child process
CLONE_CHILD_SETTID Set the TID of the child process
CLONE_PARENT_SETTID Set the TID of the parent process
CLONE_VM Parent-child process shared address space

ps:
The concept of idle process:
Simply say idle is a process, and its pid number is 0. Its predecessor was the first process created by the system, and the only process that was not spawned by fork (). In the smp system, each processor unit has an independent run queue, and each run queue has an idle process, that is, as many processor units as there are idle processes. The idle time of the system actually refers to the "running time" of the idle process. The idle process pid == o, which is init_task.

4.2 Kernel threads – standard processes that run independently in kernel space

The difference between kernel threads and ordinary processes is that kernel threads do not have an independent address space (actually, the mm pointer to the address space is set to NULL).
They only run in kernel space and never switch to user space.
The kernel process and the ordinary process move far away and can be called and preempted.

5. Process termination – do_exit ()

do_exit () is called by the system call exit (), defined in kernel / exit.c, and does the following:

  1. Set the flag member in task_struct (written as tast_struct in the book) to PF_EXITING.
  2. Call del_timer_sync () to delete any core timer. Based on the results returned, it ensures that no timers are queued and no timer handlers are running.
  3. If the BSD process accounting function is enabled, do_exit () calls acct_update_integrals () to output accounting information.
  4. Then call the exit_mm () function to release the mm_struct occupied by the process. If no other process uses them (that is, this address space is not shared), release them completely.
  5. Next call the sem__exit () function. If the process queues for the IPC signal, it leaves the queue.
  6. Call exit_files () and exit_fs () to decrement the reference count of file descriptors and file system data, respectively. If the value of one of the reference counts drops to zero, it means that no process is using the corresponding resources and can be released at this time.
  7. Then set the task exit code stored in the exit_code member of task_struct as the exit code provided by exit (), or complete any other exit actions specified by the kernel mechanism. The exit code is stored here for the parent process to retrieve at any time.
  8. Call exit_notify () sends a signal to the parent process, the child process to re-find adoptive father, adoptive father to other threads in the thread group or the init process, and the process status (stored in exit_state task_struct structure) is set to EXIT_ZOMBIE .
  9. do_exit () calls schedule () to switch to the new process. Because the process in the EXIT_ZOMBIE state will no longer be scheduled, this is the last piece of code executed by the process and do_exit () never returns.

ps: If the process is the only user of these resources, the process cannot be run (in fact, there is no address space for it to run) and is in the EXIT_ZOMBIE exit state. All the memory it occupies is the kernel stack, thread_info structure and task_struct structure.
The sole purpose of the process at this time is to provide information to its parent process. After the parent process retrieves the information, or notifies the kernel that the information is irrelevant, the remaining memory held by the process is released and returned to the system.

5.1 Delete process descriptor-wait family function

The cleanup work required at the end of the process and the deletion of the process descriptor are performed separately .
The functions of the wait () family are implemented by the only system call wait4 (). Its standard action is to suspend the calling process until one of the child processes exits, at which point the function returns the PID of the child process.

When it is finally necessary to release the process descriptor, release_task () will be called, the process is as follows:

  1. It calls __exit_signal (), which calls __unhash_process (), which in turn calls detach_pid () to delete the process from pidhash, and also deletes the process from the task list.
  2. __exit_signal () releases all remaining resources used in the current zombie process, and the process is finally counted and recorded.
  3. If this process is the last process in the thread group, and the lead process is dead, then release_task () will notify the parent process of the zombie lead process.
  4. release_task () calls put_task_struct () to release the page occupied by the process kernel stack and thread_info structure, and releases the slab cache occupied by task_struct.

5.2 The dilemma caused by the orphan process

If the parent process exits before the child process , there must be a mechanism to ensure that the child process can find a new father, otherwise these orphaned processes will always be in a dead state when they exit, wasting resources.
This mechanism is to find a thread as the father in the current thread group for the child process, if not, let the init process be their parent process .
The ps: init process will routinely call wait () to check its child processes and remove all related zombie processes.

Published 91 original articles · praised 17 · 50,000+ views

Guess you like

Origin blog.csdn.net/qq_23327993/article/details/105065705