Linux Kernel Subsystem--Process Management Analysis

Linux is a very dynamic system with ever-changing computing needs. The representation of Linux computing needs centers around the common abstraction of processes, which can be short-lived (commands executed from the command line) or long-lived (network services). Therefore, the overall management of processes and their scheduling is very important.

In user space, a process is represented by a process identifier (PID). From the user's point of view, a PID is a numerical value that uniquely identifies a process. PIDs don't change during the lifetime of a process, but PIDs can be reused after a process terminates, so caching them isn't always ideal.

In user space, you can create processes in several ways. You can execute a program (which causes a new process to be created), or within a program, you can call the fork or exec system calls. A fork call causes a child process to be created, while an exec call replaces the current process context with a new program. I'll discuss each method to see how they work.

In this post, I first show the kernel representation of processes and how they are managed within the kernel, then review the various methods of creating and scheduling processes on one or more processors, and finally discuss what happens if they die , to build a description of the process.

process representation

In the Linux kernel, a process is represented by a fairly large structure called a task_struct. This structure contains all the necessary data to represent the process, and a lot of other data for accounting and maintaining relationships with other processes (parent and child processes). A complete description of task_struct is beyond the scope of this article, but a portion of task_struct is shown in Listing 1. This code contains the specific elements discussed in this article. Note that task_struct is located in ./linux/include/linux/sched.h.

/* task_struct部分代码 */
struct task_struct {
    
    
    volatile long state;
    void ∗stack;
    unsigned int flags;
    
    int prio, static_prio;
    struct list_head tasks;
    struct mm_struct ∗mm, ∗active_mm;

    pid_t pid;
    pid_t tgid;

    struct task_struct ∗real_parent;
    char comm[TASK_COMM_LEN];
    struct thread_struct thread;
    struct files_struct ∗files;
    ...
};

In the code snippet above, you can see several items that you would expect, such as execution state, stack, a set of flags, parent process, execution thread (can have multiple), and open files. These are discussed later in this article, but here are some of them briefly. A state variable is a set of bits that indicate the state of a task. The most common states indicate that the process is running or about to run in the run queue (TASK_RUNNING), sleeping (TASK_INTERRUPTIBLE), sleeping but unable to wake up (TASK_UNINTERRUPTIBLE), stopped (TASK_STOPPED), or some other state. A complete list of these flags can be found in ./linux/include/linux/sched.h.

The flags word defines a large number of indicators, indicating everything from whether the process is being created (PF_STARTING) or exiting (PF_EXITING), or even whether the process is currently allocating memory (PF_MEMALLOC). The name of the executable (not including the path) occupies the comm (command) field.

Each process is also given a priority (called static_prio), but the actual priority of a process is determined dynamically based on load and other factors. The lower the priority value, the higher its actual priority.

The task field provides linked list functionality. It contains a prev pointer (to the previous task) and a next pointer (to the next task).

The address space of a process is represented by the mm and active_mm fields. mm represents the memory descriptor of the process, while active_mm is the memory descriptor of the previous process (optimization to improve context switch time).

Finally, thread_struct identifies the stored state of the process. This element depends on the specific architecture on which Linux is running, and you can see examples of it in ./linux/include/asm-i386/processor.h. In this structure, you will find the storage (hardware registers, program counter, etc.) of the process when it switches from execution context.

process management

Now, let's explore how to manage processes in Linux. In most cases, processes are created dynamically and represented by a dynamically allocated task_struct. One exception is the init process itself, which always exists and is represented by a statically allocated task_struct. You can see an example of this in ./linux/arch/i386/kernel/init_task.c.

All processes in Linux are collected in two different ways. The first is a hash table, which is hashed through the PID value; the second is a circular doubly linked list. Circular linked lists are great for iterating over lists of tasks. Since the linked list is circular, there is no head or tail; but since the init_task is always there, you can use it as an anchor for further iterations. Let's look at an example to walk through the current set of tasks.

The task list is not accessible from userspace, but you can easily work around that by inserting code into the kernel as a module. The code snippet below shows a very simple program that iterates over a list of tasks and provides a small amount of information about each task (name, pid, and parent name). Note that this module uses printk to output content. To see the output, you need to use the cat utility to view the /var/log/messages file (or live tail -f /var/log/messages). The next_task function is a macro in sched.h that simplifies iterating over the task list (returns a task_struct reference to the next task).

/* 用于发出任务信息的简单内核模块(procsview.c) */
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/sched.h>

int init_module( void )
{
    
    
  /∗ Set up the anchor point ∗/
  struct task_struct ∗task = &init_task;

  /∗ Walk through the task list, until we hit the init_task again ∗/
  do {
    
    
        printk( KERN_INFO "∗∗∗ %s [%d] parent %s\n",
        task‑>comm, task‑>pid, task‑>parent‑>comm );
  } while ( (task = next_task(task)) != &init_task );

  return 0;
}

void cleanup_module( void )
{
    
    
  return;
}

This module can be compiled using the Makefile shown in the code below. After compiling, you can use the insmod procsview.ko command to insert kernel objects, and use the rmmod procsview command to remove them.


obj‑m += procsview.o

KDIR := /lib/modules/$(shell uname ‑r)/build
PWD := $(shell pwd)

default:
    $(MAKE) ‑C $(KDIR) SUBDIRS=$(PWD) modules

process creation

Now let's look at the process of creating a process from user space. The underlying mechanism for userspace tasks and kernel tasks is the same, since both ultimately rely on a function called do_fork to create new processes. In the case of creating a kernel thread, the kernel calls a function called kernel_thread (see ./linux/arch/i386/kernel/process.c), which performs some initialization and then calls do_fork.

A similar operation occurs for the creation of userspace processes. In user space, a program calls fork, which results in a system call to a kernel function named sys_fork (see ./linux/arch/i386/kernel/process.c). The functional relationship is shown in Figure 1.

                                          图1 用于进程创建的函数层次结构

From Figure 1, you can see that do_fork provides the basis for process creation. You can find the do_fork function (and the companion function copy_process) in ./linux/kernel/fork.c.
The do_fork function first calls alloc_pidmap to allocate a new PID. Next, do_fork checks to see if the debugger is tracing the parent process. If so, set the CLONE_PTRACE flag in clone_flags in preparation for forking. The do_fork function then proceeds to call copy_process, passing flags, stack, registers, the parent process, and the newly allocated PID.

The copy_process function is where the new process is created as a copy of the parent process. This function does everything except start the process, which is handled later. The first step of copy_process is to verify the CLONE flags to make sure they are consistent. If not, an EINVAL error is returned. Next, the Linux Security Module (LSM) is consulted to see if the current task can create a new task. To learn more about LSM in a Security-Enhanced Linux (SELinux) environment, check out the resources section.

Next, the dup_task_struct function (located in ./linux/kernel/fork.c) is called, which allocates a new task_struct and copies the current process's descriptor into it. After building the new thread stack, some state information is initialized and control returns to copy_process. Back to copy_process, among other restrictions and safety checks, some housekeeping is performed, including various initializations of the new task_struct. A series of copy functions are then called to copy every aspect of the process, from copying open file descriptors (copy_files), copying signal information (copy_sighand and copy_signal), copying process memory (copy_mm) and finally copying threads (copy_thread).

Then, new tasks are assigned to processors, and some additional checks are done based on which processors (cpus_allowed) the process is allowed to execute on. After the priority of the new process inherits that of the parent process, a small amount of additional housekeeping is performed and control is returned to do_fork. At this point, your new process exists but is not yet running. The do_fork function fixes this by calling wake_up_new_task. This function, which you can find in ./linux/kernel/sched.c, initializes some scheduler housekeeping, puts the new process in the run queue, and then wakes it up for execution. Finally, when do_fork returns, the PID value is returned to the caller and the process is complete.

process scheduling

Although a process exists in Linux, it can be scheduled through the Linux scheduler. Although beyond the scope of this article, the Linux scheduler maintains a set of lists for each priority that contain task_struct references. Tasks are invoked through the scheduling function (provided in ./linux/kernel/sched.c), which determines the best process to run based on loading and previous process execution history. You can learn more about the Linux version 2.6 scheduler in the resources section on the right.

process destroyed

Process destruction can be driven by several events – normal process termination, via a signal, or by calling an exit function. However, process exit is driven, and the process ends by calling the kernel function do_exit (available in ./linux/kernel/exit.c). This process is shown in Figure 2.
![Figure 2
Function Hierarchy of Process Destruction] Alt
The purpose behind do_exit is to remove all references to the current process (for all unshared resources) from the operating system. Destroying a process first indicates that the process is exiting by setting the PF_EXITING flag. Other aspects of the kernel use this directive to avoid manipulating the process when it is deleted. The loop that separates a process from the various resources it acquires during its lifetime is executed through a series of calls, from exit_mm (for deleting memory pages) to exit_keys (for handling per-thread session and process security keys). The do_exit function performs various statistics on the disposition of the process, and then performs a series of notifications (for example, signaling to the parent process that the child process is exiting) by calling exit_notify. Finally, the process status becomes PF_DEAD, and the schedule function is called to select a new process for execution. Note that if a signal needs to be sent to the parent (or a process is being tracked), the task will not disappear entirely. Calling release_task actually reclaims the memory used by the process if there is no need to signal it.

Guess you like

Origin blog.csdn.net/hhhlizhao/article/details/131873226