The first assignment: Based on the source code of Linux Kernel 2.6, analyze its process model

[TOC]

1 Introduction

The content of this article is based on the source code of Linux Kernel 2.6, and its process model is deeply analyzed. The main contents of the analysis are as follows:

  • What is a process (the concept of a process)

  • How the operating system organizes processes
  • How to transition between process states
  • How processes are scheduled
  • Views on the operating system process model

2. What is a process

Since we are going to analyze the process model of the operating system, we first need to understand what a process is.

First of all, let's take a look at the definition of process (Process) on Baidu :

A process is a running activity of a program in a computer on a data set, the basic unit of resource allocation and scheduling in the system, and the basis of the operating system structure. In the early computer structure of process-oriented design, the process is the basic execution entity of the program; in the contemporary computer structure of thread-oriented design, the process is the container of the thread. A program is a description of instructions, data, and their organization, and a process is the entity of a program.

Next, let's look at the definition of process (Process) on Wikipedia :

In computing, a process is an instance of a computer program that is being executed. It contains the program code and its current activity. Depending on the operating system (OS), a process may be made up of multiple threads of execution that execute instructions concurrently.

A wave of Google translation, the content is as follows:

In computing, a process is an instance of an executing computer program. It contains program code and its current activity. Depending on the operating system (OS), a process may consist of multiple threads of execution executing instructions concurrently.

After we have some basic understanding of the process, we can proceed to the next step of analysis.

3. How the operating system organizes processes

In the Linux system, a process /linux/include/linux/sched.his defined in the header file as task_structa structure, and its instantiation is a process, which task_structis composed of many elements. Some important elements are listed below for analysis.

  • Identifier : A unique identifier associated with a process that distinguishes the executing process from other processes.
  • State : Describes the state of the process. Because the process has several states such as suspended, blocked, and running, there is an identifier to record the execution state of the process.
  • Priority : If there are several processes being executed, it involves the order in which the processes are executed, which is related to the identifier of the process priority.
  • Program Counter : The address of the next instruction in the program that is about to be executed.
  • Memory pointers : pointers to program code and process-related data.
  • Context data : The data in the processor's registers when the process is executing.
  • I/O status information : including displayed I/O requests, I/O devices allocated to the process, and a list of files used by the process.
  • Accounting information : including the total time of the processor, account number and so on.

3.1 Process status (STATE)

In the task_structstructure, the state statement that defines the process is

volatile long state;    /* -1 unrunnable, 0 runnable, >0 stopped */

valatileThe function of the keyword is to ensure that this instruction will not be omitted due to the optimization of the compiler, and it is required to read the value directly each time, which ensures the stability of real-time access to the process.

/linux/include/linux/sched.hThe possible values ​​we can find for the process in the header file are stateas follows

/*

* Task state bitmask. NOTE! These bits are also

* encoded in fs/proc/array.c: get_task_state().

* We have two separate sets of flags: task->state

* is about runnability, while task->exit_state are

* the task exiting. Confusing, but this way

* modifying one set can't modify the other one by

* mistake.
*/
define TASK_RUNNING 0
define TASK_INTERRUPTIBLE 1
define TASK_UNINTERRUPTIBLE 2
define TASK_STOPPED 4
define TASK_TRACED 8

/* in tsk->exit_state */
define EXIT_ZOMBIE 16
define EXIT_DEAD 32

/* in tsk->state again */
define TASK_NONINTERACTIVE 64
define TASK_DEAD 128

According to statethe following comments, it can be obtained that when state<0, it means that the process is in an inoperable state, when state=0, it means that the process is running, and when state>0, it means that the process is in a stopped running state .

The following lists some common values ​​of state

condition describe
0(TASK_RUNNING) Process is running or ready to run
1(TASK_INTERRUPTIBLE) The process is in an interruptible sleep state and can be woken up by a signal
2(TASK_UNINTERRUPTIBLE) The process is in an uninterruptible sleep state and cannot be woken up by a signal
4( TASK_STOPPED) process is stopped
8( TASK_TRACED) Process is monitored
16( EXIT_ZOMBIE) Zombie state process, which means that the process is terminated, but its parent program has not obtained the information that it was terminated.
32(EXIT_DEAD) Process dies, this state is the final state of the process

3.2 Process Identifier (PID)

pid_t pid; /*进程的唯一表示*/
pid_t tgid; /*进程组的标识符*/

In Linux systems, all threads in a thread group use the same PID as the thread group's leader thread (the first lightweight process in the group) and are stored in the tgid member. Only the pid member of the thread group's lead thread will be set to the same value as tgid. Note that the getpid() system call returns the tgid value of the current process instead of the pid value. (A thread is the smallest unit in which a program runs, and a process is the basic unit in which a program runs.)

3.3 Process Flags (FLAGS)

unsigned int flags; /* per process flags, defined below */

Information that reflects the state of the process, but not the running state, which is used by the kernel to identify the current state of the process for the next step.

The possible values ​​of the flags member are as follows, these macros start with PF(ProcessFlag)

/*
 * Per process flags
 */
#define PF_ALIGNWARN    0x00000001  /* Print alignment warning msgs */
                    /* Not implemented yet, only for 486*/
#define PF_STARTING 0x00000002  /* being created */
#define PF_EXITING  0x00000004  /* getting shut down */
#define PF_EXITPIDONE   0x00000008  /* pi exit done on shut down */
#define PF_FORKNOEXEC   0x00000040  /* forked but didn't exec */
#define PF_SUPERPRIV    0x00000100  /* used super-user privileges */
#define PF_DUMPCORE 0x00000200  /* dumped core */
#define PF_SIGNALED 0x00000400  /* killed by a signal */
#define PF_MEMALLOC 0x00000800  /* Allocating memory */
#define PF_FLUSHER  0x00001000  /* responsible for disk writeback */
#define PF_USED_MATH    0x00002000  /* if unset the fpu must be initialized before use */
#define PF_NOFREEZE 0x00008000  /* this thread should not be frozen */
#define PF_FROZEN   0x00010000  /* frozen for system suspend */
#define PF_FSTRANS  0x00020000  /* inside a filesystem transaction */
#define PF_KSWAPD   0x00040000  /* I am kswapd */
#define PF_SWAPOFF  0x00080000  /* I am in swapoff */
#define PF_LESS_THROTTLE 0x00100000 /* Throttle me less: I clean memory */
#define PF_BORROWED_MM  0x00200000  /* I am a kthread doing use_mm */
#define PF_RANDOMIZE    0x00400000  /* randomize virtual address space */
#define PF_SWAPWRITE    0x00800000  /* Allowed to write to swap */
#define PF_SPREAD_PAGE  0x01000000  /* Spread page cache over cpuset */
#define PF_SPREAD_SLAB  0x02000000  /* Spread some slab caches over cpuset */
#define PF_MEMPOLICY    0x10000000  /* Non-default NUMA mempolicy */
#define PF_MUTEX_TESTER 0x20000000  /* Thread belongs to the rt mutex tester */
#define PF_FREEZER_SKIP 0x40000000  /* Freezer should not count it as freezeable */

3.4 Relationship between processes

/* 
 * pointers to (original) parent process, youngest child, younger sibling,
 * older sibling, respectively.  (p->father can be replaced with 
 * p->parent->pid)
 */
struct task_struct *real_parent; /* real parent process (when being debugged) */
struct task_struct *parent; /* parent process */
/*
 * children/sibling forms the list of my children plus the
 * tasks I'm ptracing.
 */
struct list_head children;  /* list of my children */
struct list_head sibling;   /* linkage in my parent's children list */
struct task_struct *group_leader;   /* threadgroup leader */

In a Linux system, all processes are directly or indirectly linked, each process has its parent process, and may also have zero or more child processes. All processes that have the same parent process are siblings.

real_parent points to its parent process, or to the init process with PID 1 if the parent process that created it no longer exists. parent points to its parent process, and when it terminates, it must send a signal to its parent process. Its value is usually the same as real_parent. Children represents the head of the linked list, and all elements in the linked list are its child processes (the linked list of child processes of the process). sibling is used to insert the current process into the sibling linked list (the sibling linked list of the process). group_leader points to the leader process of its process group.

3.5 Process Scheduling

3.5.1 Priority

    int prio, static_prio, normal_prio;
    unsigned int rt_priority;
/*
    prio: 用于保存动态优先级
    static_prio: 用于保存静态优先级, 可以通过nice系统调用来修改
    normal_prio: 它的值取决于静态优先级和调度策略
    priort_priority: 用于保存实时优先级
*/

3.5.2 Scheduling Policy

unsigned int policy;
cpumask_t cpus_allowed;
/*
    policy: 表示进程的调度策略
    cpus_allowed: 用于控制进程可以在哪个处理器上运行
*/

policyIndicates the process scheduling strategy, currently there are the following five strategies

/*
 * Scheduling policies
 */
#define SCHED_NORMAL    0 //按优先级进行调度
#define SCHED_FIFO      1 //先进先出的调度算法
#define SCHED_RR        2 //时间片轮转的调度算法
#define SCHED_BATCH     3 //用于非交互的处理机消耗型的进程
#define SCHED_IDLE        5//系统负载很低时的调度算法 
field describe The class of the scheduler
SCHED_NORMAL (also called SCHED_OTHER) for normal processes, implemented by the CFS scheduler. SCHED_BATCH is used for non-interactive processor-consuming processes. SCHED_IDLE is used when the system load is very low CFS
SCHED_FIFO First-in, first-out scheduling algorithm (real-time scheduling strategy), tasks with the same priority are served first, and tasks with high priority can preempt tasks with low priority RT
SCHED_RR Rotational scheduling algorithm (real-time scheduling strategy), the latter provides Roound-Robin semantics and uses time slices. Tasks with the same priority will be placed at the end of the queue when the time slice is used up to ensure fairness. Similarly, high-priority tasks Low priority tasks can be preempted. Real-time tasks with different requirements can use the sched_setscheduler() API to set policies as needed RT
SCHED_BATCH A differentiated version of the SCHED_NORMAL normal process strategy. The time-sharing strategy is adopted, and CPU computing resources are allocated according to the dynamic priority (which can be set by the nice() API). Note: This type of process has a lower priority than the above two types of real-time processes. In other words, when a real-time process exists, the real-time process is scheduled first. but optimized for throughput CFS
SCHED_IDLE The priority is the lowest, and this type of process is only run when the system is idle (such as the use of idle computer resources to run tasks such as extraterrestrial civilization search, protein structure analysis, etc., which are suitable for this scheduling strategy) CFS

3.6 The address space of a process

Processes have their own resources. These resources refer to the address space of the process. Each process has its own address space. In task_struct, the definition of the process address space is as follows:

struct mm_struct *mm, *active_mm;

/*
    mm: 进程所拥有的用户空间内存描述符,内核线程无的mm为NULL
    active_mm: active_mm指向进程运行时所使用的内存描述符, 对于普通进程而言,这两个指针变量的值相同。但是内核线程kernel thread是没有进程地址空间的,所以内核线程的tsk->mm域是空(NULL)。但是内核必须知道用户空间包含了什么,因此它的active_mm成员被初始化为前一个运行进程的active_mm值。
    
*/

If another kernel thread is running before the current kernel thread is scheduled, then its mm and avtive_mm are both NULL

The above is some analysis of how the operating system organizes processes. With these as the basis, we can proceed to the next step of analysis.

4. How to transition between process states

The definition, value and description of the linux process state (STATE) are all analyzed in detail in the process state, so I won't go into too much detail here.

The following is a diagram of how the various states of the process are converted to each other:

img

​ (picture source online)

5. How processes are scheduled

5.1 Data Structures Related to Process Scheduling

Before understanding how processes are scheduled, we need to understand some data structures related to process scheduling.

5.1.1 runqueue

Under the /kernel/sched.cfile, the runnable queue is defined as struct rq, each CPU will have one struct rq, it is mainly used to store some basic information for scheduling, including timely scheduling and CFS scheduling. In Linux kernel 2.6, it struct rqis a very important data structure. Next, we will introduce some of its important fields:

                            /*   选取出部分字段做注释   */
    //runqueue的自旋锁,当对runqueue进行操作的时候,需要对其加锁。由于每个CPU都有一个runqueue,这样会大大减少竞争的机会
    spinlock_t lock; 
    
    // 此变量是用来记录active array中最早用完时间片的时间
    unsigned long expired_timestamp; 
    
    //记录该CPU上就绪进程总数,是active array和expired array进程总数和
    unsigned long nr_running; 
    
    // 记录该CPU运行以来发生的进程切换次数
    unsigned long long nr_switches; 
    
    // 记录该CPU不可中断状态进程的个数
    unsigned long nr_uninterruptible; 
    
    // 这部分是rq的最最最重要的部分, 我将在下面仔细分析它们
    struct prio_array *active, *expired, arrays[2];

5.1.2 Priority array (prio_array)

active arrayIn Linux kernel version 2.6, two more array sums sorted by priority are added to rq expired array.

The structure of these two queues is struct prio_arraythat it is defined in /kernel/sched.cand its data structure is:

struct prio_array {
    unsigned int nr_active; // 
    DECLARE_BITMAP(bitmap, MAX_PRIO+1); /* include 1 bit for delimiter */
    /*开辟MAX_PRIO + 1个bit的空间, 当某一个优先级的task正处于TASK_RUNNING状态时, 其优先级对应的二进制位将会被标记为1, 因此当你需要找此时需要运行的最高的优先级时, 只需要找到bitmap的哪一位被标记为1了即可*/
    
    struct list_head queue[MAX_PRIO]; // 每一个优先级都有一个list头
};

Active arrayIndicates the running process queue selected by the CPU for execution. The processes in this queue have time slices remaining, and the *activepointer always points to it.

Expired arrayIt is used to store Active arraythe process in which the time slice is used up, and the *expired pointer always points to it.

Once active arraythe time slice of a common process in it is used up, the scheduler will recalculate the time slice and priority of the process, delete it active arrayfrom it, and insert it into expired arraythe corresponding priority queue in .

When all tasks in the active array have used up their time slices, they only need to *activeexchange *expiredthese two pointers to switch the run queue.

5.1.3 Scheduler main function (schedule())

scheduleIn the existence /kernel/sched.cof the function, it is a very important function of the Linux kernel. Its function is to select the next process that should be executed, and complete the switching of the process. It is the main executor of process scheduling.

5.2 Scheduling Algorithm (O(1) Algorithm)

5.2.1 Introduction to O(1) Algorithms

What is the O(1) algorithm: This algorithm can always select the process with the highest priority and execute it in a limited time, regardless of how many runnable processes there are in the system, so it is named the O(1) algorithm.

5.2.2 The principle of O(1) algorithm

active arrayEarlier we mentioned two array sums sorted by priority expired array, and these two arrays are the key to implementing the O(1) algorithm.

The O(1) scheduling algorithm selects the process with the highest priority in the active array array to run each time.

So how does the algorithm find the process with the highest priority? Do you remember the previous prio_arrayfield DECLARE_BITMAP(bitmap, MAX_PRIO+1);? Here it comes into play (see the code comments for details). Here, as long as you find bitmapwhich bit is set to 1, you can get the priority of the task running on the current system (idx, implemented by the sehed_find_first_bit() method), then Go down and find the process list (queue) corresponding to idx. All processes in the queue are currently runnable and have the highest priority process, and then execute these processes in turn.

The process is defined in the schedulefunction, and the main code is as follows:

struct task_struct *prev, *next;
struct list_head *queue;
struct prio_array *array;
int idx;

prev = current;
array = rq->active;
idx = sehed_find_first_bit(array->bitmap); //找到位图中第一个不为0的位的序号
queue = array->queue + idx; //得到对应的队列链表头
next = list_entry(queue->next, struct task_struct, run_list); //得到进程描述符
if (prev != next) //如果选出的进程和当前进程不是同一个,则交换上下文
    context_switch();

6. Views on the OS process model

The algorithm model solves the problem that it takes too much time to find the process with the highest priority under Linux kernel 2.4 (time complexity is O(n)), and can realize its function within the time complexity of O(1) , it can be said that it is already very good, but this algorithm still has some shortcomings. For example, when there are too many interactive processes in the system (such as a desktop system), the system is not running well at this time. More big guys can improve this algorithm and keep improving.

7. References

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324836105&siteId=291194637