The first assignment: In-depth source code analysis process model (Linux kernel 2.6.32)

1 Introduction

This article analyzes its process model based on Linux 2.6.32, including the concept, organization, transformation, scheduling, etc. of the process, to help the understanding and learning of operating system courses and Linux-related knowledge.
The source code download address of Linux Kernel 2.6.32 is attached:
https://mirrors.edge.kernel.org/pub/linux/kernel/v2.6/linux-2.6.32.tar.gz

2. The concept of process

2.1 What is a process?

Before formally starting to analyze the process model with source code, we first need to figure out what the process is.
Wikipedia defines a process as follows:

A process is an entity that has a running program on a computer.

We know that our system only recognizes binary files, so when we want the system to run a certain function, we need to start a binary file, which is a program.
In the Linux system, each process has the permissions of three groups of people (respectively u/g/o, that is, the owner/other members of the group to which the owner belongs/not the owner or a member of the group), and each group may have The permissions are r/w/x (read/write/execute), so when different users execute the same program, the permissions given by the system may be different, that is, the generated processes may also be different.

As shown in the figure above, the program is generally stored in the disk and triggered by the user's execution. After the trigger, the program is loaded into the memory and becomes an individual, that is, the process. In order to manage this process, the operating system will give the process executor parameters such as permissions/attributes, as well as the data required by the process, and finally give a PID to identify the process through the PID.

2.2 The difference between process, program and thread

In process-oriented systems (such as early UNIX, Linux 2.4 and earlier), a process is the basic execution entity of a program; in thread-oriented systems (such as most contemporary operating systems, Linux 2.6 and later) In , the process itself is not the basic running unit, but the container of the thread.

Simply put, the difference between a process and a program is dynamic and static, a process and a program are many-to-one, and a thread and a process are also many-to-one.

3. Organization of the process

3.1 Process descriptor

A process descriptor is a structure used to describe a process. Each process has only one such structure, which contains all the information about the process. This structure is called task_struct and is defined in the include/linux/sched.hheader file in the source code package directory. Here, some code is intercepted for analysis and description. (ps: Since I am running linux under the virtual machine, in order to facilitate the copying of the code, the sftp server of secureCRT is used to transfer the relevant code files in the virtual machine to the client. Friends who use the virtual machine to run linux can refer to this usage. )

struct task_struct{
    volatile long state;    /* -1 unrunnable, 0 runnable, >0 stopped */
    ......
    int prio, static_prio, normal_prio;
    unsigned int rt_priority;
    ......
    pid_t pid;
    ......
}

3.2 Process Identifier (PID)

As mentioned in the second part of the article, the system uniquely identifies each process by its PID. Use the following command to view the current process information of the system:

ps -l

As shown in the running result, we can see three processes in the current system, namely bash, top, and ps. Among them, bash is our terminal. Every time a terminal is run, a bash process will be generated. You can see that the PID of this bash process is 7275. And ps is the command to view the process running under the terminal, and its PID is 7287. In addition to the PID, there is also a PPID, which refers to the PID of the parent process, that is, the PID-7275 of the bash that executed this command. What is the parent process? Let's put it this way, when we open the terminal, we will get a bash process, and then we use the interface provided by this bash to execute other commands, such as the ps just mentioned, these commands will also be assigned PIDs, which is "sub" process", and the original bash environment is naturally the "parent process".

3.3 Process Priority (PRI)

The priority of a process refers to the basis used by the system to determine whether a process can obtain CPU resources during process scheduling. The higher the priority of the process , the more it can win the competition and obtain CPU resources. In Linux, the priority of a process is expressed as an integer, and the lower the value, the higher the priority . In the screenshot from the previous section, we can see that each process has a priority, which is 80 by default.
In task_struct, several member variables are used to represent priority.

int prio,static_prio, normal_prio;
unsigned int rt_priority;
field describe
prio dynamic priority
static_prio Static priority, which is the priority assigned when the process starts
normal_prio Ordinary priority, that is, the priority calculated based on the static priority of the process and the scheduling policy
rt_priority real-time priority

Why are there so many priorities? This is because in some cases, the system needs to temporarily change the priority of the process, but these changes are not persistent, so this priority will be represented by prio (the priority that the scheduler will consider is also stored in prio) without changing Static priority and normal priority. It should be noted that when we renicechange the priority of a process with a command, what is changed is the static priority .

3.4 State of the process

The state of the process task_structis defined in the following way

struct task_struct{
    volatile long state;    /* -1 unrunnable, 0 runnable, >0 stopped */

In the sched.hheader file, there are possible values ​​for the state variable:

/*
 * Task state bitmask. NOTE! These bits are also
 * encoded in fs/proc/array.c: get_task_state().
 *
 * We have two separate sets of flags: task->state
 * is about runnability, while task->exit_state are
 * about the task exiting. Confusing, but this way
 * modifying one set can't modify the other one by
 * mistake.
 */
#define TASK_RUNNING        0
#define TASK_INTERRUPTIBLE  1
#define TASK_UNINTERRUPTIBLE    2
#define __TASK_STOPPED      4
#define __TASK_TRACED       8
/* in tsk->exit_state */
#define EXIT_ZOMBIE     16
#define EXIT_DEAD       32
/* in tsk->state again */
#define TASK_DEAD       64
#define TASK_WAKEKILL       128
#define TASK_WAKING     256

From the comments, we can know that the process state is divided into two categories, where state is the state of running, and exit_state is the state of exit. The meanings of the fields defined in the code are as follows:
|Field|Description|
| -------- | -----: | ----: |
|TASK_RUNNING|Runnable state, the process is either running execute, or prepare to execute. |
|TASK_INTERRUPTIBLE| Interruptible wait state in which a process sleeps until a condition is true and wakes up. |
|TASK_UNINTERRUPTIBLE|Uninterruptible wait state, similar to interruptible wait state, but cannot be awakened by a signal |
|TASK_STOPED|Suspended state, entered when the process receives a SIGSTOP, SIGTSTP, SIGTTIN or SIGTTOU signal. |
|TASK_TRACED|Tracking status, the process is monitored by another process, such as debugger|
|EXIT_ZOMBIE|Zombie status, the process has terminated, but the parent process has not obtained the termination information of the process |
|EXIT_DEAD|The final status of the process being terminated|
Comprehensive In the above two parts, we have generally understood that the system uniquely identifies the process through the process identifier (PID), determines which processes should be run now through the priority (PRI), and arranges the process in an orderly manner through the state of the process. What should be done is the organization of the process.

4. Process conversion


The above figure clearly shows the conversion process between processes, which is briefly explained here. When we create a task and run it, it is in the TASK_RUNNING state. The CPU keeps assigning time slices to the process. When the time slice is exhausted, the process turns to the ready state (still TASK_RUNNING), waiting for the CPU to allocate time slices again. If the process needs to perform an operation such as writing a disk file, the process will enter the TASK_INTERRUPTIBLE state and wait for the operation to complete. After the operation is completed, the process receives an interrupt notification and returns to the running state.
When the hardware conditions are not met, for example, when the process opens a device file and the driver starts to detect the corresponding hardware, it will enter the TASK_UNINTERRUPTIBLE state, which can only be woken up in a specific way. When the process needs to exit, it will enter the zombie state and retain its process descriptor for the parent process to obtain information. When the parent process tells the system through the wait() system call, the process enters the EXIT_DEAD state and the running cycle ends.

5. Process scheduling

5.1 About scheduling

Whether we enter the ps command under linux or open the task manager under windows, we can usually see a lot of processes, which gives us the illusion that the system is executing many processes at the same time. In fact, most of the processes we see are in a dormant state, and the CPU will only execute one process at a time, which is what the scheduler faces. The scheduler needs to share CPU time between processes, creating the illusion of parallel execution. Not only does it share the CPU as fairly as possible, but it also takes into account different task priorities to provide a good user experience. The entire scheduling system includes at least two scheduling algorithms, which are respectively for real-time processes and ordinary processes. The initial scheduler is an initial scheduling algorithm with a complexity of O(n), which traverses all tasks every time, which is inefficient. Later, the O(1) scheduler was introduced, which greatly improved the efficiency. Of course, under the wisdom of programmers around the world, since linux2.6, a better scheduler has replaced the O(1) scheduler, namely the CFS scheduler. It has a completely fair idea and treats all processes uniformly. This part of this article mainly introduces the CFS scheduler that schedules ordinary processes.

5.2 Priority and nice value

Before analyzing CFS, I would like to add the aforementioned priority and nice values. Why do you need priority? We know that the CPU can run G-level microcommand times per second. Through the CPU scheduling of the kernel, each process is switched to run by the CPU. Therefore, each process may be executed by the CPU more or less in one second. . If there is no priority, each process is like queuing up in a playground. If you want to play after playing it once, you have to go to the back to wait in line. If the previous tasks are smelly and long, and urgent tasks cannot get timely feedback, users will The experience will naturally be bad. This is why we need priorities to increase the frequency of urgent tasks. It should be noted that the PRI is dynamically adjusted by the system kernel, and the user cannot directly adjust the PRI . To adjust the priority execution order of the process, we have to pass the Nice value (humility value). In general, the correlation between PRI and NICE is as follows:
PRI(NEW)=PRI(OLD)+nice
Nice is a value from -20 to 19, and the default is equal to 0. It should be noted that when the value of Nice is negative, the value of the priority decreases, which means that the priority increases. The smaller the value of Nice, the less "humility" and the higher the priority.

5.3 CFS algorithm

task_structThere is a struct_entity semember that defines the entity structure of the scheduler. The process scheduling algorithm is actually the se member that manages all processes.
struct_entityPart of the code is as follows

struct sched_entity {
    struct load_weight  load;       /* for load-balancing */
    struct rb_node      run_node;
    struct list_head    group_node;
    unsigned int        on_rq;

    u64         exec_start;
    u64         sum_exec_runtime;
    u64         vruntime;

5.3.1 About vruntime

vruntime is an important concept of CFS. Simply put, vruntime is the running time of the process, but this time is the virtual running time calculated by the priority, not the physical time. The CPU allocation time slice was mentioned in the process conversion before. In CFS, the time slice is no longer used, but whether the process should be scheduled at this time is measured according to the virtual running time. So, how is virtual time calculated? kernel/sched.cThere is a function for handling clock interrupts, as follows

void scheduler_tick(void)
{
    raw_spin_lock(&rq->lock);
    update_rq_clock(rq);
    update_cpu_load(rq);
    curr->sched_class->task_tick(rq, curr, 0); 
    raw_spin_unlock(&rq->lock);
}

Among them, it curr->sched_class->task_tick(rq, curr, 0);is used to execute the scheduler and update the vruntime value. task_tike_fairThe function will be called in the loop statement, so there are se variables entity_tick(), the code is as follows

for_each_sched_entity(se) {
        cfs_rq = cfs_rq_of(se);
        entity_tick(cfs_rq, se, queued);//Look at this line
    }
static void entity_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr, int queued)
{
    /*
     * Update run-time statistics of the 'current'.
     */
    update_curr(cfs_rq);    //Look at this line

#ifdef CONFIG_SCHED_HRTICK
    /*
     * queued ticks are scheduled to match the slice, so don't bother
     * validating it and just reschedule.
     */
    if (queued) {
        resched_task(rq_of(cfs_rq)->curr);
        return;
    }
    /*
     * don't let the period tick interfere with the hrtick preemption
     */
    if (!sched_feat(DOUBLE_TICK) &&
            hrtimer_active(&rq_of(cfs_rq)->hrtick_timer))
        return;
#endif

    if (cfs_rq->nr_running > 1 || !sched_feat(WAKEUP_PREEMPT))
        check_preempt_tick(cfs_rq, curr);
}

The point is update_curr(cfs_rq);that the vruntime value of the ready queue is updated. ps:cfs_rq is the ready queue of ordinary processes, declared as follows

    struct cfs_rq       *cfs_rq;
    /* rq "owned" by this entity/group: */
static inline void __update_curr(struct cfs_rq *cfs_rq, struct sched_entity *curr, unsigned long delta_exec)
{
    unsigned long delta_exec_weighted;

    schedstat_set(curr->exec_max, max((u64)delta_exec, curr->exec_max));

    curr->sum_exec_runtime += delta_exec;
    schedstat_add(cfs_rq, exec_clock, delta_exec);
    delta_exec_weighted = calc_delta_fair(delta_exec, curr);

    curr->vruntime += delta_exec_weighted;
    update_min_vruntime(cfs_rq); 
}

It can be seen that the update_curr function first accumulates the actual running time (Line: 7), then calls calc_delta_fair(delta_exec,curr)the weighted calculation of delta_exec (Line: 9) (there is no further study on how to perform the weighted calculation), and then calculates the result of the weighted calculation. Add to vruntime (Line: 11) cumulatively, and finally update and save the minimum virtual runtime (Line: 12) in the cfs_rq queue. From this, we can roughly see the entire process of vruntime generation.

5.3.2 About deciding the next process

CFS uses the red-black tree to schedule the entity SE, and the key value is the virtual running time vruntime. CFS will select the process with the smallest vruntime value as the next process, and this process is actually the leftmost leaf node in the red-black tree. . The CFS scheduler is pick_next_task_fair()used to select the next process.

static struct task_struct *pick_next_task_fair(struct rq *rq)
{
    struct task_struct *p;
    struct cfs_rq *cfs_rq = &rq->cfs;
    struct sched_entity *se;

    if (!cfs_rq->nr_running)
        return NULL;

    do {
        se = pick_next_entity(cfs_rq);
        set_next_entity(cfs_rq, se); 
        cfs_rq = group_cfs_rq(se);
    } while (cfs_rq);

    p = task_of(se);
    hrtick_start_fair(rq, p);

    return p;
}

The function actually used to select the next process is on line 11:pick_next_entity(cfs_rq)

/*
 * Pick the next process, keeping these things in mind, in this order:
 * 1) keep things fair between processes/task groups
 * 2) pick the "next" process, since someone really wants that to run
 * 3) pick the "last" process, for cache locality
 * 4) do not run the "skip" process, if something else is available
 */
static struct sched_entity *
pick_next_entity(struct cfs_rq *cfs_rq, struct sched_entity *curr)
{
    struct sched_entity *left = __pick_first_entity(cfs_rq);
    struct sched_entity *se;

    /*
     * If curr is set we have to see if its left of the leftmost entity
     * still in the tree, provided there was anything in the tree at all.
     *
     */
    if (!left || (curr && entity_before(curr, left)))
    {
        left = curr;
    }

  
    se = left; /* ideally we run the leftmost entity */

   
    if (cfs_rq->skip == se)
    {
        struct sched_entity *second;

        if (se == curr) 
        {
            second = __pick_first_entity(cfs_rq);
        }
        else    
        {
           
            second = __pick_next_entity(se);
            if (!second || (curr && entity_before(curr, second)))
                second = curr;
        }

        if (second && wakeup_preempt_entity(second, left) < 1)
            se = second;
    }

    /*
     * Prefer last buddy, try to return the CPU to a preempted task.
     *
     * 
     */
    if (cfs_rq->last && wakeup_preempt_entity(cfs_rq->last, left) < 1)
        se = cfs_rq->last;

    /*
     * Someone really wants this to run. If it's not unfair, run it.
     */
    if (cfs_rq->next && wakeup_preempt_entity(cfs_rq->next, left) < 1)
        se = cfs_rq->next;

    clear_buddies(cfs_rq, se);

    return se;
}

This code implements the process of selecting the leftmost leaf node in the red-black tree, removing it from the tree, and updating the red-black tree. At this point, the next process has been successfully selected.

6. Thoughts and opinions

Since the advent of version 0.01, Linux has never stopped changing. Up to now, countless programmers have contributed their own codes to Linux, so what we are seeing is an extremely sophisticated and mature system architecture. We can't know what the future linux will evolve into. At least today, we have seen CFS, using a simple vruntime mechanism, to achieve the needs of the algorithm with a complexity of O(n) in the past, I have to sigh this system The hard work of the people behind it. Of course, the complete fairness of CFS is not absolute equality of all processes, but is reflected in the virtual running time. In fact, its update and growth rates are not the same. Whether efficiency and fairness can be fully taken into account, the answer can only be left to time to tell us.

7. References

1. Analysis of Linux process scheduling CFS algorithm implementation


Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325197168&siteId=291194637