The first assignment: Linux 2.6.28 process model and CFS scheduler analysis

first job

1. Summary

This article is mainly for the Linux Kernel 2.6.28 kernel version, describing the concept of process and the calling process.

Linux Kernel source code reference address: https://elixir.bootlin.com/linux/v4.6/source/include/linux/types.h

2. What is a process

2.1 The concept of process

An official definition of a process:

A process is a running activity of a program with certain independent functions about a certain data set, and it is also an independent unit for the operating system to allocate and schedule resources.

In short, a process is a management instance established by the operating system for a running program.

And a process consists of five entities:

  • (OS manages the running program) data structure P
  • (running program) memory code C
  • (running program) memory data D
  • (running program) general register information R
  • Program status word information PSW (executed by the OS control program)

2.2 Visible processes

2.2.1 Processes on Windows:

2.2.2 Processes on Ubuntu

3. How the process is organized

In the Linux kernel, there is a structure used to describe and associate a process: task_struct, the data /include/linux/sched.hstructure is defined in , and its code size is as many as 400 lines.

3.1 Process ID

The definition of the process ID is kept include/linux/pid.hin :

enum pid_type
{
    PIDTYPE_PID,
    PIDTYPE_PGID,
    PIDTYPE_SID,
    PIDTYPE_MAX
};

Here we explain the most important PIDs in detail.

3.1.1 Process Identifier (PID)

In Linux, a process is assigned a unique process ID, or PID. It is the unique code name of a process in the system, but a process ID is not permanently owned by a process. The PID obtained by running a process at different times is not the same. The process generated when the fork or clone system call is used will be allocated by the kernel. New unique PID value.

pid_t pid;

As shown in the above code, PID task_structis defined in pid_t, and it is actually a inttype , so the essence of PID is a number.

3.1.2 Scope of PID

include/linux/threads.hIn , the system limits the maximum value of the PID value.

#define PID_MAX_DEFAULT (CONFIG_BASE_SMALL ? 0x1000 : 0x8000)

It can be seen that in general, the maximum number of processes in a Linux system is 32768.

3.1.3 Generation of PID

So where does the PID come from? kernel/pidcThe answer to this question is given in:

static int alloc_pidmap(struct pid_namespace *pid_ns)
{
    int i, offset, max_scan, pid, last = pid_ns->last_pid;
    struct pidmap *map;

    pid = last + 1;
    if (pid >= pid_max)
        pid = RESERVED_PIDS;
    offset = pid & BITS_PER_PAGE_MASK;
    map = &pid_ns->pidmap[pid/BITS_PER_PAGE];
    max_scan = (pid_max + BITS_PER_PAGE - 1)/BITS_PER_PAGE - !offset;
    for (i = 0; i <= max_scan; ++i) {
        if (unlikely(!map->page)) {
            void *page = kzalloc(PAGE_SIZE, GFP_KERNEL);
            /*
             * Free the page if someone raced with us
             * installing it:
             */
            spin_lock_irq(&pidmap_lock);
            if (map->page)
                kfree(page);
            else
                map->page = page;
            spin_unlock_irq(&pidmap_lock);
            if (unlikely(!map->page))
                break;
        }
        if (likely(atomic_read(&map->nr_free))) {
            do {
                if (!test_and_set_bit(offset, map->page)) {
                    atomic_dec(&map->nr_free);
                    pid_ns->last_pid = pid;
                    return pid;
                }
                offset = find_next_offset(map, offset);
                pid = mk_pid(pid_ns, map, offset);
            /*
             * find_next_offset() found a bit, the pid from it
             * is in-bounds, and if we fell back to the last
             * bitmap block and the final block was the same
             * as the starting point, pid is before last_pid.
             */
            } while (offset < BITS_PER_PAGE && pid < pid_max &&
                    (i != max_scan || pid < last ||
                        !((last+1) & BITS_PER_PAGE_MASK)));
        }
        if (map < &pid_ns->pidmap[(pid_max-1)/BITS_PER_PAGE]) {
            ++map;
            offset = 0;
        } else {
            map = &pid_ns->pidmap[0];
            offset = RESERVED_PIDS;
            if (unlikely(last == offset))
                break;
        }
        pid = mk_pid(pid_ns, map, offset);
    }
    return -1;
}

alloc_pidmapThe function is used to allocate PIDs, and the same kernel/pid.hway to recycle PIDs is also defined in:

static void free_pidmap(struct upid *upid)
{
    int nr = upid->nr;
    struct pidmap *map = upid->ns->pidmap + nr / BITS_PER_PAGE;
    int offset = nr & BITS_PER_PAGE_MASK;

    clear_bit(offset, map->page);
    atomic_inc(&map->nr_free);
}

3.2 The state of the process

3.2.1 Process state definition

In Linux, there are 6 main process states:

code name describe
R TASK_RUNNING executable state
S TASK_INTERRUPTIBLE interruptible sleep state
D TASK_UNINTERRUPTIBLE uninterruptible sleep state
T TASK_STOPPED or TASK_TRACED Pause state or track state
WITH TASK_DEAD - EXIT_ZOMBIE Exit status, the process becomes a zombie process
X TASK_DEAD - EXIT_DEAD exit status, the process is about to be destroyed

include/linux/sched.hThey are defined in :

#define TASK_RUNNING            0
#define TASK_INTERRUPTIBLE      1
#define TASK_UNINTERRUPTIBLE    2
#define TASK_STOPPED            4
#define EXIT_ZOMBIE            16
#define EXIT_DEAD              32
  • In some operating system textbooks, the RUNNING state means the process being executed in the CPU, and the state that is executable but has not been called is defined as the READY (ready) state. The above two states are uniformly defined as the TASK_RUNNING state in Linux.
  • Under the normal operation of the machine, most of the processes in the system are in the TASK_INTERRUPTIBLE state, and the principle of maintaining rapid mobilization without taking up too much CPU resources makes it seem natural.
  • Why is the sleep state divided into two types: interruptible and non-interruptable? Its significance is probably to avoid being interrupted in the process of interacting with the device, thus causing the machine to fall into an uncontrollable state.
  • The process is in the TASK_DEAD state during the process of exiting. At this time, most of the resources occupied by the process will be reclaimed, except task_structfor a few special resources such as , so this state of being left and left at this time is called ZOMBIE.

3.2.2 Process state transition

The following diagram provides a brief overview of the transitions of process states in the system:

Although there are 6 different process states in the system, the transition of the process state is essentially only the mutual transition between TASK_RUNNING and non-TASK_RUNNING.

For example, when a TASK_INTERRUPTIBLE state process receives an end command, it does not directly change to the TASK_DEAD state, but first wakes up to enter the TASK_RUNNING state, and then enters the TASK_DEAD state from the TASK_RUNNING state. When a process is in the TASK_RUNNING state, it has only two options: enter the TASK_STOPED or TASK_DEAD state in response to the signal, or enter the TASK_INTERRUPTIBLE state by executing a system call.

4. How processes are scheduled

4.1 CFS Scheduler

With the change of kernel versions, the O(1) scheduler was replaced by CFS (Completely Fair Scheduler) after Linux Kernel 2.6.23.

CFS uses vruntimeto measure the priority of a process. Its calculation formula is as follows

vruntime = 进程被分配的运行时间 * NICE_0_LOAD / 进程权重

Among them, NICE_0_LOADrepresents the weight of the process whose nice value is 0, and its value is 1024, and the process weight corresponds to the nice value one-to-one, which is prio_to_weightconverted by the global array.

static const int prio_to_weight[40] = {
 /* -20 */     88761,     71755,     56483,     46273,     36291,
 /* -15 */     29154,     23254,     18705,     14949,     11916,
 /* -10 */      9548,      7620,      6100,      4904,      3906,
 /*  -5 */      3121,      2501,      1991,      1586,      1277,
 /*   0 */      1024,       820,       655,       526,       423,
 /*   5 */       335,       272,       215,       172,       137,
 /*  10 */       110,        87,        70,        56,        45,
 /*  15 */        36,        29,        23,        18,        15,
};

But how do we know the running time of the process?

Its calculation formula is进程实际运行时间 = 调度周期 * 进程权重 / 所有进程权重之和

The scheduling period is the time to schedule all processes in the TASK_RUNNING state.

If the process running is idealized, the actual running time of the process is regarded as the running time allocated to it by the system, and then the two equations can be used to obtain

vruntime = (调度周期 * 进程权重 / 所有进程权重之和)* 1024 / 进程权重 = 调度周期 * 1024 / 所有进程总权重

From the above formula, we can find that even if the weights of different processes are not the same, they vruntimeshould same, so if the vruntimevalue of a process is small, it means that it does not get the running time it deserves. At this time, the operating system It should be preferred to run.

The above is the main idea of ​​CFS.

vruntimeStored in the sched_entitydata , it is a scheduling entity include/linux/sched.hdefined in :

struct sched_entity {
    struct load_weight  load;       /* for load-balancing */
    struct rb_node      run_node;
    struct list_head    group_node;
    unsigned int        on_rq;
    u64         exec_start;
    u64         sum_exec_runtime;
    u64         vruntime;
    u64         prev_sum_exec_runtime;
    u64         last_wakeup;
    u64         avg_overlap;
#ifdef CONFIG_SCHEDSTATS
    u64         wait_start;
    u64         wait_max;
    u64         wait_count;
    u64         wait_sum;
    u64         sleep_start;
    u64         sleep_max;
    s64         sum_sleep_runtime;
    u64         block_start;
    u64         block_max;
    u64         exec_max;
    u64         slice_max;
    u64         nr_migrations;
    u64         nr_migrations_cold;
    u64         nr_failed_migrations_affine;
    u64         nr_failed_migrations_running;
    u64         nr_failed_migrations_hot;
    u64         nr_forced_migrations;
    u64         nr_forced2_migrations;
    u64         nr_wakeups;
    u64         nr_wakeups_sync;
    u64         nr_wakeups_migrate;
    u64         nr_wakeups_local;
    u64         nr_wakeups_remote;
    u64         nr_wakeups_affine;
    u64         nr_wakeups_affine_attempts;
    u64         nr_wakeups_passive;
    u64         nr_wakeups_idle;
#endif
#ifdef CONFIG_FAIR_GROUP_SCHED
    struct sched_entity *parent;
    /* rq on which this entity is (to be) queued: */
    struct cfs_rq       *cfs_rq;
    /* rq "owned" by this entity/group: */
    struct cfs_rq       *my_q;
#endif
};

4.2 Red-Black Trees

The different are sched_entityorganized together by a time-ordered red-black tree:

CFS

vurtimeThe processes with the smallest value are stored on the left side of the tree, so that the process with the smallest vruntimevalue .

5. Views on the operating system process model

For a long time, the operating system has tried to define fairness. Does the interactive process necessarily have the absolute right to speak? CFS gave his answer. It no longer attempts to distinguish interactive processes, but treats all processes equally, just as its name, Completely Fair. Its appearance makes the famous O(1) scheduler only a flash in the pan. Linux has developed across many versions, and CFS has never been replaced. It declares its own sovereignty with its unique superiority.

6. References

The state of the process - CSDN blog

Process ID--CSDN Blog

CFS Scheduler--CSDN Blog

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324978105&siteId=291194637