The first assignment: In-depth Linux source code analysis of its process model

1. Process

1. The concept of process

(1) Process: Process is a running activity of a program in a computer on a data set, the basic unit of resource allocation and scheduling in the system, and the basis of the operating system structure.

(2) The process consists of program, data and process control block PCB. When the system creates a process, it actually builds a PCB. When the process disappears, the PCB is actually revoked. During the entire life cycle of the process activity, the system manages and schedules the process through the PCB.

2. View process status

(1) ps command (common combinations: aux, ef, eFH, -eo, axo)

(2) Example

# ps aux : Display all process information associated with the terminal

# ps -ef : Display all process information in full format

The organization of the process

In Linux, each process has its own task_struct structure. In version 2.4.0, the processes owned by the system depend on the size of physical memory. Therefore, the number of processes may reach thousands. In order to manage many processes in the system and processes in different states, Linux uses the following organizational methods to manage processes:

1. Hash table

A hash table is an efficient way to organize fast lookups and is defined in include/linux/sched.h as follows:

struct task_struct* pidhash[PIDHASH_SZ];

Linux uses a macro pid_hashfn() to convert the PID into the index of the table. Through pid_hashfn(), the PID of the process can be evenly hashed in the hash table.

Given a process ID PID, the search function to find its corresponding PCB is as follows:

static inline struct task_struct * find_task_by_pid(int pid){
    struct task_struct *p, **htable = &pidhash[pid_hashfn()];
    for(p = *htable; p && p->pid != pid; p = pidhash->next)
    return p;
}

2. Process linked list

The processes are linked by a doubly circular linked list, which is defined as follows:

struct task_struct{
    struct list_head tasks;
    char comm[TASK_COM_LEN];//The name of the executable program with the path
}

The prev_task and next_task members in the task_struct structure of each process are used to implement this linked list. The head and tail of the linked list are init_task (that is, process No. 0). This process will never be revoked and statically allocated in the kernel data segment.

All processes can be easily searched through the macro for_each_task:

#define for_each_task(p)
    for(p=&init_task;(p=p->next_task)!=&init_task;)

3. Ready Queue

A doubly linked list composed of all runnable processes is called the ready queue, which is maintained by the two pointer run_list linked lists in the task_struct structure:

struct task_struct{
    struct list_head run_list;
    ,,,
}

The definition of the ready queue and related operations are in the /kernel/sched.c file:

static LIST_HEAD(runqueue_head);//Define the head pointer of the ready queue as runqueue_head
    static inline void add_to_runqueue(struct task_struct *p){
         list_add_tail(&p->run_list, &runqueue_head);
         nr_running++;
    }
    static inline void move_last_runqueue(struct task_struct *p){
        list_del(&p->run_list);
        list_add_tail(&p->run_list, &runqueue_head);
    }

Three, the state transition of the process

1. Three basic states

(1) Running state (running): The process is running on the processor.

(2) Ready state (ready): The process has the running conditions, waiting for the system to allocate a processor to run.

(3) Waiting state (blocked): No operating conditions, waiting for the completion of an event.

2. Four state transitions

(1) Running state → waiting state: When a process requests the use and allocation of a resource (such as peripherals) or waits for an event to occur (such as the completion of an I/O operation), it transitions from running state to waiting state condition.

(2) Waiting state → ready state: When the event that the process is waiting for arrives (such as the end of the I/O operation or the end of the interrupt), the interrupt handler must change the state of the corresponding process from the waiting state to the ready state.

(3) Running state→Ready state: The process in the running state has to give up the processor after the time slice is used up, so that the process is converted from the running state to the ready state.

(4) Ready state → running state: After the process in the ready state is scheduled, it obtains the processor resources (distributes the processor time slice), and then the process changes from the ready state to the running state.

3. State transition diagram

(1) Three-state model

(2) Polymorphic model

Fourth, the scheduling of the process

1. Scheduling method

(1) Non-preemptive scheduling mode (non-preemptive mode) : Simple to implement, with low system overhead, it is suitable for most batch systems, but it cannot be used in time-sharing systems and most real-time systems.

(2) Deprivation scheduling mode (preemption mode) : Certain principles must be followed, mainly: priority, short process priority and time slice principle.

2. Scheduling algorithm

(1) First come first serve (FCFS first come first serve) : It is an inalienable algorithm. The algorithm selects one or several jobs that enter the queue first from the backup job queue each time for processing.

Features: simple algorithm, low efficiency, good for long jobs, bad for short jobs.

(2) Short job first (SJF short job first) : The algorithm selects one or several jobs with the shortest estimated running time from the backup queue for processing. The handler is not released until the job completes or an event occurs to block.

Disadvantages: (1) It is not good for long jobs, causing "starvation" (2) The urgency of the job is not considered (3) Since the running time is estimated, it is not necessarily possible to give priority to short jobs.

(3) Priority : can be divided into (1) non-deprivation type (2) deprivation type; the priority can be divided into: (1) static priority (2) dynamic priority

(4) High response ratio priority : response ratio = (waiting time + processing time) / processing time = 1 + waiting time / processing time

(5) Time slice rotation

5. CFS scheduler

1. Design idea: allocate running time according to the weight of each process

2. Virtual runtime

vruntime = (scheduling period * process weight / total weight of all processes) * 1024 / process weight

= scheduling period * 1024 / total weight of all processes

It can be seen from the formula that the vruntime growth rate of all processes is promoted at the same time macroscopically, then this vruntime can be used to select the running process. The smaller the vruntime value, the shorter the time occupied by the CPU before, which is "unfair". treated, so the next running process is it. This not only selects processes fairly, but also ensures that high-priority processes get more running time.

3. Scheduling entities

The scheduling entity sched_entity represents a scheduling unit, which can be equated to a process when group scheduling is closed.
There is a sched_entity in each task_struct, and the vruntime and weight of the process are stored in this structure. All sched_entities are inserted into the red-black tree with vruntime as the key, and the leftmost node of the tree is cached, that is, the node with the smallest vruntime, so that the process with the smallest vruntime can be quickly selected.

The relationship diagram is as follows:

4. Main code

(1) Create a process

Initialization of CFS-related variables when the process is created:

void wake_up_new_task(struct task_struct *p, unsigned long clone_flags)  
{  
    .....  
    if (!p->sched_class->task_new || !current->se.on_rq) {  
        activate_task(rq, p, 0);  
    } else {  
        /*
         * Let the scheduling class do new task startup
         * management (if any):
         */  
        p->sched_class->task_new(rq, p);  
        inc_nr_running(rq);  
    }  
    check_preempt_curr(rq, p, 0);  
    .....  
}

Linux creates a process using system calls such as fork or clone or vfork, and will eventually go to do_fork.
If CLONE_STOPPED is not set, it will enter the wake_up_new_task function.

(2) Wake up the process

static int try_to_wake_up(struct task_struct *p, unsigned int state, int sync)  
{  
    int cpu, orig_cpu, this_cpu, success = 0;  
    unsigned long flags;  
    struct rq *rq;  
    rq = task_rq_lock(p, &flags);  
    if (p->se.on_rq)  
        goto out_running;  
    update_rq_clock(rq);  
    activate_task(rq, p, 1);  
    success = 1;  
out_running:  
    check_preempt_curr(rq, p, sync);  
    p->state = TASK_RUNNING;  
out:  
    current->se.last_wakeup = current->se.sum_exec_runtime;  
    task_rq_unlock(rq, &flags);  
    return success;  
}

update_rq_clock is to update the clock of cfs_rq to keep it synchronized with the system time.
The key point is activate_task, which adds the process to the red-black tree and makes some adjustments to vruntime, and then uses check_preempt_curr to check whether it constitutes a preemption condition, and sets the TIF_NEED_RESCHED flag if it can be preempted.

(3) Process scheduling

asmlinkage void __sched schedule(void)  
{  
    struct task_struct *prev, *next;  
    unsigned long *switch_count;  
    struct rq *rq;  
    int cpu;  
need_resched:  
    preempt_disable(); //There may be problems with being preempted in this, first disable it
    cpu = smp_processor_id();  
    rq = cpu_rq(cpu);  
    rcu_qsctr_inc(cpu);  
    prev = rq->curr;  
    switch_count = &prev->nivcsw;  
    release_kernel_lock(prev);  
need_resched_nonpreemptible:  
    spin_lock_irq(&rq->lock);  
    update_rq_clock(rq);  
    clear_tsk_need_resched(prev); //Clear the bits that need to be scheduled  
    /*state==0 is TASK_RUNNING, not equal to 0 is ready to sleep, it should be removed from the running queue under normal circumstances  
    But also check whether there is a signal coming, if there is a signal and the process is in interruptible sleep, wake it up  
    For a process that needs to sleep, deactive_task is called here to dequeue it and on_rq is also cleared */
    if (prev->state && !(preempt_count() & PREEMPT_ACTIVE)) {  
        if (unlikely(signal_pending_state(prev->state, prev)))  
            prev->state = TASK_RUNNING;  
        else  
            deactivate_task(rq, prev, 1);  
        switch_count = &prev->nvcsw;  
    }  
    if (unlikely(!rq->nr_running))  
        idle_balance(cpu, rq);  
    prev->sched_class->put_prev_task(rq, prev);  
    next = pick_next_task(rq, prev);  
    if (likely(prev != next)) {  
        sched_info_switch(prev, next);  
        rq->nr_switches++;  
        rq->curr = next;  
        ++*switch_count;  
        // complete process switch
        context_switch(rq, prev, next); /* unlocks the rq */  
        /*
         * the context switch might have flipped the stack from under
         * us, hence refresh the local variables.
         */  
        cpu = smp_processor_id();  
        rq = cpu_rq(cpu);  
    } else  
        spin_unlock_irq(&rq->lock);  
    if (unlikely(reacquire_kernel_lock(current) < 0))  
        goto need_resched_nonpreemptible;  
    preempt_enable_no_resched();  
    //The new process here may also have the TIF_NEED_RESCHED flag. If the new process also needs to be scheduled, schedule it again  
    if (unlikely(test_thread_flag(TIF_NEED_RESCHED)))  
        goto need_resched;  
}

(4) Clock interrupt

The clock interrupt is initialized in time_init_hook, and the interrupt function is timer_interrupt.

entity_tick function: Update the status information and check whether the preemption conditions are met.

static void  
entity_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr, int queued)  
{  
    /*
     * Update run-time statistics of the 'current'.
     */  
    update_curr(cfs_rq);  
    //.... irrelevant code  
    if (cfs_rq->nr_running > 1 || !sched_feat(WAKEUP_PREEMPT))  
        check_preempt_tick(cfs_rq, curr);  
}

5.CFS Summary

Another important feature of CFS is that the scheduling granularity is small. In the scheduler before CFS, except that the process calls some blocking functions and actively participates in scheduling, each process is preempted only after the time slice or its own time quota is used up. The CFS checks every tick, and if the current process is no longer on the left side of the red-black tree, it is preempted. On servers with high load, better scheduling performance can be obtained by adjusting the scheduling granularity.

6. Views on the Linux process model

Compared with non-real-time processes, the scheduling strategy of ordinary processes is more troublesome, because it cannot simply look at the priority, but must occupy the CPU fairly, otherwise it is prone to process starvation and slow user response. Therefore, Linux has continuously improved the scheduler in the course of its development, hoping to find a scheduling strategy that is closest to perfect to schedule processes fairly and quickly. CFS is a process scheduler adopted by the Linux kernel version 2.6.23. The core idea is "complete fairness". It treats all processes uniformly and realizes fair scheduling of all processes. Although CFS has superior performance and avoids many problems caused by the previous generation scheduler O(1), in the spirit of Linux's pursuit of excellence, I believe that a better scheduler will appear in the future to replace CFS and meet more requirements. need.

7. References

1. http://blog.51cto.com/xuding/1741861 Linux process management

2. https://blog.csdn.net/qwe6112071/article/details/70473905 Detailed explanation of the state and transition of the process of the operating system

3. https://blog.csdn.net/yusiguyuan/article/details/39404399 Linux Kernel CFS Process Scheduling Policy

The first assignment: In-depth Linux source code analysis of its process model

Guess you like