The first assignment: In-depth Linux source code analysis of its process model

1. Process

1. The concept of process

(1) Process: Process is a running activity of a program in a computer on a certain data set. It is the basic unit of resource allocation and scheduling in the system, and the basis of the operating system structure.

(2) The process consists of program, data and process control block PCB. When the system creates a process, it actually builds a PCB. When the process disappears, the PCB is actually revoked. During the entire life cycle of the process activity, the system manages and schedules the process through the PCB.

2. View process status

(1) ps command (common combinations: aux, ef, eFH, -eo, axo)

(2) Example

# ps aux : Display all process information associated with the terminal

# ps -ef : Display all process information in full format

 

The organization of the process

        In Linux, each process has its own task_struct structure. In version 2.4.0, the processes owned by the system depend on the size of physical memory. Therefore, the number of processes may reach thousands. In order to manage many processes in the system and processes in different states, Linux uses the following organizational methods to manage processes:

1. Hash table

A hash table is an efficient way to organize fast lookups and is defined in include/linux/sched.h as follows:

struct task_struct* pidhash[PIDHASH_SZ];

Linux uses a macro pid_hashfn() to convert the PID into the index of the table. Through pid_hashfn(), the PID of the process can be evenly hashed in the hash table.

Given a process ID PID, the search function to find its corresponding PCB is as follows:

static inline struct task_struct * find_task_by_pid(int pid){
    struct task_struct *p, **htable = &pidhash[pid_hashfn()];
    for(p = *htable; p && p->pid != pid; p = pidhash->next)
    return p;
}

2. Process linked list

The processes are linked by a doubly circular linked list, which is defined as follows:

 

struct task_struct{
    struct list_head tasks;
    char comm[TASK_COM_LEN];//The name of the executable program with the path
}

 

The prev_task and next_task members in the task_struct structure of each process are used to implement this linked list. The head and tail of the linked list are init_task (that is, process No. 0). This process will never be revoked and statically allocated in the kernel data segment.

All processes can be easily searched through the macro for_each_task:

#define for_each_task(p)
    for(p=&init_task;(p=p->next_task)!=&init_task;)

3. Ready Queue

 A doubly linked list composed of all runnable processes is called the ready queue, which is maintained by the two pointer run_list linked lists in the task_struct structure:

struct task_struct{
    struct list_head run_list;
    ,,,
}

The definition of the ready queue and related operations are in the /kernel/sched.c file:

static LIST_HEAD(runqueue_head);//Define the head pointer of the ready queue as runqueue_head
    static inline void add_to_runqueue(struct task_struct *p){
         list_add_tail(&p->run_list, &runqueue_head);
         nr_running++;
    }
    static inline void move_last_runqueue(struct task_struct *p){
        list_del(&p->run_list);
        list_add_tail(&p->run_list, &runqueue_head);
    }

 

Three, the state transition of the process

1. Three basic states

(1) Running state (running): The process is running on the processor.

(2) Ready state (ready): The process has the running conditions, waiting for the system to allocate a processor to run.

(3) Waiting state (blocked): No operating conditions, waiting for the completion of an event.

2. Four state transitions

(1) Running state → waiting state: When a process requests the use and allocation of a resource (such as peripherals) or waits for an event to occur (such as the completion of an I/O operation), it transitions from running state to waiting state condition.

(2) Waiting state → ready state: When the event that the process is waiting for arrives (such as the end of the I/O operation or the end of the interrupt), the interrupt handler must change the state of the corresponding process from the waiting state to the ready state.

(3) Running state→Ready state: The process in the running state has to give up the processor after the time slice is used up, so that the process is converted from the running state to the ready state.

(4) Ready state → running state: After the process in the ready state is scheduled, it obtains the processor resources (distributes the processor time slice), so the process changes from the ready state to the running state.

3. State transition diagram

(1) Three-state model

 

 

(2) Polymorphic model

 

 

Fourth, the scheduling of the process

1. Scheduling method

(1) Non-preemptive scheduling mode (non-preemptive mode) : Simple to implement, with low system overhead, it is suitable for most batch systems, but it cannot be used in time-sharing systems and most real-time systems.

(2) Deprivation scheduling method (preemption method) : Certain principles must be followed, mainly including: priority, short process priority and time slice principle.

2. Scheduling algorithm

(1) First come first serve (FCFS first come first serve) : It is an inalienable algorithm. The algorithm selects one or several jobs that enter the queue first from the backup job queue each time for processing.

Features: simple algorithm, low efficiency, good for long jobs, bad for short jobs.

(2)短作业优先(SJF short job first):算法从后备队列中选择一个或若干个估计运行时间最短的作业处理。直到完成作业或发生某事件而阻塞时,才释放处理机。

缺点:(1)对长作业不利,造成“饥饿”现象(2)未考虑作业紧迫程度(3)由于运行时间是估计所得,所以并不一定能做到短作业优先。

(3)优先级:可分为(1)非剥夺式(2)剥夺式;其中优先级可分为:(1)静态优先级(2)动态优先级

(4)高响应比优先:响应比=(等待时间+处理时间)/处理时间=1+等待时间/处理时间

(5)时间片轮转

 

五、CFS调度器

1.设计思想:根据各个进程的权重分配运行时间

2.虚拟运行时间

vruntime = (调度周期 * 进程权重 / 所有进程总权重) * 1024 / 进程权重

                                                    =调度周期 * 1024 / 所有进程总权重

通过公式可知,所有进程的vruntime增长速度宏观上看是同时推进的,那么就可以用这个vruntime来选择运行的进程,vruntime值越小说明以前占用cpu的时间越短,受到了“不公平”的对待,因此下一个运行进程就是它。这样既能公平选择进程,又能保证高优先级进程获得较多的运行时间。

3.调度实体

调度实体sched_entity,代表一个调度单位,在组调度关闭的时候可以把他等同为进程。
每一个task_struct中都有一个sched_entity,进程的vruntime和权重都保存在这个结构中。所有的sched_entity以vruntime为key插入到红黑树中,同时缓存树的最左侧节点,也就是vruntime最小的节点,这样可以迅速选中vruntime最小的进程。

关系图如下:

 

4.主要代码

(1)创建进程

进程创建时CFS相关变量的初始化:

 

void wake_up_new_task(struct task_struct *p, unsigned long clone_flags)  
{  
    .....  
    if (!p->sched_class->task_new || !current->se.on_rq) {  
        activate_task(rq, p, 0);  
    } else {  
        /* 
         * Let the scheduling class do new task startup 
         * management (if any): 
         */  
        p->sched_class->task_new(rq, p);  
        inc_nr_running(rq);  
    }  
    check_preempt_curr(rq, p, 0);  
    .....  
}  

Linux创建进程使用fork或者clone或者vfork等系统调用,最终都会到do_fork。
如果没有设置CLONE_STOPPED,则会进入wake_up_new_task函数。

(2)唤醒进程

static int try_to_wake_up(struct task_struct *p, unsigned int state, int sync)  
{  
    int cpu, orig_cpu, this_cpu, success = 0;  
    unsigned long flags;  
    struct rq *rq;  
    rq = task_rq_lock(p, &flags);  
    if (p->se.on_rq)  
        goto out_running;  
    update_rq_clock(rq);  
    activate_task(rq, p, 1);  
    success = 1;  
out_running:  
    check_preempt_curr(rq, p, sync);  
    p->state = TASK_RUNNING;  
out:  
    current->se.last_wakeup = current->se.sum_exec_runtime;  
    task_rq_unlock(rq, &flags);  
    return success;  
}  

update_rq_clock就是更新cfs_rq的时钟,保持与系统时间同步。
重点是activate_task,它将进程加入红黑树并且对vruntime做一些调整,然后用check_preempt_curr检查是否构成抢占条件,如果可以抢占则设置TIF_NEED_RESCHED标识。

(3)进程调度

asmlinkage void __sched schedule(void)  
{  
    struct task_struct *prev, *next;  
    unsigned long *switch_count;  
    struct rq *rq;  
    int cpu;  
need_resched:  
    preempt_disable(); //在这里面被抢占可能出现问题,先禁止它 
    cpu = smp_processor_id();  
    rq = cpu_rq(cpu);  
    rcu_qsctr_inc(cpu);  
    prev = rq->curr;  
    switch_count = &prev->nivcsw;  
    release_kernel_lock(prev);  
need_resched_nonpreemptible:  
    spin_lock_irq(&rq->lock);  
    update_rq_clock(rq);  
    clear_tsk_need_resched(prev); //清除需要调度的位  
    /*state==0是TASK_RUNNING,不等于0就是准备睡眠,正常情况下应该将它移出运行队列  
    但是还要检查下是否有信号过来,如果有信号并且进程处于可中断睡眠就唤醒它  
    对于需要睡眠的进程,这里调用deactive_task将其移出队列并且on_rq也被清零*/
    if (prev->state && !(preempt_count() & PREEMPT_ACTIVE)) {  
        if (unlikely(signal_pending_state(prev->state, prev)))  
            prev->state = TASK_RUNNING;  
        else  
            deactivate_task(rq, prev, 1);  
        switch_count = &prev->nvcsw;  
    }  
    if (unlikely(!rq->nr_running))  
        idle_balance(cpu, rq);  
    prev->sched_class->put_prev_task(rq, prev);  
    next = pick_next_task(rq, prev);  
    if (likely(prev != next)) {  
        sched_info_switch(prev, next);  
        rq->nr_switches++;  
        rq->curr = next;  
        ++*switch_count;  
        //完成进程切换
        context_switch(rq, prev, next); /* unlocks the rq */  
        /* 
         * the context switch might have flipped the stack from under 
         * us, hence refresh the local variables. 
         */  
        cpu = smp_processor_id();  
        rq = cpu_rq(cpu);  
    } else  
        spin_unlock_irq(&rq->lock);  
    if (unlikely(reacquire_kernel_lock(current) < 0))  
        goto need_resched_nonpreemptible;  
    preempt_enable_no_resched();  
    //这里新进程也可能有TIF_NEED_RESCHED标志,如果新进程也需要调度则再调度一次  
    if (unlikely(test_thread_flag(TIF_NEED_RESCHED)))  
        goto need_resched;  
}  

(4)时钟中断

时钟中断在time_init_hook中初始化,中断函数为timer_interrupt。

entity_tick函数:更新状态信息,检测是否满足抢占条件。

static void  
entity_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr, int queued)  
{  
    /* 
     * Update run-time statistics of the 'current'. 
     */  
    update_curr(cfs_rq);  
    //....无关代码  
    if (cfs_rq->nr_running > 1 || !sched_feat(WAKEUP_PREEMPT))  
        check_preempt_tick(cfs_rq, curr);  
}  

5.CFS小结

    CFS还有一个重要特点,即调度粒度小。CFS之前的调度器中,除了进程调用了某些阻塞函数而主动参与调度之外,每个进程都只有在用完了时间片或者属于自己的时间配额之后才被抢占。而CFS则在每次tick都进行检查,如果当前进程不再处于红黑树的左边,就被抢占。在高负载的服务器上,通过调整调度粒度能够获得更好的调度性能。

 

六、对Linux进程模型的看法

    普通进程的调度策略和非实时进程相比较为麻烦,因为它不能简单地只看优先级,必须公平的占有CPU,否则容易出现进程饥饿,造成用户响应慢的问题。因此,Linux在发展历程中不断对调度器进行改善,希望寻找一个最接近于完美的调度策略来公平快速地调度进程。CFS是Linux内核2.6.23版本开始采用的进程调度器,核心思想是“完全公平”,它将所有的进程都统一对待,实现了所有进程的公平调度。虽然CFS性能优越,避免了上一代调度器O(1)带来的很多问题,但以Linux精益求精的精神来看,我相信今后将会出现一个更优秀的调度器来取代CFS,满足更多的需求。

 

七、参考资料

1.http://blog.51cto.com/xuding/1741861 Linux进程管理 

2.https://blog.csdn.net/qwe6112071/article/details/70473905 操作系统之进程的状态和转换详解

3.https://blog.csdn.net/yusiguyuan/article/details/39404399 linux内核CFS进程调度策略

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325099438&siteId=291194637