scheduler_tick函数详解

scheduler_tick是调度器中的一个核心重要的函数，它叫做周期调度器，驱动调度器运行的机制之一。

event_handler()-->tick_handle_periodic()->tik_periodic()->update_process_times()-->scheduler_tick()

以上是该函数的调用回溯，由时钟中断驱动着这个函数的运行，每个CPU tick到来时都会执行一次。这里还有一个知识点要涉及到，那就是内核的时间子系统，在SMP环境下，每一个CPU都自己的tick device，这些tick device中有一个被选择做global tick device，global tick device负责维护整个系统的jiffies以及更新基于jiffies进行的全系统统计信息。而本文介绍的scheduler_tick显然是每个CPU自己的tick到来时要执行的操作，因为每个CPU由自己的runquque，管理自己的就绪进程，因此对应的时钟中断肯定也是属于该CPU自身的。

下面将按照代码的顺序介绍它做了什么内容。

void scheduler_tick(void)
{
    int cpu = smp_processor_id();
    struct rq *rq = cpu_rq(cpu);
    struct task_struct *curr = rq->curr;

    sched_clock_tick();  //----------(1)

    raw_spin_lock(&rq->lock);
    update_rq_clock(rq);     //---------(2)
    curr->sched_class->task_tick(rq, curr, 0); //-------(3)
    update_cpu_load_active(rq); //-----------(4)
    raw_spin_unlock(&rq->lock);

    perf_event_task_tick();

#ifdef CONFIG_SMP
    rq->idle_balance = idle_cpu(cpu);
    trigger_load_balance(rq);
#endif
    rq_last_tick_reset(rq);
}

(1)更新调度器使用的clock信息
(2)更新当前runqueue中的clock时间，基于（1）的clock来获取
(3)执行不同调度类中的task_tick回调函数
(4)更新该runqueue相关的CPU load负载，会根据cfs_rq中计算的runnable_load_avg来更新CPU负载，而runnable_load_avg的更新在我的另一篇博客中介绍PELT。

这个函数的关键处理就是调用：

  curr->sched_class->task_tick(rq, curr, 0);

注意它传入的参数，是当前runqueue中的curr变量，也就是当前正在运行着的进程。这里会调用调度类中的task_tick回调，我们主要以CFS调度器来做介绍。

static void task_tick_fair(struct rq *rq, struct task_struct *curr, int queued)
{
    struct cfs_rq *cfs_rq;
    struct sched_entity *se = &curr->se;

    for_each_sched_entity(se) {
        cfs_rq = cfs_rq_of(se);
        entity_tick(cfs_rq, se, queued);  //（1）
    }

    if (numabalancing_enabled)
        task_tick_numa(rq, curr);

    update_rq_runnable_avg(rq, 1);  //（2）
}

其中关键的两个步骤：
（1）执行调度实体的tick函数更新统计量和vruntime
（2）更新runqueue的avg统计量和负载

 static void
 entity_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr, int queued)
 {
     /*
      * Update run-time statistics of the 'current'.
      */
     update_curr(cfs_rq);  //（1）
 
     /*
      * Ensure that runnable average is periodically updated.
      */
     update_entity_load_avg(curr, 1);  //（2）
     update_cfs_rq_blocked_load(cfs_rq, 1);
     update_cfs_shares(cfs_rq);
 
 #ifdef CONFIG_SCHED_HRTICK
     /*
      * queued ticks are scheduled to match the slice, so don't bother
      * validating it and just reschedule.
      */
     if (queued) {
         resched_curr(rq_of(cfs_rq));
         return;
     }
     /*
      * don't let the period tick interfere with the hrtick preemption
      */
     if (!sched_feat(DOUBLE_TICK) &&
             hrtimer_active(&rq_of(cfs_rq)->hrtick_timer))
         return;
 #endif
 
     if (cfs_rq->nr_running > 1)
         check_preempt_tick(cfs_rq, curr); //（3）
 }

（1）update_curr更新当前调度实体的runtime信息，包括exec time实际执行时间，以及vruntime，虚拟时间
（2）更新调度实体的avg负载，以便于给后面runqueue负载计算使用
（3） check_preempt_tick用于判断当前情况是否需要执行系统调度，这个是调度关键函数。

 static void
 check_preempt_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr)
 {
     unsigned long ideal_runtime, delta_exec;
     struct sched_entity *se;
     s64 delta;
 
     ideal_runtime = sched_slice(cfs_rq, curr);
     delta_exec = curr->sum_exec_runtime - curr->prev_sum_exec_runtime;
     if (delta_exec > ideal_runtime) {
         resched_curr(rq_of(cfs_rq));
         /*
          * The current task ran long enough, ensure it doesn't get
          * re-elected due to buddy favours.
          */
         clear_buddies(cfs_rq, curr);
         return;
     }
 
     /*
      * Ensure that a task that missed wakeup preemption by a
      * narrow margin doesn't have to wait for a full slice.
      * This also mitigates buddy induced latencies under load.
      */
     if (delta_exec < sysctl_sched_min_granularity)
         return;
 
     se = __pick_first_entity(cfs_rq);
     delta = curr->vruntime - se->vruntime;
 
     if (delta < 0)
         return;
 
     if (delta > ideal_runtime)
         resched_curr(rq_of(cfs_rq));
 }

sched_slice计算当前进程理论运行的时间片，是一个实际时间，通过比较当前进程实际运行时间delta_exec，如果实际运行时间超过理论得到的时间片，那么说明需要调度了，设置调度标志位后返回，否则需要判断如下条件：
1.当前进程运行时间小于系统要求的最小时间片0.75ms，返回不进行调度行为
2.pick平衡二叉树中最左侧叶子节点的调度实体，比较当前进程和它的vruntime值，如果当前进程vruntime大于最左侧的进程，并且差值超过当前的理论运行时间片，那么也需要设置调度标志位进行调度。

scheduler_tick函数详解

猜你喜欢