In the last article, we analyzed the main code of CFS, and the designed contents are:

How the scheduler initializes a process when the process is created
How the process is added to the CFS run queue
When a process is added to the CFS run queue, how to choose the next process to run

This section continues to analyze how a process is preempted around the life cycle of a process? How to sleep? How was it dispatched?

Schedule_tick (periodical scheduling)

Periodic scheduling means that the Linux kernel will update the running time of the current process at every tick. It has been determined whether the current process needs to be scheduled.

Update_process_times will be called in the processing function of the clock interrupt, and finally will be called to the scheduler_tick function related to the scheduler

void scheduler_tick(void)
{
	int cpu = smp_processor_id();
	struct rq *rq = cpu_rq(cpu);
	struct task_struct *curr = rq->curr;
	struct rq_flags rf;

	sched_clock_tick();

	rq_lock(rq, &rf);

	update_rq_clock(rq);
	curr->sched_class->task_tick(rq, curr, 0);
	cpu_load_update_active(rq);
	calc_global_load_tick(rq);
	psi_task_tick(rq);

	rq_unlock(rq, &rf);

	perf_event_task_tick();

#ifdef CONFIG_SMP
	rq->idle_balance = idle_cpu(cpu);
	trigger_load_balance(rq);
#endif
}

Get the running queue rq on the current CPU, and call the task_tick function in the process scheduling class according to the scheduling class sched_class. Here we only describe the CFS scheduling class

static void task_tick_fair(struct rq *rq, struct task_struct *curr, int queued)
{
	struct cfs_rq *cfs_rq;
	struct sched_entity *se = &curr->se;

	for_each_sched_entity(se) {
		cfs_rq = cfs_rq_of(se);
		entity_tick(cfs_rq, se, queued);
	}

	if (static_branch_unlikely(&sched_numa_balancing))
		task_tick_numa(rq, curr);

	update_misfit_status(curr, rq);
	update_overutilized_status(task_rq(curr));
}

Obtain the scheduling entity se through the current task_struct, then obtain the CFS running queue according to the scheduling entity se, and do further operations through the entity_tick function

static void
entity_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr, int queued)
{
	/*
	 * Update run-time statistics of the 'current'.
	 */
	update_curr(cfs_rq);

	/*
	 * Ensure that runnable average is periodically updated.
	 */
	update_load_avg(cfs_rq, curr, UPDATE_TG);
	update_cfs_group(curr);


	if (cfs_rq->nr_running > 1)
		check_preempt_tick(cfs_rq, curr);
}

update_curr has been analyzed before, this function is mainly to update the execution time of the current current process, vruntime and min_vruntime of the CFS run queue
update_load_avg is mainly used to update the load of the scheduling entity and the load of the CFS running queue, which are described in detail in the load section
If the current number of CFS run queues is greater than 1, you need to stick to whether you need to preempt the current process.

static void
check_preempt_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr)
{
	unsigned long ideal_runtime, delta_exec;
	struct sched_entity *se;
	s64 delta;

	ideal_runtime = sched_slice(cfs_rq, curr);
	delta_exec = curr->sum_exec_runtime - curr->prev_sum_exec_runtime;
	if (delta_exec > ideal_runtime) {
		resched_curr(rq_of(cfs_rq));
		/*
		 * The current task ran long enough, ensure it doesn't get
		 * re-elected due to buddy favours.
		 */
		clear_buddies(cfs_rq, curr);
		return;
	}

	/*
	 * Ensure that a task that missed wakeup preemption by a
	 * narrow margin doesn't have to wait for a full slice.
	 * This also mitigates buddy induced latencies under load.
	 */
	if (delta_exec < sysctl_sched_min_granularity)
		return;

	se = __pick_first_entity(cfs_rq);
	delta = curr->vruntime - se->vruntime;

	if (delta < 0)
		return;

	if (delta > ideal_runtime)
		resched_curr(rq_of(cfs_rq));
}

sched_slice is used to obtain the ideal running time of the current process in a scheduling cycle
sum_exec_runtime represents the total execution time in this scheduling, which is updated every time in update_curr
prev_sum_exec_runtime represents the last scheduled time and is set in the pick_next function.
The delta_exec time represents the actual running time in this scheduling cycle.
If the time running time is greater than the rational scheduling time, it means that the scheduling time has exceeded expectations and needs to be scheduled, you need to set the need_resched flag
If the running time of the time is less than sysctl_sched_min_granularity, no scheduling is required. sysctl_sched_min_granularity This value guarantees the minimum running time in a scheduling cycle
Find the leftmost scheduling entity se from the CFS red-black tree. Compare the vruntime of the current process with the vruntime of se
If delta is less than 0, it means that the current process vruntime is smaller than the latest vruntime.
If it is greater than ideal_runtime, if it is greater than the ideal time, it means that the running time has exceeded too much, and it needs to be scheduled.

Process sleep

When a process has to abandon the CPU because of waiting for resources, it will choose to schedule itself. For example, when the serial port is waiting for data to be sent, you have to give up the CPU, let other processes occupy the CPU, and use the CPU for maximum resources. Usually the process that needs to sleep will use the schedule function to give up the CPU

asmlinkage __visible void __sched schedule(void)
{
	struct task_struct *tsk = current;

	sched_submit_work(tsk);
	do {
		preempt_disable();
		__schedule(false);
		sched_preempt_enable_no_resched();
	} while (need_resched());
}

static void __sched notrace __schedule(bool preempt)
{
	cpu = smp_processor_id();
	rq = cpu_rq(cpu);
	prev = rq->curr;

	if (!preempt && prev->state) {
		if (signal_pending_state(prev->state, prev)) {
			prev->state = TASK_RUNNING;
		} else {
			deactivate_task(rq, prev, DEQUEUE_SLEEP | DEQUEUE_NOCLOCK);
			prev->on_rq = 0;

       ........
}

When a process schedules a schedule function, the parameter passed is flase. False means that preemption is not currently taking place. Previously, the state of the process was described in the basic concept of the process. When the state of the process is running, it is equal to 0, and the rest is non-zero. Then through the deactivate_task function, the current process is removed from the rq.

void deactivate_task(struct rq *rq, struct task_struct *p, int flags)
{
	if (task_contributes_to_load(p))
		rq->nr_uninterruptible++;

	dequeue_task(rq, p, flags);
}

static inline void dequeue_task(struct rq *rq, struct task_struct *p, int flags)
{

	p->sched_class->dequeue_task(rq, p, flags);
}

Finally, it is called to the dequeue_task function in the scheduling class belonging to the process. Here is the CFS scheduling class as an example.

static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
{
	struct cfs_rq *cfs_rq;
	struct sched_entity *se = &p->se;
	int task_sleep = flags & DEQUEUE_SLEEP;

	for_each_sched_entity(se) {
		cfs_rq = cfs_rq_of(se);
		dequeue_entity(cfs_rq, se, flags);

                cfs_rq->h_nr_running--;
       }
}

Obtain the scheduling entity of the process, and then obtain the CFS running queue to which the scheduling entity belongs, and delete the scheduling entity from the CFS running queue through the dequeue_entity function

static void
dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
{
	/*
	 * Update run-time statistics of the 'current'.
	 */
	update_curr(cfs_rq);

	/*
	 * When dequeuing a sched_entity, we must:
	 *   - Update loads to have both entity and cfs_rq synced with now.
	 *   - Subtract its load from the cfs_rq->runnable_avg.
	 *   - Subtract its previous weight from cfs_rq->load.weight.
	 *   - For group entity, update its weight to reflect the new share
	 *     of its group cfs_rq.
	 */
	update_load_avg(cfs_rq, se, UPDATE_TG);
	dequeue_runnable_load_avg(cfs_rq, se);

	update_stats_dequeue(cfs_rq, se, flags);

	clear_buddies(cfs_rq, se);

	if (se != cfs_rq->curr)
		__dequeue_entity(cfs_rq, se);
	se->on_rq = 0;

When a CFS Runxing queue is removed from a scheduling entity product, the following things need to be done
Update the load of scheduling entities and CFS run queues
Subtract the load of the scheduling entity from CFS_rq-> runnable_avg
Subtract the weight of scheduling entity and group scheduling weight, etc.
Call the __dequeue_entity function to remove the scheduling entity that needs to be removed from the CFS red-black tree
Then update the value of on_rq equal to 0, indicating that this scheduling entity is no longer in the CFS ready queue.

Wake up a process

Before forking a new process, finally wake up a process through wake_up_new_task. This function has talked about how to add a process to the CFS ready queue in the previous article.

void wake_up_new_task(struct task_struct *p)
{
    p->state = TASK_RUNNING;

    activate_task(rq, p, ENQUEUE_NOCLOCK);
    p->on_rq = TASK_ON_RQ_QUEUED;
	
    check_preempt_curr(rq, p, WF_FORK);
}

This function will be added to the ready queue through activate_task, and the check_preempt_curr function is used to check whether the awakened process can force the current process. Because a wake-up process may be a higher-priority real-time process, the current process is an ordinary process, etc., may occur.

void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags)
{
	const struct sched_class *class;

	if (p->sched_class == rq->curr->sched_class) {
		rq->curr->sched_class->check_preempt_curr(rq, p, flags);
	} else {
		for_each_class(class) {
			if (class == rq->curr->sched_class)
				break;
			if (class == p->sched_class) {
				resched_curr(rq);
				break;
			}
		}
	}

}

If the scheduling class of the awakened process and the scheduling class of the currently running process are the same, the check_preempt_curr callback in the scheduling class is called
If the scheduling class of the awakened process is different from the currently running scheduling class. If the current process is an ordinary process and the real-time process is awakened here, the reshced_curr function is directly called to set the need_sched flag for the current process and dispatch it at the next dispatch point.
If the current process is of a lower scheduling class than the awakened process, you need to set the scheduling flag to schedule the current process
If the current process and the wake-up process scheduling class are the same, then check whether the scheduling is required by the check_preempt_curr function
If the current process has a higher scheduling class than the awakened process, then nothing is done

static void check_preempt_wakeup(struct rq *rq, struct task_struct *p, int wake_flags)
{
    if (wakeup_preempt_entity(se, pse) == 1) 
           resched_curr(rq);

}

Determine whether the current process can be forced by wakeup_preempt_entity, and set the need_sched flag if possible


/*
 * Should 'se' preempt 'curr'.
 *
 *             |s1
 *        |s2
 *   |s3
 *         g
 *      |<--->|c
 *
 *  w(c, s1) = -1
 *  w(c, s2) =  0
 *  w(c, s3) =  1
 *
 */

static int
wakeup_preempt_entity(struct sched_entity *curr, struct sched_entity *se)
{
	s64 gran, vdiff = curr->vruntime - se->vruntime;

	if (vdiff <= 0)
		return -1;

	gran = wakeup_gran(se);
	if (vdiff > gran)
		return 1;

	return 0;
}

The first parameter is the scheduling entity of the current process, and the second parameter is the scheduling entity of the awakened process
The value of vdiff is the difference between the virtual time of the current scheduling entity and the virtual time of the wake-up process scheduling entity
If vdiff is less than 0, it means that the virtual time of the current process is less than the wake-up vruntime, so it is not preempted
wakeup_gran is used to calculate the vruntime of the wake-up scheduling entity in sysctl_sched_wakeup_granularity time.
Probably means that the scheduling entity of the current process is less than the scheduling entity of the wake-up process and the value is greater than gran, then you can choose scheduling
If the difference between the vrumtime of the current process and the wake-up process does not reach the gran, it is not selected, and it can be clearly seen through the comment

to sum up

When a process is created by fork, the corresponding scheduling class will be set for this process in the sched_fork function, the priority will be set, and the value of vruntime will be updated
At this time, you need to add the process to the ready queue. For the CFS ready queue, you need to add it to the CFS red and black tree. The vruntime of the tracking process is added as the key value. Because a new process is added to the ready queue, the load and weight of the entire ready queue will change, and it needs to be recalculated.
When added to the ready queue, you need to select a new process through the callback of pick_next. The strategy is to choose the process of CFS red black tree vruntime to run
When this process runs for a period of time, it will use the schedule_tick function to determine whether the running time of the current process exceeds the ideal time, if it exceeds it, it will be dispatched
Or when this process needs to wait for system resources, it will also give up the CPU through the schedule function, then the process will be removed from the CFS ready queue, removing a process will also change the weight and load of the entire CFS run queue. , You need to recalculate
When the resource is ready, you need to wake up the current process. When you wake up, you need to check whether the current process will be preempted by the high-priority process. If there is a high-priority scheduling class, preemption will occur, if it is the same scheduling class, you need to Determine whether the value of vruntime is greater than a range, and if so, set the scheduling flag.

Loopers

Published 187 original articles · won 108 · 370,000 views

His message board concerns

CFS scheduling main code analysis two

Schedule_tick (periodical scheduling)

Process sleep

Wake up a process

to sum up

Guess you like