Overview

The unique description of each process is stored in the memory and connected together by several structures.

What the scheduler has to do is to share CPU time between processes, creating the illusion of parallel execution of processes.

This includes two parts:

1. Scheduling strategy;

2. Context switching;

According to different scheduling strategies, the scheduler is divided into different types. The article mainly introduces the complete fair scheduler. It will select the process with the highest waiting time and provide the CPU to the process. The schematic diagram is as follows:

Regarding scheduling strategies, there are several practical issues:

1. Processes have different priorities;

2. The process cannot be switched too frequently, because the switching itself also takes up overhead;

There are two ways of scheduling, one is direct scheduling, such as the process intends to sleep or give up the CPU for other purposes; the other is to use a periodic mechanism to check whether it is necessary to schedule at a fixed frequency.

The entire scheduler is divided into several subsystems, summarized as follows:

The red box is the main body of the scheduler, which is called the general scheduler. It interacts with two components: the scheduler class and the context switch:

1. The scheduler class is really used to determine which process will be scheduled. It can be implemented in a modular manner, corresponding to different scheduling strategies;

2. Context switching closely interacts with the CPU;

achieve

The following describes the code implementation of the scheduler. Since there are many codes involved, only the simplest one is introduced here.

Entrance

The implementation of the scheduler is based on two functions: the periodic scheduler function and the main scheduler function.

The schedule() function is the main scheduler function:

/*
 * schedule() is the main scheduler function.
 */
asmlinkage void __sched schedule(void)

It uses the direct scheduling method. In many parts of the kernel, if you want to allocate the CPU to other processes, this function will be called directly:

static void __lock_sock(struct sock *sk)
{
	DEFINE_WAIT(wait);

	for (;;) {
		prepare_to_wait_exclusive(&sk->sk_lock.wq, &wait,
					TASK_UNINTERRUPTIBLE);
		spin_unlock_bh(&sk->sk_lock.slock);
		schedule();
		spin_lock_bh(&sk->sk_lock.slock);
		if (!sock_owned_by_user(sk))
			break;
	}
	finish_wait(&sk->sk_lock.wq, &wait);
}

scheduler_tick() is a periodic scheduling function:

/*
 * This function gets called by the timer code, with HZ frequency.
 * We call it with interrupts disabled.
 *
 * It also gets called by the fork code, when changing the parent's
 * timeslices.
 */
void scheduler_tick(void)

Obviously it uses a periodic scheduling method:

/*
 * Called from the timer interrupt handler to charge one tick to the current
 * process.  user_tick is 1 if the tick is user time, 0 for system.
 */
void update_process_times(int user_tick)
{
	struct task_struct *p = current;
	int cpu = smp_processor_id();

	/* Note: this timer irq context must be accounted for as well. */
	account_process_tick(p, user_tick);
	run_local_timers();
	if (rcu_pending(cpu))
		rcu_check_callbacks(cpu, user_tick);
	scheduler_tick();
	run_posix_cpu_timers(p);
}

Related structure

When discussing the implementation of the scheduler, the members of the task_struct structure of the process structure will be mentioned first:

	int prio, static_prio, normal_prio;
	struct list_head run_list;
	const struct sched_class *sched_class;
	struct sched_entity se;
	unsigned int policy;
	cpumask_t cpus_allowed;
	unsigned int time_slice;
	unsigned int rt_priority;

prio, static_prio, normal_prio are the priority of the process, prio and normal_prio are dynamic priorities; static_prio is the static priority, which is the priority assigned when the process starts; normal_prio is the priority calculated based on the static priority of the process and the scheduling strategy ; The priority considered by the scheduler is prio;

rt_priority is the priority of the real-time process;

sched_class represents the scheduler class to which the process belongs;

se is a schedulable entity; the scheduler is not limited to scheduling processes, it can also implement process group scheduling, it is a schedulable entity; in fact, the scheduler operates a schedulable entity, because the scheduling entity is embedded in the process structure , So the process itself is also a schedulable entity;

policy saves the scheduling policy for the process, with the following values:

/*
 * Scheduling policies
 */
#define SCHED_NORMAL		0
#define SCHED_FIFO		1
#define SCHED_RR		2
#define SCHED_BATCH		3
/* SCHED_ISO: reserved but not implemented yet */
#define SCHED_IDLE		5

NORMAL is used for ordinary processes and is handled by a completely fair scheduler;

BATCH and IDLE are also processed by the complete fair scheduler, but can be used for secondary processes;

RR and FIFO are used to realize soft real-time process;

cpus_allowed is used on multi-processor systems to limit which CPUs the process can run on;

run_list and time_slice are required by the cyclic real-time scheduler, but not used for the complete fair scheduler;

Scheduler class

As mentioned earlier, the scheduler class is modularized. Each modular scheduler implements the following structure (include\linux\sched.h):

struct sched_class {
	const struct sched_class *next;

	void (*enqueue_task) (struct rq *rq, struct task_struct *p, int wakeup);
	void (*dequeue_task) (struct rq *rq, struct task_struct *p, int sleep);
	void (*yield_task) (struct rq *rq);

	void (*check_preempt_curr) (struct rq *rq, struct task_struct *p);

	struct task_struct * (*pick_next_task) (struct rq *rq);
	void (*put_prev_task) (struct rq *rq, struct task_struct *p);

#ifdef CONFIG_SMP
	unsigned long (*load_balance) (struct rq *this_rq, int this_cpu,
			struct rq *busiest, unsigned long max_load_move,
			struct sched_domain *sd, enum cpu_idle_type idle,
			int *all_pinned, int *this_best_prio);

	int (*move_one_task) (struct rq *this_rq, int this_cpu,
			      struct rq *busiest, struct sched_domain *sd,
			      enum cpu_idle_type idle);
#endif

	void (*set_curr_task) (struct rq *rq);
	void (*task_tick) (struct rq *rq, struct task_struct *p);
	void (*task_new) (struct rq *rq, struct task_struct *p);
};

The scheduler of each strategy implements the above structure, such as the complete fair scheduler (kernel\sched_fair.c):

/*
 * All the scheduling class methods:
 */
static const struct sched_class fair_sched_class = {
	.next			= &idle_sched_class,
	.enqueue_task		= enqueue_task_fair,
	.dequeue_task		= dequeue_task_fair,
	.yield_task		= yield_task_fair,

	.check_preempt_curr	= check_preempt_wakeup,

	.pick_next_task		= pick_next_task_fair,
	.put_prev_task		= put_prev_task_fair,

#ifdef CONFIG_SMP
	.load_balance		= load_balance_fair,
	.move_one_task		= move_one_task_fair,
#endif

	.set_curr_task          = set_curr_task_fair,
	.task_tick		= task_tick_fair,
	.task_new		= task_new_fair,
};

The interface description provided by the scheduler class is as follows:

enqueue_task : add a new process to the ready queue;

dequeue_task : remove a process from the ready queue;

yield_task : When the process wants to give up control of the processor, it will execute the system call sched_yield, and this system call will execute this interface;

check_preempt_curr : Use a new process to preempt the current process;

pick_next_task : used to select the next process to be run;

put_prev_task : Called before replacing the current operation with another process;

ser_curr_task : called when the scheduling strategy of the process changes;

task_tick : Called by the periodic scheduler every time the periodic scheduler is activated;

new_task : used to establish the association between the fork system call and the scheduler;

Ready queue

The main data structure used by the general scheduler to manage active processes is called the ready queue (the following figure also appeared before):

Its corresponding structure is as follows:

/*
 * This is the main, per-CPU runqueue data structure.
 *
 * Locking rule: those places that want to lock multiple runqueues
 * (such as the load balancing or the thread migration code), lock
 * acquire operations must be ordered by ascending &runqueue.
 */
struct rq {
	/* runqueue lock: */
	spinlock_t lock;

	/*
	 * nr_running and cpu_load should be in the same cacheline because
	 * remote CPUs use both these fields when doing load calculation.
	 */
	unsigned long nr_running;
	#define CPU_LOAD_IDX_MAX 5
	unsigned long cpu_load[CPU_LOAD_IDX_MAX];
	unsigned char idle_at_tick;
#ifdef CONFIG_NO_HZ
	unsigned char in_nohz_recently;
#endif
	/* capture load from *all* tasks on this cpu: */
	struct load_weight load;
	unsigned long nr_load_updates;
	u64 nr_switches;

	struct cfs_rq cfs;
#ifdef CONFIG_FAIR_GROUP_SCHED
	/* list of leaf cfs_rq on this cpu: */
	struct list_head leaf_cfs_rq_list;
#endif
	struct rt_rq rt;

	/*
	 * This is part of a global counter where only the total sum
	 * over all CPUs matters. A task can increase this counter on
	 * one CPU and if it got migrated afterwards it may decrease
	 * it on another CPU. Always updated under the runqueue lock:
	 */
	unsigned long nr_uninterruptible;

	struct task_struct *curr, *idle;
	unsigned long next_balance;
	struct mm_struct *prev_mm;

	u64 clock, prev_clock_raw;
	s64 clock_max_delta;
        // 后略

Each CPU has its own ready queue, these queues are in the runqueues array:

static DEFINE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues);

priority

The priority of the process is an issue that the scheduler needs to consider.

The following table summarizes the results of process priority calculations for different types of processes:

In addition to priority, the importance of a process also considers load weight, which is manifested in schedulable entities:

/*
 * CFS stats for a schedulable entity (task, task-group etc)
 *
 * Current field usage histogram:
 *
 *     4 se->block_start
 *     4 se->run_node
 *     4 se->sleep_start
 *     6 se->load.weight
 */
struct sched_entity {
	struct load_weight	load;		/* for load-balancing */

The relationship between priority and load weight:

Context switch

After the kernel selects a new process, it must deal with the technical details related to multitasking. These details are collectively referred to as context switching.

It is completed by the function context_switch() (kernel\sched.c):

/*
 * context_switch - switch to the new MM and the new
 * thread's register state.
 */
static inline void
context_switch(struct rq *rq, struct task_struct *prev,
	       struct task_struct *next)

In fact, it is in the schedule() function.

context_switch() mainly contains two functions:

switch_mm(): Replace the memory management context;

switch_to(): switch processor register and kernel stack;

"In-depth Linux Kernel Architecture" Reading Notes 007-Scheduler