Linux O (n) scheduler

Earlier we learned a few points that the design of the scheduler needs to pay attention to, review here:

  1. Throughput (corresponding to CPU-consuming processes)
  2. Response speed (corresponding to IO consumption process)
  3. Fairness, to ensure that every process can have the opportunity to run
  4. Power consumption of mobile devices

Design of scheduler in Linux, the concept introduced

  1. Common process and real-time process use priority to distinguish, 0-99 means real-time process, 100-139 means ordinary process
  2. The real-time process uses two scheduling strategies SCHED_RR or SCHED_FIFO
  3. Ordinary processes use nice values ​​to dynamically adjust the priority of ordinary processes
  4. Processes that often sleep try to increase the next priority, often occupying the CPU properly to reduce the priority

In this section, we will first learn the design of early scheduling algorithms in Linux, starting with the earliest scheduler algorithm. The time complexity of this scheduler is O (n), so it can also be called O (n) scheduling algorithm. The kernel version we chose is linux-2.4.19.

 

Implementation principle of O (n) scheduler

O (n) represents the time complexity of finding a suitable process. The scheduler defines a runqueue run queue, which will be added to the run queue when the process status is changed to Run. Of course, whether it is a real-time process or a normal process will be added to this run queue. When a suitable process needs to be run from the run queue, it is necessary to traverse from the head to the tail of the queue, so the time complexity of finding a suitable process is O (n), when the number of processes in the run queue gradually increases Larger, the efficiency of the scheduler will be significantly reduced.

The processes in the run queue are not in order, and the real-time processes and ordinary processes are disorderly sorted in it. When the scheduler needs to select the next process, it needs to traverse from the beginning, compare the priority of each process, and run with the highest priority first. Of course, only when the real-time process is completed can it be the ordinary process.

struct task_struct structure

struct task_struct {

    long counter;
    long nice;
    unsigned long policy;
    int processor;

    unsigned long cpus_runnable, cpus_allowed;
}
  • Counter represents the time slice of the process, which is the time that the process can run with in a scheduling cycle.
  • Nice represents the static priority of this process. With the macro NICE_TO_TICKS, the corresponding nice value can be converted into a corresponding time slice and stored in the counter
  • Policy is the scheduling strategy of the process. The real-time process uses SCHED_RR or SCHED_FIFO. Ordinary process uses SCHED_OTHER
    • SCHED_RR: the same priority is rotated, different priority or high priority is scheduled first
    • SCHED_FIFO: The same priority adopts the first-come-first-served order, that is, if the process scheduled first is not completed, the latter can only be queued. Different priorities are also high priority. If the high-priority real-time process is not finished, the low-priority process cannot be run.
  • pocessor: represents the processor on which the current process is running and will be used in the SMP system
  • cpu_allowed: represents that the current process is allowed to run on those CPUs.

Time slice calculation

The O (n) scheduler uses the TICK method to calculate the corresponding time slice based on the nice value of the corresponding process.

#if HZ < 200
#define TICK_SCALE(x)	((x) >> 2)
#elif HZ < 400
#define TICK_SCALE(x)	((x) >> 1)
#elif HZ < 800
#define TICK_SCALE(x)	(x)
#elif HZ < 1600
#define TICK_SCALE(x)	((x) << 1)
#else
#define TICK_SCALE(x)	((x) << 2)
#endif

#define NICE_TO_TICKS(nice)	(TICK_SCALE(20-(nice))+1)

The nice value range is -20 ~ +19, the smaller the value, the higher the priority. The default nice value of the process is 0, then the default static priority of the process is equal to 20.

We use 100HZ to calculate the time slice that can be occupied by the next process for each nice value.

nice value -20 -10 0 +10 +19
100HZ 11tick 8tick 6tick 3tick 1tick
Time slice 110ms 80ms 60ms 30ms 10ms

Of course, these time slices are calculated according to the static priority, and when the process is running, it will make a compensation for the sleeping process.

O (n) scheduler algorithm core

Select a process with the highest priority from the run queue

	list_for_each(tmp, &runqueue_head) {
		p = list_entry(tmp, struct task_struct, run_list);
		if (can_schedule(p, this_cpu)) {
			int weight = goodness(p, this_cpu, prev->active_mm);
			if (weight > c)
				c = weight, next = p;
		}
	}

It is to traverse one by one from the runqueue run queue. The can_schedule function is used to determine whether the current process can run on this_cpu, which is for the SMP system.

The main core algorithm is to find the process with the highest priority in the goodness function.

static inline int goodness(struct task_struct * p, int this_cpu, struct mm_struct *this_mm)
{

	/*
	 * Non-RT process - normal case first.
	 */
	if (p->policy == SCHED_OTHER) {
		
		weight = p->counter;
		if (!weight)
			goto out;
			
#ifdef CONFIG_SMP
		/* Give a largish advantage to the same processor...   */
		/* (this is equivalent to penalizing other processors) */
		if (p->processor == this_cpu)
			weight += PROC_CHANGE_PENALTY;
#endif

		/* .. and a slight advantage to the current MM */
		if (p->mm == this_mm || !p->mm)
			weight += 1;
		weight += 20 - p->nice;
		goto out;
	}

}
  • The code snippet above is for ordinary processes. If the scheduling strategy is SCHED_OTHER, it corresponds to an ordinary process. If weigt = 0 means that there is no time slice for this process, just jump out
  • In the SMP system, if this process was previously running on the current CPU, due to the characteristics of the cache, it will increase the corresponding time slice for this type of CPU, corresponding to punish other processes
  • If this process and the current process share a mm_struct structure, or the current process is a kernel thread, increase the time slice.
  • Under normal circumstances, the dynamic priority of the ordinary process = the remaining time slice + the static priority of the process

The real-time process is simple and rough, directly adding 1000 to the static priority of the real-time process, because the static priority of each real-time process is different.

weight = 1000 + p->rt_priority;

 

Process time slice initialization

As time goes by, the time slices of all processes may run. At this time, all processes need to be initialized once.

	/* Do we need to re-calculate counters? */
	if (unlikely(!c)) {
		struct task_struct *p;

		spin_unlock_irq(&runqueue_lock);
		read_lock(&tasklist_lock);
		for_each_task(p)
			p->counter = (p->counter >> 1) + NICE_TO_TICKS(p->nice);
		read_unlock(&tasklist_lock);
		spin_lock_irq(&runqueue_lock);
		goto repeat_schedule;
	}

That is, when no process can be found from the run queue, the counter will be re-initialized for all processes at this time. Of course, the sleep process may not complete the time slice, you need to add the remaining time slice of the sleep process. However, in order to prevent the accumulated priority of the IO-consuming process of sleep from being too high, you need to divide half.

Time slice update

The tick interrupt in the system will update the time slice of the current process.

void update_process_times(int user_tick)
{
	struct task_struct *p = current;
	int cpu = smp_processor_id(), system = user_tick ^ 1;

	update_one_process(p, user_tick, system, cpu);
	if (p->pid) {
		if (--p->counter <= 0) {
			p->counter = 0;
			p->need_resched = 1;
		}
		if (p->nice > 0)
			kstat.per_cpu_nice[cpu] += user_tick;
		else
			kstat.per_cpu_user[cpu] += user_tick;
		kstat.per_cpu_system[cpu] += system;
	} else if (local_bh_count(cpu) || local_irq_count(cpu) > 1)
		kstat.per_cpu_system[cpu] += system;
}

When each tick break comes, counter will be decremented by 1. If the value of counter is 0, it means that the time slice has been used up. You need to set the need_resced flag. At the scheduling point, it will determine whether the current process sets this value. If it is set, it will be scheduled.

Problems faced by the O (n) scheduler

  • The time complexity is O (n). The performance is ok when there are few processes in the system, but when the processes in the system gradually increase, the time to select the next process is gradually increased. And when there is no running process in the system, re-initializing the time slice of the process is also quite time-consuming, in the case of many processes in the system.
  • SMP expansion problem. When you need to picknext the next process, you need to lock the entire runqueue queue, spin_lock_irq (& runqueue_lock); when there are more processes in the system, the time in the critical section is longer, causing the remaining CPU to spin More wasteful
  • The running efficiency of the real-time process, because the real-time process and the ordinary process are in a list, every time the real-time process is checked, the entire list needs to be scanned, resulting in the real-time process is not very "real-time"
  • CPU resource waste problem: because there is only one runqueue in the system, when the number of processes in the run queue is less than the number of CPUs, the remaining CPUs are almost idle, wasting resources
  • Cache cache problem: When the process in the system is gradually reduced, the process that originally ran on CPU1 has to run on CPU2. As a result, when running on CPU2, the cacheline is almost blank, affecting efficiency.
  • In short, the O (n) scheduler has many problems, but there are definitely problems to be solved. Therefore, O (1) scheduler was introduced in Linux2.6.
Published 187 original articles · won 108 · 370,000 views

Guess you like

Origin blog.csdn.net/longwang155069/article/details/104428914