CFS Scheduler (CFS Scheduler)

Earlier we shared the implementation principles of O (n) and O (1) schedulers, and also learned about the defects and problems faced by each scheduler. In general, the O (1) scheduler appears to solve the problem that the O (n) scheduler cannot solve, and the deformation of the O (1) scheduler in the Linux2.4 kernel on the server is feasible, but Linux2. After the 4th, with the gradual prevalence of mobile devices, the face of the Caton problem gradually became clear, which led to the emergence of the CFS scheduler.

 

In this section, we will focus on the implementation of the CFS scheduler. Before learning the CFS code, we first look at the implementation principle of CFS, understand its ins and outs, and why the CFS scheduler needs to be designed in this way, you can basically master the CFS scheduler. .

 

Introduction of CFS

The Complete Fair Scheduler (CFS) was first merged into the Linux 2.6.23 version in 2017, and has been the default scheduler in the system until now. The sched-design-CFS.txt document in the kernel article has a brief introduction to the CFS scheduler.

80% of CFS's design can be summed up in a single sentence: CFS basically models
an "ideal, precise multi-tasking CPU" on real hardware.

The meaning of this sentence is that 80% of the design of CFS is summed up in a sentence "Realize fair and accurate multi-task CPU on a real hardware"

What does the phrase "ideal, precise, multitasking CPU" mean? How do you understand it? Let's explain through examples

"Ideal multi-tasking CPU" is a (non-existent  :-)) CPU that has 100% physical power and which can run each task at precise equal speed, in parallel, each at 1/nr_running speed.  For example: if there are 2 tasks running, then it runs
each at 50% physical power --- i.e., actually in parallel.

The kernel documentation says so. "Ideal, multi-task CPU" is that each task runs at 1 / nr_running_speed at the same time, that is, each process divides the CPU time at the same time. For example, if there are two processes running, each process occupies 50% of the CPU time.

As an example:

Two batch processes can only run for a total of 10ms.

Actual situation : each process runs for 5ms and occupies 100% CPU utilization

Ideal situation : each process runs for 10ms and occupies 50% of CPU utilization.

The ideal situation is the "Ideal multi-tasking CPU" mentioned by CFS

The above example is completely impossible on a single-core CPU and a CPU with only one processing core, because only one process can run at a time, and another process must wait. And CFS is to achieve complete fair scheduling, what should it do?

 

How to achieve complete fair scheduling

In the O (n) scheduler and O (1) scheduler, we know that the corresponding timeslice, that is, the time slice, is assigned by priority. And these time slices are fixed. For example, the time slice corresponding to nice0 in the O (n) scheduler is 60 ms. In the CFS scheduler, there is no longer the concept of time slices. Instead, the runnable time of the process is calculated based on the total number of runnable processes in the current system.

In the O (n) scheduler and the O (1) scheduler, ordinary processes obtain the corresponding time slices by the nice value. The larger the nice value, the more time slices are obtained, and the more opportunities for running. In the CFS scheduler, the concept of weight is introduced. The nice value is converted into the corresponding weight. The higher the priority, the greater the corresponding weight, which means more CPU time can be obtained.

                                   Then the CPU time of the CPU time = weight of process / total weight of runnable processprocess = the weight of the process / the total weight of the process that can be run

CFS is to allow the process to achieve fairness by running for a period of time. The time occupied by the process is the ratio of the weight of the process to the total weight of the total executable process.

Example: a total of 10ms of time, single-core CPU

  • The priorities of the processes are the same:

If the two processes have the same priority and the corresponding weights are the same, each process takes 5ms of CPU time; if there are 5 processes, each process takes 2ms of CPU time; if there are 10 processes in total, each process 1ms CPU time.

  • The priority of the process is different:

If the priorities of the two processes are different, for example, the nice value of the A process is 0, and the nice value of B is 1, the priority of the A process is higher and the weight is larger. Suppose A has a weight of 6, and B has a weight of 4. Then A takes 2/3 of the CPU time and B takes 1/3 of the CPU time.

In this way, fairness is achieved, and each process occupies different shares of CPU time under the weight ratio of each child.

Let's take an example in combination with life:

The company issues year-end awards. Generally speaking, the total package ( CPU time ) of a department is fixed. In order to be fair, the boss will not give everyone the same bonus, which is unfair. It is measured by the performance in the company, the serious attitude of the work ( the weight of the process ), for example, Zhang XX is very hard, often works overtime, the process travels, and the year-end bonus (the process takes up CPU time ) is more frequent. Liu XX is often late and no one is off work. The year-end bonus (the process takes up CPU time ) is rarely sent. This seems fair.

 

How the CFS scheduler selects processes

The goal of CFS is to allow each process to achieve fairness over a period of time, which is to divide CPU time according to the weight of the process. The greater the weight, the more CPU time is divided, and the more CPU time allocated, the greater the chance of getting CPU.

CFS scheduling is to select the process to run through the virtual time vruntime of the process. The calculation formula of vruntime is as follows:

vruntime = (wall_time * NICE0_TO_weight) / weight

Among them, wall_time represents the actual running time of the process, NICE0_TO_Weight represents the weight corresponding to the value of nice equal to 0, weight is the corresponding weight of the process. It can be seen that the value of vruntime is actually the actual running time multiplied by the weight corresponding to nice0 divided by the weight of the process.

/*
 * Nice levels are multiplicative, with a gentle 10% change for every
 * nice level changed. I.e. when a CPU-bound task goes from nice 0 to
 * nice 1, it will get ~10% less CPU time than another CPU-bound task
 * that remained on nice 0.
 *
 * The "10% effect" is relative and cumulative: from _any_ nice level,
 * if you go up 1 level, it's -10% CPU usage, if you go down 1 level
 * it's +10% CPU usage. (to achieve that we use a multiplier of 1.25.
 * If a task goes up by ~10% and another task goes down by ~10% then
 * the relative distance between them is ~25%.)
 */
const int sched_prio_to_weight[40] = {
 /* -20 */     88761,     71755,     56483,     46273,     36291,
 /* -15 */     29154,     23254,     18705,     14949,     11916,
 /* -10 */      9548,      7620,      6100,      4904,      3906,
 /*  -5 */      3121,      2501,      1991,      1586,      1277,
 /*   0 */      1024,       820,       655,       526,       423,
 /*   5 */       335,       272,       215,       172,       137,
 /*  10 */       110,        87,        70,        56,        45,
 /*  15 */        36,        29,        23,        18,        15,
};

This table is the conversion of nice value and weight. This table has been calculated. When you need to calculate vruntime in the code, you only need to check the table based on the nice value.

It can be seen from the notes that when Nice increases a step, it will reduce the CPU time by 10%; when Nice decreases by a step, it will get 10% CPU time.

From the above formula for calculating vruntime, it can be concluded that the virtual time of the process where nice is equal to 0 is equal to the physical time. When the weight of a process is larger, the vruntime of the corresponding process is smaller; when the weight of a process is smaller, the corresponding vruntime is larger.

The principle of CFS scheduling is to always select the process with the smallest vruntime for scheduling. The greater the weight of the process with the smallest vruntime and the higher the priority, the higher the CPU time.

 

For example : in a total of 6ms, there are 3 processes, one process A has a weight of 1024, another process B has a weight of 335, and process C has a weight of 3121

Process A vruntime = (2ms * 1024) / 1024 = 2ms, CPU usage = 1024 / (1024 + 335 + 3121) = 22%

Process B vruntime = (2ms * 1024) / 335 = 6ms, CPU usage = 335 / (1024 + 335 + 3121) = 7%

Process C vruntime = (2ms * 1024) / 3121 = 0.65ms, CPU usage = 3121 / (1024 + 335 + 3121) = 70%

As can be seen

  1. Each CPU utilization rate is different by 50%, because each time the nice value increases by one step, the CPU utilization rate has a difference of 10%
  2. The greater the weight of the process, the greater the denominator, the smaller the vruntime, and the higher priority the next time the process is selected
  3. The virtual time of the process with nice0 = 1024 weight is the same as the physical time
  4. It can be understood that the greater the weight, the smaller the virtual time, and the faster the corresponding virtual timeline runs
  5. The smaller the weight, the greater the virtual time, the slower the corresponding virtual timeline runs

 

Scheduling period (sched_period)

I said before that the CPU time occupied by a process is based on the ratio of the weight of the process to the total weight of the process that can be run in the system.

                    CPU time of the CPU time = weight of process / total weight of runnable processprocess = weight of the process / total weight of runnable processes

For example, two processes with the same priority have a total time of 10ms, and each process takes 5ms. When the number of processes that can run in the system gradually increases, the CPU time occupied by each process will become smaller and smaller, approaching zero. This will lead to frequent context switching before the process. Most of the CPU time is used to handle the context switching of the process, resulting in a decrease in system efficiency.

So for this problem, a scheduling cycle is introduced in CFS. The calculation of the scheduling cycle is as follows

/*
 * The idea is to set a period in which each task runs once.
 *
 * When there are too many tasks (sched_nr_latency) we have to stretch
 * this period because otherwise the slices get too small.
 *
 * p = (nr <= nl) ? l : l*nr/nl
 */
static u64 __sched_period(unsigned long nr_running)
{
	if (unlikely(nr_running > sched_nr_latency))
		return nr_running * sysctl_sched_min_granularity;
	else
		return sysctl_sched_latency;
}

static unsigned int sched_nr_latency = 8;
unsigned int sysctl_sched_latency			= 6000000ULL;
unsigned int sysctl_sched_min_granularity			= 750000ULL;

Judging from the comments, the purpose of this function is to allow each process to run once. When the number of processes in the system gradually increases, you need to increase the scheduling period.

When the number of processes is less than 8, the scheduling period is equal to the scheduling delay is equal to 6ms. When the number of processes in the system is greater than 8, the scheduler cycle is equal to the number of processes times 0.75ms. sysctl_sched_min_granularity can be understood as a process that guarantees to execute at least 0.75ms in a scheduling cycle.

 

CFS summary:

  • In both O (n) and O (1) schedulers, fixed time slices are assigned by nice values. There is no concept of time slices in CFS
  • The process weight is calculated by the static priority of the process in the CFS scheduler. The process weight represents the proportion of CPU time that the process needs to obtain
  • Calculate the vruntime virtual time of the process by the weight of the process and the actual running time of the process.
  • When a process is added to the run queue, the scheduler will always update the vruntime of the process to achieve fairness
  • Each time the scheduler schedules, it only selects the process with the smallest virtual time in the run queue. When this process runs for a period of time, the vruntime will become larger
  • At this time, when you need to schedule, you need to reselect the new minimum vruntime process to execute. The last scheduled process needs to choose its position in the run queue according to the value of vrumtime.
Published 187 original articles · won 108 · 370,000 views

Guess you like

Origin blog.csdn.net/longwang155069/article/details/104512696