CFS scheduler wonderful - everything is a trade-off

Remember once wrote an article called "still can not write an article cfs," I was just quietly enjoy harmonious cfs, but some of the fleeting sentiment is not written would be a great pity, in fact, not to mention what sentiment, but understand nothing, sometimes you instantly realize what is best to write down, or else a long time you will find it hard to understand. cfs scheduler is introduced in the 2.6.23 kernel, how naive and realize at first is simple, designed to represent a virtual clock fair_key queue, in any case always realized a version cfs, to the 2.6.25 later, under cfs becomes easy for them, not the virtual queue structure of the clock, its concept directly into the lower left corner of the field vruntime scheduling entity of the red-black tree, in fact, is to make every scheduling entity (group scheduling situation is not process, said after the process) of vruntime catch up with each other, and then somewhere will be normalized, that is, to achieve the same value. So how do reflect different priorities? This is the most wonderful place in cfs, cfs not say anything priority, and renamed the unified weights, process different weights of its virtual clock to go in a different passage of the same real clock, which is different increments of its vruntime , specifically, cfs uses an empirical value, is 2.5%, that is to say the process priority value increased by 1, its weight reduces the original 2.5 percent, which is consistent with a reference, easier multiplication and division, It became also makes a simple linear relationship between dynamic time slice and priority in cfs, when calculated for each process's virtual clock to change how many advances, uses a simple formula is:

Pace of virtual clock (vruntime increment) = (actual time) * (nice 0 is the weight of the process / process weight) [wherein, nice 0 is the weight value of the process 1024]

This formula is very simple, this provides a fair value based on scheduling cfs, the scheduler always chooses the smallest process vruntime then run it, run for how long? This is the most wonderful thing in cfs, I mean the realization of future kernel version 2.6.25, 2.6.23 I can not see from this wonderful, early cfs using a more complex calculations, in the end run for how long? It is a formula:

Process run time = (system scheduling period) * (the weight of this process / total weight of this run queue)

System scheduling cycle in the above formula is a period of time, is a real period of time, such as regular desktop system as defined in 20ms (5 of processes or less), what does it mean? This means that a process nr equitable sharing this 20ms, how a fair law? According to the weight, the more the higher the who's who weights assigned time slice, the sum of the time all processes assigned to this system is the scheduling period, more than five processes we supposed to do? Easy to handle, according to the proportional increase in the scheduling period on the line, it is in accordance with (the actual number of processes / 5) This ratio can increase, where these words indicate a meaning, that each had a system scheduling cycle that is a system scheduling cycle inside, you can ensure that the system can run each process, the end result is that every scheduling period to the end of the system, the virtual clock vruntime all processes get a return, in other words vruntime all processes will be the same, and then start the next a time slicing cycle divided. This solves a big problem is that the process of hunger in the previous O (1) scheduler, the process will be hungry, why hunger, the first affected by the number of system processes, because the allocation of time slices is fixed, by the priorities and HZ decision, if the number of words a lot of system processes, will lead to even enter expire high priority process starvation, after all, a low-priority process had to deal with the queue, otherwise low-priority process starvation, which causes the scheduler into the dilemma, the second priority is to adjust and interactive process to determine the interaction process can not enter the queue expire, which would result in the starvation process expired queue, simply linux scheduler introduces a hunger detection mechanism, this solves a problem from the standard rather than the present, in any case, this solution seemed reluctant. In fact, there a little, hunger is not necessarily wrong interactive process back into the run queue, interactive time since a genuinely participatory process scheduling few, most of the time is waiting for a keyboard or mouse, even if you type too fast you again soon but cpu clock speed of 1/1000 of it, so the next process if the interaction is precisely the presence of hunger in the process of re-queuing situation before they can blame it, though the process of interaction in this respect can escape detection hunger to blame, but it was very scared the significance of formula, and I'm afraid to ask the author, he is also hard to say clear, who is wrong? Wrong in O (1) scheduler design, essentially O (1) scheduler only in select accounts on the advantages of the process, once the process increased, once the complicated nature of the process, everything will be out of control, the selection process, the insertion process, and the process the team and so are first-class design, Time complexity are reduced to O (1) great complexity, then you know scheduling is not just a team, the team, pick, another thing is to ensure fairness, and everything is a trade-off, though O ( 1) scheduler in the selection, the team, the team has done a great, it is bound to compensate for it in terms of a fair, equitable and therefore guarantee mechanism O (1) scheduler and its complex and discord, cfs perfect solution all this, cfs use of the statistical properties of the red-black tree, the effect is very rods, and it also take into account the process is not running, the virtual time these processes with virtual processes running backward in time is always forward, and the scheduler total will choose the most backward process is running, is wonderful, it all happened in a system scheduling solution within a period of time, cfs is a balanced, although it is not O in the team, the team aspect (1) do well, but on the whole it is very harmonious, very efficient. If the text is unclear expression, then look at the following chart:

Real clock pace

Process 1 (Weight 1) virtual clock pace

Process 2 (power of two) virtual clock pace

Process 3 (Weight 3) virtual clock pace

1/6

2/6

0

0

2/6

2/6

1/6

0

3/6

2/6

1/6

1/9

4/6

2/6

1/6

2/9 (vruntime still minimal, the process is still running 3)

5/6

2/6

2/6

2/9

6/6

2/6

2/6

3/9

7/6

4/6

2/6

3/9

8/6

4/6

3/6

3/9

9/6

4/6

3/6

4/9

10/6

4/6

3/6

5/9

11/6

4/6

4/6

5/9

12/6

4/6

4/6

6/9

13/6

6/6

4/6

6/9

14/6

6/6

5/6

6/9

15/6

6/6

5/6

7/9

16/6

6/6

5/6

8/9

17/6

6/6

6/6

8/9

18/6

6/6

6/6

9/9

 

Chart 1.cfs principle example of a table (in bold running path)

As can be seen from the table, the process of a weight of 1, 2, 2, 3 so that the virtual clock in a real return within one clock interval, the number of running processes and their weights are proportional to here 2 set the clock forward process virtual and real clock pace is consistent, it is a process 2 as a reference.

Yes, cfs perfect solution to everything, so how to compensate the sleeping process cfs is it? Note, cfs no longer distinguish between this sleep is io sleep or interact sleep because scheduling period cfs can ensure that the process is always a period of time to run all the process had interactive process under the O (1) scheduler of so they need to compensate is the fear of hunger, and starvation response means that the interaction processes slow down, now you do not need all of this, but the process needs to compensate for sleep is of interest because they may have more urgent tasks to be done, traditional the UNIX scheduler is to do so, more general reason is that, after waking up the general spirit is better, and more to do some work should be, there is to wake up the process of compensation for linux cfs code:

if (sched_feat(NEW_FAIR_SLEEPERS)) {

unsigned long thresh = sysctl_sched_latency; // This is the scheduling period

if (sched_feat(NORMALIZED_SLEEPER))

thresh = calc_delta_fair (thresh, se); // This compensation reduced proportionally by time guaranteeing fairness

vruntime - = thresh; // process compensation

}

vruntime = max_vruntime (se-> vruntime, vruntime); // ensure virtual clock will not be back

The above comments are intended to ensure fair what it is? That is to ensure that no matter how much its weight to ensure that the same number of compensation run, that just woke up process must ensure that multi-run n times, n represents a weight of 1024 nice 0 processes running within a scheduling period frequency. Why wake adjust the process vruntime, just to compensate for it anyway? Do not! To compensate, the best way is to retain vruntime it before going to bed, sleep for a long time if it is, then this is far less than vruntime vruntime certain other current run queue process, the result of this process just woke up desperate to catch up the end result is cfs to be fair, all the processes waiting for it, it will lead to other processes cpu prolonged occupation of hunger, things can not be too absolute, since sleep, and compensation is there, but the punishment should be, it is deprived of its original punishment Some vruntime, direct it to the front, but not all deprived, it still needs to do more things, is it a corresponding increase in the number of runs, starving other processes it performs up to three times the time, and this is a trade-off, a compromise, hunger and reward tradeoff.

If you look at the theory, then it is simple, it should all understand, but found the red-black tree insertion process when looking at the code when the key is not used vruntime process, but se-> vruntime - cfs_rq-> min_vruntime , how to explain it? We look vruntime type, is usigned long type, and then look at key type is signed long type, because the process is a virtual time incremental value, so it will not be negative, but it has its the upper limit is the maximum that can be represented unsigned long, if spilled, it will start from 0 to roll back, if so, what happens? The result is very serious ah, that would be putting the cart before, such as the following examples to illustrate the problem unsigned char:

unsigned char a = 251,b = 254;

b += 5;

// this is determined the size of a and b

Look at the example above, b rolled back, resulting in a far greater than b, in fact, the result should be a real b than a big 8, how did the real result? Replaced by the following:

unsigned char a = 251,b = 254;

b += 5;

signed char c = a - 250,d = b - 250;

Analyzing this size // c, and d

The results correctly, to have this effect, but the process of how vruntime not deal with the overflow unsigned long type it? Because of this action is to promote vruntime virtual clock, and no other use, it may not care, but it really deal with the overflow problem, however, when calculating the key can not be red-black tree care, so minus a minimum vruntime all of the key processes surrounding the minimum vruntime so much easier to track. Min_vruntime role is to deal with the run queue overflow problem, its main function is to track the red-black tree can be found in the lower-left most of the process at any time. Now look insert a new process occurred case, simply put, it is to delay the progress of the entire cfs run queue, because a process of accession, and where the future of this process applied to join it? That it should be red-black tree in the key is how much? In order to make it run faster new process should not be added in the lower left position, but can be the most lower-left corner of locations? No, as there is no equivalent basis to seize the position of most of the processes that are running, and the current minimum of red-black tree vruntime value is min_vruntime values ​​have been given the next most worthy of running processes, alive there is a possibility that vruntime currently running process is min_vruntime, then the solution to the problem is based on min_vruntime on adding value vruntime as a new process, what value it? It is clear that:

static u64 sched_vslice(struct cfs_rq *cfs_rq, struct sched_entity *se)

{

return calc_delta_fair(sched_slice(cfs_rq, se), se);

}

sched_slice calculated each time the process se should promote the real clock, but calc_delta_fair should promote virtual clock, to process the figure of 2, the end result is 1/6 to 3 process, the result is 1/9, which means that , so far written off virtual clock advance, beginning from the new process to compete, what is the next one, the next one is virtual clock the current running process is complete after this time, and a new process to participate in the competition, it is a such. Since then, all Enron. Look at __update_curr in vruntime update vruntime in how each process, is to use the real clock call calc_delta_fair intervals, here it is sched_slice (cfs_rq, se) call, that is to say that the new process scheduler cfs has been running sched_slice (cfs_rq, se), but sched_slice (cfs_rq, se) is the ideal time for each run of this new process, the so-called pre-run commitment is to let the process run sched_slice a later plug it into the red-black tree weighed, meaning that all processes have been consistent virtual clock to be fair, only when the process will produce a run after the sched_slice unfair, unfair indeed occurred only, cfs scheduler will go to the most compensation should be compensated, to compensate this things considered happen, in fact, the normal operation of the process of running the sched_slice after scheduling will occur, then there will be a put_prev_task schedule before schedule the pick_next_task, which is the current process into the team process, its key value by min_vruntime difference vruntime and cfs_rq to define, which is the unfair trade-off after, Saying in a process before scheduling subject, that is, before being weighed, it must be into the team, only the team into the room after the pre-commitment into the team before, when a new process is inserted, cfs scheduler has promised to think through the process of running , how much of it run? Is currently running a period of time sufficient to allow the scheduler to weigh cfs, it is sched_slice

Cfs look at the problem of hunger, if the direct use vruntime as key value, cfs theory can ensure that no starvation occurs, but the key is to calculate the difference between two equally incremental value, how can it guarantee that a key process once great and vruntime time is relatively large, but relatively little difference between the two values ​​of the process to jump the queue it? It is simply this min_vruntime timely updates and normalization process to stop all of this, remember, in the insertion process time, using the same min_vruntime, so is found by subtracting the same value, the same size relationship, and ultimately It will not lead to hunger, in the following snippet:

while (*link) {

parent = *link;

entry = rb_entry(parent, struct sched_entity, run_node);

if (key < entity_key(cfs_rq, entry)) {

...

}

System scheduling cycle cfs scheduler can be configured for desktop systems, the most important is the response rather than performance, and therefore very little scheduling period can cope with, that is 20ms, but for the server, the maximum dispatches period to 1s.

Published 158 original articles · won praise 115 · views 370 000 +

Guess you like

Origin blog.csdn.net/yiyeguzhou100/article/details/103935947