Operating system's cpu scheduling algorithm

One, linux scheduling algorithm

CFS Scheduler Completely Fair Scheduler. This is also introduced in the 2.6 kernel, specifically 2.6.23, that is, starting from this version, the kernel uses CFS as its default scheduler. CFS does not calculate priority, but determines who will schedule it by calculating the CPU time consumed by the process (virtual CPU time after normalization). So as to reach the so-called fairness.

  • Absolute fairness:
    cfs defines a new model. The basic idea is very simple. He regards the CPU as a resource and records the usage of the resource by each process. When scheduling, the scheduler always chooses to consume Run the process with the least resources. This is the so-called "complete fairness". But this absolute fairness is sometimes unfair, because the work of some processes is more important than other processes, and we hope to allocate CPU resources according to weights.

  • Relative fairness:
    In order to distinguish processes with different priorities, the running time will be allocated according to the weight of each process. The calculation formula for the running time of the process is: running time allocated to the process = scheduling period * process weight / sum of all process weights.

Implementation principle:
Each CPU has a run queue cfs_rq (there is a process in the ready state), Linux uses a red-black tree (for multi-core scheduling, in fact, each core has its own red-black tree), record When the vruntime of each process needs to be scheduled, select a process with the smallest vruntime from the red-black tree to run. Specifically, let's look at the conversion formula from actual running time to vruntime

vruntime = actual running time * 1024 / process weight.

Weight: For each process, there is an integer static_prio that represents the static priority set by the user, which corresponds to the nice value in the kernel. The weight is determined by the nice value. Specifically, there is a one-to-one correspondence between the weight and the nice value of the process, and nice can be dynamically adjusted by the kernel.

//nice值共有40个,与权重之间,每一个nice值相差10%左右。
static const int prio_to_weight[40] = {
    
    
    /* -20 */ 88761, 71755, 56483, 46273, 36291,
    /* -15 */ 29154, 23254, 18705, 14949, 11916,
    /* -10 */ 9548, 7620, 6100, 4904, 3906,
    /* -5 */ 3121, 2501, 1991, 1586, 1277,
    /* 0 */ 1024, 820, 655, 526, 423,
    /* 5 */ 335, 272, 215, 172, 137,
    /* 10 */ 110, 87, 70, 56, 45,
    /* 15 */ 36, 29, 23, 18, 15,
};

vruntime: In fact, vruntime is to standardize the actual running time according to the weight. After standardization, the resource consumption of each process can be directly known by comparing vruntime. For example, if the vruntime of a process is relatively small, we can know that the process consumes CPU The resources are relatively small, and the CPU resources are consumed on the contrary. With the concept of vruntime, the scheduling algorithm is very simple. Whoever has a smaller vruntime value means that it has occupied the cpu for a short time and has been treated "unfairly", so the next running process is it. In this way, processes can be selected fairly, and high-priority processes can get more running time. This is the main idea of ​​CFS. Or it can be understood this way: The idea of ​​CFS is to let the vruntime of each scheduling entity (process in the case of packet scheduling, and later on) catch up with each other, and the vruntime of each scheduling entity increases at different speeds, and the greater the weight increases The slower is, the more CPU execution time can be obtained in this way. In terms of specific implementation, Linux uses a red-black tree (for multi-core scheduling, in fact, each core has its own red-black tree) to record the vruntime of each process. When scheduling is needed, select one from the red-black tree The smallest process of vruntime comes out and runs.
Insert picture description here

  1. Reset the vruntime value when the dormant process is awakened, and give a certain compensation based on the min_vruntime value, but not too much.

  2. The wake-up preemption feature of CFS is determined by the WAKEUP_PREEMPT bit of sched_features. Since the dormant process will be compensated by vruntime when it wakes up, it is a high probability event that it has the ability to preempt the CPU when it wakes up. This is also the original intention of the CFS scheduling algorithm, which is to ensure the response speed of the interactive process, because the interactive process waits User input will sleep frequently. In addition to interactive processes, processes that actively sleep will also be compensated when they wake up. For example, by calling sleep(), nanosleep(), they wake up regularly to complete specific tasks. Such processes often do not require fast response, but CFS does not distinguish them from interactive processes. They also get vruntime compensation every time they wake up, which may cause other more important application processes to be preempted, degrading overall performance. After disabling the wake-up preemption feature, the newly awakened process will not immediately preempt the running process, but will wait until the running process runs out of time slice.

  3. In order to avoid too much consumption caused by too short process switching, CFS sets the minimum time value for the process to occupy the CPU, sched_min_granularity_ns. If the process running on the CPU does not have this time, it cannot be transferred from the CPU.

  4. On a multi-CPU system, the load of different CPUs is different. Some CPUs are busier, and each CPU has its own run queue. The vruntime of the process in each queue also goes faster or slower, such as We compare the min_vruntime value of each run queue, there will be differences: if a process migrates from a CPU with a smaller min_vruntime (A) to a CPU with a larger min_vruntime (B), it may take advantage of the CPU (B) The vruntime of the process in the run queue is generally larger, and the migrated process will get more CPU time slices. This is obviously not fair. CFS does this:

    • When a process comes out of a CPU's run queue (dequeue_entity), its vruntime must be subtracted from the min_vruntime value of the queue;
    • When a process joins another CPU's run queue (enqueue_entiry), its vruntime must be added to the min_vruntime value of the queue.
  5. If the initial value of the vruntime of the new process is 0, which is much smaller than the value of the old process, it will maintain the advantage of preempting the CPU for a long time, and the old process will starve to death, which is obviously unfair. CFS does this: the run queue cfs_rq of each CPU maintains a min_vruntime field to record the minimum vruntime value of all processes in the run queue. The initial vruntime value of the new process is set based on the min_vruntime of the run queue in which it is located. Keep within a reasonable gap with the old process.

    The setting of the initial value of the vruntime of the new process is related to two parameters:
    sched_child_runs_first: specifies that the child process runs before the parent process after the fork;
    the START_DEBIT bit of sched_features: specifies that the first run of the new process must be delayed.

Two, windows scheduling algorithm

Windows scheduling unit is thread, which adopts preemptive scheduling based on dynamic priority, combined with time quota adjustment.

  • Ready threads enter the corresponding queue according to priority
  • The system always selects the ready thread with the highest priority to run
  • Each thread of the same priority is scheduled in a round-robin time slice
  • Allow multiple threads to run in parallel in a multi-CPU system

Priority: The Windows kernel uses 32 priority levels to indicate the urgency of thread execution, which is represented by a number from 0 to 31. According to the different priority functions, it can be divided into 3 groups: 16 real-time priority levels (16~31), 15 variable priority levels (1~15), 1 system priority level (0), clearing memory pages Thread reservation.

Insert picture description here

Time quota: A time quota is not a time length value, but an integer called a quantum unit. When a thread runs out of its time quota, if there is no other thread with the same priority, Windows will assign a new time quota to the thread and let it continue to run.

Conditions that trigger thread scheduling:

  • The priority of a thread has changed
  • A thread changes its Affinity processor set

In the following 5 situations, Windows will increase the current priority of the thread:

  • I/O operation completed
  • Semaphore or event waiting to end
  • A thread in the foreground process completes a waiting operation
  • Wake up the window thread due to window activity
  • The thread has been in the ready state for more than a certain period of time and has not run-"starvation" phenomenon

Scheduling strategy: active switching, preemption, exhaustion of time quota
Insert picture description here
Insert picture description here
Insert picture description here

3. Load balancing of multi-core scheduling

When a program is running, it only runs on one CPU core? Or alternately run on multiple CPU cores? How does the Linux kernel schedule processes among multiple cores? In fact, if you have not done special treatment to your process, it is possible for the LINUX kernel to run it on multiple CPU processors. This is the load balancing of the kernel. There is a runqueue queue on each processor, which means that the process linked list on this processor is in the run state. In a multi-processor kernel, there will be multiple runqueues, and if their sizes are very uneven, they will be triggered The kernel's load_balance function. This function will move too many processes on a certain CPU processor to a CPU processor with relatively few runqueue elements.

When will load balancing occur?

1. When there is no runnable process in the runqueue on cpu1. This is easy to understand, cpu1 has nothing to do, then load_balance will be called on cpu1 and found that there are still many processes waiting to run on cpu0, then it will find the highest priority process from the runnable processes on cpu0 , Get your runqueue to start execution.

2. The first situation does not apply to the situation where the run queue has not been empty. For example, there are always 10 runnable processes on cpu0 and 1 runnable process on cpu1. Obviously, the processes on cpu0 have been treated unfairly, and the time it takes to get the cpu is much shorter. In this case, load_balance will never be called. So, in fact, every time a clock beat, the kernel will call the scheduler_tick function, and this function will do many things, such as reducing the time slice of the currently executing process, and at the end of the function will call the rebalance_tick function. The rebalance_tick function determines how often load balancing is performed.

​ When the idle flag is SCHED_IDLE, it means that the current CPU processor is idle, and load_balance will be called very frequently (1, 2 clock beats), otherwise it means that the current CPU is not idle and will be called very frequently load_balance (10-100ms). The specific value depends on the interval above. If you have not done special treatment to your process, it is possible for the LINUX kernel to run it on multiple CPU processors. However, sometimes if we want our process to always run on a certain CPU processor, Can it be done? The kernel provides such system calls. The system call sched_getaffinity will return the cpu mask used by the current process, and sched_setaffinity can set the number of cpu processors that the process can only execute on.

Fourth, the affinity of CPU scheduling

In a multi-core CPU structure, each core has its own L1, L2 cache, and the L3 cache is shared. If a process switches back and forth between cores, the cache hit rate of each core will be affected. On the contrary, if the process can always be executed on one core no matter how it is scheduled, the hit rate of the L1 and L2 caches of its data can be significantly improved.

Insert picture description here
In the Linux system, you can use the CPU_* series of functions and sched_setaffinity() to realize the binding of the binding process and the core.

Guess you like

Origin blog.csdn.net/u014618114/article/details/107581668