Table of contents
The concept part only explains
CFS
the scheduler
Highlights of CFS Scheduler
basic concept
Linux 2.6
In my opinion, it is a relatively perfect version, and 2.4
the 2.6
biggest difference lies in CFS
the introduction of the scheduler
CFS
Due to the introduction of priority, cpu
resource allocation (that is, actual running time runtime
) is allocated based on weight. The mapping between weight weight
and priority nice
is as follows. Priority and weight are inversely proportional.
static const int prio_to_weight[40] = {
/* -20 */ 88761, 71755, 56483, 46273, 36291,
/* -15 */ 29154, 23254, 18705, 14949, 11916,
/* -10 */ 9548, 7620, 6100, 4904, 3906,
/* -5 */ 3121, 2501, 1991, 1586, 1277,
/* 0 */ 1024, 820, 655, 526, 423,
/* 5 */ 335, 272, 215, 172, 137,
/* 10 */ 110, 87, 70, 56, 45,
/* 15 */ 36, 29, 23, 18, 15,
};
The actual running time runtime
is calculated as
r u n t i m e = s c h e d u l e p e r i o d × w e i g h t i / ∑ i = 1 n w e i g h t i ( i = 1 , 2... n ) runtime = scheduleperiod \times weight_i / \sum_{i=1}^{n} weight_i(i=1,2...n) runtime=scheduleperiod×weighti/i=1∑nweighti(i=1,2...n)
A virtual running time is set herevruntime
v r u n t i m e = r u n t i m e × ( 1024 / w e i g h t i ) vruntime = runtime\times(1024/weight_i) v r u n t im e=runtime×(1024/weighti)
Simplification shows that the weight of each scheduling entity vruntime
does not change due to changes in its own weight. Therefore, from a macro perspective, the weight of each scheduling entity vruntime
should be the same in each scheduling cycle. This is an ideal state.
v r u n t i m e = s c h e d u l e p e r i o d × ( 1024 / ∑ i = 1 n w e i g h t i ( i = 1 , 2... n ) ) vruntime = scheduleperiod \times (1024/\sum_{i=1}^{n} weight_i(i=1,2...n)) v r u n t im e=scheduleperiod×(1024/i=1∑nweighti(i=1,2...n))
When a certain scheduling entity enters the blocking or sleeping state due to some reasons, it will actively give up the time slice, causing it to remain vruntime
unchanged temporarily, while other scheduling entities obtain the time slice and start running, causing it to vruntime
increase. This creates an asymmetry, which is unfair, so the vruntime
smallest process needs to be scheduled at the next process switch.
runtime
Why is it completely fair to have high priority and low priority assigned differently ?
Because fairness is based vruntime
on the logic of , not runtime
the logic of , cfs
it ensures that each scheduling entity is vruntime
equal. If there is a smaller one, vruntime
it will be scheduled first.
The higher priority one is runtime
bigger, the lower priority one runtime
is smaller, but it vruntime
's the same, so in this case the lower priority one is actually the clock has a higher decay rate
some problems
vruntime
Is the initial value of the new process 0
?
- When the child process is created,
vruntime
the initial value is first set tomin_vruntime
- If bits
sched_features
are set inSTART_DEBIT
,vruntime
it willmin_vruntime
be increased based on - After setting the child process
vruntime
, checksched_child_runs_first
the parameters. If it is 1, compare the parent process and the child processvruntime
. If the parent processvruntime
is smaller, swap the parent and child processesvruntime
. This ensures that the child process will be before the parent process. run
Does the value of the hibernating process vruntime
remain unchanged?
Reset vruntime
the value when the hibernation process is awakened, and min_vruntime
give a certain amount of compensation based on the value, but not too much.
Can the time slice occupied by a process be infinitesimally small?
CFS
Sets the minimum time value for the process to occupy the CPU. sched_min_granularity_ns
If the process running on the CPU is less than this time, it cannot be transferred away CPU
.
liuzixuan@10-60-73-159:~$ cat /proc/sys/kernel/sched_min_granularity_ns
1500000
Does it change when a process is moved from one CPU
to another ?CPU
vruntime
When a process comes out of one CPU's run queue, it vruntime
subtracts min_vruntime
the value of the queue; and when a process joins another CPU
's run queue, it vruntime
adds min_vruntime
the value of that queue. In this way, processes remain relatively fair after CPU
migrating from one to anotherCPU
vruntime
vruntime
What should I do if infinite accumulation occurs and overflow occurs?
The red-black tree is key
not the smallest one in the red-black tree vruntime
. Subtract the smallest one to surround all processes with the smallest one . In other words, only the relative sizes are compared.vruntime-min_vruntime
min_vruntime
key
vruntime
key
vruntime
static inline int less(u32 left, u32 right)
{
return (less_eq(left, right) && (mod(right) != mod(left)));
}
Source code and supplements
Scheduling classes and scheduling strategies
Linux
Scheduling class: Allocate computing power as needed to provide maximum fairness to each process in the system
fair_sched_class
:CFS
Completely fair scheduleridle_sched_class
: Each processor has one idle thread, that is,0
thread numberrt_sched_class
: Maintain a queue for each scheduling priority
struct sched_class {
const struct sched_class *next;
void (*enqueue_task) (struct rq *rq, struct task_struct *p, int wakeup);
void (*dequeue_task) (struct rq *rq, struct task_struct *p, int sleep);
void (*yield_task) (struct rq *rq);
void (*check_preempt_curr) (struct rq *rq, struct task_struct *p, int sync);
struct task_struct * (*pick_next_task) (struct rq *rq);
void (*put_prev_task) (struct rq *rq, struct task_struct *p);
#ifdef CONFIG_SMP
int (*select_task_rq)(struct task_struct *p, int sync);
unsigned long (*load_balance) (struct rq *this_rq, int this_cpu,
struct rq *busiest, unsigned long max_load_move,
struct sched_domain *sd, enum cpu_idle_type idle,
int *all_pinned, int *this_best_prio);
int (*move_one_task) (struct rq *this_rq, int this_cpu,
struct rq *busiest, struct sched_domain *sd,
enum cpu_idle_type idle);
void (*pre_schedule) (struct rq *this_rq, struct task_struct *task);
int (*needs_post_schedule) (struct rq *this_rq);
void (*post_schedule) (struct rq *this_rq);
void (*task_wake_up) (struct rq *this_rq, struct task_struct *task);
void (*set_cpus_allowed)(struct task_struct *p,
const struct cpumask *newmask);
void (*rq_online)(struct rq *rq);
void (*rq_offline)(struct rq *rq);
#endif
void (*set_curr_task) (struct rq *rq);
void (*task_tick) (struct rq *rq, struct task_struct *p, int queued);
void (*task_new) (struct rq *rq, struct task_struct *p);
void (*switched_from) (struct rq *this_rq, struct task_struct *task,
int running);
void (*switched_to) (struct rq *this_rq, struct task_struct *task,
int running);
void (*prio_changed) (struct rq *this_rq, struct task_struct *task,
int oldprio, int running);
#ifdef CONFIG_FAIR_GROUP_SCHED
void (*moved_group) (struct task_struct *p);
#endif
};
static const struct sched_class fair_sched_class; // 公开调度类
static const struct sched_class idle_sched_class; // 空闲调度类
static const struct sched_class rt_sched_class; // 实时调度类
Linux
Scheduling strategy: deciding when and how to select a new process to CPU
run
SCHED_NORMAL
: Ordinary process scheduling strategy, allowing scheduling entities tocfs
run through the schedulerSCHED_FIFO
: Real-time process scheduling strategy, first-in-first-out scheduling algorithmSCHED_RR
: Real-time process scheduling strategy, time slice rotation algorithmSCHED_BATCH
: Ordinary process scheduling strategy, batch processing, so that scheduling entitiescfs
run through the schedulerSCHED_IDLE
cfs
: Ordinary process scheduling strategy, so that the scheduling entity runs through the scheduler with the lowest priority
#define SCHED_NORMAL 0
#define SCHED_FIFO 1
#define SCHED_RR 2
#define SCHED_BATCH 3
#define SCHED_IDLE 5
sched_getscheduler()
A certain process scheduling policy can be obtained through system calls
Priority classification
Regarding the priority issue , priority is generally divided into static priority and dynamic priority.
- Static priority:
100
Used to139
represent the static priority of an ordinary process. It is used to evaluate the degree of scheduling between this process and other ordinary processes in the system. It essentially determines the basic time slice of the process. - Dynamic priority:
100
Used to139
represent the dynamic priority of ordinary processes, which is the number used by the scheduler when selecting a new process to run.
Balancing rq in multiprocessor systems
A principle: no runnable process can appear in two or more run queues at the same time
Scheduling domain: is a set whose workload should be balanced by the kernel. Its composition is similar to a radix tree. Each scheduling domain is divided into one or more groups in turn, and each group is a subset cpu
of the pending scheduling domain. cpu
Balancing of workload is always done between groups in the scheduling domain
cpu
All physical sched_domain
descriptors in the system are placed in each cpu
variablephys_domains
static DEFINE_PER_CPU(struct static_sched_domain, phys_domains);
Their initialization is in each machine directory
/* sched_domains SD_NODE_INIT for SGI IP27 machines */
#define SD_NODE_INIT (struct sched_domain) {
\
.parent = NULL, \
.child = NULL, \
.groups = NULL, \
.min_interval = 8, \
.max_interval = 32, \
.busy_factor = 32, \
.imbalance_pct = 125, \
.cache_nice_tries = 1, \
.flags = SD_LOAD_BALANCE \
| SD_BALANCE_EXEC \
| SD_WAKE_BALANCE, \
.last_balance = jiffies, \
.balance_interval = 1, \
.nr_balance_failed = 0, \
}