Analysis and Research of Linux 2.6.23 Process Model and Algorithm

I. Introduction

At present, the most popular operating system besides windows is the operating system based on LinUx. For beginners in operating systems this course Linux has some unique advantages:

  • Linux is a free operating system with excellent performance.
  • Linux has excellent performance and its source code is open, so that users in different fields and different levels can cut down the Linux kernel according to their application requirements.
  • High reliability, good stability and good portability. Compared with the current popular operating systems, Linux is very stable and reliable, which has been affirmed by many users.
  • Extensive auxiliary documentation.

Based on linux2.6.23, this paper studies and analyzes the process organization, state transition and scheduling algorithm CFS (Completely Fair Scheduler).

The organization of the process

The Linux kernel uses  task_struct data structures to associate all process-related data and structures. All algorithms of the Linux kernel involving processes and programs are built around this data structure, which is one of the most important data structures in the kernel. This data structure is defined in the kernel file  include/linux/sched.h .

2.1 ID type of the process

To understand how the kernel organizes and manages process IDs, you must first know the types of process IDs:

  • PID : This is a number assigned to a process that uniquely identifies a process in its namespace in Linux, called the process ID number, or PID for short. Processes spawned when using the fork or clone system calls are assigned a new and unique PID value by the kernel.
  • TGID : In a process, if the process created by calling clone with the CLONE_THREAD flag is a thread of the process, they are in a thread group, and the ID of the thread group is called TGID. All processes in the same thread group have the same TGID; the TGID of the thread group leader is the same as its PID; if a process does not use threads, its TGID and PID are also the same.
  • PGID : In addition, independent processes can form process groups (using the setpgrp system call). Process groups can simplify the operation of sending signals to all processes in the group. For example, processes connected by pipes are in the same process group. The process group ID is called PGID, and all processes in the process group have the same PGID, which is equal to the PID of the group leader.
  • SID : Several process groups can be combined into a session group (using the setsid system call), which can be used for terminal programming. All processes in the session group have the same SID.

2.2 Hash table of process IDs

During the operation of the system, there may be hundreds or thousands of processes running. At this time, the efficiency of process search is particularly important. For example, the system administrator uses the kill 1024 command to terminate the process with PID=1024. Export the corresponding process descriptor for processing. In order to improve the search efficiency, the kernel uses 4 hash tables, PIDTYPE_PID, PIDTYPE_TGID, PIDTYPE_PGID, and PIDTYPE_SID, to index process descriptors. Why 4? In this way, we can use pid, tgid, pgrp, and session to find the process. The four hash tables are described as follows:

 

 

 In the kernel, these four hash tables occupy a total of 16 page frames, that is, each hash table occupies 4 page frames, and each of them can have 2048 entries. The address is saved to the pid_hash array. Taking the pid hash table as an example, how to save 32767 PID values ​​in 2048 entries, the kernel will process each assigned PID value, and the value of the result is the corresponding entry, and the processing result is the same The PIDs are strung together into a linked list, as follows:

When the kill 29384 command is used, the kernel will process 29384 to obtain 199, and then use 199 as the subscript to obtain the corresponding linked list header in the PID hash table, and find the process with PID=29384 in this linked list. The same is true for the other three hash tables.

3. Process state transition

3.1 The Linux process status is:

TASK_RUNNING : Ready state or running state, the process is ready to run, but not necessarily occupying the CPU, the R corresponding to the process state

TASK_INTERRUPTIBLE: Sleep state, but the process is in shallow sleep and can respond to signals. Generally, the process is in the state where the process actively sleeps, corresponding to the process state S

TASK_UNINTERRUPTIBLE: Sleep state, deep sleep, do not respond to signals, the typical scenario is that the process acquires semaphore blocking, corresponding to process state D

TASK_ZOMBIE: Zombie state, the process has exited or ended, but the parent process does not yet know, the state when there is no recycling, corresponding to the process state Z

TASK_STOPED: stop, debugging state, corresponding to process state T

3.2 The conversion relationship between processes, as shown in the following figure:

3.3 Process scheduling timing:

Process scheduling will cause process state transition. From the above figure, it can be seen that the following conditions will trigger scheduling. When the process terminates or the process sleeps, it actively exits or sleeps to release the CPU; the light sleep process is selected by CFS scheduling to wake up, and the deep sleep process locks due to the semaphore. It is awakened by the release of waiting; the process receives a semaphore, etc.; there is also one of the most common interrupts, exceptions.

Fourth, the scheduling of the process

4.1 Overview of System Scheduling

The process scheduling module is one of the key subsystems of Linux, which decides which process to process the system to allocate system resources such as CPU to. If the CPU is the heart of the computer, then process scheduling is the soul of the computer. The essence of scheduling is the allocation of resources. The main job of the scheduler is how to make the system maintain a short response time and high throughput, and how to select a process that is most worthy of running from multiple ready processes and put it into operation. The scheduling algorithm is actually the key to how to optimize the resource allocation strategy. It can be seen that the quality of the scheduler affects the overall performance of 05, especially in the field of (Real-time OPeration System).

4.2 Improvement of Linux 2.6.23 process scheduling module

Due to the openness of the Li~ system, the Linux kernel has been continuously developed, and has been greatly improved and optimized in terms of reliability, scalability and performance, and its scheduling system has also been updated in new kernels from generation to generation. The overall performance of the system has been greatly improved. At present, the latest Linux 2.6.23 has made great changes to the process scheduling part. The main features of the scheduler in the new kernel include three aspects:
(1) Modular scheduler interface. The latest Linux2.6.23 kernel provides a process scheduler module manager, and various scheduling algorithms can be registered in the manager as a module. Different processes can choose to use different scheduling modules. In the Linux2.6.23 kernel, the process scheduling module implements two scheduling modules: cFS module and real-time scheduling module, which are implemented in kernel/sched_fair.c and kemel/sched_rt.c respectively.
(2) CFS scheduler. The CFS scheduler attempts to run processes on a "maximum demand" for CPU time, which helps ensure that each process gets a fair share of the CPU.
(3) CFSs group scheduling. Group scheduling is to enable users to share cPu fairly. In the new Linux 2.6 process scheduling module, the real-time process scheduling has not changed much compared to the previous one, and in order to implement a new CFS (Completely Fair Scheduler) scheduler, the scheduling-related kernel parts in Linux 2.6 have Major modifications were made. This paper mainly analyzes and studies the relevant theory of CFS scheduler in Linux2.6.23 kernel process scheduling.

4.3 The basic principle of CFS scheduler
CFS is the abbreviation of "Completely Fair Scheduler", which is released in the latest Linux 2.6.23 kernel version, according to the author Ingo Molnar: "Eighty percent of the work of CFS can be summarized in one sentence - CFS emulates a fully ideal multi-process processor on real hardware." First of all, "fairness" never means "equal", that is to say, when allocating processor resources, all processes in the system cannot simply be treated equally, but should be treated differently. This is because the processes in the system are not equal. For example, many kernel threads are used to deal with certain emergencies, and they should be better than ordinary user-space processes. Secondly, absolute fairness is also impossible to achieve. CFS is concerned with relatively long-term fairness in time. It may not seem fair in every small time interval. There are many reasons such as fair compensation, changes in system load, and restrictions on implementation methods, so the "C" and "F" in CFS are not absolute.

Suppose there are three processes in a uniprocessor system with identical running parameters and equally important. Under ideal fair conditions, they should start and finish at the same time, but I am afraid that only under the conditions of multi-dimensional time, such a processor is possible. manufactured. Therefore, the start and execution time of the process must be delayed. In order to be fair, it also needs to make up for this part of the time when it occupies the processor. In this way, the problem of selecting the process to occupy the processor based on what index is solved - just select the process that waits the longest time for the processor, that is, the process that will occupy the longest processor time. When the system is fully loaded, it can only allocate its specified quota. Simply dividing by the number of processes is obviously contradicting the discussion of fairness and equality just introduced above. The weight of various processes on a certain processor is a better indicator, which can include different weights (importance) of each process. situation. Obviously, when the system is not at full capacity, you don't want to let the processor idle for "quota" reasons. For example: when there are two processes on the processor that are pure computing processes (not sleeping), and the weights of the two are equal, that is, under fair conditions, they should each occupy 50% of the processor time. If one of the processes ends earlier than the other, it is not desirable that the remaining process is only using 50% of the processor time. In fact, there is not only such a requirement in processor scheduling, but also in the fields of packet-switched networks and other fields. This kind of scheduling has a name: work-conserving scheduling, VirtualClock is one of them that combines ease of implementation and excellent performance. an advantageous method. In this method, in addition to the actual clock, the run queue corresponding to each processor maintains a virtual clock. Its advance pace is inversely proportional to the weight of the process on the processor, that is to say, when there are multiple processes in the system, the pace of the virtual clock will slow down proportionally to the total weight, that is, the virtual time unit will be proportional to lengthen. For example, there are two ready processes in the system, one with a weight of 1 and the other with a weight of 2, so that the virtual time unit is 3 actual time units. These two processes should get 1/3 and 2/3 of the processor time, respectively, per virtual time unit. Each process also has its own virtual clock, and their progress is inversely proportional to their own weight. Applying the above example, the virtual clock unit of the first process is the same as the real time unit, which is 1; the virtual clock unit of the second process is 1/2. At the end of each virtual time unit, if the process is treated fairly, then their virtual clock is the same as the virtual clock of the run queue. Otherwise, if the virtual clock of the process is slower than that of the run queue, it means that the process needs to be compensated, and instead the process should be stopped. In the simplest cases, choosing the process with the smallest virtual clock usually does a decent job of fairness. It can be seen from the above that the virtual clock on the run queue is actually a ruler for judging fairness or not. This is the policy of CFS using the virtual clock. This clock is represented by the fair--clock member in the cfsesrq structure. In the function uPdat several curr() maintains its progress. For scheduling entities on each CPU, CFS uses a time-sorted red-black tree for maintenance. The reason why this method works well is that: (1) The red-black tree can always be balanced. ② The red-black tree is a binary tree, and the time complexity of the search operation is logarithmic. But it's hard to perform lookups other than the leftmost lookup, and the leftmost node pointer is always cached. ③ For most operations, the execution time of the red-black tree is O(logn). O(fogn) behavior has measurable latency, but it doesn't matter for large numbers of processes. ④The red-black tree can be realized through internal storage, and the data structure can be maintained without using external allocation. The following will analyze the key parts of CFS implementation in Linux 2.6.23, in order to deeply understand the working principle of CFS. Data structures can be maintained without the use of external allocations. The following will analyze the key parts of CFS implementation in Linux 2.6.23, in order to deeply understand the working principle of CFS. Data structures can be maintained without the use of external allocations. The following will analyze the key parts of CFS implementation in Linux 2.6.23, in order to deeply understand the working principle of CFS.

4.3.1 Core data structure
1. Improved struct task_struct

The Linux2.6.23 kernel still uses the data structure task_struct (defined in include/linux/sched.h) to represent the process. Although the thread is optimized, the representation of the kernel thread is still the same as the process. With the improvement of the scheduler, the content of task_struct has also been improved. New features such as interactive process priority support and kernel preemption support are reflected in task_struct. CFS removes the priority array (struct prio_array) and introduces scheduling entities and scheduling classes, which are defined by structsched--entity and structschedeeclass respectively. Therefore, task--lu truct contains information about the two structures of sched_entity and sched_class, which will be analyzed in the following.
(1) structsched---entity. This structure represents each schedulable object, and each process corresponds to a scheduling entity, which contains the complete information used to realize the scheduling of a single process or a process group, and is defined as follows:

 

  • Among them, wait_runiime represents the time that the scheduling entity waits for the processor after entering the ready queue;
  • fair_key is the key value of the scheduling entity inserted into the red-black tree;
  • load is the load status of the current entity;
  • run_node is used to implement the link of the red-black tree;
  • on_rq indicates whether it is on the run queue;
  • exe_start indicates the time when it starts to execute;
  • sum_exec_runtime represents the total time of its execution;
  • In addition, there are some related scheduling statistics and group scheduling related information.

(2) structsched. lass. This structure provides the necessary interfaces for the Linux process scheduling process. It adopts the object-oriented method and is the core data structure of the modular scheduler. The scheduling class connects each scheduling module through the linked list, and the new scheduling module can be added to the kernel as long as it implements these interfaces. Its data structure is defined as follows:

  • Among them, next is used to link the scheduling module;
  • enqueu_task() is called when the process enters the running state, it inserts the scheduling entity (process) into the red-black tree and adds 1 to the variable nr_running;
  • dequeue_ask() is called when a process exits the runnable state, it removes the scheduling entity from the red-black tree and decrements the variable nr_running by 1;
  • yield_task() in the case that compat_yield_sysctl is closed, the execution scheduling entity is dequeued first and then enqueued, in this case, it puts the scheduling entity at the far right end of the red-black tree;
  • check_preempt_curr() checks if the current process can be preempted, before actually preempting a running process, the CFS scheduler module will perform a fairness test, which will drive wakeup preemption;
  • pick_next_task() selects the most suitable process to run next;
  • put_prev_task() is to give up the CPU usage rights to the currently running process. If the current_task has been executed, it will be removed from the runqueue, otherwise it will be put back into the runqueue to wait for the next scheduling;
  • load_balance() is the load balancing function used in multiprocessors;
  • set_curr_task() is called when a process changes its scheduling class or changes its process group;
  • task_tick() usually calls the time_tick() function, which may cause a process switch, which will drive runtime preemption;
  • task_new(): The kernel scheduler provides the scheduling module with an opportunity to manage the startup of new processes. The CFS scheduling module uses it for group scheduling, while the real-time process scheduling does not use this function.

4.3.2 Research on Red-Black Tree Algorithm A
red-black tree is a variant of a balanced binary search tree. The height difference between its left and right subtrees may be greater than 1, so a red-black tree is not a balanced binary tree (AVL) in the strict sense. However, the cost of balancing it is lower, and its average statistical performance is stronger than that of AvL[68j]. In addition to a key and three pointers: p~t, Ichild, and rchild, the attributes of each node of the red-black tree also have an attribute: cofor: red or black. As with most binary search trees, the smaller keys in the red-black tree are stored in the left subtree. In addition to all the properties of a binary search tree, a red-black tree also has the following five properties: ① Nodes are red or black. ②The root is black. ③ All leaves are black (leaves are NIL nodes). ④ Both children of each red node are black (there cannot be two consecutive red nodes on all paths from each leaf to the root). ⑤ All paths from any node to each of its leaves contain the same number of black nodes. Because a red-black tree is a specialization of a binary search tree, read-only operations on a red-black tree are the same as for a normal binary search tree. However, inserting and deleting operations on a red-black tree will cause it to no longer conform to the properties of a red-black tree, which requires a series of adjustments to related nodes to maintain the properties of a red-black tree. The deletion and insertion algorithms of red-black tree nodes will be given below. In the deletion and insertion algorithm of red-black tree, a lot of left rotation (Left-Rotate) and right rotation (form ght-Rotate) are used. The so-called Left-Rotate(x) is to
move the node x one space to the lower left, and then let the original right child node of x replace its position, and the shape ght-Rotate(y) is the inverse operation of Left-Rotate(x), The operation process is shown in Figure 2-1 (nodes x and y can appear anywhere in the tree, and the letters a, b and c represent any subtree):

4.3.3 Proof of Algorithm Time Complexity

The time complexity of the red-black tree node insertion and deletion algorithm is proved below.

Proof: Let h(v) be the height of the subtree rooted at node v, bh(v) = the number of black nodes from v to any leaf in the subtree (excluding node v), called the black height of v .

If h(v)=0, then v must be NIL, that is, bh(v)=0. , you can get 2^(bh(v) )- 1=1-1=0.

Induction hypothesis: v of h(v)=k has 2^(bh(v)-1)-1=1-1 internal nodes, then v' of h(v')=k+l has 2^(bh (v) ) - 1 internal node.
Since h(v')>0 it is an internal node, and also it should have a son with black height of bh(v') or bh(v')-1 (depending on whether v' is red or black).
Induction assumes that each son has at least 2^(bh(v')-1)-1 interior contacts, then v'' has 2^(bh(v')-1)-1+2bh(v')- 1+1=2(bh(v'))-1 internal nodes. Assuming true, this proves that a subtree rooted at a node v has at least 2^(bh(v))-1 internal nodes.
According to the nature of the red-black tree, it can be known that at least half of the nodes on any path from the root to the leaf are black, so the black height of the root is at least h(root)/2, so n>=2^(h(root) )/2)-1.
Further, b(root)<=2log(n+1) can be obtained, and the height of the visible root is not greater than 2log(n+1). It is known that the time complexity of deletion, insertion and other algorithms on a binary search tree of height h is O(h), and a red-black tree containing n nodes is a binary search tree, so red The time complexity of the black tree node insertion and deletion algorithm is not greater than O(2log(n+1)), which is O(logn).

4.3.4 Process Scheduling Strategies
The Linux 2.6.23 kernel provides four scheduling strategies for different types of processes: SCHED_NORMAL, CHEDBAI_BATCH, SCHED_FIFO and SCHED_RR.
(1) SCHED_NORMAL. This strategy is the default time-sharing scheduling strategy in the Linux2.6 kernel, which is used for the scheduling of common non-real-time processes. It inserts all processes into a red-black tree, continuously updates and maintains the red-black tree, and always selects the leftmost process of the red-black tree for scheduling.

(2) SCHED_BATCH. This strategy is a unique scheduling strategy in Linux, which is mainly used for the scheduling of computationally intensive non-real-time processes. It is very similar to SCHED_NORMAL, the difference is that processes at the SCHED_BATCH level are regarded by the system as computationally intensive non-real-time processes, and the system has some slight "penalties" for this type of process during scheduling, so this scheduling strategy compares Suitable for non-interactive processes with low response time requirements.

(3) SCHED_FIFO. This strategy complies with the FIFO (first-in, first-out) scheduling rule of the POSIX1.b standard, and is used for real-time process scheduling. In Linux 2.6, SCHED_FIFO level processes will be scheduled prior to any SCHED_NORMAU/SCHED_BATCH level processes. Once a SCHED_FIFO level process is in the executable state, it will continue to execute until the process actively releases the CPU, or the CPU is preempted by another real-time process with a higher rt_priority.

(4) SCHED_RR. This strategy complies with the RR (round-robin) scheduling rule of the POSIX1.b standard, and is used for real-time process scheduling, and each process is executed in turn according to the time slice. This strategy is similar to SCHED_FIFO, except that the time slices are somewhat different. The Linux2.6.23 process scheduling module is divided into two parts: the CFS algorithm module and the real-time scheduling module: in the CFS algorithm module, the non-real-time processes are sorted in a red-black tree manner, and the SCHED_NORMAL and SCHED_BATCH scheduling strategies are used to schedule non-real-time processes; In the scheduling module, real-time processes are sorted in O(1) way, and sCHED_FIFO and SCHED_RR are used to schedule real-time processes.

4.3.5 Process scheduling process
The timing of Linux 2.6.23 process scheduling has not changed much compared with the past, and it is mainly checked when the clock is interrupted. When the clock interrupt occurs, the system will call the function scheduler_tick(), which will call its interrupt handling function according to the scheduling class to which the current process belongs. For the CFS scheduling module, it is the function task_tick_fair(), and for the real-time scheduling module, it is the function task_tick_rt(). Among them, the function task_tick_fair() first updates the run queue information, and then checks whether the process needs to be preempted. The checking process is to compare the actual running time of the current process with the ideal running time. If the actual running time exceeds the ideal running time, it needs to be rescheduled. At this time, the TIF NEED RESCHED flag of the current process will be set, and the flag will be checked when the kernel control path exits. If the flag is set, scheduling will be performed; the function task_tick_rt() first determines whether the scheduling policy of the process is SCHED_RD, and if so, decrements the current The time slice of the process. If the time slice is 0, the current process will be re-allocated according to the priority of the real-time process. If there is more than one process on the running queue, the process will be placed at the end of the queue, and the TIF_NEED_RESCHED flag of the current process will be set. .

The Linux2.6.23 kernel still uses the schedule() function (implemented in kemel/sched.c) for scheduling, and its core process has not changed much from before, the difference is how to select a new process. In the current core, the system will select the appropriate process from the run queue in turn according to the registered scheduling class, until it finds one. This is achieved by calling the function pick_next_task() in sched_class, which is Pick_next_task_fair() in the CFS algorithm module. Its implementation is quite simple, just extracting its leftmost node from the red-black tree. There are two important data in the schedule() function: prev and next. Where prev points to the current process, that is, the process that is about to be switched out of the CPU; next points to the next process, that is, the process that is about to be switched into the CPU. The main processing flow of the schedule() function is shown in Figure 2-4:

 

 

5. Talk about your views on the operating system process model

summary:

Based on linux2.6.23, this paper studies and analyzes the process organization, state transition and scheduling algorithm CFS (Completely Fair Scheduler). This paper introduces the role and status of the process scheduling module in the Linux 2.6 kernel, and analyzes the improvement of the latest Linux 2.6.23 kernel in process scheduling. Secondly, from the data structure, red-black tree algorithm and algorithm complexity involved in the CFS scheduling algorithm , process scheduling strategy and other aspects, deeply analyzes the implementation principle of CFS scheduler in Linux2.6.23 kernel, and finally analyzes and summarizes the scheduling strategy and scheduling process of the process scheduling module in the latest Linux2.6.23 kernel. Speaking of this, I remembered that I had followed the teacher to learn the two courses of digital logic and computer composition principles in the past year. I have heard such an example, by learning digital logic, you can learn the operation method of computer (and, and , non, etc.), to lay the necessary foundation for the follow-up study of computer ideas, and by learning and mastering the knowledge of computer composition principles, you can understand the ideas of computers, and design your own simple cpu (please allow me to call it a micro operating system) , at this time, it is like an architect who has the ability to build houses, but not the ability to build buildings. Now, when I learn the course of operating system and this experiment, I really feel the grandeur of the "skyscraper". While lamenting the amazing work of the senior programmers, I also deeply realize my own shortcomings. However, every building is built up brick by brick without exception. I think that through the teacher's guidance and my own continuous learning, I can definitely learn this course well and lay a more solid foundation for mastering the ability to build buildings.

---End of recovery content---

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325125646&siteId=291194637