Homework 1: Analysis of Linux Processes

Before we analyze Linux's process model, we must first know what Linux is.

1. Overview of Linux

Linux is a multi-user, multi-tasking, multi-threading and multi-CPU-based operating system with stable performance based on POSIX and Unix, which is free to use and spread freely.

Linux inherits the network-centric design idea of ​​Unix. It is also a Unix-like operating system that can run major Unix tool software, applications and network protocols, supports 32-bit and 64-bit hardware, and can be installed on mobile phones, tablets, etc. Computers, routers, desktop computers, supercomputers and other computer hardware devices.

The Linux operating system was originally written by a student named Linus Torvalds at the University of Helsinki in Finland, and subsequently designed and implemented by thousands of programmers around the world. Its purpose is to create Unix-like compatible products that are not subject to the copyright restrictions of any commercial software and can be used freely around the world.

 

After we understand Linux, we should know what is a process

2. Process:

2.1 The concept of process:

       A process is the foundation of the operating system structure; is an executing program; an instance of a running program in a computer; an entity that can be assigned to and executed by a processor; displayed by a single sequential execution, a current state and a set of The unit of activity described by the associated system resource. After the computer application is run, it is equivalent to putting the application into the container. You can add other things to the container (such as: variable data required by the application at runtime, DLL files that need to be referenced, etc.), When the application is run twice, the contents of the container will not be drained, the system will find a new process container to hold it. In order to deeply describe the nature of the dynamic execution process of a program, the concept of "Process" is introduced.

After understanding the basic concept of the process, we need to deeply explore the specific operation of the process

2.2 Representation of the process:

     Within the Linux kernel, processes are represented by a  task_struct structure called . This structure contains all the data necessary to represent this process, in addition, it also contains a lot of other data for statistics and maintenance of relationships with other processes. This structure is very large. All its fields can be divided according to their functions as follows:

  • Process state (State)
  • Process scheduling information (Scheduling Information)
  • Various identifiers (Identifiers)
  • Information about process communication (IPC: Inter_Process Communication)
  • Times and Timers
  • Process link information (Links)
  • File System Information (File System)
  • Virtual Memory Information (Virtual Memory)
  • Page management information (page)
  • Symmetric Multiprocessor (SMP) Information
  • Processor Specific Context
  • Other Information

     We all know that there are many processes, so how to identify processes in a simple way is very important. He can help us manage the process easily, so let's start to understand the markers of the process

2.3 Process identifiers

2.3.1 The role of identifiers:

 A process identifier is a numerical value used to uniquely identify a process. Each process has a unique identifier, and the kernel uses this identifier to identify different processes. At the same time, the process identifier PID is also an interface provided by the kernel to the user program, and the user program issues commands to the process through the PID.

2.3.2 Characteristics of identifiers on Linux;

PIDs are 32-bit unsigned integers that are numbered sequentially: the PID of a newly created process is usually the PID of the previous process plus 1. However, in order to maintain compatibility with traditional Linux systems on 16-bit hardware platforms, the maximum PID number allowed on Linux is 32767, and when the kernel creates the 32768th process in the system, it must restart using the idle PID number.

3. How the operating system organizes processes

Source: http://www.bubuko.com/infodetail_83248.html

 In order to manage the creation and death of processes (processing operations such as zombie processes), the parent-child and sibling relationships are used; in order to uniformly process the same semaphore, the thread group relationship is used; in order to facilitate global search, the hash table relationship is used; for the scheduler, the use of Run queue, wait queue data structure.

4. How the state of the process is converted

4.1 Status of the process:

kernel representation

meaning

TASK_RUNNING

runnable

TASK_INTERRUPTIBLE

interruptible wait state

TASK_UNINTERRUPTIBLE

uninterruptible wait state

TASK_ZOMBIE

dead

TASK_STOPPED

pause

TASK_SWAPPING

swap in / swap out

 

(1) Runnable state: A process in this state is either running or preparing to run. The running process is the current process, and the process that is ready to run can be put into operation immediately as long as it gets the CPU. The CPU is the only system resource that these processes wait for.

(2). Waiting state: The process in this state is waiting for an event or a resource, and it must be in a waiting queue (wait_queue) in the system.

(3). Suspended state: The process at this time is temporarily stopped to accept some special processing. Usually when a process receives a SIGSTOP, SIGTSTP, SIGTTIN or SIGTTOU signal it is in this state. For example, a process being debugged is in this state.

(4) Zombie state: Although the process has been terminated, for some reason, the parent process has not executed the wait() system call, and the information of the terminated process has not been recovered. As the name implies, a process in this state is a dead process, which is actually garbage in the system and must be dealt with to release the resources it occupies.

 

We already know the various states of the process, so how are the states transitioned? Let's take a look at a state transition diagram

4.2 Process state transition diagram

 

  After looking at the transitions between states, I believe you must have a general understanding of the process. But you must be wondering how to schedule processes reasonably to make the system more efficient, so let's take a look at the scheduling information of processes

5. Process scheduling

5.1 What is process scheduling:

   The scheduler uses a part of the information to decide which process in the system should run the most, and combines the state information of the process to ensure the fair and efficient operation of the system. This part of the information usually includes the category of the process (normal process or real-time process), the priority of the process, etc.

        In layman's terms, the scheduling of a process is to use some information and determine the operation of a process reasonably according to a rule. Obviously, it is very important to organize the operation of a process reasonably and efficiently. This rule is what we call a scheduling algorithm. We will describe the scheduling algorithm in detail later.

5.2 Scheduling Algorithm - CFS Scheduling Algorithm:

5.2.1 CFS principle:

cfs defines a new model that schedules a virtual clock, vruntime, for each process in cfs_rq (cfs' run queue). If a process is executed, its vruntime will continue to increase as time grows (that is, the arrival of one tick). Processes that do not get executed have their vruntime unchanged. It should be emphasized that the scheduler always chooses the process with the slowest vruntime to execute. This is called "perfect fairness". In order to distinguish processes of different priorities, the vruntime of a process with a higher priority grows slowly, so that it may get more opportunities to run.
      

5.2.2 CFS design ideas:

(Note: The design idea of ​​this algorithm looks very simple, but some of the specific parts in it are really a little confusing. I can only tell you about it. The details inside will be rewritten after I have thoroughly studied them. The following is just me Roughly some simple sorting out of the idea of ​​​​the Great God) 

  The idea can be simply understood as assigning the running time according to the weight of the process (of course, how to calculate the weight is also very important)

So how should the time allocated to the process be calculated. Its formula is:

Running time allocated to a process = scheduling period * process weight / sum of all process weights

I think the most important thing for a scheduling algorithm should be its fairness, so how does the fairness of CFS reflect?

In fact, fairness is reflected in another quantity, called virtual runtime (vruntime), which records the running time of the process, but it is not directly recorded, but the running time is enlarged or reduced by a proportion according to the weight of the process.
Let's take a look at the conversion formula from actual running time to vruntime
vruntime = actual running time * 1024 / process weight .

Here, the great god mentioned that this 1024 refers to the weight of the process whose nice is 0. The code is NICE_0_LOAD. That is to say, all processes use the weight 1024 of the process whose nice value is 0 as the benchmark to calculate their own vruntime increase speed. Then we simply understand the idea of ​​CFS is to let the vruntime of each scheduling entity catch up with each other, and the vruntime of each scheduling entity increases at different speeds. Balanced, so that you can get more CPU execution time.

5.2.3 Data Structure of CFS

 

Changes to struct task_struct

CFS is removed  struct prio_array, and scheduling entities and scheduling classes are introduced, defined by  struct sched_entity and  , respectively struct sched_class . So, task_struct contains information about  sched_entity and  sched_classthese two structures:

struct task_struct { /* Defined in 2.6.23:/usr/include/linux/sched.h */
....
-   struct prio_array *array;
+  struct sched_entity se;
+  struct sched_class *sched_class;
   ....
   ....
};

struct sched_entity

This structure contains complete information for implementing scheduling of individual tasks or groups of tasks. It can be used to implement group scheduling. A scheduling entity may not be associated with a process.

Listing 2. sched_entity structure

struct sched_entity { /* Defined in 2.6.23:/usr/include/linux/sched.h */
 long wait_runtime;   /* Amount of time the entity must run to become completely */
                      /* fair and balanced.*/
 s64 fair_key;
 struct load_weight   load;         /* for load-balancing */
 struct rb_node run_node;            /* To be part of Red-black tree data structure */
 unsigned int on_rq; 
 ....
};

struct sched_class

The scheduler class is like a chain of modules that assist the kernel scheduler in its work. Each scheduler module needs to implement a  struct sched_class suggested set of functions.

Listing 3. sched_class structure
struct sched_class { /* Defined in 2.6.23:/usr/include/linux/sched.h */
      struct sched_class *next;
      void (*enqueue_task) (struct rq *rq, struct task_struct *p, int wakeup);
      void (*dequeue_task) (struct rq *rq, struct task_struct *p, int sleep);
      void (*yield_task) (struct rq *rq, struct task_struct *p);
 
      void (*check_preempt_curr) (struct rq *rq, struct task_struct *p);
 
      struct task_struct * (*pick_next_task) (struct rq *rq);
      void (*put_prev_task) (struct rq *rq, struct task_struct *p);
 
      unsigned long (*load_balance) (struct rq *this_rq, int this_cpu,
                 struct rq *busiest,
                 unsigned long max_nr_move, unsigned long max_load_move,
                 struct sched_domain *sd, enum cpu_idle_type idle,
                 int *all_pinned, int *this_best_prio);
 
      void (*set_curr_task) (struct rq *rq);
      void (*task_tick) (struct rq *rq, struct task_struct *p);
      void (*task_new) (struct rq *rq, struct task_struct *p);
};

 

  • 1.enqueue_task: This function will be called when a task becomes runnable. It puts the scheduling entity (process) into the red-black tree and increments the nr_running variable by 1.
  • 2.dequeue_task: This function is called when a task exits the runnable state, it will remove the corresponding scheduling entity from the red-black tree and nr_running subtract 1 from the variable.
  • 3.yield_task: In the  compat_yield sysctl closed case, the function actually performs dequeue then enqueue; in this case, it puts the dispatch entity at the far right of the red-black tree.
  • 4.check_preempt_curr: This function will check if the currently running task is preempted. The CFS scheduler module will perform a fairness test before actually preempting running tasks. This drives wakeup preemption.
  • 5.pick_next_task: This function selects the most appropriate process to run next.
  • 6.load_balance: Each scheduler module implements two functions, load_balance_start() and  load_balance_next(), using these two functions implements an iterator, which is called in the module's load_balance routine. The kernel scheduler uses this method to achieve load balancing of processes managed by the scheduling module.
  • 7.set_curr_task: This function is called when a task modifies its scheduling class or modifies its task group.
  • 8.task_tick: This function is usually called from the time tick function; it may cause a process switch. This drives runtime preemption.
  • 9.task_new: The kernel scheduler provides the scheduling module with an opportunity to manage the launch of new tasks. The CFS scheduling module uses it for group scheduling, while the scheduling module for real-time tasks does not use this function.

6. Views on the operating system process model

          In the process of exploring the process model, I was gradually convinced by the Linux process model. Because I don't know the process model of other systems, I can't compare with others. According to my model of the learning process during this period, I will briefly talk about my immature views. After all, I still know very little about the process. First of all, if you want a better user experience, the process model is really very important, and in the process model, the scheduling algorithm is even more important. I think CFS scheduling has been done perfectly. The Linux scheduling algorithm is also constantly improving. The O(1) scheduler of the 2.6 kernel used to be very classic, but then the kernel scheduler was changed to an algorithm with a red-black tree as the basic data structure. Scheduling algorithm is a complex problem, but its function is very clear, that is, to select the "appropriate process" to execute. After selecting the process, of course it is executed. Of course, the process model is not only the scheduling algorithm, but also the transition relationship of the process state is also very important. Of course, I think it should be difficult to change the process state, and the optimization of the scheduling algorithm is not a simple problem. I personally have no way of finding out what's wrong with the existing process model.

 

Links to related materials

1.https://blog.csdn.net/dyllove98/article/details/9281081

https://blog.csdn.net/zjf280441589/article/details/43339007

http://blog.sina.com.cn/s/blog_79e165ef0102wcvz.html

https://blog.csdn.net/fdssdfdsf/article/details/7894211

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325070738&siteId=291194637