Analysis of Process Model Based on Linux

About this blog:

    First, as a student, this blog is written in response to the requirements of the teaching course; second, this blog is a result of my active learning about the Linux process model. If there are any mistakes or deficiencies, please correct me, I will accept it with humility and make corrections.

 

1. Introduction to Linux

       From the title, we can know that what we want to analyze is the process model of Linux. In order to understand this problem in depth, we need to clarify some basic problems first, which will be the basis for us to discuss the problem later.

1.1, the concept of Linux

        Linux is a set of Unix-like operating systems. It is a multi-user, multi-tasking, multi-threading and multi-CPU operating system based on POSIX and Unix. It inherits the network-centric design idea of ​​Unix and is a stable performance multi-user network operating system.

        There are many versions of Linux, but they all use the Linux kernel. Linux can be installed on a variety of computer hardware devices such as mobile phones, tablets, routers, desktops, supercomputers, and more.

        Linux was born on October 5, 1991 (official announcement time), the founder was Linus Torvalds, and later more than 100 programmers participated in the writing and modification of the Linux kernel code. Finally, in March 1994, Linux 1.0 with 170,000 lines of code was released. At that time, Linux was completely free and free, and this feature has been preserved until now.

1.2 The difference between Linux and Unix

       This is equivalent to a voiceover. The main reason for writing this part is that when I was learning the Linux operating system, I would find that many people would confuse the two operating systems, Linux and Unix, thinking that the two are the same concept. Before me, although I knew they were different concepts, I didn't know what the difference was. The differences between them are specifically explained here to prevent some readers from being unable to distinguish the difference between the two.

       ①The previous concept mentioned that Linux is an operating system based on POSIX (Portable Operating System Interface), so it was born after POSIX; and Unix was born long before POSIX. Due to different ages, Unix is ​​an operating system under the command line, while Linux is an operating system with window management.

       ②Unix has powerful functions and comprehensive performance, and can be used on many different platforms. Linux is similar to it in this respect (even better than Unix in terms of performance), but they only stop at "similar", and Linux does not originate from any version of the Unix source code, it's just a Unix-like product.

       ③ As mentioned before, Linux is completely free and free, while Unix needs to be paid for. We can get many versions of Linux and the application software developed for it without spending money.

       There are many differences between them, so I will not list them one by one here. For the specific and complete information, please refer to the sharing at the end of the blog.

2. Introduction to the process

2.1, the definition of the process

       "Process" (Process) has two definitions, narrow and broad.

       In a narrow sense, a process is an instance of a running program, or an entity of some running program. Broadly speaking, a process is a running activity of a program with a certain independent function on a data set. It is the basic unit of the dynamic execution of the operating system. In the traditional operating system, the process is not only the basic allocation unit, but also the basic execution unit.

2.2, the characteristics of the process

       Processes have some basic characteristics:

  • Dynamic: The essence of a process is an execution process of a program in a multi-program system, and the process is dynamically generated and destroyed;
  • Concurrency: any process can execute concurrently with other processes;
  • Independence: a process is a basic unit that can run independently, and it is also an independent unit for the system to allocate resources and schedule;
  • Asynchrony: Due to the mutual constraints between processes, the process has intermittent execution, that is, the processes advance at their own independent and unpredictable speeds;
  • Structural features: The process consists of three parts: the program segment, the data segment and the process control block.

       Multiple different processes can contain the same program: a program constitutes different processes in different data sets and can get different results; but the program cannot be changed during execution.

2.3, the difference between process and program, thread

2.3.1 The difference between process and program

       The fundamental difference between a process (Process) and a program (Procedure) is that the program is static and does not have much meaning in itself. When the processor starts to execute it, it becomes a process and is dynamic.

       Due to the fundamental differences between the two, their functions are also quite different:

  • A program is an ordered collection of instructions and data, which can exist for a long time as a kind of software data and is permanent; while a process has a certain life cycle and is temporary.
  • A process can describe concurrency more realistically than a program can because it is static.
  • Processes have the ability to create other processes and programs cannot create other programs because it is static.
  • A program can correspond to multiple threads, as long as the same program runs on different data sets.
  • In traditional operating systems, programs cannot run independently, and the basic unit for resource allocation and independent operation is a process rather than a program.

2.3.2 The difference between process and thread

       A thread is a basic unit smaller than a process. Usually, a process contains several threads, which can utilize the resources owned by the process. After the introduction of threads, people regard the process as the basic unit for allocating resources, and the thread as the basic unit for independent operation and independent scheduling. Since the thread is smaller and does not occupy system resources, the overhead of scheduling it is smaller, and the degree of concurrent execution among multiple programs can be more efficiently improved.

3. The operating system organizes the process

       When I mentioned the structural characteristics of the process before, I mentioned that it consists of three parts, the program segment, the data segment and the process control block, and the organization of the process is associated with these three parts.

3.1. Process control block (PCB)

       When a process is created, the system creates a PCB for the process. When the process is executed, the system knows the current state information of the process through its PCB in order to control and manage it; when the process ends, the system reclaims its PCB and the process dies. The operating system manages and controls the process through the PCB table. The PCB is not only a part of the process, but also the only sign of the existence of the process.

       PCB usually contains the following contents:

3.2. Program segment

       A program segment is a program code segment that can be scheduled by the process scheduler to the CPU for execution.

       PS: Programs can be shared by multiple processes, that is, multiple processes can run the same program.

3.3, data segment

       The data segment of a process can be the original data processed by the program corresponding to the process, or the intermediate or final result generated when the program is executed.

3.4 How the process is organized

3.4.1. Linear mode

       This method is used when the number of processes is small. That is, all PCBs are organized in a linear table, and the first address of the table is stored in a dedicated area of ​​the memory, and the entire table needs to be scanned for each search.

3.4.2, link method

       The PCBs with the same state are linked into a queue with the link words in them, and the PCBs are stored in a continuous storage area.

       The schematic diagram is as follows:

3.4.3. Index method

       The first address of each index table in the memory unit is also recorded in a dedicated unit in the memory, and the address in the PCB table of a certain PCB in a corresponding state is recorded by adding an index table.

       The schematic diagram is as follows:

4. Process state transition

       As mentioned earlier, the execution of the process is intermittent, which determines that the process may have multiple states. A running process has three basic states:

(1) Ready state (Ready):

       The process has obtained the required resources other than the processor and is waiting for processor resources to be allocated; the process can be executed as long as the processor is allocated. Ready processes can be queued by multiple priorities.

(2) Running status (Running):

       Processes occupy processor resources; the number of processes in this state is less than or equal to the number of processors. The system's idle process is usually executed automatically when there are no other processes to execute (such as when all processes are blocked).

(3) Blocked state (Blocked):

       Because the process is waiting for some condition (such as an I/O operation or process synchronization), execution cannot continue until the condition is met. Even if processor resources are allocated to the process before this event occurs, it cannot run.

       There are the following types of process states in Linux:

(1) TASK_RUNNING: ready state or running state, the process is ready and can run, but not necessarily occupying the CPU;

(2) TASK_INTERRUPTIBLE: Sleep state, the current process is in a light sleep and can respond to signals, generally the state where the process actively sleeps;

(3) TASK_UNINTERRUPTIBLE: sleep state, deep sleep, do not respond to signals, the typical scenario is that the process acquires semaphore blocking;

(4) TASK_ZOMBIE: Zombie state, the process has exited or ended;

(5) TASK_STOPED: stop, debugging state.

        Schematic:

Image source: https://img-blog.csdn.net/20160827200931165?watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQv/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/Center

5. Process scheduling

       Modern operating systems are multi-tasking operating systems. Although with the development of science and technology, there are more and more processor cores in hardware, but there is still no guarantee that one process corresponds to one core. This is bound to require a management unit, which is responsible for scheduling processes. , it is up to the management unit to decide who should use the CPU at the next moment. Here, the management unit is the process scheduler.

 (1) The structure of the scheduler

       In the Linux kernel, the scheduler can be divided into two levels. The one that is directly called in the process becomes the general scheduler or the core scheduler. They are separated from other parts of the process as a component, and the general scheduler is not directly related to the process. It directly manages the process through the specific scheduler class of the second layer. Each process must belong to a specific scheduler class, and Linux will implement different scheduler classes according to different needs.

(2) Scheduler class

       A scheduler class framework is implemented in the Linux kernel, which defines the functions that the scheduler should implement, and each specific scheduler class must implement these functions.

       The scheduler is defined as follows:

 struct sched_class 
 {
     const struct sched_class *next;
 
     void (*enqueue_task) (struct rq *rq, struct task_struct *p, int flags);
     void (*dequeue_task) (struct rq *rq, struct task_struct *p, int flags);
     void (*yield_task) (struct rq *rq);
     bool (*yield_to_task) (struct rq *rq, struct task_struct *p, bool preempt);
 
     void (*check_preempt_curr) (struct rq *rq, struct task_struct *p, int flags);
 
     struct task_struct * (*pick_next_task) (struct rq *rq);
     void (*put_prev_task) (struct rq *rq, struct task_struct *p);
 
 #ifdef CONFIG_SMP
     int  (*select_task_rq)(struct task_struct *p, int sd_flag, int flags);
     void (*migrate_task_rq)(struct task_struct *p, int next_cpu);
 
     void (*pre_schedule) (struct rq *this_rq, struct task_struct *task);
     void (*post_schedule) (struct rq *this_rq);
     void (*task_waking) (struct task_struct *task);
     void (*task_woken) (struct rq *this_rq, struct task_struct *task);
 
     void (*set_cpus_allowed)(struct task_struct *p,
                  const struct cpumask *newmask);
 
     void (*rq_online)(struct rq *rq);
     void (*rq_offline)(struct rq *rq);
 #endif
 
     void (*set_curr_task) (struct rq *rq);
     void (*task_tick) (struct rq *rq, struct task_struct *p, int queued);
     void (*task_fork) (struct task_struct *p);
 
     void (*switched_from) (struct rq *this_rq, struct task_struct *task);
     void (*switched_to) (struct rq *this_rq, struct task_struct *task);
     void (*prio_changed) (struct rq *this_rq, struct task_struct *task,
                  int oldprio);
 
     unsigned int (*get_rr_interval) (struct rq *rq,
                      struct task_struct *task);
 
 #ifdef CONFIG_FAIR_GROUP_SCHED
     void (*task_move_group) (struct task_struct *p, int on_rq);
 #endif
 };

 (3) Scheduling entity

       What can be scheduled in Linux is not only a process, but also a process group, so Linux abstracts the scheduling object into a scheduling entity. In fact, the scheduler directly operates the scheduling entity, and only obtains its corresponding structure according to the scheduling entity.

struct sched_entity 
{
    struct load_weight    load;        /* for load-balancing */
    struct rb_node        run_node;
    struct list_head    group_node;
    unsigned int        on_rq;

    u64            exec_start;
    u64            sum_exec_runtime;
    u64 vruntime;
    u64            prev_sum_exec_runtime;

    u64 nr_migrations;

#ifdef CONFIG_SCHEDSTATS
    struct sched_statistics statistics;
#endif

#ifdef CONFIG_FAIR_GROUP_SCHED
    struct sched_entity    *parent;
    /* rq on which this entity is (to be) queued: */
    struct cfs_rq        *cfs_rq;
    /* rq "owned" by this entity/group: */
    struct cfs_rq        *my_q;
#endif

#ifdef CONFIG_SMP
    /* Per-entity load-tracking */
    struct sched_avg    avg;
#endif
};

       illustrate:

       Load is used for load balancing, which determines the proportion of each entity to the load in the queue. It is the main responsibility of the scheduler to calculate the load weight, because the selection of the next process is based on this information. run_node is a red-black tree node, which is used to add entities to the red-black tree. on_rq indicates whether the entity is in the ready queue. When it is 1, it means that in the ready queue, a process will be dispatched from the ready queue when it is scheduled. It will be re-added to the ready queue when the CPU is given up (in the case of normal scheduling, excluding sleep and waiting).

       In the time-related fields, exec_start records the time when the process starts to run on the CPU; sum_exec_time records the total running time of the process on the CPU, and pre_sum_exec_time records the time the process has been running before local scheduling. When the process is transferred from the CPU, the value of sum_exec_time will be saved to pre_sum_exec_time, and sum_exec_time will not be reset, but will always increase as it runs on the CPU. And vruntime records the elapsed time on the virtual clock during the execution of the process.

6. Personal views on the operating system process model

       When I was learning the process model of the Linux operating system, I found that a lot of knowledge about the process was novel, which was not covered in the previous computer courses. However, the birth of a process is undoubtedly crucial to the history of the operating system. If the operation of the whole operating system is likened to a grand stage play, then these processes are the hard-working people behind the scenes, ordinary people do not pay much attention to them, and they do not understand them, but they have always existed here. , pay silently. An operating system cannot run successfully without processes, just as a stage play cannot be successfully staged without people behind the scenes.

       With the progress of the times, in order to meet people's increasingly diverse needs for computer functions, the internal algorithms of computers are also becoming more and more complex, which also leads to the process scheduling algorithm as a key link must be very complex. The process scheduling content shown in the blog is actually basic, and it is only a small part of its huge algorithm. I know that I have not learned the deeper aspects of the Linux process model. I will continue to learn and master more relevant knowledge. I can also appreciate the great efforts made by world-renowned programmers to improve computer performance. .

 

At the end, some sharing information is attached:

1. The difference between Linux and Unix: https://www.2cto.com/os/201109/104824.html

2. About Linux scheduling algorithm: https://www.cnblogs.com/ck1020/p/6089970.html

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325119368&siteId=291194637