process

1. What is the process

　1.1 Concept

　　All runnable software on a computer, usually including the operating system, is organized into several sequential processes , or processes for short .

　　A process is an instance of an executing program, including the programmer, the current values of registers, and variables. A process is an activity of some type that has procedures, inputs, outputs, and states. A single processor can be shared by several processes, it uses some scheduling algorithm to decide when to stop the work of one process, and turn to serve the function of another process.

2. How the process is organized

　2.1 Process Control Block (PCB)

　　The main active entity in a Linux system is a process.

　　Each process executes a separate program and has a separate thread of control when the process is initialized. In other words, each process has an independent program counter, which can be used to track the next instruction to be executed.

　　All processes are placed in a data structure called a process control block (PCB), which can be understood as a collection of process attributes, which is created and managed by the operating system. Each process has a process control block in the kernel to maintain process-related information. The process control block of the Linux kernel is a (task_struct) structure.

　2.2 Process Identifier (PID)

　　Process identifiers are defined under task_struct

pid_t pid;       // The id used to identify the process in the kernel 
pid_t tgid;     // Used to implement the thread mechanism

struct pid
{
    atomic_t count;
    unsigned int level;

    /* lists of tasks that use this pid */
    struct hlist_head tasks[PIDTYPE_MAX];
    struct rcu_head rcu;
    struct upid numbers[1];
};

　　Each process has a unique identifier (PID), and the kernel uses this identifier to identify different processes. At the same time, the process identifier (PID) is also an interface provided by the kernel to the user program, and the user program issues commands to the process through the PID.

　　PIDs are 32-bit unsigned integers that are numbered sequentially: the PID of a newly created process is usually the PID of the previous process plus 1. However, in order to maintain compatibility with traditional Linux systems on 16-bit hardware platforms, the maximum PID number allowed on Linux is 32767, and when the kernel creates the 32768th process in the system, it must restart using the idle PID number. On a 64-bit system, the PID can be extended to 4194303.

3. Process status

　3.1 Six states

#define TASK_RUNNING

　　1. Indicates that the process is either executing or preparing to execute.

#define TASK_INTERRUPTIBLE

　　2. Indicates that the process is blocked (sleep), and only when a certain condition is TRUE, its state is correspondingly set to TASK_RUNNING. It can be woken up by signals and wake_up.

#define TASK_UNINTERRUPTIBLE

　　3. Indicates that the process is blocked (sleep), and only when a certain condition is TRUE, its state is correspondingly set to TASK_RUNNING. It can only be woken up by wake_up.

#define TASK_STOPPED

　　4. Indicates that the process is stopped.

#define TASK_TRACED

　　5. Indicates that the process is being monitored by processes such as debugger .

#define EXIT_ZOMBIE

　　6. Indicates that the execution of the process is terminated, but its parent process has not used system calls such as wait() to know its termination information.

#define EXIT_DEAD

　　7. Represents the final state of the process.

　3.2 State transition diagram

4. O(1) scheduling algorithm under Linux

　　The Linux O(1) scheduler (O(1) scheduler) is a historically popular Linux system scheduler. Named this name because of its ability to perform task scheduling in constant time, such as selecting a task from the execution queue or adding a task to the execution queue, which is related to the total number of tasks in the system.

　4.1 O(1) Scheduler

　　In O(1) scheduling, the most important data structure to ask is the run queue. The run queue depicts the structure of the process queue, which is represented by the runqueue structure in the kernel source code.

struct runqueue 
{   unsigned long nr_running; 
    task_t *curr;
    prio_array_t *active,*expired,arrays[2]; 
};

　4.2 Priority array

　　Another core data structure of the O(1) algorithm is the prio_array structure. There is an array queue used to represent the dynamic priority of the process in this structure, which contains the linked list formed by each priority process.

#define
 MAX_USER_RT_PRIO        100
#define
 MAX_RT_PRIO             MAX_USER_RT_PRIO
#define
 MAX_PRIO                (MAX_RT_PRIO + 40)
typedef struct prio_array
 prio_array_t;
struct prio_array
 {
      unsigned int nr_active;
      unsigned long bitmap[BITMAP_SIZE];
      struct list_head
 queue[MAX_PRIO];
};

　4.3 Static Priority and Dynamic Priority

　　The process has two priorities, one is static priority and the other is dynamic priority. Static priority is used to calculate the time slice length of the process running, dynamic priority is used when the scheduler schedules, the scheduler The process with the highest dynamic priority is selected to run each time.

Calculation of static priority:
The relationship between nice value and static priority is: static priority = 100 + nice + 20 
and the range of nice value is - 20 ~ 19 , so the range of static priority of ordinary process is 100 ~ 139

Calculation of dynamic priority:
dynamic priority =max( 100 , min(static priority – bonus + 5 , 139 ))

　4.4 Time slice

　　The O(1) algorithm uses the expired process array and the active process array to solve the O(n) complexity problem brought by the previous scheduling algorithm. The processes in the expired array have all used up their time slices, and the processes in the active array still have time slices. When a process uses up its time slice, it is moved to the expired process array, and the expired process has calculated a new time slice before being moved. It can be seen that the O(1) scheduling algorithm adopts the method of decentralizing the calculation of time slices, instead of recalculating the time slices for all runnable processes in the previous algorithm. When there is no process in the active process array, it means that all runnable processes have used up their time slices. At this time, only two arrays need to be exchanged to switch the expired process to the active process, and then continue to be scheduled by the scheduler. Switching between two arrays is actually an exchange between pointers, so the time spent is constant.

struct prop_array *array = rq->active;
if (array->nr_active != 0) {
    rq->active = rq->expired;
    rq->expired = array;
}

　　The above code illustrates the exchange between two arrays. By dispersing the calculation time slice, exchanging the expired and active two process sets, the O(1) algorithm can recalculate the time slice for each process in constant time. .

Calculation of the time slice length of the process running
static priority < 120 , base time slice = max(( 140 - static priority) * 20 , MIN_TIMESLICE)
static priority >= 120 , base time slice = max(( 140 - static priority) * 5 , MIN_TIMESLICE)

　4.5 Scheduling Algorithms

　　At each process switch, the kernel scans each process on the ready queue in turn, calculates the priority of each process, and then selects the process with the highest priority to run; although this algorithm is simple to understand, it spends in selecting the priority. The time on the highest process cannot be ignored. The more processes that can run in the system, the more time it takes, and the time complexity is O(n).

// pseudocode 
for (each process in the system) {
    Recalculate the time slice;
    Recalculate the priority;
}

5. Views on the operating system process model

　　Process is the core concept of the operating system, which is an abstraction of the running program. Everything else in an operating system revolves around the concept of a process.

　　Conceptually, each process has its own virtual CPU. Of course, the actual real CPU switches back and forth between the respective processes.

　　Process management is the management of the CPU by the operating system. In order to improve the CPU utilization, multi-programming is used, and there is multi-process maintenance management. The operating system has implemented various management functions. The abstraction of hardware CPU and a series of process resources has become the concept of a process. It can be said that a process can be regarded as an "out of nothing" of the operating system. The programmer of the application program directly uses the mechanism of the process to make the application program efficient. The purpose of utilizing hardware resources.

6. Reference

　　https://blog.csdn.net/a2796749/article/details/47101533

　　Modern Operating Systems (Fourth Edition) Andrew S. Tanenbaum and Herbert Bos 著

Analysis of Process Management Based on Linux Operating System