From the analysis of the C language source code, how does the mysterious Linux system record and describe the process?

Insert picture description here
How does the Linux kernel record the resources of a process?

First of all, it should be understood that most of the Linux kernel is written in C language, so to figure out how the kernel records process resources, you only need to check the relevant C language code. In fact, the Linux kernel uses the task_struct structure to describe the resources of the process. Its C language part of the code is as follows, please see:
Insert picture description here

task_struct structure is very long

The task_struct structure is very long. In my Linux kernel C language source code, it occupies 280 lines. Of course, this includes a lot of conditional compilation.

In view of the long task_struct structure, it is impossible to introduce its members clearly. If the reader is as curious as I am, a cursory glance at the task_struct structure should be able to find some familiar members, such as:
Insert picture description here

Familiar members of task_struct structure

Through C language comments and member variable names, you can see that the task_struct structure contains information about the file system, thread structure, and files opened by the process, which corresponds to the content of the previous article. Other members will be involved in my subsequent articles, so I won’t repeat them here.

When creating a process, Linux allocates task_struct structure through the slab allocator, which can avoid the overhead caused by dynamic allocation and release and improve the efficiency of memory usage.

So after the task_struct structure is created, how does the kernel access it?

According to the kernel C language source code in my hand, there is also a structure thread_info in Linux. One of its member task pointers is suitable for indexing the task_struct structure. On the X86_64 platform, the relevant C language code of thread_info is as follows, please Look:
Insert picture description here

task pointer

Linux usually keeps the thread_info structure at the bottom or top of the kernel stack, and the size of the kernel stack is usually known, so each process can easily find the thread_info structure from its own stack, and then find the task_struct structure.

To find the thread_info structure of the current process, you can call the current_thread_info() function. Its C language code is as follows, please see:

static inline struct thread_info *current_thread_info(void)
{
register unsigned long sp asm (“sp”);
return (struct thread_info *)(sp & ~(THREAD_SIZE - 1));
}

current_thread_info() function

It can be seen that the current_thread_info() function is actually calculated through the process stack, so its implementation is related to the platform architecture. The above C language code is actually only the implementation method of the arm platform, and the reader can refer to the implementation methods of other platforms.

At this point, to obtain the resources of the current process, you can use the current_thread_info()->task index.

Process PID

The Linux kernel assigns a unique process identification (PID) to each process to distinguish different processes. PID is an integer, expressed as pid_t type (actually int type) in the C language source code of the kernel. Enter the ps command on the Linux command line to view the PID of the process, for example:
Insert picture description here

View the PID of the process

The task_struct structure uses the member pid to record the PID value of the process. The related C language code is as follows, please see:
Insert picture description here

The task_struct structure uses the member pid to record the PID value of the process

In Linux systems, the maximum value of PID can be adjusted. In the early days, in order to be compatible with older versions of Unix and Linux, the default maximum value was 32768 (the maximum value that can be represented by the short int type). This value can be viewed through the cat command:

#cat /proc/sys/kernel/pid_max  
32768 

The maximum value of PID has an impact on the operation of Linux system. Because PID value is unique, its maximum value actually represents the maximum number of processes that the system can run at the same time. For ordinary individual users, 32768 is enough, but for large servers, 32768 may not be enough. At this time, pid_max can be modified to solve this problem.

Process status

Now I know how the Linux kernel describes and records process resources, and how to distinguish different processes. So what state does the process have? Readers should notice that the first member state of the task_struct structure is used to record the state of the process. The state of the process is defined by several macros in the C language source code:
Insert picture description here

The state of the process is defined by several macros in the C language source code

A process in a Linux system must be in one of these five states. From top to bottom, the process is in:

Running or ready to run
Sleeping, but can be interrupted, it will be awakened
in advance when receiving a signal Sleeping and uninterruptible, that is, it will not be awakened even if a signal is received It is
being tracked by other processes
Stopped running
Now I understand that sometimes it cannot pass The kill command kills processes in the D state. This is because these processes are in a state of not responding to signals. The kill command essentially sends a SIGKILL signal, and naturally cannot kill the process.

Parent process and child process

The parent process and child process of the process are also resources of the process, so they are also recorded in the task_struct structure, please see the relevant C language code:
Insert picture description here

It is convenient to access the parent and child processes of the current process

So it is convenient to access the parent and child processes of the current process, for example:

struct task_struct *p = current->parent; 
struct task_stuck *c = current->children; 

If you think about it for a moment, you should be able to find that the parent pointer and children pointer in the process structure task_struct actually constitute a linked list. Through this linked list, we can easily access the parent process, grandfather process..., and children, grandchildren... Wait. However, it should be understood that for a system with a large number of processes, the overhead of repeatedly traversing all processes is very large.

summary

This section first discusses how the Linux kernel records and describes process resources. It can be seen that the kernel management process is actually the task_struct structure. Then, through the C language source code, I checked how the kernel accesses the task_struct structure and how to distinguish the process. Finally, we also discussed the state of the process and the family tree. It can be seen that the Linux kernel source code is not mysterious to incomprehensible.

Need C/C++ Linux server architect learning materials plus group (812855908) to obtain (data including C/C++, Linux, golang technology, Nginx, ZeroMQ, MySQL, Redis, fastdfs, MongoDB, ZK, streaming media, CDN, P2P, K8S , Docker, TCP/IP, coroutine, DPDK, ffmpeg, etc.)
Insert picture description here

Guess you like

Origin blog.csdn.net/qq_40989769/article/details/109312039