fork () to create a process principle

1. Copy-on-write technology

fork()When a child process is generated, it just copies 虚拟地址to the child process, that is, the address of the memory occupied by the parent process, instead of actually opening and closing a new memory space for the child process; that is, when the fork() is just finished, the parent and child processes point to The same block of memory, and the mapping relationship between virtual address and physical address, is also copied from the parent process.
Only for some pages that have been modified in the memory of the parent/child process, such as page Z in the figure below, if the child process modifies the content of the page memory, then a new page Z' must be allocated to the child process, which The content inside is no longer the original content of the parent process but the modified content is saved, and the child process can write its own content on this memory page later.
insert image description here

This mechanism is called the kernel's " 写时复制"
and the whole process is:
(1) When creating a child process, copy the parent process's 虚拟内存and 物理内存mapping relationship (that is, the page table) to the child process, and set the memory to 只读(set to read-only for when 写操作Triggered when the memory is processed 缺页异常). The page fault exception is that when the virtual address is used to find the physical address, it is found that the address is not in the memory at this time, and the page needs to be swapped in from the disk.
(2) When the child process or the parent process modifies the memory data (write operation), it will trigger the 写时复制mechanism: copy the original memory page to a new one, and reset its memory mapping relationship, and read the memory of the parent and child processes Write permissions are set to read and write.

Advantages of copy-on-write:

It can be seen from the above that when creating a child process, it is not necessary to copy every page of the parent process, and to open up memory for the child 创建子进程的速度加快了process 减少了物理内存的浪费. ah)

vfork() and fork()

vfork() appeared after fork(), but the fork() at that time did not copy on write, that is, when the child process was created at that time, the memory owned by the parent process should be copied, and a new memory should be opened to save this data. Then map the child process, which is time-consuming and wastes memory, so there is vfork(), 加速创建进程the process of thinking.
vfork()创建子进程,不复制父进程的页表:
(1) Directly let the child process completely 共享父进程的地址空间(like a process in a process), the child process runs in the address space of the parent process, then the modification of the data inside by the child process can also be seen by the parent process; (
2 ) address space are shared, that 堆栈也共享,这样是容易出问题的. After the child process executes a function, the stack will be popped, and the stack of the parent process will also be affected. In other words, when the entire child process ends, the stack is to be released, and the stack of the parent process is also released, so how can the parent process execute related function calls normally.
(3) vfork()创建出子进程后,子进程会被确保首先执行, the parent process is blocked until the child process has its own address space (call exec()) or the child process exits (exit())

Later fork() introduced copy-on-write technology, then the advantage of vfork() is that it does not need to copy the page table of the parent process. The advantage in speed is not great, but it is easy to cause many problems, so it is rarely used.

2. Preliminary principle of fork()

It can be seen from the copy-on-write mechanism that the actual cost of fork() creating a child process is: 复制父进程的页表(the mapping relationship between virtual address and physical address) + 给子进程创建唯一的PCB(in Linux, it is task_structa structure). In fact, one needs to be created for each new process 内核栈, which will be mentioned later. This is equivalent to adding the time spent on developing the kernel stack to developing the PCB. After all, the two are created together.
There is a pointer in task_struct that points to the page table.

Then understand the image of the next page table and multi-process in memory

每个进程的页表都不一样, page tables are used to map virtual addresses to physical addresses. Different processes have different virtual address spaces, so their page tables are also different.
It is easy to understand: regardless of the shared kernel space, because each process has an independent address space, then the address space of each process is from the low address 0x000 0000 to the bottom of the stack 0xc000 000, if the page of each process If the same table is used, the logical address is the same, and the physical address is also the same, so there is a conflict. Therefore, the page table of each process must be different, so that multiple processes will not occupy the same area of ​​physical memory at the same time.
And in fact, each process seems to occupy a complete address space (0x0000 0000 - 0xFFFF FFFF) or the entire memory. In fact, they often only occupy a small part of the physical memory, such as the following:
insert image description here
insert image description here

Source: Kobayashi coding

This takes advantage of the fact 局部性原理that even if 4GB variables are defined in processes a, b, and c, these variables will not be used at the same time, and according to the principle of locality, only a small part of variables will be used frequently, such as the above variable a_i, b_i, c_i, then they are swapped into the memory when they are used, so that only three small pieces of memory are needed to run three processes at the same time. If the page table of each process is the same, then in the above figure, the variable i of a, b, and c will all be stored in the same block of physical memory, 造成冲突!
The use of the principle of locality can make the physical memory run multiple processes at the same time, but the CPU's switching scheduling of the process is the image of the process when the process is actually running, that is, the CPU can only run one process at the same time, and the state or context of the process when it is running , is recorded in the registers of the CPU. When the process is switched, these contexts are obtained 从寄存器中复制到进程的PCB里的相应字段里. This is the so-called 保存现场信息.

Now, we know that 除了有指向页表的指针there are corresponding variables in tack_struct 存储了进程在CPU中的现场信息.
In addition, for the virtual address space of each process, there is a structure mm_struct in the task_struct that specifically points to each memory segment of the virtual address space: stack, mapping area, heap, BBS segment, data segment, and code segment. The source of the next insert image description here
picture: Kobayashi coding

In addition to the above, there are some fields of task_struct as follows:
(1) task_struct must include the process pid, so that the process can be identified; and since the thread is also described by task_struct, there must be an identification, this is tgid( tg is the thread group thread group), the pid of the child process is different from that of the parent process, and the pid of the child thread is also different from that of the main thread (it is not the same as the pid that I thought before. At this time, the pid is understood as the thread id instead of the process id) , 但tgid是一样的, is to use tgid to mark the two different threads belonging to the same process. When creating a process, naturally the pid and tgid are different.

By the way, here is the difference between creating a process and creating a thread.

The difference between creating a process and creating a thread

In fact, the two below are the same system call:
insert image description here
to distinguish whether to create a process or a thread, you can focus on the pid and tgid mentioned above, as well as the sharing of some resources , see the picture below, see the watermark for the source of the picture, I forgot which blog :
insert image description here
Each thread has its own unique pid. How does the kernel know which process this thread belongs to?
The answer is tgid. yes, 一个进程就是一个线程组soAll threads of each process have the same tgid
Simply put:tgid用于标记某线程属于哪个进程

For a process with only one thread, its tgid is the process id or pid;
When the program starts running, there is only one main thread, and the tgid of this main thread is equal to the pid. When other threads are created, they inherit the tgid of the main thread;

But the tgid of a process is not necessarily equal to the pid. It should be unequal except for a single thread. Moreover 用户用ps命令获取进程的id,其实获取的是tgid,而不是真正的进程id, this can also distinguish different processes, so there is no need to entangle.

(2) What are the processes opened 文件, the corresponding file descriptor fd, such as which sockets are created, listening sockets, connection sockets, they are stored in the process, the default file descriptor array of the process The size is 1024, that is, a maximum of 1024 files can be opened by default, which can be modified.
(3) The opened files are all marked, so it is natural to mark which ones are occupied, the I/O设备average CPU usage, and the usage of network traffic
(4) There 进程的状态must also be current records (ready, running, creating, etc. )

task_struct is a rather huge structure, here paste some fields, image source watermark:
insert image description here

3. The specific process of fork()

The next step is to analyze what fork() does step by step.
As mentioned earlier, the bottom layer of creating a process and creating a thread is actually calling the kernel function do_fork(), so we simply talk about do_fork() directly.

Its main operations are:
(1) Call alloc_thread_info()the function to obtain a union of 8KB (the size of two pages) to store thread_infothe sum of the new process 内核栈:

union thread_union {
    
    
    struct thread_info thread_info;  // thread_info
    unsigned long stack[THREAD_SIZE/sizeof(long)]; //内核栈
};

(The characteristic of union is that the variable addresses inside are the same, that is, they are all the first address of union; in addition, the size of union is the size of the largest variable that occupies memory)

For each process, the kernel allocates a separate memory area for it, which stores the kernel stack and the process corresponding to 一个小型进程描述符thread_info结构:
insert image description here

The kernel stack grows downward (from high address to low address) from the top of the memory area, while the thread_info structure grows upward (from low address to high address) from the beginning of the area. The top address of the kernel stack is stored in the esp register. Therefore, when the process switches from user mode to kernel mode, the esp register points to the end of this area

In the figure above, the bottom part is the 8KB memory 首地址, and the kernel stack grows from the high address to the low address (first address); and it can be seen that there is a pointer thread_info并不是PCBinside it , pointing to the PCB.struct task_struct *taskThread_info can be understood as the entry of task_struct, so it is called a small process descriptor. If you put task_struct directly here, it will take up a lot of space, because the task_struct structure is very large, so this is equivalent to the design idea of ​​the kernel designer, not directly but indirectly obtain task_struct.

The thread_info structure code is as follows:

struct thread_info {
    
    
    struct task_struct  *task;      /* 主进程描述符 */
    struct exec_domain  *exec_domain;   /* 执行域 */
    __u32           flags;      /* 低级别标志 */
    __u32           status;     /* 线程同步标志 */
    __u32           cpu;        /* 当前CPU */
    int         preempt_count;  /* 0 => 可抢占, <0 => BUG */
    mm_segment_t        addr_limit;
    struct restart_block    restart_block;
    void __user     *sysenter_return;
#ifdef CONFIG_X86_32
    unsigned long           previous_esp;   /* 先前栈的ESP,以防嵌入的(IRQ)栈 */
    __u8            supervisor_stack[0];
#endif
    int         uaccess_err;
};

So why put thread_info and kernel stack together?
esp寄存器The main reason is that the kernel stack of the currently running process is saved in the CPU 栈顶. The kernel can easily obtain the address of the thread_info structure of the currently running process through the value of the esp register, and then obtain the address of the current process descriptor , call the following two functions:

/* 从esp中获取当前栈指针 */
register unsigned long current_stack_pointer asm("esp") __used;
 
/* 获取当前线程信息结构 */
static inline struct thread_info *current_thread_info(void)
{
    
    
    return (struct thread_info *)
        (current_stack_pointer & ~(THREAD_SIZE - 1)); // ~是按位取反
}

In the current_thread_info function above, the inline assembly statement that defines current_stack_pointer will get the top address of the kernel stack from the esp register, and ~(THREAD_SIZE - 1) will mask the lower 13 bits (or 12 bits, when THREAD_SIZE is 4096), the address referred to at this time is the starting address of this memory area, which happens to be the address of the thread_info structure, 然后调用get_current()函数就能得到指向task_struct的指针task:

static inline struct task_struct *get_current(void)
{
    
    
    return current_thread_info()->task;
}

(2) Obtain the pointer of the current process PCB,Copy the task_struct of the current process (parent process) to the task_struct of the new process in the memory just allocated.
This is copy_process()what is done in the following section:
Note that copy_process() does not pass in the task_struct of the parent process as a formal parameter, but calls a function in it to obtain the parent process, and then assigns it to the child process:

struct task_struct *p;
p = dup_task_struct(current);

This dup_task_structis the task_struct address current passed in to the current parent process. It receives a pointer to the task_struct structure of the original process, returns a pointer to the task_struct structure of the new process , and completely copies the PCB of the parent process.
In fact, the alloc_task_info() mentioned in the above part (1) is called in dup_task_struct() in copy_process(), so the order of do_fork() mentioned in this article is not the actual calling order of each function, but ignores A logical order after the underlying implementation.

There are two pointers in dup_task_struct(), and the corresponding macro functions:

struct task_struct *tsk;
struct thread_info *ti;

Then, 执行alloc_task_struct宏the macro is responsible for allocating space for the process descriptor of the child process, assigning the first address of the memory to tsk, and then checking whether the memory is allocated correctly. 执行alloc_thread_info宏, obtain a free memory area for the child process, 用来存放子进程的内核栈和thread_info结构and assign the first address of this memory area to the ti variable, and then check whether the allocation is correct.
That is to say:
Calling alloc_task_struct is to allocate the structure task_struct,
calling alloc__thread_info is to allocate the kernel stack and thread_info, and then a pointer in thread_info points to task_struct.

(3) After copying, the child process is exactly the same as the parent process. At this time, check whether the number of processes owned by the user exceeds the resource limit (the global descriptor table GDT limits the maximum number of processes to 4090) —
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — ———
In the front, the information of the parent process is copied, and later, the new child process must be different from the parent process.
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

(4) Note that 子进程状态it is set to be at the beginning task_uninterruptible, and then it becomes task_runningthe ready state later. Some data say that the fork is in the task_running state. In fact, the intermediate process is omitted, but the child process created after the fork is completely executed is indeed ready. The status is task_running.

TASK_UNINTERRUPTIBLE : In this state, it cannot be woken up by an external signal 只能由内核亲自唤醒.

After the previous operations, the child process is not complete at this time, so its state is set to uninterruptible, that is, an uninterruptible sleep state, which is fine 保证这个不完整的子进程不会马上被投入运行.

(5) Call get_id() to get a valid ID for the new processPID

(6) Update various fields in the child process, such as 亲属关系the fields described, but many fields cannot be inherited from the parent process, and kinship is possible.
The domain to which a process belongs is usually used to implement security policies to prevent illegal access between processes. To put it bluntly, a lot of information between processes cannot be shared. Even when a child process inherits a parent process, it cannot inherit certain variables or states in the parent process, such as:

File descriptor; (the child process will copy a file descriptor table from the parent process, but these descriptors only point to the same file, and the file descriptor of the parent process will not be affected when the child process closes the file descriptor) signal handler
; (The child process will not inherit the signal processing function of the parent process, but will inherit the shielding state of the signal)
environment variables;
list of opened files;

This part is a bit complicated. Some say that the child process cannot inherit the parent process:
text (text), data and other locked memory (memory locks) (Translator's Note: Locked memory refers to locked virtual memory pages, after locking , If the kernel is not allowed to swap out (page out)
memory when necessary, it is actually shared with the parent process. Only when a write operation occurs in the child process or the parent process will a new one be allocated. This is in copy-on-write I said it.
I will add this part later,What is known so far is that file locks, memory locks, signal processing functions and timers do not inherit

(7) 把新创建的进程PCB插入进程链表To ensure the kinship between processes, such as sibling lists. Of course, the relationship between processes uses a tree or a linked list, depending on how many child processes a parent process has, one uses a process linked list, and multiple uses a process tree

(8) Also put the new PCB 插入pidhash哈希表. This hash table speeds up the search, because it is still relatively slow to search with a linked list, so there is such a hash table in the kernel. In fact, there are four hash tables, corresponding to the four IDs of the process. In addition to the pid, there are pgid, sid, reference article

(9) Set the PCB state of the child process to task_running, and call wake_up_process() to insert the child process into the ready queue list, waiting to be called by the CPU. 队列只是个历史称呼并不是真的是个queue!!!The actual implementation data structure may be a linked list or a red-black tree

(10) Let the parent process and the child process平分剩余的时间片

(11) Return the PID of the child process, and then return to the user state. This PID is finally read by the parent process in the user state.

Guess you like

Origin blog.csdn.net/mrqiuwen/article/details/130441611