[Linux]-process creation fork

First, the role of fork

We all know that fork can be used to create processes. First, let's understand the two uses of fork to really understand when we will use fork to create processes.

Usage 1: A parent process wants to copy itself, so that the parent and child processes execute different code segments at the same time.
This is the most common in the network service process-the parent process waits for the client's service request. When the request arrives, the parent process calls fork, and the child process processes the request. The parent process waits for the next service request to arrive.
Usage 2: A process has to execute a different program.
This usage is common to the shell. In this case, the child process calls exec immediately after returning from the fork. I will continue to update this concept in subsequent blog posts.
Note: In some operating systems, the two operations fork and exec are combined into one, and it is called spawn. But in the UNIX system, the two operations are separated, because in many occasions it is necessary to use fork separately, followed by exec operation, so that the child process can change its own properties between these two operations, such as I / O redirection , User ID, signal arrangement, etc.

After understanding the usage of fork, it is like using the knife to use the key places. At this time, we use the fork to create the process and we know what to use.

Second, the fork () feature

1. The relationship between the father and son process

At this time, we have to clarify two concepts-parent process and child process. The process created by fork is called a child process. His function prototype is as follows:

#include<unsitd.h>
pid_t fork(void);

Their relationship is shown below: The
Insert picture description here
following code can verify the relationship between the parent and child processes:

int main()
{
	pid_t n = fork();
	assert(-1 != n);

	if(0 == n)
	{
		printf("Hello: mypid = %d,  myppid = %d\n", getpid(), getppid());
	}
	else
	{
		sleep(1);  // 保证新进程先执行完
		printf("World: n = %d, mypid = %d\n", n, getpid());
	}

	
	exit(0);
}

The results are as follows:
Insert picture description here

2. The return of the parent-child process

The fork function is called once, but returns twice. The only difference between the two returns is that the child process returns 0, and the return value of the parent process is the process ID of the new child process.
The reason why the return value of the parent process is the process ID of the new child process is because there can be multiple child processes of a process, and there is no function that a process can obtain the process ID of all its child processes.
But the return value of the child process is indeed 0 because a process will only have one parent process, so the child process can always call getppid to obtain the ID of the parent process
as in the following program, what is the output result?

#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
#include <unistd.h>
#include <string.h>

int main()
{
	pid_t n = fork();
	assert(-1 != n);

	if(0 == n)
	{
		printf("Hello\n");
	}
	else
	{
		printf("World\n");
	}
	
	exit(0);
}

The running results are as follows: Insert picture description here
Reason analysis: This result is related to the return value of fork. There are two return results in one call. The world is printed first. Since the value returned by the child process is 0, print hello.

3. The execution of the parent-child process

In general, whether the parent process executes first or the child process executes first after the fork is uncertain. This depends on the scheduling algorithm used by the kernel. If parent-child processes are required to synchronize with each other, some form of interprocess communication is required. After the fork method is called, both the parent and child processes start from the execution after the fork call. This instruction is roughly divided into call fork and mov pid eax. The parent-child process is executed after the end of the mov instruction. The program shown in the following figure:
Insert picture description here
According to the above execution situation, the running result of the program is uncertain, and the parent-child process executes who is first and then executes randomly. The following figure shows the different results of executing the program twice:
Insert picture description here

3. Copy-on-write technology

In order to understand this technology more visually, we first analyze it one by one through some code examples.
The following two pieces of code are to test the global, local, and heap data, whether the parent and child processes are shared, and the modification of the child process has no effect on the parent process.

#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
#include <unistd.h>
#include <string.h>

int  gdata1 = 10;
static gdata2 = 10;

int main()
{
	int ldata1 = 10;
	static ldata2 = 10;

	int *ptr = (int *)malloc(4);
	*ptr = 10;

	pid_t n = fork();
	assert(-1 != n);

	if(0 == n)
	{
		printf("child: %d, %d, %d, %d, %d\n", gdata1, gdata2, ldata1, ldata2, *ptr);

		gdata1 = 20;
		gdata2 = 20;
		ldata1 = 20;
		ldata2 = 20;
		*ptr = 20;

		printf("child: %d, %d, %d, %d, %d\n", gdata1, gdata2, ldata1, ldata2, *ptr);
	}
	else
	{
		sleep(2); // 保证子进程将数据已经修改并输出
		printf("father: %d, %d, %d, %d, %d\n", gdata1, gdata2, ldata1, ldata2, *ptr);
	}


	exit(0);
}

The running result is:
Insert picture description here

int  gdata1 = 10;
static gdata2 = 10;

int main()
{
	int ldata1 = 10;
	static ldata2 = 10;

	int *ptr = (int *)malloc(4);
	*ptr = 10;

	pid_t n = fork();
	assert(-1 != n);

	if(0 == n)
	{
		printf("child: 0x%x, 0x%x, 0x%x, 0x%x, 0x%x\n", &gdata1, &gdata2, &ldata1, &ldata2, ptr);

		gdata1 = 20;
		gdata2 = 20;
		ldata1 = 20;
		ldata2 = 20;
		*ptr = 20;

		printf("child: 0x%x, 0x%x, 0x%x, 0x%x, 0x%x\n", &gdata1, &gdata2, &ldata1, &ldata2, ptr);
	}
	else
	{
		sleep(2); // 保证子进程将数据已经修改并输出
		printf("father: 0x%x, 0x%x, 0x%x, 0x%x, 0x%x\n", &gdata1, &gdata2, &ldata1, &ldata2, ptr);
	}


	exit(0);
}

Operation result:
Insert picture description here
Carefully observing the above code, we will find that the modification of the global, local, and heap data by the child process will not affect the data and address of the parent process, and the address of the parent and child process is the same. The addresses of the parent and child processes are the same because the addresses printed here are all logical addresses, which are what we call offset addresses on the program, which can be converted into physical addresses only through page table mapping. The operating system maintains a page table for each process. Although their logical addresses are the same, their physical addresses are different.

1. Concept

With the foreshadowing of the above program, we can lead to the concept of our copy-on-write technology: do not perform a complete copy of the parent process data segment, stack and heap, these areas are shared by the parent and child processes, and the kernel changes their access permissions Is read-only. If any one of the parent and child processes attempts to modify these areas, the kernel only makes a copy of the fast memory that modifies the area, and assigns values ​​in units of "one page" in the virtual memory system.

2. Features

There are three characteristics of copy-on-write technology. Let's take a look at these three characteristics through how data is copied. The code is implemented as follows:

int main()
{
	int size = 1024 * 1024 * 1024;  // 设置申请的基数位1G
	char *ptr = (char *)malloc(size * 2); // 一共申请2G空间

	// 循环使用申请的空间
	int i = 0;
	for(; i < 32; ++i)
	{
		sleep(1);
		memset(ptr + i * 1024 * 1024 * 34, 'a', 1024 * 1024 * 34);//初始化32兆
	}

	pid_t n = fork();
	assert(-1 != n);

	if(0 == n)
	{
		// 循环使用申请的空间
		printf("child start\n");
		int i = 0;
		for(; i < 32; ++i)
		{
			sleep(1);
			memset(ptr + i * 1024 * 1024 * 32, 'b', 1024 * 1024 * 32);//相当于对数据的一个修改
		}
	}
	else
	{
		sleep(35);
	}
	free(ptr);
	exit(0);
}

In the above code, we first applied for 1G of space with malloc and recycled the applied space. Let's think about whether the fork () implementation will directly copy all the data space of the parent process to the child process?
First of all, before the program is executed, the operation of our system's cpu and swap partition is as follows:
Insert picture description here
after the fork is executed, the system's cpu and swap partition runs as follows:
Insert picture description here
we found that after the fork is executed, the cpu does not occupy a lot The memory and swap partitions are also slowly increased from the previous ones to no. From this, we can get the first feature of our copy-on-write technology : .malloc application space is not directly allocated physical memory space to the user after malloc succeeds, but only when the user uses it. Memory space. The successful malloc call just allocates the virtual address and heap space on the space to the user.
The second feature : the fork method does not directly copy the data space of the parent process to the child process, but the child process only allocates space to the child process when modifying the data on the data space

After waiting for a while, the program ends, and the CPU and swap partition of the user space system is running as follows:
Insert picture description here
we can see that the recovery of user space is declining step by step, and we can get the third feature of our copy-on-write technology : When the space is released, the physical memory space will be released directly

Published 98 original articles · won praise 9 · views 3646

Guess you like

Origin blog.csdn.net/qq_43412060/article/details/105442802