Linux process control (1) --- process creation and termination (copy-on-write, exit and _exit, etc.)

Table of contents

process creation

fork() function

How does a child process inherit data from the parent process

1. Copy separation at creation time

2. Copy-on-write★

process terminated

What does the OS do when a process terminates?

Common ways of process termination

The code runs and the result is correct

exit code★

The code runs to completion with incorrect results

Abnormal code termination

How to terminate a process with code

return

exit and _exit★


process creation

fork() function

I have already explained the usage of this function in detail in the previous article, so this article mainly looks at fork() from a more in-depth perspective.

The fork function is a very important function in linux, which creates a new process from an existing process. The new process is the child process, and the original process is the parent process.

#include <unistd.h>
pid_t fork(void);
return value: return 0 in the child process, return the child process id in the parent process, return -1 in error

Now we will explain from a more systematic perspective:

Please describe, what does the operating system do when fork() creates a child process?

1. First of all, the starting point of this question is: fork() creates a child process, and there will be one more process in the system!

2. Then answer what is the process:

Process = kernel data structure (OS) + process code and data (usually from disk, that is, the result of C/C++ program loading)

3. After that, the following operations will be performed on the child process:

        a. Allocate new memory blocks and data structures to the child process

        b. Copy part of the data structure content of the parent process to the child process

        c. Add child processes to the system process list

        d.fork() returns and starts the scheduler call.

How does a child process inherit data from the parent process

Creating a child process and assigning the corresponding kernel structure to the child process must be unique to the child process, because the process is independent. In theory, subprocesses should also have their own code and data!

It can be that in general, the child process does not have a loaded process, that is to say, the child process does not have its own code and data, so the child process can only use "parent process code and data"

This is wrong, doesn’t it mean that the process is independent, how can the code and data of the parent process be used?

Here are also discussed on a case-by-case basis:

Code : can not be rewritten, can only be read , so father and son share, no problem!

Data : may be modified, so must be detached!

So for data , how to separate it?

1. Copy separation at creation time

Let’s look at the first method first: when creating a child process, just copy and separate it directly, so what’s the problem?

1. When we create a child process, can we run it immediately? This is one of them.

2. Even if it works right away, will you have access to all the data? This is the second.

3. Even if you can access all data, is your access to all data written? If it is not written, there is no need to copy at all. 

So the problem is that it is possible to copy the data space that the child process will not use at all, even if it is used, it may only be read!

This situation is also well illustrated:

 Two different string constants, when we print the addresses, we will find that their addresses are the same!

Because the compiler knows that the content in this const modified variable cannot be modified, so the following variables with the same content as it will point directly to it.

 This is just to tell everyone: when the compiler compiles the program, it knows how to save space. What's more, this kind of system interface that directly uses memory will pay more attention to the .

Therefore, creating a child process does not need to copy the data that will not be accessed or only read.

But here comes the question, what kind of data is worth copying, and what kind of data must be copied?

It must be data that can be written by the parent process or child process in the future .

However, generally speaking, even the OS cannot know in advance which spaces may be written.

But even if you know, copy it in advance, will you use it immediately?

The answer is that there is so much data that can be written, definitely not, but the space has been given to you, but you don’t use it, so this causes a waste of space.

So OS chose a technology: copy-on-write. To separate the data of the parent and child processes.

2. Copy-on-write★

So combined with the above, copy-on-write means that when you need data that can be written , the OS will give you the corresponding space.

Why OS adopts copy-on-write technology to separate parent and child process data:

1. When it is used, it is allocated again, which is a manifestation of efficient use of memory.

2. The OS cannot predict which spaces will be accessed before the code is executed.

 Here's another problem:

Then the code before the parent process fork(), is the child process shared?

The answer is shared, and the child process shares all the code before the parent process fork().

A picture will be explained below. I have also described this picture in detail in the concurrent execution of the process concept and state .

 That is, there is the address of the next line of code in EIP, and these stored data are called context data.

When the child process inherits the parent process, copy-on-write occurs, and the context data of the parent process is copied to the child process.

Although the parent and child processes are scheduled separately later, the codes are different, and each will modify the EIP, but it is not important anymore, because the child process already thinks that the initial value of its own EIP code is the code after fork().

So although the child process runs after fork(), it does not mean that the code child process before fork() cannot see it!

process terminated

What does the OS do when a process terminates?

We know that a program becomes a process when it runs, and a process is composed of process code and data + kernel data structure .

Therefore, process termination is to release the kernel data structure and corresponding data and codes applied by the process. The essence is to release system resources.

Common ways of process termination

The code runs and the result is correct

This situation is very common. When we write algorithm questions, such as Lituo, if the result of a question is correct, it will be displayed as passed after the final submission.

Or write it yourself under the compiler, and the final output is also in line with our expected results, etc.

I wonder if we have noticed that the main function has a return value, return 0, what is the meaning of its return value, and why it always returns 0?

In fact, the return value of the main function is not always 0, but we often write 0 when we usually write.

The value returned at the end of the main function is called the exit code of the process.

Generally, 0 represents success, indicating that the result of the process running is correct.

The operation result of the non-zero flag is incorrect, which will be described in detail later.

For example we return a 10.

 Then we use $? to output the exit code of the most recent process.

 It can be found that the first run, the exit code obtained is the most recently returned 10.

The reason why it is 0 for the second time is because the last echo $? is also a process, it was executed successfully, so it returns 0.

So what is the meaning of the return of the main function ?

Used to return to the upper level process, used to judge the execution result of the process, can be ignored. 

 For example, if we write an addition and summation program, if the answer is correct, it returns 0, and if the answer is wrong, it returns 1.

At this point our sum is the correct process, so the result should return 0

And when we write the calculation logic of sum incorrectly and fail to get the correct result, it will return 1, indicating that the final result of the program is incorrect, which is also the meaning of the return value of the main function. 

exit code★

Going back to the non-zero exit code just mentioned, there are countless non-zero values, which can be used to identify different error causes.

After the end of our program, it is convenient to locate the details of the cause of the error.

These reasons are also defined in linux, we can print to see, here we need to use a function strerror()

What it does is return a string description of the error code.

We enter the code:

 

 Then output the result:

 

Up to 100, we found that there are many error types.

Of course, we can use these exit codes and meanings ourselves, but if you want to define your own, you can also design a set of exit schemes yourself. 

The code runs to completion with incorrect results

Ditto.

Abnormal code termination

When the program terminates abnormally, the exit code is meaningless. Generally speaking, the return statement corresponding to the exit code is not executed!

So why does the program crash?

How to terminate a process with code

return

First of all, we know that the return exit code terminates the process. Of course, only the return statement in the main function terminates the process.

exit and _exit★

Let's take a look at the introduction first

 See what this does is let the process terminate gracefully.

Then the function parameter is status, which is used to identify the exit code.

It is different from the main function: the exit function is called anywhere, and the process is terminated directly!

See the following example:

 

 If it is return 200, it will only return 200 to a, and the program will not end.

 It can be seen that the output world statement after func() is not executed, and the exit code is 200 for the first time.

So what is _exit and what is the difference?

 It describes a lot and is quite abstract. I will use an example and a picture to explain the difference between it and exit.

First of all, the example just now, we change exit to _exit.

It can be found that the result is no different from exit:

 

But we change the program to something like this:

 

At this time, we expect that because printf does not have the '\n' newline character to refresh the buffer, the content is stored in the buffer at this time, so the program will output the content to the screen after running for 1 second, and then the exit code returns 111. 

Since it is a static image, the effect cannot be demonstrated, but the result is displayed after 1 second.

But if we change exit to _exit.

 

 But found that nothing was output.

Here is the conclusion directly, exit will refresh the contents of the buffer to the screen at the end of the program, and _exit will end directly without refreshing the contents of the buffer.

As shown in the following figure

But we usually recommend using exit.

We know that the library function is an encapsulated system interface, and the library function inside is actually exit, and the system interface is _exit.

And where is the buffer we usually talk about, I will talk about it later, but it must not be inside the operating system

If it is inside the operating system and is maintained by the operating system, then _exit can still be refreshed, but it cannot be refreshed, indicating that it must not be inside the operating system, but provided by the C language library.

So much for process creation and termination.

Guess you like

Origin blog.csdn.net/weixin_47257473/article/details/131784854