Linux process (3) --- in-depth understanding of process address space

Table of contents

Address space division and verification

Is the so-called address space memory?

A strange phenomenon (introduction of virtual addresses)

What is a process address space?

Is the memory we usually access physical memory?

In-depth understanding of regional division

Let's talk about strange phenomena

Why can a variable hold two different values ​​at the same time in fork()

Expansion: When the program is not compiled and loaded into memory, is there an address inside the program?


Address space division and verification

When we wrote code before, we often defined some variables, including static variables, global variables, ordinary variables, or variables that apply for space, etc. These variables will also have their own corresponding positions in memory . I believe you can use this picture more or less seen.

Whether you have seen this picture or not, you must firmly remember the location of each area in the address space.

Note that the variable address in the heap area grows upward, while the stack area grows downward. 

We write the following code code to verify this graph.

  1 #include<stdio.h>  
  2 #include<stdlib.h>  
  3 //定义未初始化全局数据  
  4 int g_unInitVal;  
  5 //定义初始化全局数据  
  6 int g_InitVal=0;  
  7 int main()  
  8 {  
  9   printf("text:%p\n",main);  
 10   printf("init:%p\n",&g_InitVal);  
 11   printf("Uninit:%p\n",&g_unInitVal);                                                                                                                                                                        
 12     
 13   //定义堆区数据  
 14   char* p1 = (char*)malloc(sizeof(char));  
 15   char* p2 = (char*)malloc(sizeof(char));  
 16   char* p3 = (char*)malloc(sizeof(char));  
 17     
 18   printf("heap1:%p\n",p1);  
 19   printf("heap2:%p\n",p2);  
 20   printf("heap3:%p\n",p3);  
 21   
 22   //定义栈区数据  
 23   int a = 1,b = 2,c = 3;  
 24   
 25   printf("stack:%p\n",&a);  
 26   printf("stack:%p\n",&b);  
 27   printf("stack:%p\n",&c);  
 28   
 29   return 0;  
 30 }  

We exit vim, compile with make, and run:

You can also see that there is a large part of the address difference between the heap in the heap area and the stack in the stack area. This part of the space is called the shared space , which will be explained later. 

At the same time, let me explain: the above conclusions are only valid under Linux, but most of the process address space is based on Linux.

Is the so-called address space memory?

Then there is a question: Is the space drawn above memory?

According to our understanding, whenever we run the program, various variables or codes will be temporarily saved in these divided areas of memory for later use.

But the answer is that the picture is not a real memory!

So what exactly is it? The address space is very abstract, which will be discussed later in this article. Explaining one aspect alone will pull out a lot of related knowledge, so we first set up a shelf, and then gradually introduce the explanation.

We observe the following phenomenon:

A strange phenomenon (introduction of virtual addresses)

We know that the parent process fork() creates a child process, and the child process will share the global variables of the parent process, so their addresses are the same.

 Let's make compile it:

It is found that the addresses of the two global variables are indeed the same.

If we make a slight change to the code at this time, in the child process, modify the g_val value to 20 , and then observe the value of the parent and child process again.

  We exit, then make compiles:

 At this time, a very strange phenomenon occurs, the value of the global variable g_val of the parent and child processes is different.

But it is strange that their addresses are the same . When the same address is read at the same time, different values ​​appear, how is this possible?

But we can draw one important conclusion from it:

The address of these variables here is definitely not the address of physical memory. Because if it is a physical address, it is absolutely impossible for the variable value to be different when it is read at the same time. It must be the same.

This is actually called a virtual address (linear address).

And what needs to be said is that almost all languages, if it has the concept of "address", then this address must not be a physical address, but a virtual address .

So what is a virtual address? How should we understand it? Why is it designed like this? What does it have to do with physical addresses? And other related issues. I will try to help you understand with as many examples as possible.

What is a process address space?

 This is an example to help understand:

There is an American millionaire who owns 1 billion US dollars, and then he has 3 illegitimate children, namely a, b, and c.

One day, he called A to himself and said: "Study hard, and if you achieve something by then, I will give you all the 1 billion US dollars." When A was happy, he studied harder. Strive for scientific research results

Similarly, the next day, he called B to himself again and said, "You are a businessman. If you can make your business bigger in the future, my $1 billion will be yours in the future." It’s okay, I immediately went to my own business with motivation and energy.

On the third day, say the same thing about c.

So far, every son thinks that he will have the 1 billion dollars, but the money has not come to them now, and they don't know each other's existence, and then they all work harder. It is equivalent to giving each son a rich man It all cost a big cake.

At this time, in order to get 1 billion US dollars, son a studied hard, but needed some money to buy more books and resources, so one day, he went to the rich man and said, "Dad, can you give me 100 US dollars first?" I need to buy some resources." When the rich man heard it, he could use it for business after all. So it is given, and so are b and c.

But even so, everyone still thinks they have a billion

From the perspective of God, we must know that the three sons cannot get the 1 billion. The three sons only ask the rich man sporadically every time. I must also think that my father has this 1 billion. Even if a certain son wants a lot one day, such as 100 million, and my father says that there is not so much, and I can’t give it, even if the application space fails, these sons will think that my father has such a 100 million. 100 million, but they are not willing to give it to me yet.

If you understand this meaning, you will understand the address space at this time:

Here their rich daddy corresponds to the operating system . His sons (a, b, c) are 3 processes, and the 1 billion pie that daddy drew for these sons is the process address space .

The pies in reality , that is, the money that the rich man really owns is physical memory. He drew not only one pie for each son, but many sons will have many pies. Therefore, these pies need to be managed at that time . How to manage them? ? Describe first, organize later!

The address space in the kernel is essentially a data structure in the future, and it will also be associated with a specific process in the future. 

How to understand it? I will talk about it later.

Is the memory we usually access physical memory?

We need to know: the physical memory itself can be read and written at any time, there is no such thing as unreadable, unwritable, etc.

What happens if you access physical memory directly?

For example, due to improper operation of process 1, a wild pointer is generated, which points to the address of process 2. At this time, modifying the wild pointer in process 1 will directly modify the data or code of process 2.

Or, there are some private password data in process 3, we directly point the pointer to this space in process 1, and then directly access to get the password data, which is absolutely impossible.

Therefore, there is a fatal problem in directly accessing physical memory: it is extremely unsafe! And it can also cause some memory fragmentation problems.

Therefore, it is impossible for our computer to directly use this method of directly allowing users to access physical addresses.

So, in order to solve this kind of problem, modern computers propose this way:

We know that when each process starts, the OS will create a task_struct structure to identify all the attribute information of the process, and then the OS will also create a process address space for each process . This address space is called a virtual address , and then the system also There will be a mapping mechanism (page table) , let’s not talk about it for now, and then map the virtual memory to the physical memory through the mapping mechanism.

 There will be an attribute in the task_struct structure to point to this virtual address space.

But if the virtual address is also an illegal address, after mapping, it is still illegal in the physical address.

In fact, when a process tries to access an illegal address, such as accessing an unmapped memory area or illegal access rights, the MMU will detect this error and trigger an exception or interrupt. This exception or interrupt is called a page fault exception, which will be discussed later explain.

For example, you get 500 pocket money during the Chinese New Year. It’s great to directly access the 500 yuan, but when you were young, what if you were cheated by others, at this time our mother would usually take our pocket money away. Then you said I would give it to you when you needed it, and then you said you wanted to buy a book to read, your mother said yes, so she gave it to you, so after a while, you said I wanted to buy a toy, you Mom says what toy to buy, spend your money in the right place! QAQ.

Here, the mother plays a management role and has the ability to distinguish right from wrong . To distinguish whether your needs are legal or not. Similarly, if an illegal virtual address is encountered, the MMU in this virtual address will prohibit you from accessing it.

This is equivalent to protecting physical memory in disguise!

In-depth understanding of regional division

So how do we understand the division of regions ?

The essence of area division is to define start and end at a range. It is used to identify the start and end.

For example, when we were in elementary school, two people might have an argument and draw the so-called "38th line".

 What needs to be emphasized here is: each process must have a process address space!

The address space is a kind of kernel data structure, which must have at least the division of each area.

We abstract each area into a data structure.

The size of each area is not fixed. For example, the heap grows upwards, but the size is actually changing.

The so-called range change is essentially to mark the start or end +/- a specific range. 

Under Linux, we abstract the structure of the process virtual address space into a structure called mm_struct.

We directly look at the Linux source code, find task_struct, and find mm_struct from it, because each process will have a process address space, so it must be stored in task_struct as an attribute.

see what's in there 

Have you seen it, this is the area division, using unsigned int to represent the .start and end of each range data type to identify the range.

So the final process of a process accessing physical memory is as follows:

The address space and page table (user-level) will have a copy for each process. 

So what about multiple processes , what if there is a conflict in the location mapped to physical memory ?

As long as you ensure that the page table of each process maps different areas of physical memory, you can ensure that the processes will not interfere with each other and ensure the independence of the process!

Let's talk about strange phenomena

Looking back now, look at the reasons why the parent and child processes access the global variable g_val at the same time at the beginning, resulting in different values:

Here is a picture to explain.

When the child process does not modify g_val at the beginning, both the parent process and the child process first pass through the virtual address space, and then pass through the page table, and at the same time access the g_val of the parent process in the physical memory

When the child process wants to modify g_val , when the child process passes through its own process address space and then maps through the page table, it will open up a new space in the physical space at this time , and then save the modified data of the child process, and then When remapping, it is mapped to the newly opened space. This operation is called copy-on-write.

Why can a variable hold two different values ​​at the same time in fork()

At the same time, it can also answer the question left over from the previous chapter: how can a variable hold two different values ​​​​at the same time.

We know that the reason why fork() has two return values ​​is that it was returned twice.

The essence of return is to write the id.

At this time, copy-on-write occurs when writing, so the parent-child process actually has its own variable space in the physical memory, but it is marked with the same variable (virtual address) in the user layer !

Expansion: When the program is not compiled and loaded into memory, is there an address inside the program?

Let me talk about the answer first: there is already an address. When the executable program is compiled, there is already an address inside.

Don't just understand the address space as something that needs to be obeyed inside the OS, the compiler should also obey it! That is, when the compiler compiles the code, it has already formed various areas for us, code area, data area, etc., and, in the same way as in the Linux kernel, each variable and each line of code are compiled. site. Therefore, when the program is compiled, each field already has a virtual address!

We can understand by drawing a picture.

First of all, in the disk, the code we write already has a corresponding virtual address when we write it. For example, the first line of code is 0x1, the second line is 0x10, and the third line is 0x100. But for the execution logic, the first line The virtual address of the second line must be saved in one line of code, and the virtual address of the third line must also be saved in the second line of code. as follows:

Then when the program is running, it will be loaded into the physical memory. At this time, each line of code has the virtual address of the next line of code, and each line of code itself has an address in the physical address. For example, 0xA, 0xAA, 0xAAA.

Then at this time, the process can find the corresponding physical address through the virtual address and then access the page table and then access it.

But the question is: Where did the initial data of the address space and page table come from? 

After the compilation is complete, the compiler will automatically fill the addresses of these codes into the virtual address space , and these code addresses are given to you by the compiler. These virtual addresses will be filled to the left of the page table , when loaded into memory , and each code has its own physical address. At this time, the physical address is filled to the right side of the page table. In this way, each process can find the corresponding physical address through the virtual address in the process address space and the mapping relationship. code and execute it.

 

At this time, the CPU starts to execute. The CPU has the first instruction and gets 0x1, so according to the code area in the address space, after page table mapping, the instruction 0x1 is found in the physical memory, and then read to the CPU. At this time The inside of the instruction read is virtual address 0x10 ! After the CPU executes the 0x1 instruction, it passes through the virtual address space according to 0x10, and then finds the 0x10 instruction through the page table and reads it back to the CPU. Repeat this, and the CPU can execute all the instructions.,

That's the whole process. 

This is the end of the explanation of the process address space. Its essence is actually a virtual space!

Then it is mapped to the physical address through the page table.

If you have any questions or don’t understand, please feel free to comment or private message~

Guess you like

Origin blog.csdn.net/weixin_47257473/article/details/131735218