Process address space under Linux (required for beginners)

content

1. Program address space

2. Process address space


1. Program address space

First, let's review the program address space in c/c++ through a picture:

 Here is a brief introduction to these areas:

1. Heap area:

The heap data area is the heap area. In the C program, the allocation and recovery of this area are performed by mallocand free. As the area allocation proceeds, the area continues to extend from the lower address to the higher address.

2. Stack area:

The stack area, when the program is running, the stack generated by the function call is stored in this area. The starting address of this area is fixed (next to the kernel memory area), and as the stack is generated when calling functions, this area continues to extend from high addresses to low addresses.

3. Code and data area:

The content of the executable program file is loaded into this area, which is divided into two parts. The low address part contains the program code and read-only data, which is the read-only part; the other area stores the readable and writable data of the executable file, which is the read-only part. read and write area.

4. Kernel memory area:

The kernel virtual memory area, in a block of address space at the highest address of the virtual address space. User processes cannot access this area.

The title verifies the address space segment of the above figure through a simple representation:

#include<stdio.h>                                                                                                                                           
  2 #include<iostream>
  3 using namespace std;
  4   int g_unval;
  5  int g_val=0;
  6  int main()
  7  {
  8    printf("code addr:%p\n",main);//打印代码区的地址
  9    const char*str="ddddddd";
 10    printf("string rdonly addr %p\n",str);//打印字符常量区的地址
 11    //打印未初始化全局数据区的地址
 12    printf("uninit addr: %p\n",&g_unval);
 13    //打印已初始化全局数据区
 14    printf("init addr: %p\n",&g_val);
 15   //打印堆区的地址 
 16      int *p1=new int (1);
 17      int *p2=new int(2);
 18      int *p3=new int(3);
 19     printf("heap addr: %p \n",p1);
 20     printf("heap addr: %p \n",p2);
 21     printf("heap addr: %p \n",p3);
 22 
 23 
 24    //打印栈区的地址
 25    int a1=10;
 26    int a2=20;
 27    int a3=30;
 28    printf("stack addr : %p\n",&a1);
 29    printf("stack addr : %p\n",&a2);
 30    printf("stack addr : %p\n",&a3);
 31 
 32    return 0;
 33 }
~

Allowed results:

 I found that the branch is indeed like this.

Let's do an experiment:

#include<unistd.h>
  2 #include<stdio.h>
  3 #include<sys/types.h>
  4 int g_val=10;
  5 int main()
  6 {
  7   pid_t id=fork();
  8   if(id<0)
  9   {
 10     printf("fork fail");
 11   }
 12   else if(id==0)
 13   {
 14    while(1)
 15    {
 16     printf(" I am a child pid:%d  g_val:%d &g_val:%p\n",getpid(),g_val,&g_val);
 17     sleep(2);
 18    }
 19 
 20   }
 21   else
 22   {
 23     while(1)
 24     {
 25       printf("I am a parent pid:%d  g_val :%d &g_val:%p\n",getpid(),g_val,&g_val);                                                                          
 26       sleep(2);
 27     }
 28 
 29   }
 30 
 31   return 0;
 32 }
 33 

Let's run this code:

 The g_val and &g_val printed by the child process and the parent process are the same. There is no problem. This is the same as what we said before. The code and data of the parent and child processes are shared (on the premise of no modification). There is no problem. Let's look at a piece of code. :

#include<unistd.h>
  2 #include<stdio.h>
  3 #include<sys/types.h>
  4 int g_val=100;                                                                                                                                              
  5 int main()
  6 {
  7   pid_t id=fork();
  8   if(id<0)
  9   {
 10     printf("fork fail");
 11   }
 12   else if(id==0)
 13   {
 14     int cnt=0;
 15    while(1)
 16    {
 17      cnt++;
 18      if(cnt==3)
 19      {
 20        g_val=10;
 21        printf("子进程修改g_val\n");
 22      }
 23     printf(" I am a child pid:%d  g_val:%d &g_val:%p\n",getpid(),g_val,&g_val);
 24     sleep(2);
 25    }
 26 
 27   }
 28   else
 29   {
 30     while(1)
 31     {
 32       printf("I am a parent pid:%d  g_val :%d &g_val:%p\n",getpid(),g_val,&g_val);
 33       sleep(2);
 34     }
 35 
 36   }
 37 
 38   return 0;
 39 }
 40 
                                                                              
                                              

Let's run this program:

 We found that after the child process modifies the value of g_val, copy-on-write occurs because the processes are independent. We can understand that the values ​​printed by the child process and the parent process are different, but we are surprised to find that their addresses are the same. If this is a physical address, how can it be possible to take the contents from the same space and not the same. So we can get:

This is definitely not a physical address. Under the Linux address, this kind of address is called a virtual address. The addresses we see in the C/C++ language are all virtual addresses! The physical address is invisible to the user and managed by the OS.

2. Process address space

So we said before that 'the address space of the program' is not accurate, it should be the process address space. The process address space is a structure created by the OS. The title is called mm_struct. The virtual address space is divided. Its virtual address is from (0x 00 00 00 00 to oxff ff ff ff) and is managed by the same other process control block PCB. Its own process address space, that is, each process thinks that it is an exclusive resource and thinks that it has 4GB of space (under 32-bit platforms). So the process address space is actually a virtual address space. The virtual address space can also be considered as the virtual address of the corresponding linear position when the area is divided on the address space!

Virtual address space Each process has a virtual address space, and the OS maps the virtual address to the physical address through a certain mapping relationship, thereby corresponding to the real physical address.

 And this conversion work is the legendary page table. We can explain the above phenomenon below. The creation of the child process takes the parent process as a template, so the parent and child processes have virtual address spaces, and the content is basically the same (part of the data is different), and the mapping relationship of the page table is child The process is the same as the parent process. When the child process modifies the data, the OS will interrupt the child process to open up a new space to copy the data, and then let the child process modify the newly opened space. Although the physical address has changed, the virtual address has not changed, which is to change the mapping relationship between the virtual address and the physical address in the child process. So the virtual addresses are the same when the addresses we saw before are the same in nature.

 Now we can understand how the parent and child processes are independent. The parent-child process code and data are shared, but as long as one party tries to write to it, it will also copy-on-write, modifying the relationship between the mapping in the page table and the physical memory, so that the parent-child process has its own data, achieving Independence

Why is it designed this way?

Reason one:

With the virtual address space, the process of accessing physical memory cannot directly access physical memory, and an intermediate layer (page table) is added, which is more conducive to the operation of memory management. In this way, each process must access the corresponding physical memory through the virtual address space and page table. The operating system can intervene when converting the virtual address to a physical address to determine whether the conversion is a legal physical address, so as to protect the physical memory.

Reason two:

The concepts of memory application and memory usage are clearly divided in time, and a series of operations of the underlying application are shielded through the virtual address space, so that the OS can better manage memory and processes.

Reason three:

With a virtual address space, each process considers its own exclusive memory resource and views memory in the same way. This greatly improves the management efficiency of the OS.

For example: when the CPU executes the code, it first needs to find the starting position of the program, that is, the starting address. With the virtual address space, it only needs to find the fixed virtual address. The process address space of different processes has different mapping relationships, so This fixed virtual address is mapped to different physical addresses in different processes, and the process-related code and data are searched, so that the CPU can quickly find the starting position of the program.

Reason four:

The virtual address space can make addresses contiguous and reduce the probability of abnormal access and out-of-bounds.

Re-understand process and process creation:

1. What is a process? A process is a program loaded into memory, including code, data, and a data structure created by the OS for it (PCB (task_struct) + mm_struct (process address space) + page table. And we can find the corresponding mm_struct through the PCB.

Guess you like

Origin blog.csdn.net/qq_56999918/article/details/123938638