Understanding of program address space

Program: Just a piece of code, saved in a file.

When the compiler compiles a program to generate an executable file, it sorts the address of each instruction and data.

When the program is running, the instructions and data will be placed in the designated memory. The program only occupies memory when it is running, so the program address space is also called the process address space.
Insert picture description here

The memory space is like this.
What if the running program directly accesses the physical address?

  1. It may cause the program to fail to run. When the program is compiled, the variable data will be sorted by address, but if an address is occupied, the program will not be able to run. (The compiler cannot dynamically obtain which piece of memory is used)
  2. Wild pointer problem. If the process directly accesses the physical address, the wild pointer may change the data of other processes.
  3. Memory usage is low. Program operation requires a contiguous address space, which will waste space to a certain extent.

Therefore, virtual memory is set up in the OS, which is mapped to physical memory through the virtual address space . When using C language/C++, the addresses of variables or functions are all addresses in the virtual space, and the physical memory addresses are invisible to users and managed by the OS. The OS is responsible for mapping the virtual address to the corresponding physical address .

Every time a program is run, a continuous address space will be opened up. If each program occupies a larger space and many programs run together, some programs will not run in memory. The space utilization rate of the continuously opened memory address space is very low.

After the process uses the virtual memory , each process has its own virtual address space, and there will be a continuous space to use.

Take a look at this code:

   #include <stdio.h>
   #include <stdlib.h>
   #include <unistd.h>
  
   int global_val = 200;
   
   int main()
   {
       pid_t pid = fork();//创建子进程
      if(pid < 0)
      {
          printf("fork error\n");
          return 0;
      }
      else if(pid == 0)
      {
          printf("child:%d  %p\n",global_val,&global_val);
      }
      else
      {
          printf("parent:%d   %p\n",global_val,&global_val);
      }
      return 0;                                                                                                                                                                             
   }

The output is: It is
Insert picture description here
found that the same variables and addresses are used in the child process and the parent process.

Make a small change to the code:

   #include <stdio.h>
   #include <stdlib.h>
   #include <unistd.h>
  
   int global_val = 200;
   
   int main()
   {
       pid_t pid = fork();//创建子进程
      if(pid < 0)
      {
          printf("fork error\n");
          return 0;
      }
      else if(pid == 0)
      {
          global_val = 100;
          printf("child:%d  %p\n",global_val,&global_val);
      }
      else
      {
          sleep(3);
          printf("parent:%d   %p\n",global_val,&global_val);
      }
      return 0;                                                                                                                                                                             
   }

The output is:
Insert picture description here
you can see that the variables of the child process have changed, while the variables of the parent process have not changed.

Why did the child process variables change, but the parent process variables did not change?
The child process is a copy of the parent process, and the child process copies all the information of the parent process. When the data in the child process has not changed, the child process uses all the information of the parent process.

In the first code, the variables in the child process have not changed, and the variables in the parent process have not changed. So in the first code, the addresses are equal and the variables are also equal.

In the second code, the variables in the child process have changed, but the variables in the parent process have not changed. The same virtual address is mapped to different physical addresses. So the second code has the same address but different variables.

The same here means: the child process copies all the information of the parent process, process address space, PCB...

Insert picture description here
The child process data is changed and copied.

In the second code, copy- on- write technology is involved here : fork() in Linux is implemented using copy-on-write. Copy-on-write is a technique that delays or eliminates copying. The OS does not copy the entire process address space, but the parent process of the child process shares an address. When data is written and changed, the data will be copied, so that each process has its own copy. Copying of resources is only performed when writing. Prior to this, the child process was only readable and shared , thus ensuring the code sharing and data independence of the parent and child processes .

Benefits of copy-on-write technology:

  1. Improve the efficiency of child process creation.
  2. save resources.

So why does the OS use virtual address space? Or what are the benefits of virtual address space?

  1. Improve physical memory usage.
  2. Ensure independence between processes

How are virtual addresses mapped to physical addresses?

Memory management mode in operating system:

  1. Segmented type: segment number + offset within segmentInsert picture description here

Segment table: The operating system records how many blocks the memory is divided into.
Find the corresponding physical memory starting address through the segment number, and add the offset within the segment to find the physical address.

  1. Pagination: page number + page offset , the drawing here is relatively simple. Insert picture description here
    When there is a difference here, we need to know the size of a page . Generally, the size of a page is 4K.
    In 32-bit OS, if the memory is 4G, it occupies 4* 1024* 1024* 1024/4* 1024a page number, that is, a page table entry.
    There are 2^20个page table entries/page numbers in total. Divide the memory into many small blocks.

By finding the corresponding page number, its physical address and offset within the page, the physical address of the variable can be found.

  1. Segment paging: Memory is managed in segments, and paging is used in each segment.
    First obtain the segment number and search in the segment table; in the segment table, the start address of the page table corresponding to the segment number is stored, and then find the page table through the start address of the page table in the segment.

The segment page management used by the current computer .

Virtual pages are cached in physical memory. As shown in the figure:
Insert picture description here
virtual memory can be cached in the page table: if the page hits , VP2 will be cached in the memory.
Cache miss: Page fault , VP3 will not hit, and page fault interrupt occurs. Then the OS will copy VP3 from the disk to PP3 in memory, update PTE3, and then return.
VP3: virtual memory 3.
PP3: physical memory 3
PTE3: page table entry 3. 0: interruption occurred, 1: cacheable.
Insert picture description here
After the page fault interruption: the page fault handler will choose one as the victim page and replace it with the copy of VP3 on disk.

MMU uses page tables to implement the mapping from virtual address space to physical address empty space:
Insert picture description here

So what about sacrificing pages?
Using memory replacement algorithm :

  1. OPT: The best replacement algorithm, the page that is replaced will never be used again in the future or will not be used again for the longest time. This algorithm is only a theoretical algorithm.
  2. FIFO: First in first out algorithm. Will cause the page fault rate to increase.
  3. LRU: Algorithms that have not been used for the longest time, replacing the pages that have not been used for the longest time. (Generally use this algorithm)
  4. LFU: The least commonly used algorithm, the least used in a period of time, and the possibility of using it in the future is also very low.

Guess you like

Origin blog.csdn.net/w903414/article/details/108695586