LINUX kernel research ---- in-depth program loading and execution process

Program loading and execution process

From the operating system's point of view, the most critical feature of a program is having an independent virtual address space.

Process creation:

When the program starts, the operating system will create a new process to execute the program, which is mainly divided into three steps:

1. Create an independent virtual address space. The mapping relationship in this step is the mapping relationship from virtual space to physical memory.

Creating a virtual address space is not actually creating a space, but a data structure needed to create an address map in the kernel. The data structure that the linux kernel abstractly describes the virtual address space is the mm_struct data structure used for virtual memory management, and the mm_struct data structure is stored in the PCB, so first allocate two physical pages in the kernel, and create them separately. The size of the PCB and the kernel stack space for the process is 8K in total. After the LINUX2.6 kernel, the PCB is placed in the cache, and the place where the PCB was previously stored is used to store the members in the thread_info structure that point to the PCB information in the cache.

At the same time, a page directory table is allocated, and the mapping relationship of the pages is set by the page fault exception handler after a page fault occurs in the subsequent program.

Kernel code before 2.4


The 2.6 kernel code is:


            The above process will be completed by the fork() system call function:

2. Read the executable file header, and establish the mapping relationship between each segment of the executable file and each segment in the virtual address space.

What this step does is to establish the mapping relationship between the executable file and the virtual address space. This mapping relationship is just a data structure stored in the PCB in the kernel.

There are two LOAD sections in the ELF file structure, these two LOAD sections are similar section merges with various read and write permissions in the ELF. The specific process of this step is to map the two LOAD segments to the code segment and the data segment in the virtual address space in units of pages.

 This process of mapping from the ELF file to the virtual address space uses the mmap system call and the execve() system call function.

When the program runs, the contents of the executable file need to be loaded into the memory for execution. For example, when the process accesses a global variable, the global variable has not been loaded into the memory. When the program encounters a page fault, the operating system will The "missing page" corresponding to the global variable is read from the disk to the real physical memory, and then the mapping relationship between the virtual page and the physical page is set.

From virtual address space to physical address mapping through multi-level page table mapping .

When the operating system captures a page fault, it needs to know where the currently missing page is in the executable file, which is the mapping relationship between the virtual space and the executable file.

Since the executable file is actually a mapped virtual address space when it is loaded, the executable file is also often called an image file.

 

3. Set the CPU instruction register as the entry of the current executable file (this entry address is stored in the ELF file), and then start the operation. During execution, page fault interrupts will occur continuously. When a page fault interrupt occurs, the contents of the actual executable file will be loaded into physical memory, and then the mapping relationship between virtual memory pages and physical memory pages will be established.

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326660142&siteId=291194637