Linux kernel-from executable file to process startup

This article reference: https://www.jianshu.com/p/84d96a6385b0
https://zhuanlan.zhihu.com/p/148426554?from_voters_page=true
https://www.cnblogs.com/qscfyuk/p/11697816.html
https://www.cnblogs.com/icecri/p/4438351.html

1. Concept

1. Executable file The
executable file belongs to the elf file, and the process loader is to load the executable file. The executable file contains many sections, such as the readable and executable section represented by the code section; the data section and the BSS section. The authority represented by the segment is readable and writable; the authority represented by the read-only data segment is the read-only segment, and the program information loaded when the process is created is these parts. In Linux, sections are stored in units of "pages" (physical memory is also paged, they correspond one to one, and one page is generally 4096 bytes, which is 4k). If there is not one page, one page is also occupied. The
ELF executable file is loaded to the memory location 0x8048000 by default, and starts loading from this location. The header information of the ELF executable file is loaded before, but due to different file sizes, the actual entry of the program is: 0x8048x00, the legend is 0x8048300, which means that this location is the actual entry address of the program, that is, the executable file has just been loaded. Process (the entry point to start execution after a process loads a new executable file), is from this place to start execution

For details about elf files, please refer to: https://www.cnblogs.com/qscfyuk/p/11697816.html

2. Process virtual address space.
Each process has its own virtual address space of 4G (32-bit system). In fact, not every process has such a large space on the disk or memory, but the addressable space of the process is 4G (For 32-bit systems, the addressing range is 0x0000 0000~0xFFFF FFFF)
Virtual memory is a technology for computer system memory management. It makes the application think that it has continuously available memory (a continuous and complete address space), but in fact, it is usually divided into multiple physical memory fragments, and some are temporarily stored on external disk storage, when needed Perform data exchange. Compared with systems that do not use virtual memory technology, systems that use this technology make it easier to write large programs and use real physical memory more efficiently. When the processor reads or writes to a memory location, it uses a virtual address. During a read or write operation, the processor translates the virtual address into a physical address.
Advantages:
Programmers don’t need to worry about how to store data or programs.
Programs can use a series of continuous virtual addresses to access large discontinuous memory areas in physical memory. Users see continuous addresses without worrying about lower-level physical addresses. Arrangement.
By using virtual memory, the program can use more space than the actual available physical memory. When the physical memory is not enough, the operating system will save the physical memory page in the disk file, and the data page or code page will move between the physical memory and the disk as needed. .
The virtual addresses used by different processes are isolated from each other, and users do not need to worry about affecting the data in the memory addresses of other programs. The memory management module of the operating system maps the virtual addresses to physical addresses.
3. A process mainly occupies the following parts in memory, Are the code segment, data segment, BSS, stack, heap, and other parameters. Among them, the content of code, data, and BSS is the corresponding content in the executable file. The loader does not fill their content from the executable program into the memory, but updates their information (base address, length, etc.) In the process control block (task_struct), when the CPU actually addresses the execution for the first time, it will cause a page fault interrupt, and the operating system will copy the actual content from the executable file to the physical memory.
The content of the heap is dynamically allocated during program execution, so the loader only updates its starting address to the process control block, and when it encounters a dynamic memory allocation operation during execution, it allocates actual pages in physical memory. The parameter area should be stored in the environment variable and command line parameter list when the new process is loaded. The contents stored in the stack when the program is loaded are the pointers to the environment parameter list and the command line parameter list and the number of command line parameters.

Reference: https://blog.csdn.net/weixin_29058331/article/details/113368011?utm_medium=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-4.control&dist_request_id=1328626.10471.16153736270096707&depth_1-utm_source=distribuant task-blog-BlogCommendFromMachineLearnPai2-4.control

4. The page table
can refer to "Essence of Operating System and Design Principles"

Two, process creation

Insert picture description here

1. Create task_struct process descriptor
2. Apply for mm_struct memory descriptor. The
memory descriptor is the process address space, which is what we call 4G virtual memory. It can be seen from the name that this structure is used to describe memory, not It really opened up 4G of memory space for him.
3. Load the executable program The
executable program is divided into code segment, data segment and other parts. When the system is loaded, these segments of information are respectively given (not to copy all the actual segment content of the executable file, but only the information of this segment. Mark the location of the segment to create a vm_area_struct structure (virtual memory area), the vm_start/vm_end in the structure respectively point to which segment of the virtual memory area needs to be mapped to, as shown in the figure above, each vm_area_struct points to the virtual memory on the right An interval of the area.
The ELF executable file is loaded to the virtual memory location 0x8048000 by default, and starts loading from this location. The header information of the ELF executable file is loaded before, but due to different file sizes, the actual entry of the program is: 0x8048x00
This step is equivalent to the established mapping relationship between virtual memory and executable files

Three, process execution

After step two, the process has loaded the executable program information into its own structure, so how does the program load the data to be executed into the physical memory when the program is running?
1. The page table, page frame
virtual memory and physical memory are all based on paging, that is, they are divided into 4k areas. These areas on the virtual memory are called page tables, and the physical memory is called page frames; The specific principle is not detailed
2. When the running process
program is scheduled, the system will allocate some page frames to the process and load the corresponding executable file into it. The process starts to execute. When the program executes to a certain address, the page frame is found ( The design here calculates the content of the actual physical address based on the virtual address) and there is no data to be executed (that is, part of the program in the current page frame has been executed), and the next part of the executable program needs to be loaded. When the page fault interrupt is triggered, the system will load the next executable program into the page box to continue executing
a process. The memory mainly occupies the following parts, which are code segment, data segment, BSS, stack, Heap, and other parameters. Among them, the content of code, data, and BSS is the corresponding content in the executable file. The loader does not fill their content from the executable program into the memory, but updates their information (base address, length, etc.) In the process control block (task_struct) , when the CPU actually addresses the execution for the first time, it will cause a page fault interrupt, and the operating system will copy the actual content from the executable file to the
heap content in the physical memory. It is dynamically allocated during program execution, so the loader only updates its starting address to the process control block, and allocates actual pages in physical memory when it encounters an operation of dynamically allocating memory during execution . The parameter area should be stored in the environment variable and command line parameter list when the new process is loaded. The contents stored in the stack when the program is loaded are the pointers to the environment parameter list and the command line parameter list and the number of command line parameters.
Page fault interrupt:
Insert picture description here
Insert picture description here

Cite other articles

1) Enter ./executable file name in the shell interface.
After shell analysis, if this parameter is not a built-in command of the shell, it is considered to load an executable file. So call the fork function to start creating a new process, generate a 0x80 interrupt, map it to the function sys_fork(), call the find_empty_process() function, and apply for an available process number for the new process.
2) Find storage space
for the management structure of the executable program In order to protect the process, the system specially designs a structure for the management of each process, namely task_struct. The kernel obtains the page for saving task_struct and kernel stack by calling the get_free_page function and can only be in the linear address space of the kernel.
3) The shell process copies the task_struct structure
for the new process. After the task_struct is copied for the executable program, the new process inherits all the management information of the shell. However, because the information in the task_struct structure of each process is different, the structure must be personalized (to prevent being switched to the process during the setting process, it should be set to an uninterruptible state). Personalized settings mainly include process number, parent process, time slice, TSS segment (designed for switching between processes, the process of switching is based on the protection of the process, and TSS is used to save or restore when the process is switched. The value of the register used in the scene of the process). These are all done through the function copy_process.
4) Copy the new process page table and set its corresponding page directory entry.
Now call the function copy_mem as the process segment (LDT), update the base address of the code segment and data segment, that is, determine the linear address space (the key is to determine the segment base address and Length limit). Then there is paging, which is based on segmentation.
5) Establish the association between the new process and the global descriptor (GDT).
Hook the TSS and LDT of the new process at the specified location of the GDT. (Note: TSS and LDT are vital to the protection of the process)
6) Set the new process to the ready state
7) Load the executable file After
entering the do_execve function, load the header table of the executable file into the memory and check the relevant information. Load the executive program (the program is loaded into the memory on demand).

Guess you like

Origin blog.csdn.net/chengcheng1024/article/details/114669992