Program execution (6/11)

There are two types of program execution: one is based on the operating system environment, and the other is to execute bare metal programs in an environment without an operating system. In the Linux environment, the executable file is in ELF format (in addition to the basic code segment, data segment, file header, symbol table and other information used to assist the program running), the program executed in the bare metal environment is generally BIN/HEX format, they are pure instruction files.

Although the running environments of the two programs are different and the file formats are also different, the principles are the same: the instructions must be loaded into the specified location in the memory, and this specified location is related to the link address when executing the file link.

Program execution under operating system environment

When a computer system with an operating system executes an application, it will first run a program called a loader. The loader will load the executable file from the ROM into the memory based on the software's installation path information, and then perform some operations. Operations related to initialization and dynamic library relocation, and finally jump to the entry point of the program to run. When running an application in Linux command line mode, a Shell terminal program like sh or bash acts as a loader: it loads the program into memory, encapsulates it into a process, and participates in the scheduling and running of the operating system.

An executable file can be composed of different sections, including code segments, data segments, BSS segments, etc. When the loader is running, the loader will load these code segments and data segments into different locations in the memory. The file header of the executable file provides basic information such as file type, running platform, and program entry address. Before loading the program, the loader will first make some judgments based on the file header information. If it is found that the running platform of the program does not match the current environment , an error will be reported.

In addition, there is a segment in the executable file called the segment header table. The segment header table records relevant information on how to load the executable file into the memory, including the segments in the executable file to be loaded into the memory. , entrance address and other information. In an executable file, the loader needs to load the program into memory and relies on the information provided by the segment header table, so the segment header table is necessary.

jiaming@jiaming-pc:~/Documents/CSDN_Project$ readelf -l a.out 

Elf file type is EXEC (Executable file)
Entry point 0x1030c
There are 9 program headers, starting at offset 52

Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  EXIDX          0x000718 0x00010718 0x00010718 0x00008 0x00008 R   0x4
  PHDR           0x000034 0x00010034 0x00010034 0x00120 0x00120 R   0x4
  INTERP         0x000154 0x00010154 0x00010154 0x00013 0x00013 R   0x1
      [Requesting program interpreter: /lib/ld-linux.so.3]
  LOAD           0x000000 0x00010000 0x00010000 0x00724 0x00724 R E 0x10000
  LOAD           0x000f10 0x00020f10 0x00020f10 0x00118 0x0011c RW  0x10000
  DYNAMIC        0x000f18 0x00020f18 0x00020f18 0x000e8 0x000e8 RW  0x4
  NOTE           0x000168 0x00010168 0x00010168 0x00044 0x00044 R   0x4
  GNU_STACK      0x000000 0x00000000 0x00000000 0x00000 0x00000 RW  0x10
  GNU_RELRO      0x000f10 0x00020f10 0x00020f10 0x000f0 0x000f0 R   0x1

 Section to Segment mapping:
  Segment Sections...
   00     .ARM.exidx 
   01     
   02     .interp 
   03     .interp .note.gnu.build-id .note.ABI-tag .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rel.dyn .rel.plt .init .plt .text .fini .rodata .ARM.exidx .eh_frame 
   04     .init_array .fini_array .dynamic .got .data .bss 
   05     .dynamic 
   06     .note.gnu.build-id .note.ABI-tag 
   07     
   08     .init_array .fini_array .dynamic 

Programs running in the Linux environment are generally encapsulated into processes and participate in the unified scheduling and operation of the operating system. To run a program in a Shell environment, the Shell terminal program usually forks a child process to create an independent virtual process address space, and then calls the execve function to load the program to be run into the process space: through the file header of the executable file, find The entry address of the program establishes the mapping relationship between the process virtual address space and the executable file. Set the PC pointer to the entry address of the executable file to start running. The corresponding relationship between a C program, the compiled executable file, and the process when the executable file is run is as follows:

Insert image description here
Different compilers have different link starting addresses. In the Linux environment, GCC generally starts storing code segments at 0x08040000 as the starting address when linking, while the ARM GCC cross-compiler generally uses 0x10000 as the linking starting address. Next to the code segment, the data segment is stored starting at a 4KB boundary-aligned address. Next to the data segment is the BSS segment. The first 4KB address alignment after the BSS segment is the heap space we apply for using malloc() / free() in the program.

For each running process, the Linux kernel uses a task_struct structure to represent it, and multiple structures form a linked list through pointers. The operating system can manage, schedule and run these processes based on this linked list. The code segments and data segments of different processes are stored in different physical pages of physical memory. The processes are independent of each other. Through context switching, they take turns occupying the CPU to execute their own instructions. When multiple processes run concurrently in the Linux environment, the corresponding relationship between C source programs, executable files, processes and physical memory is as follows:

Insert image description here

Program running in bare metal environment

On a bare metal platform, after the system is powered on, there is no environment for the program to run. It is necessary to use third-party tools to load the program into the memory before it can run normally.

Many integrated development environments, such as ADS1.2, Keil, RVDS and other IDEs, not only provide program editing and compilation functions, but also support program running, debugging, and programming. Taking the ADS1.2 integrated development environment as an example, you can communicate with the development board through the JTAG interface, and download the ARM executable file in BIN/HEX format compiled on the PC to the memory of the development board for running. It can be set according to the actual RAM physical address of the development board through the Debug Setting setting option provided by the ADS1.2 integrated development environment when compiling the program.

Insert image description here

In an embedded Linux system, the running of the Linux kernel image is actually the running of the program in the bare metal environment. The Linux kernel image is usually loaded from the Flash storage partition into the memory and run using the U-boot loading tool. u-boot plays the role of a loader in the Linux startup process.

Program entry main() function analysis

After the loader loads the instructions into the memory, it then starts running the program. The main() function is the entry function of all programs in the usual sense, but the default program entry is the _start symbol, not main. The latter is just a convention. .

Before the main function runs, many initialization operations have been done: they mainly complete some initialization work before running the main function, such as initializing the stack pointer, initializing the data segment content, among the initialized global variables, all int types are initialized to 0, Boolean Type variables are initialized to False and pointer types are initialized to NULL. After completing the initialization environment, this part of the code will also pass the parameters passed in by the user to main, and finally jump into the main function to run.

This part of the initialization code is automatically added to the executable file by the compiler during the program compilation phase. This part of the code belongs to the code in the C Runtime Library (C Running Time, CRT). When the compiler manufacturer develops the compiler, in addition to implementing the standard functions such as printf, fopen, and fread specified in the C language standard, it will also implement this part of initialization The code completes a series of initialization operations before entering the main function.

  • The basic stack environment and process environment for C language operation.
  • Loading, releasing, initializing, cleaning, etc. of dynamic libraries.
  • Pass parameters argc and argv to the main function and call the main function for execution.
  • After the main function exits, call the exit function to end the process.

In the lib directory under the ARM cross-compiler installation path, you will see crt1.othe object file. This file is compiled and generated by the assembly initialization code and is part of the CRT. During the linking process, the linker will assemble the crt1.o object file and the goals in the project to generate the final executable file.

BSS segment

For uninitialized global variables and static local variables, the compiler places them in the BSS section. The BSS segment does not occupy the storage space of the executable file. The purpose of setting the BSS segment is to reduce the size of the executable file and save disk space.

Although the BSS segment does not occupy storage space in the executable file, when the program is loaded into memory and run, the loader will allocate a storage space in the memory for the BSS segment. The size of the BSS segment is recorded in the segment table, and the address and size of each variable is recorded in the symbol table. Based on this information, the loader will allocate a memory space of a specified size behind the data segment and clear it, and allocate storage space to each uninitialized global variable and static variable in this memory according to the address of each variable in the symbol table.

Guess you like

Origin blog.csdn.net/weixin_39541632/article/details/132228915