Principles of computer composition | In-depth understanding of ELF format and static linking

In-depth analysis of the process of C language code to machine code

第二阶段
Loader
Cache
CPU
第一阶段
Compile
Assemble
Link
Load
读取指令和数据
CPU
内存
装载器
链接
汇编
编译
C代码
可执行文件
Figure: From C code to machine code execution process

Broadly speaking, it can be divided into two stages:

  1. The first stage: It consists of three stages: Compile, Assemble and Link, and generates an executable program (Executable Program).
  2. The second stage: Load the executable file into the memory through the loader, and then the CPU reads instructions and data from the memory to start actually executing the program.

Phase 1: Compile, Assemble and Link

  1. Compile: At this stage, a C language compiler (such as GCC) is used to compile the C source code file (.c file) into an assembly code file (.s file). The compiler performs lexical analysis, syntax analysis, and semantic analysis on the C code, and then generates intermediate code to represent the logical structure of the program.
  2. Assembly (Assemble): At this stage, an assembler (such as GNU assembler) is used to convert the assembly code file (.s file) into a machine code instruction file (.o file). An assembler translates each instruction in assembly code into a corresponding machine code instruction.
  3. Link: At this stage, a linker (such as the GNU linker) is used to link multiple machine code instruction files (.o files) and required library files together to generate the final executable file (Executable Program) . The linker resolves references to functions and global variables and associates their definitions with the corresponding references to create the executable file.

Phase Two: Load and Execute

  1. Load: In this phase, the operating system's loader is responsible for loading the executable file into the appropriate location in memory. The loader allocates memory space and copies the instructions, data, and other resources of the executable file to the corresponding memory address.
  2. Execution: Once the executable file is successfully loaded into the memory, the CPU reads the instructions and data from the memory and starts executing the program in the order of the instructions. The CPU will perform arithmetic operations, logical judgments, memory access and other operations according to the instructions, and finally realize the functions of the program.

In-depth understanding of the ELF format: an important role in the Linux system

What is ELF?

  • ELF (Executable and Linkable Format, executable and linkable format)

  • In Linux systems, use ELF to store and organize data

ELF file structure

ELF main file structure:

  1. .text Section: Code section or instruction section (Code Section), used to save the code and instructions of the program;
  2. .data Section: Data Section (Data Section), used to save the initialization data information set in the program;
  3. .rel.text Secion,: Relocation Table (Relocation Table). In the relocation table, what is kept is in the current file, which jump addresses are actually unknown to us.
  4. .symtab Section:Symbol Table. The symbol table keeps what we call an address book of function names and corresponding addresses defined in the current file.

img

Figure: Key structure of ELF file

The key role of ELF format in the compilation process

  1. Compile phase (Compile): The object file generated by the compiler usually uses the ELF format to store the compiled code and data.
  2. Assembly stage (Assemble): The ELF format is used at this stage to store assembled machine instructions and data.
  3. Link stage (Link): The link stage is the main application area of ​​the ELF format. During the linking phase, the linker reads multiple object files and library files, performs symbol resolution and relocation based on symbol reference relationships, and finally generates an executable file. The ELF format provides structures such as segment tables, symbol tables, and relocation tables to describe the relationship between various parts of the file and symbols, allowing the linker to accurately handle symbol reference and relocation operations.
  4. Loading phase (Load): The ELF format helps the operating system (Operation System) understand the layout and relocation requirements of the executable file during this phase.

ELF running example

C code

The following two files add_lib.cand link_example.cwork together to implement an addition function.

// add_lib.c
int add(int a, int b)
{
    
    
    return a+b;
}
// link_example.c

#include <stdio.h>
int main()
{
    
    
    int a = 10;
    int b = 5;
    int c = add(a, b);
    printf("c = %d\n", c);
}

compilation

The following is the object file (Object File) generated by add_lib.cand : and .link_example.cadd_lib.olink_example .o

Compile with gcc:

$ gcc -g -c add_lib.c link_example.c
$ objdump -d -M intel -S add_lib.o
$ objdump -d -M intel -S link_example.o

The assembly code we get after compilation:

# add_lib函数的汇编代码

add_lib.o:     file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <add>:
   0:   55                      push   rbp
   1:   48 89 e5                mov    rbp,rsp
   4:   89 7d fc                mov    DWORD PTR [rbp-0x4],edi
   7:   89 75 f8                mov    DWORD PTR [rbp-0x8],esi
   a:   8b 55 fc                mov    edx,DWORD PTR [rbp-0x4]
   d:   8b 45 f8                mov    eax,DWORD PTR [rbp-0x8]
  10:   01 d0                   add    eax,edx
  12:   5d                      pop    rbp
  13:   c3                      ret    
# link_example函数的汇编代码

link_example.o:     file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <main>:
   0:   55                      push   rbp
   1:   48 89 e5                mov    rbp,rsp
   4:   48 83 ec 10             sub    rsp,0x10
   8:   c7 45 fc 0a 00 00 00    mov    DWORD PTR [rbp-0x4],0xa
   f:   c7 45 f8 05 00 00 00    mov    DWORD PTR [rbp-0x8],0x5
  16:   8b 55 f8                mov    edx,DWORD PTR [rbp-0x8]
  19:   8b 45 fc                mov    eax,DWORD PTR [rbp-0x4]
  1c:   89 d6                   mov    esi,edx
  1e:   89 c7                   mov    edi,eax
  20:   b8 00 00 00 00          mov    eax,0x0
  25:   e8 00 00 00 00          call   2a <main+0x2a>
  2a:   89 45 f4                mov    DWORD PTR [rbp-0xc],eax
  2d:   8b 45 f4                mov    eax,DWORD PTR [rbp-0xc]
  30:   89 c6                   mov    esi,eax
  32:   48 8d 3d 00 00 00 00    lea    rdi,[rip+0x0]        # 39 <main+0x39>
  39:   b8 00 00 00 00          mov    eax,0x0
  3e:   e8 00 00 00 00          call   43 <main+0x43>
  43:   b8 00 00 00 00          mov    eax,0x0
  48:   c9                      leave  
  49:   c3                      ret    

Link

gcc -c add_lib.s
gcc -c link_example.s

executable code

gcc -o executable add_lib.o link_example.o
$ ./executable
c = 15 # 运行结果为15
  • Note: The jump address maincalled in the function addis no longer the address of the next instruction, but addthe entry address of the function

link_example:     file format elf64-x86-64
Disassembly of section .init:
...
Disassembly of section .plt:
...
Disassembly of section .plt.got:
...
Disassembly of section .text:
...

 6b0:   55                      push   rbp
 6b1:   48 89 e5                mov    rbp,rsp
 6b4:   89 7d fc                mov    DWORD PTR [rbp-0x4],edi
 6b7:   89 75 f8                mov    DWORD PTR [rbp-0x8],esi
 6ba:   8b 55 fc                mov    edx,DWORD PTR [rbp-0x4]
 6bd:   8b 45 f8                mov    eax,DWORD PTR [rbp-0x8]
 6c0:   01 d0                   add    eax,edx
 6c2:   5d                      pop    rbp
 6c3:   c3                      ret    
00000000000006c4 <main>:
 6c4:   55                      push   rbp
 6c5:   48 89 e5                mov    rbp,rsp
 6c8:   48 83 ec 10             sub    rsp,0x10
 6cc:   c7 45 fc 0a 00 00 00    mov    DWORD PTR [rbp-0x4],0xa
 6d3:   c7 45 f8 05 00 00 00    mov    DWORD PTR [rbp-0x8],0x5
 6da:   8b 55 f8                mov    edx,DWORD PTR [rbp-0x8]
 6dd:   8b 45 fc                mov    eax,DWORD PTR [rbp-0x4]
 6e0:   89 d6                   mov    esi,edx
 6e2:   89 c7                   mov    edi,eax
 6e4:   b8 00 00 00 00          mov    eax,0x0
 6e9:   e8 c2 ff ff ff          call   6b0 <add>  # 直接在main函数中调用add函数的入口地址
 6ee:   89 45 f4                mov    DWORD PTR [rbp-0xc],eax
 6f1:   8b 45 f4                mov    eax,DWORD PTR [rbp-0xc]
 6f4:   89 c6                   mov    esi,eax
 6f6:   48 8d 3d 97 00 00 00    lea    rdi,[rip+0x97]        
 6fd:   b8 00 00 00 00          mov    eax,0x0
 702:   e8 59 fe ff ff          call   560 <printf@plt>
 707:   b8 00 00 00 00          mov    eax,0x0
 70c:   c9                      leave  
 70d:   c3                      ret    
 70e:   66 90                   xchg   ax,ax
...
Disassembly of section .fini:
...

The linker scans all input object files and then collects the information in all symbol tables to form a global symbol table. Then according to the relocation table, all codes whose jump addresses are uncertain are corrected according to the addresses stored in the symbol table. Finally, the corresponding sections of all target files are merged into the final executable code.

img

Figure: Schematic diagram of the executable file generation process

Windows OS: PE

  • The executable file format of Windows is called PE (Portable Executable Format).
  • The loader under Linux can only parse the ELF format and not the PE format.

How to make formats compatible under Windows system and Linux system?

  • Wine, a well-known open source project under Linux, supports a loader compatible with PE format, allowing us to run Windows programs directly under Linux
  • Windows also provides WSL, which is Windows Subsystem for Linux, which can parse and load files in ELF format
  • Although various tools exist to achieve executable file format compatibility, the program also relies on dynamic link libraries, system calls, etc. provided by various operating systems themselves, and still needs to be adapted and tested for specific platforms. In other words, format compatibility is only the first step.

references

Guess you like

Origin blog.csdn.net/YuvalNoah/article/details/131183719