Detailed explanation of C language compilation and linking process

Detailed explanation of C language compilation and linking process

Source File

main.c
#include <stdio.h>

extern int data;
extern int add(int a,int b);

int a1;
int a2 = 0;
int a3 = 10;

static int b1;
static int b2 = 0;
static int b3 = 20;

int main()
{
    
    
	int c1;
	int c2 = 0;
	int c3 = 30;

	static int d1;
	static int d2 = 0;
	static int d3 = 40;

	c1 = data;
	c2 = add(a1,a2);

	while(1);

	return 0;
}
add.c
int data = 3;
int add(int a,int b)
{
    
    
	return a+b;
}

Two major processes: compilation and linking

1. Compilation process:


  1. Preprocessing(.i)

    • Process preprocessing instructions starting with #: #include #define #ifndef #if #else, etc.

    • Remove comments, add line numbers, generate file indexes, etc.

    Command: gcc -E main.c -o main.i, generate .i file

  2. compile (.s)

    Compile the .i file to generate a .s assembly file

    Command: gcc -S main.i generate .s file

  3. Assembly(.o)

    Translate assembly files into two-process relocatable files, i.e. .o files

    Command: gcc -c main.s generate .o file

PS: The gcc command is just a wrapper for some background programs. It calls other programs according to different parameters:

  • Precompilation and compilation are combined into one step, using the program cc1 , or you can generate a .s file through the following command

    cc1 hello.c

    Equivalent to gcc -S hello.c -o hello.s

  • assembler as

  • linkerld

Analyze binary relocatable files

main.c file

#include <stdio.h>

int a1;
int a2 = 0;
int a3 = 10;

static int b1;
static int b2 = 0;
static int b3 = 20;

int main(void)
{
    
    
	int c1;
	int c2 = 0;
	int c3 = 30;

	static int d1;
	static int d2 = 0;
	static int d3 = 40;

	return 0;
}

Compile command: Compile 32-bit .o files on a 64-bit machine

*gcc -m32 -fno-PIC -c .c

-m32 specifies compilation to generate 32-bit files; -fno-PIC removes segments that are independent of position (leaving only .text.data.bss.comment, etc.)

Insert image description here

1. Read the elf file header
$ readelf -h main.o                                                           
ELF 头:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 
  类别:                              ELF32
  数据:                              2 补码,小端序 (little endian)
  版本:                              1 (current)
  OS/ABI:                            UNIX - System V
  ABI 版本:                          0
  类型:                              REL (可重定位文件)
  系统架构:                          ARM
  版本:                              0x1
  入口点地址:               0x0
  程序头起点:          0 (bytes into file)
  Start of section headers:          268 (bytes into file)
  标志:             0x5000000, Version5 EABI
  本头的大小:       52 (字节)
  程序头大小:       0 (字节)
  Number of program headers:         0
  节头大小:         40 (字节)
  节头数量:         10
  字符串表索引节头: 7

(1) Magic number

Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00

Insert image description here

(2) REL (relocatable file)

(3) Entry point address: 0x0

(4) Start of section headers: 268 (bytes into file)

(5) Header size: 52 (bytes)

2. Get the section headers information of the elf file (for link use)
$ readelf -S main.o
There are 12 section headers, starting at offset 0x2ec:

节头:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            00000000 000000 000000 00      0   0  0
  [ 1] .text             PROGBITS        00000000 000034 000044 00  AX  0   0  1
  [ 2] .rel.text         REL             00000000 00026c 000020 08   I  9   1  4
  [ 3] .data             PROGBITS        00000000 000078 00000c 00  WA  0   0  4
  [ 4] .bss              NOBITS          00000000 000084 000014 00  WA  0   0  4
  [ 5] .comment          PROGBITS        00000000 000084 00002a 01  MS  0   0  1
  [ 6] .note.GNU-stack   PROGBITS        00000000 0000ae 000000 00      0   0  1
  [ 7] .eh_frame         PROGBITS        00000000 0000b0 00003c 00   A  0   0  4
  [ 8] .rel.eh_frame     REL             00000000 00028c 000008 08   I  9   7  4
  [ 9] .symtab           SYMTAB          00000000 0000ec 000140 10     10  14  4
  [10] .strtab           STRTAB          00000000 00022c 000040 00      0   0  1
  [11] .shstrtab         STRTAB          00000000 000294 000057 00      0   0  1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
  p (processor specific)

There are 12 segment headers, and the starting segment header offset is 0x2ec

You can see the offset and size of each segment

3. Print out the contents of the segment
~ $ objdump -s main.o

main.o:     文件格式 elf32-i386

Contents of section .text:
 0000 8d4c2404 83e4f0ff 71fc5589 e55183ec  .L$.....q.U..Q..
 0010 14c745ec 00000000 c745f01e 000000a1  ..E......E......
 0020 00000000 8945f48b 15000000 00a10000  .....E..........
 0030 000083ec 085250e8 fcffffff 83c41089  .....RP.........
 0040 45ecebfe                             E...            
Contents of section .data:
 0000 0a000000 14000000 28000000           ........(...    
Contents of section .comment:
 0000 00474343 3a202855 62756e74 7520372e  .GCC: (Ubuntu 7.
 0010 352e302d 33756275 6e747531 7e31382e  5.0-3ubuntu1~18.
 0020 30342920 372e352e 3000               04) 7.5.0.      
Contents of section .eh_frame:
 0000 14000000 00000000 017a5200 017c0801  .........zR..|..
 0010 1b0c0404 88010000 20000000 1c000000  ........ .......
 0020 00000000 44000000 00440c01 00471005  ....D....D...G..
 0030 02750043 0f03757c 06000000           .u.C..u|....
4. Read the .o file symbol table
~ $ objdump -t main.o                                                           
main.o:     文件格式 elf32-little

SYMBOL TABLE:
00000000 l    df *ABS*	00000000 main.c
00000000 l    d  .text	00000000 .text
00000000 l    d  .data	00000000 .data
00000000 l    d  .bss	00000000 .bss
00000004 l     O .bss	00000004 b1
00000008 l     O .bss	00000004 b2
00000004 l     O .data	00000004 b3
00000008 l     O .data	00000004 d3.1881
0000000c l     O .bss	00000004 d2.1880
00000010 l     O .bss	00000004 d1.1879
00000000 l    d  .note.GNU-stack	00000000 .note.GNU-stack
00000000 l    d  .eh_frame	00000000 .eh_frame
00000000 l    d  .comment	00000000 .comment
00000004       O *COM*	00000004 a1
00000000 g     O .bss	00000004 a2
00000000 g     O .data	00000004 a3
00000000 g     F .text	00000044 main
00000000         *UND*	00000000 data
00000000         *UND*	00000000 add

It marks which segment each symbol is in and how much memory it occupies. A1 is marked *COM* to indicate that it is a weak symbol (an uninitialized non-static global variable that may have the same name defined in other files)

The two symbols data and add are marked *UND*, indicating undefined symbols. The definition cannot be found in this file and will be found in other files when linking.

5. Based on the section headers information, draw the composition of the binary relocatable file (.o file)

Insert image description here

It can be found that the starting satellite TV of the bss segment and the comment segment are the same, but actual calculation shows that the bss segment is not stored in the .o file, but the bss segment is recorded in the symbol table.

It is concluded that the bss section saves global variables that are not initialized/initialized to 0 , and static local variables that are not initialized/initialized to 0 , so their default values ​​​​are all 0, so in order to save space in the .o file, No storage is required, but it needs to be recorded in the symbol table. After the executable file is finally executed, the symbols of the bss segment are stored in the virtual address space.
Insert image description here
Insert image description here

2. Link process:


Compiling on a 64-bit x86 machine - linking commands that produce 32-bit object files and executables

编译:
	gcc -m32 -fno-PIC -c *.c
手动链接:
    ld -e main -melf_i386 *.o -o run
    
生成如下文件:
    $ ls
	add.c  add.o  main.c  main.o  run

PS:

-m32 specifies compilation to generate 32-bit files;

-fno-PIC removes segments independent of position (leaving only .text.data.bss.comment, etc.)

-e specifies the program entry, just follow -e with a symbol, or you can use the add function as the program entry, i.e. -e add

-melf_i386 specifies the link to generate a 32-bit, x86 architecture executable file


The essence of the linking process is to "glue" multiple target files together. In essence, what is stitched together are the references to addresses between target files, that is, function names and global variables.

The symbol table is a section of the .o file, symtab , view the symbol table command

readelf -s main.o

objdump -t main.o

nm main.o

What is included in the symbol table, mainly focusing on 1 and 2 :

    1. Global symbols defined in this object file, such as variable names, function names, etc.
    1. Symbols referenced in other target files are not defined in this file and are generally called external symbols.
    1. Section name, such as ".text", ".data", etc.
    1. Local symbols are only visible inside the compilation unit. The debugger can use these symbols to analyze the program or the core dump file when it crashes. The linker often ignores them during the linking process.
$ objdump -t main.o

main.o:     文件格式 elf32-i386

SYMBOL TABLE:
00000000 l    df *ABS*	00000000 main.c
00000000 l    d  .text	00000000 .text
00000000 l    d  .data	00000000 .data
00000000 l    d  .bss	00000000 .bss
00000004 l     O .bss	00000004 b1
00000008 l     O .bss	00000004 b2
00000004 l     O .data	00000004 b3
00000008 l     O .data	00000004 d3.1877
0000000c l     O .bss	00000004 d2.1876
00000010 l     O .bss	00000004 d1.1875
00000000 l    d  .note.GNU-stack	00000000 .note.GNU-stack
00000000 l    d  .eh_frame	00000000 .eh_frame
00000000 l    d  .comment	00000000 .comment
00000004       O *COM*	00000004 a1
00000000 g     O .bss	00000004 a2
00000000 g     O .data	00000004 a3
00000000 g     F .text	00000016 main
1. Merge segments of all .o files

Insert image description here

As shown in the figure above, when text segments are merged, data segments are merged, and bss segments are merged, weak symbols need to be converted into strong symbols (or weak symbols are replaced by strong symbols), and the size of the bss segment increases.

And after discovering the link, each segment of the generated executable file is assigned a memory address (virtual memory)

2. Merge symbol tables , symbol parsing, and relocation

Insert image description here

  • Merge symbol tables

​ It can be seen that the symbol table of an executable file is simply a combination of the symbol tables of multiple .o files.

  • Symbol parsing

Convert weak symbols (*COM*) into strong symbols

​ Undefined symbols in this file (*UND*) were found in other files

  • reset

​ Allocate a virtual memory address to the symbol. The symbol's address is calculated based on the segment address plus its own offset.

Executable file analysis

1. View the file header
$ readelf -h run
ELF 头:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 
  类别:                              ELF32
  数据:                              2 补码,小端序 (little endian)
  版本:                              1 (current)
  OS/ABI:                            UNIX - System V
  ABI 版本:                          0
  类型:                              EXEC (可执行文件)
  系统架构:                          Intel 80386
  版本:                              0x1
  入口点地址:               0x80480a1
  程序头起点:          52 (bytes into file)
  Start of section headers:          4676 (bytes into file)
  标志:             0x0
  本头的大小:       52 (字节)
  程序头大小:       32 (字节)
  Number of program headers:         3
  节头大小:         40 (字节)
  节头数量:         9
  字符串表索引节头: 8

Entry point address: 0x80480a1.

2. View segment information
$ readelf -S run
There are 9 section headers, starting at offset 0x1244:

节头:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            00000000 000000 000000 00      0   0  0
  [ 1] .text             PROGBITS        08048094 000094 000051 00  AX  0   0  1
  [ 2] .eh_frame         PROGBITS        080480e8 0000e8 00005c 00   A  0   0  4
  [ 3] .data             PROGBITS        0804a000 001000 000010 00  WA  0   0  4
  [ 4] .bss              NOBITS          0804a010 001010 000018 00  WA  0   0  4
  [ 5] .comment          PROGBITS        00000000 001010 000029 01  MS  0   0  1
  [ 6] .symtab           SYMTAB          00000000 00103c 000170 10      7  14  4
  [ 7] .strtab           STRTAB          00000000 0011ac 000059 00      0   0  1
  [ 8] .shstrtab         STRTAB          00000000 001205 00003f 00      0   0  1

Each segment is assigned a virtual address.

3. View program headers
$ readelf -l run

Elf 文件类型为 EXEC (可执行文件)
Entry point 0x80480a1
There are 3 program headers, starting at offset 52

程序头:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  LOAD           0x000000 0x08048000 0x08048000 0x00144 0x00144 R E 0x1000
  LOAD           0x001000 0x0804a000 0x0804a000 0x00010 0x00028 RW  0x1000
  GNU_STACK      0x000000 0x00000000 0x00000000 0x00000 0x00000 RW  0x10

 Section to Segment mapping:
  段节...
   00     .text .eh_frame 
   01     .data .bss 
   02

Binary relocatable files only have "section headers" , and only executable files have "program headers" . "Program headers" show the virtual address and alignment bytes of each section (one page is 4K)

Merge according to segment attributes , read-only (text+rodata), readable and writable (data+bss), etc.

Use readelf -l main to view the "Segment" of ELF(for loading use)

PS: Because we linked it ourselves and did not link the C library, the content in the paragraph is relatively small.

​ * If you run gcc main.c -o main directly , the C library will be linked by default, and there will be a lot of content when viewing each section of the executable file.

​ * Executable files are loaded into the process by execve

​ * The reason why the executable file can be run is because it specifies the entry address (main) and program headers (specifies the virtual address to be loaded)

​ * The structure describing "Segment" is called "Program Header" , which describes how the ELF file should be mapped to the virtual space of the process by the operating system.

Guess you like

Origin blog.csdn.net/HuangChen666/article/details/133493602