Compile and link the executable file is how to get it?

Pangu! We wrote a C language source file, then from what source file into the executable program that took place in the middle? Compile, link these concepts What does it mean? With curiosity on these issues, I checked some information. Among them, the main reference is the "Programmer's self-cultivation" This book and some online blog.

In windowsoften only need to click the down Runor Debugyou can run a C program, this convenience hides the complexity of the mechanisms behind, and I would like to know in the end what happened behind.

The system is used herein ubuntu, these concepts also apply to the windowsnext.

Four stages 1. Compile the source file

If we write a very simple helloworld.cprogram:

#include <stdio.h>

int main(int argc, char *argv[])
{
    printf("Hello,World!\n");
    return 0;
}

We all know that running the command

gcc helloworld.c -o helloworld

You will be able to compile this file, and the executable file name helloworld. Then run

./helloworld
Hello,World!

You will be able to execute the file, but this has gone behind it?

note:

This article is not a rigorous discussion compilation of articles, but I know a carding process on this issue.

1.1 Pretreatment (Preprocessing)

In the preprocessing stage, we can be simply understood is to deal with "#" those pre-start instruction, for example:

#define,#include,#if,#elif,#else,#endif

Preprocessor in accordance with the meaning of these instructions deal with the #definedefinition of the macro replacement expansion, the #includefile containing the whole replacement came.

You can run the command

gcc -E helloworld.c -o helloworld.i

To get through the pre-processing files, checking can be found in pre-really helped us to #includedocument include it, the other in the file also contains some line number information, so the program after an error where the error location.

1.2 compiler (compile)

This step is the last step to get the *.icompile to get assembly code, you can run the command

gcc -S helloworld.i -o helloworld.s

Obtained after compilation files, wherein a portion of the document as follows:

main:
    ...
    leaq    .LC0(%rip), %rcx
    call    puts
    ...

We just call the corresponding function in the main program printf, so we know at this stage is to generate a compilation of documents.

1.3 Assembler (assembly)

This step is the last step of the assembly code compilation for the specific machine code, you can run the command

gcc -c helloworld.s -o helloworld.o

Generated helloworld.ocan be called a target file, let's check the target file, to help understand the 链接process.

1.3.1 The structure of the target file

The last step is to generate the target file, but the link has not been the target file, it is also one of the few symbols can not be determined, for example, in the above printfwe can not determine where to find a specific definition of the function, through the head file stdio.hwe just know its definition form, we know how to call it, but when the actual implementation is the need to code, where you go to find it? Looking for printfaction and writes it to the address of our program is linked.

We often deal with the file system has

  1. Executable files ( Executable File ), such as Windowsunder .exeor linuxnext /bin/bashfile
  2. Shared object files ( Shared Object File ), such as Windowsunder .dllor linuxnext .sofile
  3. Relocatable files ( Relocatable File ), the resulting file is above us this file , relocatable refers to the symbol of the program in some positions (function and variable names) address has not been determined, after the link stage requires repositioning

In Linuxyou can use the command fileto view the specific file formats, let's run

$ file helloworld.o
helloworld.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped

So specifically, the target file in the end what it contains? First will contain code, followed by data (defined variables), In addition, we are also concerned that the file contains a symbol table , it is the most important element of our follow-up the implementation of the link.

Run command

$ readelf -S helloworld.o

We can see the object file segment table, details about the segment table, please see the "Programmer's self-cultivation" this book.

There are 13 section headers, starting at offset 0x2d8:

节头:
  [号] 名称              类型             地址              偏移量
       大小              全体大小          旗标   链接   信息   对齐
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .text             PROGBITS         0000000000000000  00000040
       0000000000000022  0000000000000000  AX       0     0     1
  [ 2] .rela.text        RELA             0000000000000000  00000228
       0000000000000030  0000000000000018   I      10     1     8
  [ 3] .data             PROGBITS         0000000000000000  00000062
       0000000000000000  0000000000000000  WA       0     0     1
  [ 4] .bss              NOBITS           0000000000000000  00000062
       0000000000000000  0000000000000000  WA       0     0     1
  [ 5] .rodata           PROGBITS         0000000000000000  00000062
       000000000000000d  0000000000000000   A       0     0     1
  [ 6] .comment          PROGBITS         0000000000000000  0000006f
       000000000000002c  0000000000000001  MS       0     0     1
  [ 7] .note.GNU-stack   PROGBITS         0000000000000000  0000009b
       0000000000000000  0000000000000000           0     0     1
  [ 8] .eh_frame         PROGBITS         0000000000000000  000000a0
       0000000000000038  0000000000000000   A       0     0     8
  [ 9] .rela.eh_frame    RELA             0000000000000000  00000258
       0000000000000018  0000000000000018   I      10     8     8
  [10] .symtab           SYMTAB           0000000000000000  000000d8
       0000000000000120  0000000000000018          11     9     8
  [11] .strtab           STRTAB           0000000000000000  000001f8
       000000000000002e  0000000000000000           0     0     1
  [12] .shstrtab         STRTAB           0000000000000000  00000270
       0000000000000061  0000000000000000           0     0     1

We are concerned that the above-mentioned segment table 2number Segment Table: .rela.textrelocatable table. As we have said before, at the link stage To relocatable file relocation of some of the symbols, so we have to understand what needs to locate the symbol, and .rela.textis used to record the appropriate symbol.

Wherein the symbol table contains several symbols:

  1. Symbols defined in the present document can be referenced by other object file
  2. Symbolic references in this document, but is not defined in this document
  3. ...

Let's run the command

$ nm helloworld.o
                 U _GLOBAL_OFFSET_TABLE_
0000000000000000 T main
                 U puts

To see our object file symbol table, we can see two symbols mainand puts. The reason is not printflikely to be the compilation were changed.

Let's run another command to view the detailed symbol table:

$ readelf -s helloworld.o
Symbol table '.symtab' contains 12 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     ......
     9: 0000000000000000    34 FUNC    GLOBAL DEFAULT    1 main
    10: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND _GLOBAL_OFFSET_TABLE_
    11: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND puts

I saw the familiar two symbols, as mainis defined in this document so it is of type FUNCfunction, and Ndx=1can be located in that section of the code, but putsdue to the undefined, so Ndx=UND(undefine), so we can get through what symbol the symbol table It is defined in this document, which symbols need to be relocated.

Above we know the existence of the symbol table, the following procedure at the link we detail.

Suppose we have two files, a.cand b.c. Examples from the "Programmer's self-cultivation."

/* a.c */
extern int shared;
int main(){
    int a=100;
    swap(&a, &shared);
    return 0;
}

/* b.c */
int shared = 1; // default is global variable, can be accessed by external program

void swap(int *a, int *b){
    *a ^= *b ^= *a ^= *b; // swap value
}

The first to use gcccompile these two files

$ gcc -c a.c b.c

Then we'll get two files a.o, b.oview separate symbol table two documents

$ readelf -s a.o
Symbol table '.symtab' contains 13 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     ......
     8: 0000000000000000    81 FUNC    GLOBAL DEFAULT    1 main
     9: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND shared
    11: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND swap
    
$ readelf -s b.o
Symbol table '.symtab' contains 10 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     ......
     8: 0000000000000000     4 OBJECT  GLOBAL DEFAULT    2 shared
     9: 0000000000000000    75 FUNC    GLOBAL DEFAULT    1 swap

Thus, we can see that in a.oonly defines a global symbol main, while sharedand swapare not defined, but in the b.omiddle, sharedand swapit is the definition of.

We will link command is used

$ ld a.o b.o -e main -o ab
  • -e indicates maina main function of the inlet
  • -o indicates the output file name

And then view the assigned address assigned before and after

$ objdump -h a.o
a.o:     文件格式 elf64-x86-64

节:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .text         00000051  0000000000000000  0000000000000000  00000040  2**0
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
  1 .data         00000000  0000000000000000  0000000000000000  00000091  2**0
                  CONTENTS, ALLOC, LOAD, DATA
  ......

$ objdump -h b.o
b.o:     文件格式 elf64-x86-64
节:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .text         0000004b  0000000000000000  0000000000000000  00000040  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  1 .data         00000004  0000000000000000  0000000000000000  0000008c  2**2
                  CONTENTS, ALLOC, LOAD, DATA
  ......

I tried several times to run the command

$ ld a.o b.o -e main -o ab

But you are prompted an error

a.o:在函数‘main’中:
a.c:(.text+0x4b):对‘__stack_chk_fail’未定义的引用

I do not know why, so I had to use the command

$ gcc a.o b.o -o ab

But the authors of the document and will generate not the same, as follows

 节:
 Idx Name          Size      VMA               LMA               File off  Algn
 ......
 13 .text         00000222  0000000000000560  0000000000000560  00000560  2**4
 ......
 22 .data         00000014  0000000000201000  0000000000201000  00001000  2**3
                  CONTENTS, ALLOC, LOAD, DATA
 23 .bss          00000004  0000000000201014  0000000000201014  00001014  2**0
                  ALLOC
 24 .comment      0000002b  0000000000000000  0000000000000000  00001014  2**0
                  CONTENTS, READONLY

But still it can be seen VMA (virtual memory address) has been assigned, while in the previous a.oand b.oin are not assigned.

This step is meant to go through the link, we will synthesize two object files into a single file, and each function has its own relative address, this time we can give each symbol a given address.

Run command

$ readelf -s ab

To see the symbol table only lists related content

Symbol table '.symtab' contains 66 entries:
    Num:    Value          Size Type    Bind   Vis      Ndx Name
    59: 000000000000066a    81 FUNC    GLOBAL DEFAULT   14 main

    62: 00000000000006bb    75 FUNC    GLOBAL DEFAULT   14 swap

    65: 0000000000201010     4 OBJECT  GLOBAL DEFAULT   23 shared

We can see that the relevant symbol has been given a specific address space, that is, we completed the linking process.

After the above process, we run the command to view the disassembly

$ objdump -d ab
 000000000000066a <main>:
 66a:   55                      push   %rbp
 66b:   48 89 e5                mov    %rsp,%rbp
 66e:   48 83 ec 10             sub    $0x10,%rsp
 672:   64 48 8b 04 25 28 00    mov    %fs:0x28,%rax
 679:   00 00 
 67b:   48 89 45 f8             mov    %rax,-0x8(%rbp)
 67f:   31 c0                   xor    %eax,%eax
 681:   c7 45 f4 64 00 00 00    movl   $0x64,-0xc(%rbp)
 688:   48 8d 45 f4             lea    -0xc(%rbp),%rax
 68c:   48 8d 35 7d 09 20 00    lea    0x20097d(%rip),%rsi        # 201010 <shared>
 693:   48 89 c7                mov    %rax,%rdi
 696:   b8 00 00 00 00          mov    $0x0,%eax
 69b:   e8 1b 00 00 00          callq  6bb <swap> # <swap> 6bb
 6a0:   b8 00 00 00 00          mov    $0x0,%eax
 6a5:   48 8b 55 f8             mov    -0x8(%rbp),%rdx
 6a9:   64 48 33 14 25 28 00    xor    %fs:0x28,%rdx
 6b0:   00 00 
 6b2:   74 05                   je     6b9 <main+0x4f>
 6b4:   e8 87 fe ff ff          callq  540 <__stack_chk_fail@plt>
 6b9:   c9                      leaveq 
 6ba:   c3                      retq  

Notice swapand variable sharedaddress has been correctly assigned to the program, what we see as a comparison under the program before the link

$ objdump -d a.o
a.o:     文件格式 elf64-x86-64
Disassembly of section .text:
0000000000000000 <main>:
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   48 83 ec 10             sub    $0x10,%rsp
   8:   64 48 8b 04 25 28 00    mov    %fs:0x28,%rax
   f:   00 00 
  11:   48 89 45 f8             mov    %rax,-0x8(%rbp)
  15:   31 c0                   xor    %eax,%eax
  17:   c7 45 f4 64 00 00 00    movl   $0x64,-0xc(%rbp)
  1e:   48 8d 45 f4             lea    -0xc(%rbp),%rax
  22:   48 8d 35 00 00 00 00    lea    0x0(%rip),%rsi        # 29 <main+0x29>
  29:   48 89 c7                mov    %rax,%rdi
  2c:   b8 00 00 00 00          mov    $0x0,%eax
  31:   e8 00 00 00 00          callq  36 <main+0x36>
  36:   b8 00 00 00 00          mov    $0x0,%eax
  3b:   48 8b 55 f8             mov    -0x8(%rbp),%rdx
  3f:   64 48 33 14 25 28 00    xor    %fs:0x28,%rdx
  46:   00 00 
  48:   74 05                   je     4f <main+0x4f>
  4a:   e8 00 00 00 00          callq  4f <main+0x4f>
  4f:   c9                      leaveq 
  50:   c3                      retq  

We should note that the offset 22and the offset 31respectively correspond to sharedand swapcalls the second column hexadecimal represent this instruction, the instruction is four bytes per address, we can see these addresses are 0this description file a.o, the inability to determine the specific address, only this time the compiler to assign a special address 0x0, the correct address before completing the final link phase assignment.

We can also run the command

$ objdump -r a.o
a.o:     文件格式 elf64-x86-64

RELOCATION RECORDS FOR [.text]:
OFFSET           TYPE              VALUE 
0000000000000025 R_X86_64_PC32     shared-0x0000000000000004
0000000000000032 R_X86_64_PLT32    swap-0x0000000000000004
000000000000004b R_X86_64_PLT32    __stack_chk_fail-0x0000000000000004

Which offsetis described to be relocated in position.

2. summary

In fact, in the "Programmer's self-cultivation" of this book is to explore in depth the details, in order to fully understand and grasp too hard.

I would like to summarize the main section on the link. Probably process is:

  1. Link receives the input file
  2. Collecting each input file segment table, a synthetic global symbol table, this table contains all the symbols defined
  3. If you are statically linked, merge multiple input files, address space allocation, after this has been done specifically address all the symbols on the set
  4. And then reposition each input symbol relocation required file to the correct address

Guess you like

Origin www.cnblogs.com/cporoske/p/11653999.html