C++ from code to executable process (precompilation, compilation, assembly, linking) (using Linux as a reference)

Refer to Axiu's study notes

test code

#include<iostream>
using namespace std;
#define PI 3.14
int main(){
    
    
        //测试代码
        cout<<PI<<endl;
        cout<<"hello world"<<endl;
        return 0;
}

preprocessing

processing content

  • Delete define, expand all macro definitions
  • Processes all conditional precompiled directives such as "#if", "#endif", "#ifdef", "#elif" and "#else".
  • The "#include" precompiled directive is processed to replace the contents of the file in its place, this process is recursive, and the file contains other files.
  • Remove all comments, "//" and "/**/".
  • Keep all the #pragma compiler directives, the compiler needs to use them, such as: #pragma once is to prevent files from being repeatedly referenced.
  • Add line number and file identification, so that the compiler can generate line number information for debugging when compiling, and can display line numbers when compilation errors or warnings are generated during compilation.

example

g++ main.cpp -E -o main.i

  • -E: Indicates preprocessing
  • -o indicates the target file
......
# 2 "main.cpp" 2

# 2 "main.cpp"
using namespace std;

int main(){
    
    

 cout<<3.14<<endl;
 cout<<"hello world"<<endl;
 return 0;
}

compile

compile

Perform a series of lexical analysis, syntax analysis, semantic analysis and optimization on the xxx.i or xxx.ii files generated after precompilation to generate corresponding assembly code files.

  • Lexical analysis: Using an algorithm similar to the "finite state machine", the source code program is input into the scanner, and the character sequence in it is divided into a series of tokens.
  • Syntax analysis: The syntax analyzer performs syntax analysis on the tokens generated by the scanner to generate a syntax tree. The syntax tree output by the parser is a tree with expressions as nodes.
  • Semantic analysis: The grammatical analyzer only completes the analysis of the grammatical level of the expression, and the semantic analyzer judges whether the expression is meaningful. The semantics analyzed are static semantics - the semantics that can be staged during compilation, corresponding to Dynamic semantics are semantics that can only be determined at runtime.
  • Optimization: An optimization process at the source code level.
  • Object code generation: The intermediate code is converted into object machine code by the code generator, and a series of code sequences are generated—assembly language representation.
  • Target code optimization: The target code optimizer optimizes the above target machine code: find a suitable addressing mode, use displacement to replace multiplication, delete redundant instructions, etc.

example

g++ main.i -S -o main.s

  • -S means compile
        .file   "main.cpp"
        .text
        .section        .rodata
        .type   _ZStL19piecewise_construct, @object
        .size   _ZStL19piecewise_construct, 1
_ZStL19piecewise_construct:
        .zero   1
        .local  _ZStL8__ioinit
        .comm   _ZStL8__ioinit,1,1
.LC1:
        .string "hello world"
        .text
        .globl  main
        .type   main, @function
main:
.LFB1493:
        .cfi_startproc
        pushq   %rbp
        .cfi_def_cfa_offset 16
        .cfi_offset 6, -16
        movq    %rsp, %rbp
        .cfi_def_cfa_register 6
        subq    $16, %rsp
        movq    .LC0(%rip), %rax
        movq    %rax, -8(%rbp)
        movsd   -8(%rbp), %xmm0
        leaq    _ZSt4cout(%rip), %rdi
        call    _ZNSolsEd@PLT
        movq    %rax, %rdx
        movq    _ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_@GOTPCREL(%rip), %rax
        movq    %rax, %rsi
        movq    %rdx, %rdi
        call    _ZNSolsEPFRSoS_E@PLT
        leaq    .LC1(%rip), %rsi
        leaq    _ZSt4cout(%rip), %rdi
        call    _ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc@PLT
        movq    %rax, %rdx
        movq    _ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_@GOTPCREL(%rip), %rax
        movq    %rax, %rsi
        movq    %rdx, %rdi
        call    _ZNSolsEPFRSoS_E@PLT
        movl    $0, %eax
        leave
        .cfi_def_cfa 7, 8
        ret
        .cfi_endproc
.LFE1493:
        .size   main, .-main
        .type   _Z41__static_initialization_and_destruction_0ii, @function
_Z41__static_initialization_and_destruction_0ii:
.LFB1983:
        .cfi_startproc
        pushq   %rbp
        .cfi_def_cfa_offset 16
        .cfi_offset 6, -16
        movq    %rsp, %rbp
        .cfi_def_cfa_register 6
        subq    $16, %rsp
        movl    %edi, -4(%rbp)
        movl    %esi, -8(%rbp)
        cmpl    $1, -4(%rbp)
        jne     .L5
        cmpl    $65535, -8(%rbp)
        jne     .L5
        leaq    _ZStL8__ioinit(%rip), %rdi
        call    _ZNSt8ios_base4InitC1Ev@PLT
        leaq    __dso_handle(%rip), %rdx
        leaq    _ZStL8__ioinit(%rip), %rsi
        movq    _ZNSt8ios_base4InitD1Ev@GOTPCREL(%rip), %rax
        movq    %rax, %rdi
        call    __cxa_atexit@PLT
.L5:
        nop
        leave
        .cfi_def_cfa 7, 8
        ret
        .cfi_endproc
.LFE1983:
        .size   _Z41__static_initialization_and_destruction_0ii, .-_Z41__static_initialization_and_destruction_0ii
        .type   _GLOBAL__sub_I_main, @function
_GLOBAL__sub_I_main:
.LFB1984:
        .cfi_startproc
        pushq   %rbp
        .cfi_def_cfa_offset 16
        .cfi_offset 6, -16
        movq    %rsp, %rbp
        .cfi_def_cfa_register 6
        movl    $65535, %esi
        movl    $1, %edi
        call    _Z41__static_initialization_and_destruction_0ii
        popq    %rbp
        .cfi_def_cfa 7, 8
        ret
        .cfi_endproc
.LFE1984:
        .size   _GLOBAL__sub_I_main, .-_GLOBAL__sub_I_main
        .section        .init_array,"aw"
        .align 8
        .quad   _GLOBAL__sub_I_main
        .section        .rodata
        .align 8
.LC0:
        .long   1374389535
        .long   1074339512
        .hidden __dso_handle
        .ident  "GCC: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0"
        .section        .note.GNU-stack,"",@progbits

compilation

Convert assembly code into machine-executable instructions (machine code files). The assembly process of the assembler is simpler than that of the compiler. There is no complicated syntax, no semantics, and no instruction optimization is required. It is only translated one by one according to the comparison table between assembly instructions and machine instructions. The assembly process has an assembler as completed. After compilation, the object file (almost the same format as the executable file) xxx.o (under Linux), xxx.obj (under Window) will be generated.

g++ main.s -s -o main.o

Link

Link the object files generated by different source files to form an executable program.

static link

When compiling and linking an executable, the linker copies these functions and data from the library and combines them with other modules of the referenced program to create the final executable. The linker mainly completes the following two tasks:

  • Symbol resolution: each symbol corresponds to a function, a global variable or a static variable
  • Relocation: The linker works by associating each symbol definition with a memory location, and then modifies all references to those symbols so that they point to that memory location.
    insert image description here
shortcoming

Waste of space, each executable program must have a copy of all object files, so if multiple programs have dependencies on the same object file, the same object file will appear

advantage

Runs fast because everything needed to execute the program is already present in the executable.

dynamic link

Split the program into relatively independent parts according to modules, and link them together at runtime. Linux is in .so format, and Windows is in dll file. == In memory, one copy of the .text section (compiled program's machine code) of a shared library can be shared by different running processes
insert image description here

shortcoming
  • Performance loss, every time the program is executed, it needs to be linked, so there will be a certain loss in performance.
  • If there is no corresponding runtime library installed on a computer, the dynamically compiled executable file will not be able to run.
advantage
  • Shared library: multiple programs share the same copy when executing
  • Easy to update: just replace the original target file when updating. When the program runs next time, the new version of the target file will be automatically loaded into the memory and linked.

Guess you like

Origin blog.csdn.net/qaaaaaaz/article/details/130786287