Refer to Axiu's study notes
test code
#include<iostream>
using namespace std;
#define PI 3.14
int main(){
//测试代码
cout<<PI<<endl;
cout<<"hello world"<<endl;
return 0;
}
preprocessing
processing content
- Delete define, expand all macro definitions
- Processes all conditional precompiled directives such as "#if", "#endif", "#ifdef", "#elif" and "#else".
- The "#include" precompiled directive is processed to replace the contents of the file in its place, this process is recursive, and the file contains other files.
- Remove all comments, "//" and "/**/".
- Keep all the #pragma compiler directives, the compiler needs to use them, such as: #pragma once is to prevent files from being repeatedly referenced.
- Add line number and file identification, so that the compiler can generate line number information for debugging when compiling, and can display line numbers when compilation errors or warnings are generated during compilation.
example
g++ main.cpp -E -o main.i
- -E: Indicates preprocessing
- -o indicates the target file
......
# 2 "main.cpp" 2
# 2 "main.cpp"
using namespace std;
int main(){
cout<<3.14<<endl;
cout<<"hello world"<<endl;
return 0;
}
compile
compile
Perform a series of lexical analysis, syntax analysis, semantic analysis and optimization on the xxx.i or xxx.ii files generated after precompilation to generate corresponding assembly code files.
- Lexical analysis: Using an algorithm similar to the "finite state machine", the source code program is input into the scanner, and the character sequence in it is divided into a series of tokens.
- Syntax analysis: The syntax analyzer performs syntax analysis on the tokens generated by the scanner to generate a syntax tree. The syntax tree output by the parser is a tree with expressions as nodes.
- Semantic analysis: The grammatical analyzer only completes the analysis of the grammatical level of the expression, and the semantic analyzer judges whether the expression is meaningful. The semantics analyzed are static semantics - the semantics that can be staged during compilation, corresponding to Dynamic semantics are semantics that can only be determined at runtime.
- Optimization: An optimization process at the source code level.
- Object code generation: The intermediate code is converted into object machine code by the code generator, and a series of code sequences are generated—assembly language representation.
- Target code optimization: The target code optimizer optimizes the above target machine code: find a suitable addressing mode, use displacement to replace multiplication, delete redundant instructions, etc.
example
g++ main.i -S -o main.s
- -S means compile
.file "main.cpp"
.text
.section .rodata
.type _ZStL19piecewise_construct, @object
.size _ZStL19piecewise_construct, 1
_ZStL19piecewise_construct:
.zero 1
.local _ZStL8__ioinit
.comm _ZStL8__ioinit,1,1
.LC1:
.string "hello world"
.text
.globl main
.type main, @function
main:
.LFB1493:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
subq $16, %rsp
movq .LC0(%rip), %rax
movq %rax, -8(%rbp)
movsd -8(%rbp), %xmm0
leaq _ZSt4cout(%rip), %rdi
call _ZNSolsEd@PLT
movq %rax, %rdx
movq _ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_@GOTPCREL(%rip), %rax
movq %rax, %rsi
movq %rdx, %rdi
call _ZNSolsEPFRSoS_E@PLT
leaq .LC1(%rip), %rsi
leaq _ZSt4cout(%rip), %rdi
call _ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc@PLT
movq %rax, %rdx
movq _ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_@GOTPCREL(%rip), %rax
movq %rax, %rsi
movq %rdx, %rdi
call _ZNSolsEPFRSoS_E@PLT
movl $0, %eax
leave
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE1493:
.size main, .-main
.type _Z41__static_initialization_and_destruction_0ii, @function
_Z41__static_initialization_and_destruction_0ii:
.LFB1983:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
subq $16, %rsp
movl %edi, -4(%rbp)
movl %esi, -8(%rbp)
cmpl $1, -4(%rbp)
jne .L5
cmpl $65535, -8(%rbp)
jne .L5
leaq _ZStL8__ioinit(%rip), %rdi
call _ZNSt8ios_base4InitC1Ev@PLT
leaq __dso_handle(%rip), %rdx
leaq _ZStL8__ioinit(%rip), %rsi
movq _ZNSt8ios_base4InitD1Ev@GOTPCREL(%rip), %rax
movq %rax, %rdi
call __cxa_atexit@PLT
.L5:
nop
leave
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE1983:
.size _Z41__static_initialization_and_destruction_0ii, .-_Z41__static_initialization_and_destruction_0ii
.type _GLOBAL__sub_I_main, @function
_GLOBAL__sub_I_main:
.LFB1984:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl $65535, %esi
movl $1, %edi
call _Z41__static_initialization_and_destruction_0ii
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE1984:
.size _GLOBAL__sub_I_main, .-_GLOBAL__sub_I_main
.section .init_array,"aw"
.align 8
.quad _GLOBAL__sub_I_main
.section .rodata
.align 8
.LC0:
.long 1374389535
.long 1074339512
.hidden __dso_handle
.ident "GCC: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0"
.section .note.GNU-stack,"",@progbits
compilation
Convert assembly code into machine-executable instructions (machine code files). The assembly process of the assembler is simpler than that of the compiler. There is no complicated syntax, no semantics, and no instruction optimization is required. It is only translated one by one according to the comparison table between assembly instructions and machine instructions. The assembly process has an assembler as completed. After compilation, the object file (almost the same format as the executable file) xxx.o (under Linux), xxx.obj (under Window) will be generated.
g++ main.s -s -o main.o
Link
Link the object files generated by different source files to form an executable program.
static link
When compiling and linking an executable, the linker copies these functions and data from the library and combines them with other modules of the referenced program to create the final executable. The linker mainly completes the following two tasks:
- Symbol resolution: each symbol corresponds to a function, a global variable or a static variable
- Relocation: The linker works by associating each symbol definition with a memory location, and then modifies all references to those symbols so that they point to that memory location.
shortcoming
Waste of space, each executable program must have a copy of all object files, so if multiple programs have dependencies on the same object file, the same object file will appear
advantage
Runs fast because everything needed to execute the program is already present in the executable.
dynamic link
Split the program into relatively independent parts according to modules, and link them together at runtime. Linux is in .so format, and Windows is in dll file. == In memory, one copy of the .text section (compiled program's machine code) of a shared library can be shared by different running processes
shortcoming
- Performance loss, every time the program is executed, it needs to be linked, so there will be a certain loss in performance.
- If there is no corresponding runtime library installed on a computer, the dynamically compiled executable file will not be able to run.
advantage
- Shared library: multiple programs share the same copy when executing
- Easy to update: just replace the original target file when updating. When the program runs next time, the new version of the target file will be automatically loaded into the memory and linked.