An overview of compilation principles

1.1 Language Processor

Compiler and Interpreter

  • Category 1: Compiler
    • Send a program in one language (source language) one at a time to an equivalent program written in another language (target language).
    • If the target language is an executable machine language program, it can be invoked by the user to process input and produce output.
    • efficient

Enter image description

  • Category 2: Interpreter
    • The target program is generated without translation.
    • The interpreter uses the input provided by the user to perform the operations specified in the source program.
    • low efficiency

Enter image description

  • Category 3: Hybrid Structures
    • Java source programs are first compiled into an intermediate form of bytecode.
    • The bytecode is interpreted and executed by a virtual machine.
    • Just in time compilers translate bytecode into machine language, speeding up execution.

Enter image description

1.2 Compiler structure

Enter image description

  • Lexical analysis : Reads a stream of characters from a source program and forms them into sequences of meaningful lexemes.
    • For each lexeme, the analyzer produces a token of the form: <token-name,attribute-value>
      • token-name : abstract symbol
      • attribute-value : the entry in the symbol table
      • position = initial + rate *60 | -- <id,1><=><id,2><+><id,3><><60>
        • Morpheme -- position : <id,1>
        • Morpheme -- =: <=>
        • Morpheme -- initial : <id,2>
        • Morpheme -- +: <+>
        • Morpheme -- rate : <id,3>
        • Morpheme -- : < >
        • Morpheme -- 60: <60>
  • Syntax analysis : Create a syntax tree from the lexical units
  • Semantic analyzer : Check whether the semantics of the source program and the language definition are consistent
    • Type checking :
    • Automatic type conversion (coercion) , dynamic typing
  • Intermediate code generation : Compile the syntax tree into an unambiguous low-level or cumulative intermediate representation of the language.
  • Code optimization : Optimize machine-independent code to generate better object code.
  • Code Generation : Take an intermediate representation as input and map it to the target language.
    • If the target language is machine code, then a register or memory location must be chosen for each variable used by the program, and intermediate instructions are translated into sequences of machine instructions capable of accomplishing the same task .
    • Allocate registers reasonably to store the value of the variable

Enter image description

  • Symbol table management :
    • The symbol table data structure creates a record entry for each variable name, and the fields of the record are the attributes of the name.
      • Allows the compiler to quickly find the record for each name, and the data for the record is quickly stored in the refrigerator record.
  • Combine multiple steps :
    • Frontend and Backend
      • Front-end steps : lexical analysis, syntax analysis, semantic analysis, intermediate code generation
      • Code optimization optional
      • Backend Step : Code Generation
    • Some collections of compilers are built around a set of deliberately designed intermediate representations that allow us to connect a language-specific front end with a target-specific back end.
      • Combining different front ends with the back end of a target machine to build compilers on the target machine for different source languages
      • A front-end is combined with different back-ends to build compilers for different target machines
  • Compiler construction tools :
    • Parser generator : A parser can be automatically generated from the syntax description of a programming language.
    • Scanned generators : can generate lexical analyzers from regular expression descriptions of a language's syntactic units.
    • Syntax-aware translation engine : can generate a set of routines for variable parse trees and generate intermediate code.
    • Generator of code generators : Generates a code generator according to a set of rules on how to translate each operation of the intermediate language into machine language on the target machine.
    • Data flow analysis engine : can help mobile data flow information, that is, how values ​​in the program are passed from one part of the program to another. Data flow analysis is an important part of code optimization
    • Compiler Construction Tools : Provides a complete collection of routines that can be used for different stages of compiler construction. Data flow analysis is an important part of code optimization

1.3 Compiler structure

language development

  • By generation
    • The first generation: machine pre-research
    • Second Generation: Assembly Language
    • 3rd Generation: Fortran, Cobol, Lisp, C, C++, C#, Java
    • Fourth Generation: Languages ​​for Application-Specific Design
      • Generate report: NOMAD
      • Database query: SQL
      • Text Typesetting: Postscript
    • Fifth generation: based on logic and constraints. Prolog and OPS5
  • Complete computer tasks :
    • Imperative: C, C++, C#, Java
    • Declarative: ML, Haskell, Prolog

1.4 The science of compilers

  • Modeling compiler design and implementation: A study of how to design the correct mathematical model and choose the correct algorithm.
  • Code Optimization Science:
    • The optimization must be correct, that is, it must not change the meaning of the compiler program
    • Optimization must be done to improve the performance of many programs
    • The time required for optimization must be kept within a reasonable range
    • The engineering work required must be manageable

1.5 Application of Compilation Technology

  • Implementation of high-level programming languages:
    • High-level languages ​​are controlled by low-level memory registers, which is likely to lose performance, especially if the target machine is different, the compiled target program is less efficient
  • Optimization for computer architecture:
    • Parallelism : Instruction-level parallelism is used in all modern microprocessors, and multiprocessors are increasingly popular
    • Memory Hierarchy : If most of a program's memory accesses can be satisfied by the fastest in the hierarchy, then the average memory access time of a program will decrease.
  • Design of New Computer Architecture
    • RISC (Reduced Instruction Set Computer): Reduced Instruction-Set Computer
    • CISC (Complex Instruction Set Computer): Complex Instruction-Set computer
    • Specialized architectures: data volume clusters, vector machines, VLIW (very long instruction word) machines, SIMD (single instruction, multiple data) processor arrays, systolic arrays, shared memory multiprocessors, distributed memory multiprocessor.
  • Program translation
    • Binary translation : translate the binary code of one machine into the binary code of another machine
    • Hardware Synthesis: Verilog and VHDL
    • Data Query Interpreter: SQL
    • Post-compiled simulations: Post-compiled simulations run Kobe orders of magnitude faster than the interpreter-based approach. (Verilog and VHDL)
  • Software production tools:
    • Data flow analysis : can find errors on all possible execution paths, rather than only those paths executed by a combination of input data, as is the case when the program is tested.
    • Type checking : used to catch Chen Xiang's inconsistencies
    • Bounds Check : Checks that the data is not out of bounds
    • Memory management tools : Splendid garbage collection is just an example of the trade-off between efficiency and ease of compilation and software reliability.

1.6 Basics of programming pre-research

  • The difference between static and dynamic :
    • Static strategy (compile-time strategy) : A language uses strategies that allow the compiler to decide a problem statically.
      • Static scope: The variable type is declared directly. C/Java
    • Dynamic Policy (Runtime Policy) : A policy that only allows decisions to be made while the program is running.
      • Dynamic scope (static scope): When the program runs, x can point to one of several declarations of x.
  • Environment and State (Scope)
    • Environment : A map from a name (variable name) to a storage location. Mapping from names to variable names, lvalues ​​in C
    • State : A map from memory locations to their values. C language: map lvalues ​​to their corresponding rvalues
      • Environment changes are subject to the language's scoping rules.
        • When f() runs, the environment is adjusted accordingly, so the name i points to is a local variable

Enter image description

...
int i;  /**全局i*/
...
void f(..){
    int i;  /**局部i*/
    ...
    i = 3; /**对局部i的使用*/
    ...
}

...
 x = i+1;   /**对全局i的使用*/

  • Static scope and block structure
    • Block : C uses { and } to delimit a block, by using begin and end .
      • A block is a statement
      • A block contains a sequence of declarations followed by a sequence of statements.
    • Static scope in C language :
      • A C program consists of a top -level sequence of variable and function declarations.
      • Variables can be declared inside functions. Variables include local variables and parameters. The scope of each such declaration is limited to the function in which they appear.
      • The scope of a top-level declaration of the name x includes all subsequent programs. If a function also has a declaration of x, then the x in the function is no longer in the scope of the top-level declaration.
  • Show access control :
    • C++/JAVApublic、private、protected
  • Dynamic scope :
    • A use of a variable named x that points to was most recently called and has not expired.
      • Macro expansion in C preprocessor
        • The following pseudocode: When b() is executed, it will replace a with (X+1)
      • Method Analysis in Object Oriented Programming
        • Dynamic scope resolution is essential for multiple processes. Polymorphism: Refers to a procedure with two or more definitions for the same name depending on the parameter type
#define a (x+1)
int x = 2;

void b() {int x =1 ; printf("%d\n",a) ;}

void c() {print("%d",a);}

void main() {b() ; c();}

  • Parameter passing mechanism : actual parameter and formal parameter association.
    • pass-by-value, pass-by-reference
    • Call-by-value : All computations on formal parameters performed by the calling procedure are limited to this procedure, and the corresponding actual parameters do not change. Arguments can be changed by passing in a pointer .
    • Call-by-reference : The address of the actual parameter is passed to the caller as the corresponding formal parameter value.
  • Aliases : Multiple variables all point to the same location

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324927629&siteId=291194637
Recommended