1.1 Language Processor
Compiler and Interpreter
- Category 1: Compiler
- Send a program in one language (source language) one at a time to an equivalent program written in another language (target language).
- If the target language is an executable machine language program, it can be invoked by the user to process input and produce output.
- efficient
- Category 2: Interpreter
- The target program is generated without translation.
- The interpreter uses the input provided by the user to perform the operations specified in the source program.
- low efficiency
- Category 3: Hybrid Structures
- Java source programs are first compiled into an intermediate form of bytecode.
- The bytecode is interpreted and executed by a virtual machine.
- Just in time compilers translate bytecode into machine language, speeding up execution.
1.2 Compiler structure
- Lexical analysis : Reads a stream of characters from a source program and forms them into sequences of meaningful lexemes.
- For each lexeme, the analyzer produces a token of the form: <token-name,attribute-value>
- token-name : abstract symbol
- attribute-value : the entry in the symbol table
- position = initial + rate *60 | -- <id,1><=><id,2><+><id,3><><60>
- Morpheme -- position : <id,1>
- Morpheme -- =: <=>
- Morpheme -- initial : <id,2>
- Morpheme -- +: <+>
- Morpheme -- rate : <id,3>
- Morpheme -- : < >
- Morpheme -- 60: <60>
- Syntax analysis : Create a syntax tree from the lexical units
- Semantic analyzer : Check whether the semantics of the source program and the language definition are consistent
- Type checking :
- Automatic type conversion (coercion) , dynamic typing
- Intermediate code generation : Compile the syntax tree into an unambiguous low-level or cumulative intermediate representation of the language.
- Code optimization : Optimize machine-independent code to generate better object code.
- Code Generation : Take an intermediate representation as input and map it to the target language.
- If the target language is machine code, then a register or memory location must be chosen for each variable used by the program, and intermediate instructions are translated into sequences of machine instructions capable of accomplishing the same task .
- Allocate registers reasonably to store the value of the variable
- Symbol table management :
- The symbol table data structure creates a record entry for each variable name, and the fields of the record are the attributes of the name.
- Allows the compiler to quickly find the record for each name, and the data for the record is quickly stored in the refrigerator record.
- Combine multiple steps :
- Frontend and Backend
- Front-end steps : lexical analysis, syntax analysis, semantic analysis, intermediate code generation
- Code optimization optional
- Backend Step : Code Generation
- Some collections of compilers are built around a set of deliberately designed intermediate representations that allow us to connect a language-specific front end with a target-specific back end.
- Combining different front ends with the back end of a target machine to build compilers on the target machine for different source languages
- A front-end is combined with different back-ends to build compilers for different target machines
- Compiler construction tools :
- Parser generator : A parser can be automatically generated from the syntax description of a programming language.
- Scanned generators : can generate lexical analyzers from regular expression descriptions of a language's syntactic units.
- Syntax-aware translation engine : can generate a set of routines for variable parse trees and generate intermediate code.
- Generator of code generators : Generates a code generator according to a set of rules on how to translate each operation of the intermediate language into machine language on the target machine.
- Data flow analysis engine : can help mobile data flow information, that is, how values in the program are passed from one part of the program to another. Data flow analysis is an important part of code optimization
- Compiler Construction Tools : Provides a complete collection of routines that can be used for different stages of compiler construction. Data flow analysis is an important part of code optimization
1.3 Compiler structure
language development
- By generation
- The first generation: machine pre-research
- Second Generation: Assembly Language
- 3rd Generation: Fortran, Cobol, Lisp, C, C++, C#, Java
- Fourth Generation: Languages for Application-Specific Design
- Generate report: NOMAD
- Database query: SQL
- Text Typesetting: Postscript
- Fifth generation: based on logic and constraints. Prolog and OPS5
- Complete computer tasks :
- Imperative: C, C++, C#, Java
- Declarative: ML, Haskell, Prolog
1.4 The science of compilers
- Modeling compiler design and implementation: A study of how to design the correct mathematical model and choose the correct algorithm.
- Code Optimization Science:
- The optimization must be correct, that is, it must not change the meaning of the compiler program
- Optimization must be done to improve the performance of many programs
- The time required for optimization must be kept within a reasonable range
- The engineering work required must be manageable
1.5 Application of Compilation Technology
- Implementation of high-level programming languages:
- High-level languages are controlled by low-level memory registers, which is likely to lose performance, especially if the target machine is different, the compiled target program is less efficient
- Optimization for computer architecture:
- Parallelism : Instruction-level parallelism is used in all modern microprocessors, and multiprocessors are increasingly popular
- Memory Hierarchy : If most of a program's memory accesses can be satisfied by the fastest in the hierarchy, then the average memory access time of a program will decrease.
- Design of New Computer Architecture
- RISC (Reduced Instruction Set Computer): Reduced Instruction-Set Computer
- CISC (Complex Instruction Set Computer): Complex Instruction-Set computer
- Specialized architectures: data volume clusters, vector machines, VLIW (very long instruction word) machines, SIMD (single instruction, multiple data) processor arrays, systolic arrays, shared memory multiprocessors, distributed memory multiprocessor.
- Program translation
- Binary translation : translate the binary code of one machine into the binary code of another machine
- Hardware Synthesis: Verilog and VHDL
- Data Query Interpreter: SQL
- Post-compiled simulations: Post-compiled simulations run Kobe orders of magnitude faster than the interpreter-based approach. (Verilog and VHDL)
- Software production tools:
- Data flow analysis : can find errors on all possible execution paths, rather than only those paths executed by a combination of input data, as is the case when the program is tested.
- Type checking : used to catch Chen Xiang's inconsistencies
- Bounds Check : Checks that the data is not out of bounds
- Memory management tools : Splendid garbage collection is just an example of the trade-off between efficiency and ease of compilation and software reliability.
1.6 Basics of programming pre-research
- The difference between static and dynamic :
- Static strategy (compile-time strategy) : A language uses strategies that allow the compiler to decide a problem statically.
- Static scope: The variable type is declared directly. C/Java
- Dynamic Policy (Runtime Policy) : A policy that only allows decisions to be made while the program is running.
- Dynamic scope (static scope): When the program runs, x can point to one of several declarations of x.
- Environment and State (Scope)
- Environment : A map from a name (variable name) to a storage location. Mapping from names to variable names, lvalues in C
- State : A map from memory locations to their values. C language: map lvalues to their corresponding rvalues
- Environment changes are subject to the language's scoping rules.
- When f() runs, the environment is adjusted accordingly, so the name i points to is a local variable
...
int i; /**全局i*/
...
void f(..){
int i; /**局部i*/
...
i = 3; /**对局部i的使用*/
...
}
...
x = i+1; /**对全局i的使用*/
- Static scope and block structure
- Block : C uses { and } to delimit a block, by using begin and end .
- A block is a statement
- A block contains a sequence of declarations followed by a sequence of statements.
- Static scope in C language :
- A C program consists of a top -level sequence of variable and function declarations.
- Variables can be declared inside functions. Variables include local variables and parameters. The scope of each such declaration is limited to the function in which they appear.
- The scope of a top-level declaration of the name x includes all subsequent programs. If a function also has a declaration of x, then the x in the function is no longer in the scope of the top-level declaration.
- Show access control :
- C++/JAVA:public、private、protected
- Dynamic scope :
- A use of a variable named x that points to was most recently called and has not expired.
- Macro expansion in C preprocessor
- The following pseudocode: When b() is executed, it will replace a with (X+1)
- Method Analysis in Object Oriented Programming
- Dynamic scope resolution is essential for multiple processes. Polymorphism: Refers to a procedure with two or more definitions for the same name depending on the parameter type
#define a (x+1)
int x = 2;
void b() {int x =1 ; printf("%d\n",a) ;}
void c() {print("%d",a);}
void main() {b() ; c();}
- Parameter passing mechanism : actual parameter and formal parameter association.
- pass-by-value, pass-by-reference
- Call-by-value : All computations on formal parameters performed by the calling procedure are limited to this procedure, and the corresponding actual parameters do not change. Arguments can be changed by passing in a pointer .
- Call-by-reference : The address of the actual parameter is passed to the caller as the corresponding formal parameter value.
- Aliases : Multiple variables all point to the same location