Basics of Compilation Principles_Longshu Learning Record

Basics of compilation principles

language processor

A compiler is a program that reads a source program (written in a certain language) and translates it into a target program (written in a target language) . One of the important tasks of the compiler is to report errors in the source program that it finds during translation .

An interpreter is another common language processor . It does not generate target programs through translation. From the user's perspective, the interpreter directly uses the input provided by the user to perform the operations specified in the source program.

translater,andInterpreterThere are two common forms.
The main difference :
the target language is translated into a form that can be executed by the computer. The software system that completes this process is called a compiler. The interpreter
does not compile the source program into a target program like a compiler, but compiles the source statement ( Combined with user input) explain and execute item by item.

In addition to the compiler, the preprocessor
also requires some other programs to create an executable target program, such as the task of aggregating the multiple modules divided by the programmer when writing the program, and the expansion of called macros into Tasks in source language format. ( Summary: The preprocessor is responsible for gathering the source program together and converting macros into source language statements)

An assembler
passes preprocessed source programs as input to a compiler. The compiler may produce an assembly language program as its output because assembly language is easier to output and debug. This assembly language program is then processed by a program called an assembler and produces relocatable machine code . (Summary: The assembler is responsible for processing assembly language and generating relocatable machine code )

Linker (linker)
Large programs are often divided into multiple parts for compilation. Therefore, relocatable machine code must be connected with other relocatable object files and library files to form code that actually runs on the machine. Code in one file may point to a location in another file, and the linker can resolve external memory addresses .

The loader
puts all executable target files into memory for execution.

The structure of a compiler

Through the compiler, the source program is mapped to the target language program (semantically equivalent). The mapping process is roughly divided intoanalyzeandcomprehensiveTwo parts.

analysis

  • Break the source program into multipleComponents
  • Create an intermediate representation of the source program through the grammatical structure of the component elements
  • Check whether the source program follows the correct grammatical structure and semantics. If not, useful information must be provided for the user to correct.
  • Collect information about the source program and store it in a data structure called a symbol table
  • The symbol table is passed to the synthesis part along with the intermediate representation.

Synthesis
obtained from analysisintermediate representationandInformation in the symbol tableto construct what users expecttarget program

Common nameanalysis partforCompiler front end,andComprehensive partforCompiler back end

source–>Frontend Optimizer Backend–>Machine Code
compilation process is to perform a series of preprocessed files

The GCC compilation process is equivalent to the following instructions:

gcc -S 源文件 -o 编译文件(****.s) 

Some compilers have a machine-independent optimization step between the front-end and back-end . The purpose of this optimization step is to perform transformations on top of the intermediate representation so that the backend program can generate a better target program . If code is generated based on an intermediate representation that has not undergone suboptimization steps, the quality of the code will suffer. Because optimization is optional .

Frontend: front end

  • lexical analysis
  • Gramma analysis
  • Semantic Analysis

Backend: backend

  • Generate intermediate code
  • Intermediate code optimization
  • Generate object code

lexical analysis
Lexical analysisor calledscanning. Read the source program character stream and split it intomeaningful sequence of morphemeslexical unitand aSymbol table

useAn algorithm similar to a Finite State Machine
The lex program can implement lexical scanning,it willAccording to the lexical rules previously described by the userSplit the input string into tokens. Because of the existence of such a program, compiler developers do not need to develop an independent lexical scanner for each compiler, but can just change the lexical rules as needed.

Generally speaking, the language that is parsed into tokens isBased on context-free grammarof.

The lexical analyzer will generate a lexical unit (token) in the following form for each lexeme as output:
<token-name, attribute-value>

After the lexical unit is generated, it will be sent to the next step, which is syntax analysis.

Lexical unit <token-name, attribute-value> : The first component token-name is an abstract symbol used by the parsing step, and the second component attribute-value points to the symbol table about thisEntries for lexical units. Symbol table entry information will beSemantic Analysisandcode generationSteps to use. The symbol table entry corresponding to
an identifier stores information about the identifier , such as its name and type.
Store numbers and string constants in text tables, etc., for later use.
Whitespace separating lexemes is ignored by the lexical analyzer.

Gramma analysis
syntax analysisor calledparsingaccording tothe first component of the lexical unitcreatea tree-shaped intermediate representation, this intermediate representation gives the grammatical structure of the stream of token units produced by lexical analysis . A common way of expressingsyntax tree.
Each internal node in the tree represents an operation, and the node's child nodes represent the components of that operation.

The entire analysis process usedAnalysis methods of context-free grammar. To put it simply, the syntax tree generated by the syntax analyzer is a tree with expressions as nodes. (Context-free grammar is a form of recursion, can be used to guideGramma analysis). This can be achieved using a recursive descent algorithm.

The role of the syntax analyzer:

  • According to the grammatical structure, the first component of each lexical unit is created into a tree-shaped intermediate representation, and the intermediate representation is output.
  • Intermediate representation = lexical unit flow (the first component in each lexical unit) + grammatical structure.
  • The commonly used intermediate representation is a syntax tree.
  • Each internal node in the syntax tree represents an operation, and the node's child nodes represent the components of the operation.

Semantic analysis
Semantic analyzer (semantic analyzer) usesyntax treeandSymbol tableinformation fromCheck whether the source program is consistent with the semantics defined by the language. at the same timeCollection type information, and store this information in the grammar book or symbol table, and use it in the subsequent intermediate code generation process.

An important part of semantic analysis is type checking . The compiler checks each operator to see if it has matching operand components. For example, the definition of many programming languages ​​requires that the following table of an array must be an integer. If you use a floating point number as an array index, the compiler must report an error.

Programming languages ​​may allow certain type conversions , which are calledAutomatic type conversion (coercion). For example, a binary arithmetic operator can be applied to a pair of integers or a pair of floating point numbers. If this operator is applied to a floating point number and an integer, the compiler can convert (or automatically type cast) the integer into a floating point number.

Intermediate Code Generation
After syntactic and semantic analysis of the source program is complete, many compilers generate an explicit low-level or machine language-like intermediate representation . We can think of this representation as a program for some abstract machine . This intermediate representation should have two important properties:

  • should be easy to generate,
  • Can be easily translated into the language on the target machine.

An intermediate representation called a three-address code . This intermediate representation consists of a set of assembly language-like instructions, each with three operational components. Each operation component is like a register.

  • There can be at most one operator on the right side of each three-address assignment instruction. These instructions therefore determine the order in which operations are completed. In source program 1.1, multiplication should be done before addition.
  • The compiler should generate a temporary name to hold the value computed by a three-address instruction.
  • Some three-address instructions have fewer than three operands.

Code Optimization
Machine-independent code optimization steps attempt to improve intermediate code so thatGenerate better object code. "Better" usually means faster, but there may be other goals, such as shorter or less energy-intensive object code.

use aSimple intermediate code generation algorithm,ThenThen perform code optimization stepsis generatedA reasonable approach to quality object code

Code Generation
A code generator takes as input an intermediate representation of a source program and maps it to the target language. If the target language is machine code, then it mustSelect a register or memory location for each variable used by the program. The intermediate instructions are then translated into instructions that accomplish the same taskmachine instruction sequence. A crucial aspect of code generation is the proper allocation of registers to hold the values ​​of variables .

Symbol table management
One of the important functions of a compiler isRecord the names of variables used in the source program and collect information about the various attributes of each name. These properties can provide a namedInformation about storage allocation, its type, scope (i.e. where in the program the value of this name can be used), etc.. For a procedure name, this information also includes: itsThe number and type of parameters, the method of passing each parameter (such as by value or reference), and the return type

The symbol table data structure creates aLog entry. The fields of the record are the various attributes of the name. This data structure should allow the compiler to quickly find the record for each name and quickly store and retrieve the data in the record.

Combining Multiple Steps into Layers
Within a particular implementation, the activities of multiple steps can be combined into a pass . Each pass reads an input file and produces an output file. For example, lexical analysis, syntax analysis, semantic analysis, and intermediate code generation in the front-end steps can be combined into one pass. Code optimization is available as an optional pass. You can then have a backend that generates code for a specific target machine.

Some compiler collections areCreated around a carefully designed set of intermediate representations, these intermediate representations allow us to combine a language-specific front-end with a target-specific back-end . Using these sets, we can combine different front-ends with the back-end of a target machine to build compilers for different source languages ​​on that target machine. Similarly, we can combine a front-end with different target machine back-ends to create compilers for different target machines.

Guess you like

Origin blog.csdn.net/u010523811/article/details/124266396