Engineering a Compiler reading notes (1)

Insert picture description here


Preamble:

A modern optimizer contains various technologies. The compiler uses a greedy heuristic search to explore a large solution space, uses a deterministic finite automaton to identify words in the input, a fixed point algorithm is used to judge the behavior of the program, and the program is predicted by theorem and algebraic reducer to predict The value of the expression. Compilers use fast matching algorithms to map abstract calculations to machine-level operations. They use linear Diophantine equations and Presberg arithmetic to analyze array subscripts. The compiler uses a lot of classic algorithms and data structures, such as hash tables, graph algorithms, and sparse set implementation methods, etc.


Chapter 1: Overview of compilation

Introduction

|| A compiler is a computer program (similar to an OS) that is responsible for converting a program written in one language into a program written in one language. The main components of the compiler are: compiler, interpreter, automatic conversion

|| Roadmap for concept implementation:
In order for the compiler to implement its language conversion function, then it must have the following functions:

  • Understand the form and content of input language (ie grammar and semantics).
  • Understand the form and content of the output language (ie grammar and semantics).
  • Mapping scheme: the rule of mapping the source language to the target language

|| From the above functional requirements, we can get the structure of the compiler:

  • Front end: used to process the source language
  • Backend: used to process the target language
  • Intermediate form: connecting front and back ends
  • Optimizer: improved optimization conversion, used to analyze and rewrite intermediate forms

|| Explanation of programming language:
Function: We use programming language to express calculation as a sequence of operations. A computer program is an abstract sequence of operations written by a programming language.
Features: 1. The programming language is a formal language used to accurately represent calculations. It is a language that does not allow ambiguity. 3, often with high abstraction

|| If the compiler still outputs the programming language for humans, not the assembly language of the computer. Which is called the converter by the source to the source of

|| The difference between interpreter and compiler:

  • The input to the compiler is an executable specification, and the output is another executable specification
  • The input of the compiler is an executable specification, and the output is the result of executing the specification

Insert picture description here
Insert picture description here

|| The interpreter and the compiler have in common:

  • It is necessary to analyze the executable specifications entered to determine whether they are valid
  • An internal model will be established to represent the structure and semantics of the input.
  • To determine where to store the value during execution

|| The difference between some languages ​​in the conversion scheme:
APL, Scheme, more is implemented by the interpreter, rather than the compiler
Java, which includes both compilation and interpretation: (The following roughly represents the process)

  1. Java source code is compiled into a form called bytecode
  2. Run the bytecode execution program on the corresponding java virtual machine (JVM), the JVM is a bytecode interpreter

|| The basic principles of the
compiler : 1: The compiler must preserve the semantics of the compiled program — the preservation is to maintain correctness during the compilation process (prevent ambiguity)
2: The compiler must be improved in a way that the class perceives Enter the program. For example, the source-to-source converter of the C language is due to the input program to a certain extent. The input program is improved to the source program to make it more usable and general


Compiler structure

|| Structure description:
The work of the front end involves understanding the source program and recording its analysis results in the form of IR;
the work of the optimizer focuses on improving the form of IR;
the work of the back end is to map the optimized IR to a limited Resource

|| The actual performance of the compiled code depends on: the quality of the interaction between the optimizer and the technology used in the back-end two stages, decided jointly (not the optimization function is good, the mapping function is strong, the compilation effect is good , The connection is important)
translater
Insert picture description here


Conversion overview

In the front end:
|| Reason: The source code needs to be translated first

|| Conversions performed in the front end:

  1. Grammar check:

    || What is grammar: The
    definition of a finite set of rules, called "grammar". We usually a sentence by parts of speech, which can be described by a single number sentence grammar rules Insert picture description here
    for || distal twice in separate treatment, referred to as "lexical analyzer" and "parser", enter the code to determine whether in fact Belongs to the valid set in the grammar definition

    || Part-of-speech analyzer:
    classify and classify: identify each word in a sentence, and classify each word into the corresponding part of speech, and classify in the form of a pair of (p, s) (p indicates the word s The
    role of the word class) : divide the sentence into a stream of categorized words

    || Grammar Analyzer:
    Perform parsing (grammar analysis work): Match the stream of classified words according to the grammar rules of the specified input language, and perform deduction
    : determine whether the input stream is a sentence in the source language Language grammar sentences).

    || Type Check
    Perform type checking: type judgment (string / integer ...) on
    well -structured sentences : check whether the use of names in the input program is consistent in type

  2. The intermediate representation
    generates the IR form of the code. Since various types of IR forms are generated, it will involve the selection strategy.


In the optimizer:

|| Reason: When the IR program runs, the statements are executed one by one according to their order in the source code, and the code will be executed in a more limited and predictable context.
Function: In order to analyze the IR code more effectively, the optimizer will analyze the IR form of the code to discover the facts about the context, and use this knowledge about the context to rewrite the code so that it can obtain the same answer more effectively.

|| Conversions that occur during optimization

  1. Analysis: determine where the program is safe and beneficially apply optimization techniques
    Commonly used analysis techniques: data flow analysis / correlation analysis

  2. Conversion: Rewrite the analyzed code

|| Examples In the
Insert picture description here
backend:

|| Reason: The backend will traverse the optimized IR code. For each IR operation, it selects the corresponding target machine operation to achieve it, determines which values ​​can reside in the register and which values ​​need to be placed in memory, and inserts the code to implement these decisions, and chooses an efficient execution Order.

|| Conversions happening in the backend

  1. Instruction selection:
    Rewrite the IR operation as the target operation, this process is called instruction selection

  2. Register allocation: (Minimize memory) In the
    instruction selection phase, the compiler intentionally ignores the fact that the target machine has limited registers. Therefore, this stage will rewrite the code to achieve the allocation of register resources

  3. Instruction scheduling (minimizing time)
    rearranges the order of instructions, minimizing the time wasted by instructions waiting for the operand to lie

Published 76 original articles · Like 94 · Visitors 20,000+

Guess you like

Origin blog.csdn.net/a13352912632/article/details/105561218
Recommended