Summary of Compilation Principles ("Compilation and Decompilation Technology")

mind Mapping

Insert picture description here

Introduction

What is a compiler?

A program that converts a certain high-level language program into another low-level language program equivalently

What are the stages of the compilation process?

Lexical analysis, syntax analysis, semantic analysis and intermediate code generation, intermediate code optimization, target code generation

Draw the structure diagram of the compiler.

Symbol table manager-lexical analyzer-error handling program
syntax analyzer
semantic analysis + intermediate code generator
optimizer
target code generator

What is the difference between a compiler and an interpreter?

The interpreter program either executes the source program directly while interpreting it, or executes it after translating the source program into some intermediate representation. The control is in the interpreter; the
compiler translates the source program into a target language program, and then runs the target program on the computer, and the control is in the target program.

Why use the concept of universal and phase?

Pass: Scan the source program or the intermediate representation from beginning to end once, and do the related processing, the new intermediate result or target program of the shape process.
There can be several stages in one pass, and one stage can be
divided into several passes in order to make the program structure better. Clarity makes the program more readable.

lexical analysis

Input and output of word classification and lexical analysis

Classification: basic words, identifiers, constants, operators, delimiters
Input: source program output: word symbols

Word formation rules

Representation: word type + word symbol attribute value.
Word type is commonly represented by integer encoding:

1. If a category has only one word symbol:
the category code represents the word symbol.
Generally speaking, the basic characters, operators and delimiters are all one symbol. ·If there are multiple word symbols in a category,

2. If there are multiple word symbols in one category:
for each word symbol, the category code and attribute information are given.
Identifiers are listed in a single type: attributes are pointers to store its related information.
Constants are classified by type: the value of a constant is expressed in standard binary form.

The concepts of grammar and language and their mutual derivation

Language: Any subset L of an alphabet is called a language of the alphabet, and a value in the language is called a sentence.
Grammar: The formal rules used to describe the grammatical structure of a language.
The language of the grammar: the set of all terminal strings produced by the grammar.

The leftmost and rightmost derivation, find the syntax tree

Leftmost derivation: Replace the leftmost nonterminal
every time. Right derivation: Replace the rightmost nonterminal every time.

Syntax tree: each node is Vt, Vn, the root node is S, and the leaf node is Vt

Ambiguity

Grammatical ambiguity: There are two grammatical trees for a sentence in a grammar.
Language ambiguity: There is a problem of grammatical ambiguity.

Grammar classification:

Such as mind map
type 0, 1, 2, 3

Regular expressions, regular sets:

For the alphabet, if a is a letter, a is a normal set
. The intersection, union, union, etc. of the normal set are all normal sets.

Automata

Introduction: A model that reads the input symbols one by one, and jumps to the state according to the input.

The structure of DFA and NFA

1. Formal form-DFA
2. NFA determinization
3. DFA minimization

grammar

input Output

Input: word symbol
Output: parse tree

Top-down problems and solutions

Problem: left recursion, backtracking, false matching, unsuccessful analysis, not knowing the location of the error, low efficiency, and high cost

Solution: change left recursion to right recursion, extract left factor to eliminate backtracking, FIRST+FOLLOW, LL(1), predictive analysis table

Bottom-up core issues

Identify reducible strings for reduction

Canonical reduction and rightmost derivation

Normative statute: the reverse process of
derivation on the right side . Sentence pattern for derivation of normative statute: canonical sentence
pattern. Essence of normative statute: When a handle appears, just statute

Phrases, direct phrases, sentence patterns, handles

Phrase: a subtree of the grammar tree
Direct phrase: a phrase at the same level of the grammar tree
Sentence pattern: a string containing Vt or Vn
Handle: the leftmost direct phrase

Prefix, live prefix, valid item

Prefix: string prefix
Live prefix: a prefix of the handle
Valid items: string B derived from the live prefix A, B is the valid
item set specification family of A: A bunch of items, the items pass -> form a DFA, DFA can Identify live prefixes

Put semantic analysis into syntactic analysis

Grammar-guided translation

Combine static semantic checking and intermediate code generation into syntactic analysis.

Grammar-guided definition

Based on the context-free grammar, Vt and Vn are equipped with a set of attributes, and Vn->Vt is equipped with a set of semantic rules.

Attribute grammar

The semantic rule function does not have a grammatically guided definition of side effects.
Non-side effects: only the attribute value is calculated

SL attribute grammar

Comprehensive attributes: The comprehensive attributes of the non-terminal symbol A on the analytic tree node N can only be defined by the child nodes of N or the attributes of N itself.
Inheritance attribute: The inheritance attribute of the non-terminal symbol A on the analytic tree node N can only be defined by the attributes of N's siblings, parent nodes or N itself.
S-attribute grammar: A grammar-guided definition uses only comprehensive attributes.
L attribute grammar: A grammar-guided definition uses only inherited attributes.

Translation mode

Associate attributes (values) with grammatical symbols (Vt, Vn), and use "{}" to deficient the semantic rules, and insert them into the right part of the production to describe the language structure.

Abstract syntax tree

The information unnecessary for translation is removed from the syntax tree, so as to obtain a more effective source program intermediate identification. This transformed syntax tree becomes an abstract syntax tree.

Intermediate language representation and grammar-guided translation examples

Why use intermediate language

Facilitate code optimization work that has nothing to do with the machine,
compile consistent, and
make the structure of the program logically simple and clear

Intermediate language representation

Inverse Polish expression, ternary, indirect ternary, quaternary

Examples of sentence translation

Explain the translation of sentences The translation of
assignment sentences The translation of
control sentences

Symbol table

Symbol table composition

Name column
Information column: record different attributes (type, species, size, storage pointer)

Symbol table function

1. Register identifier attribute information
2. Find symbol attributes, check symbol context semantic legitimacy
3. As a basis for address allocation for target code generation

optimization

purpose:

Produce more efficient code

in principle:

Equivalent, effective and economical

Commonly used techniques:

Mind map

Basic block concept

A sequence of statements executed sequentially in a program, in which there is only one entry and one exit. The entry is the first statement and the exit is the last statement

Object code generation

Representation

Machine language, assembly language, machine language module to be assembled

Characteristics of each form

Machine language: All addresses have been located.
Assembly language: need to be assembled by assembler and converted into machine language.
Machine language modules to be assembled: Link them with some running programs and convert them into machine semantics.

Target code execution

Activity record

A continuous storage area to store the dynamic information needed for one execution of the process

Storage allocation strategy

Static storage allocation
Dynamic storage allocation

Stacked storage allocation

1. Static chain and activity record
2. Nested level display table display

Heap storage allocation

Allow data objects to be allocated and released freely

Guess you like

Origin blog.csdn.net/qq_42882717/article/details/112178588