Featured Foreign Videos: Introduction to Compilation Principle 3

Gramma analysis

The purpose of this stage is to find any grammatical errors in the tokenized (word-segmented) program. Grammatical errors refer to content that is broken into language rules

Analyzing the grammar of a program is like checking whether a simple sentence written in English has semantics.
Insert picture description here
This sentence is valid because it conforms to certain regulations.
Before we apply the grammatical rules of a programming language to a marked (word-divided) program, this The grammar of a particular language must have been defined in some way, and it
needs to be written in a way that everyone recognizes.

In the definition language of the program, it is usually completed by a so-called context-free language system, BNF (Backus Normal Form), which is widely used to describe context-free grammar

Insert picture description here
Here we use separators to separate these optional structural forms. This is also the rule we usually use when writing code.

Insert picture description here
In this particular language, a condition is composed of two expressions and a relational operator

Insert picture description here
The parser used for a specific programming language usually implements the relevant context grammar in the logic of its code, and the compiler does not use BFN

Insert picture description here
For an advanced program, its various structures and structures can use the nodes and branches of the parse tree as intermediate representations

Behind this, the parse tree is a structure based on temporary memory, in which each node can be implemented as a class instance

The parser checks each token from the lexical analyzer (those elements obtained from the lexical analysis), and then distinguishes what type of token they are. Each token will be analyzed programmatically, and each token will be implemented in the design of the grammar. Called in, then, the lexical unit information will be inserted into the appropriate position on the analysis tree

In the process of constructing the parse tree, the parser will check whether the grammar of the language is normally used in the source code.
For example, if the If keyword is not followed by a valid condition that meets the grammatical definition, or the end if keyword is missing, then it will An error message is generated, and then the compilation fails. The compiler will also check whether the identifier is correct. For example, the variable name should start with a letter, and the parentheses and quotation marks should be used in pairs. The so-called recursive descending parser will start from top to bottom. Building a parse tree A
parse tree is also called a concrete syntax tree, because it contains almost all the information provided by the lexical analyzer and the syntax definition of the token recognized by the parser.
If you look at the leaf nodes of the entire tree, in these messy grammars You can see that our source code snippets
Insert picture description here
have limited grammar checks during the construction of a parse tree.

The process of analyzing the meaning of an English sentence is called semantic analysis

Semantic Analysis

Insert picture description here
Insert picture description here
For local variables and parameter memory addresses, the relative offset will be offset based on the stack starting position of the program execution. The compiler implementation cannot determine how much space will be occupied, so the memory must be dynamically allocated. This process occurs at runtime Yes, the dynamically allocated identifier is usually allocated to a memory area called the heap. Of course, where the variable will be in physical memory depends on the operating system where the program is running, because in all stages of the compilation process The symbol table is frequently accessed and modified, so the symbol table needs to be implemented with a data structure that allows fast access. Inheritance, the symbol table is usually implemented with a hash table, and each symbol can be used as the key of the Hash function , And the various attributes of the symbol (such as a record or the attributes of an object) can be stored in the Hash table. Some compilers use an ordered linear list to implement the symbol table, and you can use binary search

Reverse Polish: no parentheses required for compilation

Insert picture description here

Guess you like

Origin blog.csdn.net/weixin_44522477/article/details/112061578