[Compilation Principle Notes] Chapter 1 Introduction


Programming Languages ​​and Compilation

The Origins of Compilation: Programming Languages

High-level language : A language that is closer to natural language or mathematical language than machine code or assembly language, including many programming languages, such as Fortran, Pascal, Java, C, C++, C#, python, etc., which are highly encapsulated programming languages .
Features: independent of specific machines, good portability, low requirements for users, easy to use, easy to maintain, etc.
Low level language (Low level Language): including bit code, machine language, assembly language.
Features: It is related to a specific machine and has high efficiency, but the use is complicated, cumbersome, time-consuming, and error-prone.
Assembly language : An abstraction of machine language, a low-level language used in electronic computers, microprocessors, microcontrollers, or other programmable devices. In different devices, assembly language corresponds to different machine language instruction sets. An assembly language is dedicated to a certain computer system structure, unlike many high-level languages, which can be ported between different system platforms.
Machine language : A set of machine instruction system instructions that can be directly recognized and executed by a computer expressed in binary code.

Appearance of compilers : programs written in high-level languages ​​cannot be executed immediately by the computer, but must be processed by a "translation program" and converted into equivalent machine language programs before the machine can execute them. Such a translation program is called a "compiler".
There are several steps to execute a high-level language program on the computer: ① Translate
the high-level language program into a machine language program; ② Run the obtained machine language program to obtain the calculation result.


Conversion of programming languages

Translation : It refers to the ability to convert a source program in a certain language into another language program— the target language program without changing the semantics . It refers to translators of various languages, including assembler and compiler , and is a general term for assembler, compiler and various conversion programs. Source Program : A program written in assembly language or high-level language is called a source program. Object program : A program expressed in the target language . Target language : It can be an "intermediate language" between the source language and machine language, it can be the machine language of a certain machine, or it can be the assembly language of a certain machine. Two implementations of the translation process (different implementation mechanisms, similar purpose):



insert image description here

Interpretation : Accept a sentence input in a high-level language, explain it and control the computer to execute it, get the execution result of this sentence immediately, and then accept the next sentence. In Basic language, type one sentence and explain one sentence.
Process : The source program is used as input, the target program is not generated, and it is executed while explaining. Intuitive and easy to understand, simple structure, easy to realize man-machine dialogue, but low efficiency.
explain the working process

Compilation : It refers to the conversion from a high-level language to a low-level language , and the entire translation of the program in the high-level language. C language and Pascal language are generally the compilation process.
Conversion process :
two-stage conversion : compile-run, compile directly generates machine language
insert image description here
three-stage conversion : compile-assemble-execute, compile to generate an assembly language program, and the assembly language can be executed after being assembled.
three phases
Assembler : If the source program is written in assembly language, and the program expressed in machine language is obtained through the translation program, the translation program at this time is called assembler, and this translation process is called "Assemble".
Compiler : If the source program is written in a high-level language and processed to obtain the target program, this translation process is called "compile".
Both assembler and compiler are translation programs, the main difference is the different processing objects. Due to the simple format of assembly language, there is often a one-to-one correspondence between assembly language and machine language, and the translation work to be done by assembler is much simpler than that of compiler.
Compile-interpret execution
Compiler generation :
compiling the compiling program directly in machine language;
compiling the compiling program in assembly language (the core part of the compiling program is usually written in assembly language);
compiling the compiling program in high-level language (commonly used).

Self-compilation (snowball)
compilation tools: LEX (lexical analysis editor), YACC (syntax analysis, automatic generation of LALR analysis tables)
transplantation: compilers of the same language are transplanted between different types of machines.


PL/0 General Introduction (※Experimental)

Reference: Syntax description of PL/0 language

PL/0 Compiler Overall Structure

Organization of the T-graph
insert image description here
PL/0 compiler : a unilateral compiler centered on syntax and semantic analysis programs.
The ENBF of the PL/0 language means Fragment-Determining Word-Formation Rules-Branch.
Type Context Constraints and Scoping Rules - Semantic Rules.
insert image description here

Pascal Language Features

The syntax is clear and the semantics are straightforward. Typical nested structure language. Nested calls and definitions are allowed, declare before use.
Link: [Reprint] Introduction to pascal language

PL/0 program sample

insert image description here
Including the description part and the procedure part:
- The description part includes constant and variable definition (red box) and the procedure description part (blue box)
- For the defined main program procedure description, the local procedure (Q) of the inner layer can still be nested
- procedure Part of the main program entry, PL/0 program execution starts from here (yellow box)

var m,n,r,q;
{求最大公约数}
	procedure gcd;
	begin
		while r#0 do
			begin
				q:=m/n;
				r:=m-q*n;
				m:=n;
				n:=r;
			end;
	end;

begin
	read(m);
	read(n);
	if m<n then
	{为了方便规定m>=n}
		begin
			r:=m;
			m:=n;
			n:=r;
		end;
	begin
		r:=1;
		call gcd;
		write(m);
	end;
end.

P-code-like virtual machine

The PL/0 compiler is a non-real compiler, and the generated object code is not a real assembly (CPU instruction). It needs a "CPU" that can understand instructions, that is, a P-code virtual machine, that is, the code interpreter function part .
insert image description here

(Write here first, I just learned it, and I haven’t understood it yet)


Compiler overview

Similar to natural language translation.

Compiler's job

It is customary to divide the compilation process into five basic stages: lexical analysis, syntax analysis, semantic analysis, and generation of intermediate code, code optimization, and generation of target programs.
insert image description here

lexical analysis

Analyze and scan the source program (character sequence), analyze and recognize words according to the lexical rules of the grammar, and output them in a certain encoding form ( converted into a unified specification for later use).

单词:是语言的基本语法单位,具有实在意义。
一般语言有四大类单词:
	基本字:命令组成部分,语言定义的关键字或保留字(如BEGIN、END、IF)。
	标识符:用户自己定义的函数名、过程名、变量名和常量名。
	常数:整个程序运行中不变的部分。
	分界符:
			运算符:如+、-、*、/、;、(、) ……   
			界限符:分割开标识符、语句等,如分号和括号。

Format after conversion: (class number, internal code)
Effective tools for describing lexical rules: regular form (determine whether a word conforms to the regular description specification) and finite automaton (comparison).
Conversion specifications are described in Handling of tables.

Gramma analysis

According to grammatical rules (that is, language grammar, which stipulates how words constitute grammatical units), word symbols are combined into various grammatical units (phrases, clauses, sentences, procedures, programs), and various grammatical components are analyzed and identified, such as Expressions, various descriptions, various statements, procedures, functions, etc., and check for grammatical correctness .
Representation of grammatical rules:
BNF: A::=B|C (A定义为B或C)
insert image description here
grammatical analysis methods: derivation and reduction. Rightmost derivation, leftmost reduction. It can also be represented by a syntax tree.
insert image description here

Semantic analysis and intermediate code formation

Semantic analysis is carried out on various grammatical components identified, and corresponding intermediate codes are generated.
Intermediate Code: A form of intermediate language between the source language and the target language that identifies grammatical categories (phrases, clauses, statements, procedures, procedures). The purpose
of generating the intermediate code : to facilitate optimization processing; to facilitate transplantation of compiled programs. It is divided into two phases : ① Static semantic inspection: The syntax analysis recognizes that it is an assignment statement. The semantic analysis first needs to analyze the semantic correctness, for example, to check whether the types in the expression and on both sides of the assignment number are consistent. ②Intermediate code translation: Generate intermediate code according to the semantics of the assignment statement. That is to use one language form to replace another language form, which is a key step in translation. (The essence of translation: semantic equivalence) The form of the intermediate code : Compiler programmers can design it by themselves, and the commonly used ones are quaternary, ternary, and reverse Polish representation. Quaternary (three-address instruction)





insert image description here

code optimizer

Improve the quality of the target program. It mainly considers the extraction of common subexpressions, merging known quantities, deleting useless statements, loop optimization, etc.
Principle: equivalent transformation.
insert image description here

Generate object program

The object program (address instruction sequence) is generated from the intermediate code.
insert image description here

目标代码的形式:
-绝对指令代码:可以立即执行的目标代码。
-汇编指令代码:汇编语言程序,需要通过汇编程序汇编后才可运行。
-可重定位指令代码:先将各目标模块连接起来,确定变量、常数在主存中的位置,装入贮存后才能成为可以运行的绝对指令代码。

不同机器对应不同的绝对指令代码。

Forms and Form Management

Form function : used to record various information of the source program and various conditions of the compilation process. Register the information in the source program and the information generated during the compilation process in the table in time, and look up the information in these tables at the same time during the subsequent compilation process.
The tables related to the first three stages of compilation are: symbol table, constant table, label table, subprogram entry table, and intermediate code table.

Symbol table management
Symbol table : register constant names, variable names, array names, procedure names, etc. in the source program, and record their properties, definitions and references.
Constant table : Each type of constant is a table, registering various constant values.
List of labels : definition and application of registration labels.
Entry name table : the layer number of the registration process, the entry of the sub-program symbol table, etc.
Intermediate code table : Generate the table generated by the intermediate code (quaternion in the example).

error handling

Diagnose source program errors, and report the nature and location of user errors, so that users can modify source programs. There are special error handling procedures to complete.
Error Type :
Syntax Error: Detected during the lexical analysis and parsing phases.
Semantic errors: generally detected in the semantic analysis stage.
Logic error: the compiler cannot detect it, and it will not be processed during compilation. For example, an infinite loop (the loop condition is always true or the loop scope is infinite).


compilation process

Pass (PASS)

Scan the source program (including the intermediate form of the source program) from beginning to end, and perform relevant processing to generate a new intermediate form of the source program or target program, which is usually called one pass.
insert image description here
The difference between pervasive and basic stages

五个基本阶段:将源程序翻译为目标程序在逻辑上要完成的工作。 
遍:是指完成上述5个基本阶段的工作要经过几次扫描处理。

Variable scanning can save memory space, improve the quality of object code, and make the logical structure of compilation clearer; but the compilation time is longer. Use as few passes as memory permits.
A one-pass scan compiler that can complete the entire compilation work is called a one-pass scan compiler. The unilateral scanning lexical grammar is mixed and executed, and the logic is not very clear.
insert image description here


compiler constructs

Elements: source language, target language, compilation method. (analogous to foreign language translation)

front end and back end

According to the functions of each part of the compiler, the compiler is divided into a front end and a back end.
insert image description here
Reason for separation: In the development process of multiple source languages ​​and target languages, flexible collocation can eliminate workload and improve development efficiency.

The Importance of Intermediate Representations

insert image description here

Compiler's pre- and post-processors

Source program: multiple files, macro definitions and macro calls, include files.
Object program: generally assembler or relocatable machine code.
insert image description here


Application of compilation technology

Syntax-guided structured editor
Program formatting tools
Software testing tools
Program understanding tools
High-level language translation tools

Guess you like

Origin blog.csdn.net/qq_45973306/article/details/123152607