[Java study notes (110)] Front-end compilation Javac compilation process analysis

This article is published by the official account [Developing Pigeon]! Welcome to follow! ! !


Old Rules-Sister Town House:

One. Front end compilation

(1) Compile classification

       There are three types of Java compilation, one is a front-end compiler, which converts java files into Class files, such as JDK's Javac; the other is a Java virtual machine's Just In Time (JIT, Just In Time), which converts bytes during runtime The code is converted into native machine code, such as the C1 and C2 compilers of the HotSpot virtual machine; one is a static Ahead Of Time Compiler (AOT, Ahead Of Time Compiler), which directly compiles the program into binary code related to the target machine instruction set, Such as Jaotc of JDK.

       There are almost no optimization measures in the front-end compiler. The Java virtual machine concentrates performance optimization into the runtime just-in-time compiler, so that Class files that are not generated by Javac can also enjoy the performance improvement brought by compiler optimization measures. . The optimization of the front-end compiler at compile time can only improve the coding efficiency of developers.

(Two) Javac compiler

       The compilation process is divided into 1 preparation process and 3 processing processes

1. Preparation process

       Initialize the plug-in annotation processor.


2. The process of parsing and filling the symbol table

(1) Lexical analysis, grammatical analysis

       Lexical analysis is the process of transforming the character stream of the source code into a set of marks. A single character is the smallest element in program writing, and a mark is the smallest element at compile time. For example, "int a = b + 2" contains 6 tags, which cannot be split.

       The process of constructing an abstract syntax tree according to the tag sequence during syntax analysis. Abstract Syntax Tree (AST, Abstract Syntax Tree) is a tree representation used to describe the grammatical structure of program code. Each node in the tree represents the code. A grammatical structure, such as package, type, modifier, etc.

       After lexical and grammatical analysis generates a syntax tree, the compiler will no longer operate on the source character stream, and subsequent operations are based on the abstract syntax tree.

(2) Fill the symbol table

       The symbol table is a data structure composed of a group of symbol addresses and symbol information, which can be realized by a hash table. The information registered in the table is used in different stages of compilation, such as semantic checking and generation in the semantic analysis stage. Code.

3. The annotation processing process of the plug-in annotation processor

       After JDK5, Java provides support for annotations, and JDK6 also proposed a standard API of "plug-in annotation processor", which can process specific annotations in the code during compile time, thereby affecting the working process of the front-end compiler. The plug-in annotation processor can be regarded as a plug-in of the compiler, which can read, modify, and add any element in the abstract syntax tree. If these plug-ins modify the syntax tree, the compiler will return to the process of parsing and filling the symbol table. Processing until all plug-in annotation processors cannot modify the syntax tree, and each cycle is called a round.

       Developers can use compiler annotations to process this API to interfere with the behavior of the compiler. Any element in the syntax tree and even code comments can be accessed in the plug-in, so the plug-in annotation processor can have a lot of room for development, such as Lombok, a coding efficiency tool, uses annotations to automatically generate getter/setter methods.

4. Semantic analysis and bytecode generation process

(1) Overview of semantic analysis

       An abstract grammar book can represent a source program with a correct structure, but there is no guarantee that the semantics of the source program are logical. Through semantic analysis, the source program can be checked for context-related properties, such as type checking, control flow checking, data flow checking, etc. Wait. The so-called semantic check refers to whether it conforms to the language specification of a specific language. Therefore, the semantic check is closely related to the specific language and the specific context. The red line marking error prompts we often see when coding is the check of semantic analysis. result.

       The semantic analysis process is divided into annotation inspection and data and control flow analysis.

(2) Mark inspection

       The content to be checked for annotation checking includes whether the variable has been declared before use, whether the data type between the variable and the assignment can be matched, etc., as well as constant folding code optimization, such as optimizing "int a = 1 + 2" to The literal "3".


(3) Data and control flow analysis

       Further verification of the program context logic, such as whether the local variable is assigned before use, whether each path of the method has a return value, and so on.

(4) Syntactic sugar

       Syntactic sugar is a certain grammar added to the language. This grammar has no actual effect on the compilation results and functions of the language, but it can facilitate developers to use the language and reduce the amount of code. Java is a low-sugar language, that is, a more verbose language. The language sugar in Java includes generics, variable length parameters, automatic boxing and unboxing, etc. However, the Java virtual machine does not support these grammars when it is running. They are restored to the original basic grammatical structure during the compilation phase. This process is called unsyntactic sugar.

(5) Bytecode generation

       Bytecode generation is the last stage of the Javac compilation process. The information (syntax tree, symbol table) generated in the previous steps is converted into bytecode instructions and written to the disk, and a small amount of code addition and conversion work is also performed. For example, the instance constructor () method and the class constructor () method are added to the syntax tree at this stage. After completing the traversal and adjustment of the syntax tree, the symbol table filled with all the required information will be handed over to In the ClassWriter class, the bytecode is output by the writeClass() method of this class to generate the final Class file.

Guess you like

Origin blog.csdn.net/Mrwxxxx/article/details/112795785