Analysis of Java compiler optimization and runtime optimization technology

1. java compiler optimization

1. JVM compilers can be divided into three compilers: 
    1) Front-end compiler: the process of converting .java to .class. Such as Sun's Javac, Incremental Compiler (ECJ) in Eclipse JDT. 
    2) JIT compiler: The process of converting bytecode into machine code, such as the C1 and C2 compilers of HotSpot VM. 
    3) AOT compiler: static advance compiler, the process of compiling local machine code directly from *.java files.

Javac compiler

  The Javac compiler itself is a program written in the Java language.

               

2. Parse and fill the symbol table

The parsing step is completed by the parseFiles() method. The parsing step includes two processes: lexical analysis and syntax analysis.

  • Lexical Analysis and Syntax Analysis 
    Lexical analysis: Convert the character stream of the source code into a set of tokens. A single character is the smallest element of the program writing process, while the token is the smallest element of the compilation process. Keywords, variable names, literals , and operators can become tags. In the Javac source code, the lexical analysis process is implemented by the com.sun.tools.javac.parser.Scanner class. 
    Syntax analysis is the process of constructing abstract syntax tree according to Token sequence. Abstract syntax tree is a tree -like representation used to describe the syntax structure of program code . Each node of the syntax tree represents a syntax structure in the program code, such as packages, types, modifiers, interfaces, return values ​​and even code comments can be a syntax structure. The parsing process is implemented by the com.sun.tools.javac.parser.Parser class, and the abstract syntax tree produced at this stage is represented by the com.sun.tools.javac.tree.JCTree class. After this step, the compiler is basically There will be no more operations on the source code files, and subsequent operations are based on the abstract syntax tree.
  • Filling the symbol table 
    After completing the abstract syntax tree, the next step is the process of filling the symbol table, that is, the enterTrees() method. A symbol table is a table composed of a set of symbol addresses and symbol information, similar to the form of KV value pairs in a hash table. The information registered in the symbol table is used in different stages of compilation. When addressing symbol names, the symbol table is the basis for address assignment. The filling process is implemented by the com.sun.tools.javac.comp.Enter class.

Annotation Processor

       After JDK1.5, Java provides support for annotations, which play a role at runtime just like normal Java code. 
  With the standard API for compiler annotation processing, our code may interfere with the behavior of the compiler. Since any element in the syntax tree, even code comments can be accessed in the plug-in, we use plug-in annotation processing. The device has a lot of room to play in function.

Syntax analysis and bytecode generation

  After parsing, the compiler obtains an abstract syntax tree representation of the program code, which can represent a well-structured source code abstraction. The main task of semantic analysis is to perform context-sensitive inspections of structurally correct source programs, such as type inspection. 
  In the Javac compilation process, the syntax analysis process is divided into two steps: annotation inspection and data and control flow analysis , which are completed corresponding to the attribute() and flow() methods respectively.

  • Annotation check 
    The content checked in the annotation check step includes, for example, whether the variable has been declared before use, whether the data type between the variable and the assignment can match, and so on. Additionally, there is an important step in this process called constant folding. 
    The implementation classes of the annotation check step in the Javac source code are the com.xun.tools.javac.comp.Attr and com.sun.tools.javac.comp.Check classes.
  • Data and control flow analysis 
    Data and control flow analysis is a further verification of the program context logic. It can find out, for example, whether the programmer's local variables are assigned before use, whether each path of the method has a return value, whether all the Checked exceptions are correctly handled and so on. The purpose of data and control flow analysis at compile time is basically the same as that of data and data flow analysis at class loading, but the scope of verification is different. Some verification items can only be performed at compile time or runtime. If a local variable is declared final, it has no effect on the runtime. The invariance of the variable is only guaranteed by the compiler during compilation. In the source code of Javac, the entry of data and control flow analysis is the flow() method. The specific operation This is done by the com.sun.tools.javac.comp.Flow class.
  • Unsyntactic sugar 
    Syntactic sugar refers to adding a certain grammar to a computer language. This grammar has no effect on the function of the language, but is more convenient for programmers to use. 
    Java is a "low-sugar language", and the commonly used syntactic sugars are the aforementioned generics, variable-length parameters, automatic boxing/unboxing, etc. These syntaxes are not supported by the virtual machine runtime, they are reverted back to simple basic syntax structures at compile time, a process called de-syntax sugar. The process of desugering is triggered by the desuger() method.
  • Bytecode generation 
    Bytecode generation is the last stage of the Javac compilation process, which is completed by com.sun.tools.javac.jvm, Gen classes. The bytecode generation stage is not only about the information generated by the previous steps ( Syntax tree, symbol table) is converted into bytecode and written to disk, and the compiler also performs a small amount of code addition and conversion work. 
    After completing the traversal and adjustment of the syntax tree, the symbol table filled with all the required information will be handed over to the com.sun.tools.javac.jvm.ClassWriter class, and the bytecode will be output by the wrtieClass() method of this class. Generate the final Class file.

Java syntactic sugar

Generics and type erasure

  Generics is a new feature of JDK1.5. Its essence is the application of parameterized types , that is to say, the data type to be operated is specified as a parameter, and this parameter type can be used in the creation of classes, interfaces and methods. , called generic classes, generic interfaces, and generic methods, respectively. 
  Unlike C#'s generics, Java's generics only exist in the program source code, and in the compiled bytecode file, it has been replaced with the original native type, also known as the naked type, and in the corresponding Place cast code inserted . Therefore, for the Java language at runtime, ArrayList and ArrayList are the same class, so the generic technology is actually a syntactic sugar of the Java language. The generic implementation method in the Java language is called type erasure. Based on this Generics implemented in this way are called pseudo-generics. Therefore, when List and List are used as parameters, erasure makes the signatures of the two identical, and sometimes the method with the two method parameters cannot be overloaded. It is worth noting that when the above situation occurs, if the return values ​​are different, the two methods can exist in a Class file. To sum up, if the two methods have the same name and feature signature, but If the return value is different, they are also legal and can coexist in a Class file. 
  The so-called erasure of the erasure method is only to erase the bytecode in the code attribute of the other party. In fact, the generic information is still retained in the metadata, which is also the fundamental basis for us to obtain parameterized types through reflection.

Autoboxing, unboxing, and traversal loops

  Automatic boxing and unboxing are converted into corresponding packaging and restoration methods after compilation, such as Integer.valueOf() and Integer, intValue() methods, and the traversal loop restores the code to the implementation of the iterator, which This is also the reason why traversing the loop requires the traversed class to implement the Iterable interface. 
  The "==" operations of wrapper classes are not automatically unboxed without encountering arithmetic operations, and their equals() methods do not handle data conversion relations.

Conditional compilation

  The Java language uses an if statement with a constant condition. The if statement in this code is different from other Java code. It will be executed during the compilation phase, and the generated bytecode only contains the correct part of the condition. 
  The implementation of conditional compilation in the Java language is also a syntactic sugar of the Java language. According to the true or false value of the Boolean constant, the compiler will eliminate the code blocks that do not hold in the branch, which is implemented in the de-syntax sugar phase.

  There are quite a few other language sugars in the Java language, such as inner classes, enum classes, assert statements, switch support for enums and strings, defining and closing resources in try statements, and more.

 

2. Runtime optimization

       Java programs are initially interpreted and executed through the Interpreter. When the virtual machine finds that a method or code block runs very frequently, it will identify these codes as "Hot Spot Code". In order to improve the execution efficiency of hot code, at runtime, the virtual machine will compile these codes into machine code related to the local platform, and perform various levels of optimization. The compiler that completes this task is called a just-in-time compiler (Just In Time Compiler, hereinafter referred to as JIT compiler).

       When the memory resource limitation in the program running environment is relatively large (such as in some embedded systems), you can use interpreted execution to save memory, and vice versa, you can use compiled execution to improve efficiency.

                    

 

 

 

 

 

 

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325482106&siteId=291194637