Lombok is often used, but do you know how it works? (two)

In the last article Lombok is often used, but do you know what its principle is? The annotation processor is briefly introduced, which is a tool for processing annotations at compile time. We just generated some code by ourselves, but it is different from Lombok, because Lombok adds some classes on the basis of the original classes. How does Lombok modify the content of the original class? Next, we will further understand the principle of Lombok.

Javac principle

Now that we operate on classes at compile time, we need to understand what Javac does to programs in Java. Javac's process of compiling code is actually written in Java. We can view its source code for a simple analysis, how to download the source code, and I won't analyze the Debug source code here. I recommend an article that is well written. Javac source code debugging tutorial .

The compilation process is roughly divided into three stages

Parse and fill symbol table
Annotation processing
Analysis and Bytecode Generation

The interaction process of these three stages is shown in the following figure.

Parse and fill symbol table

This step is two steps, including parsing and filling symbols, where parsing is divided into two steps: lexical analysis and syntax analysis .

Lexical Analysis and Syntax Analysis

Lexical analysis is to convert the character stream of the source code into a set of tokens (Token) in Java. A single character is the smallest element in the program writing process, and a token (Token) is the smallest element in the compilation process. Keyword, variable name , literals, and operators can all be tokens. For example, in Java int a = b+2, this code represents 6 tags Token, namely int、a、=、b、+、2. Although the keyword int is composed of three characters, it is just a Token and cannot be split.

Syntax analysis is the process of constructing an abstract object tree according to the Token sequence. The abstract syntax tree is a tree representation method used to describe the syntax structure of the code. Each node of the syntax tree represents the A syntactic construct such as a package, type, modifier, operator, interface, return value or even a code comment is a syntactic construct.

The tree structure analyzed by the syntax analysis is represented by the JCTree origin , and we can see what its subclasses are.

We build a class by ourselves, and we can observe what kind of structure it is represented by a tree structure during the compilation process.

public class HelloJvm {

    private String a;
    private String b;

    public static void main(String[] args) {
        int c = 1+2;
        System.out.println(c);
        print();
    }

    private static void print(){

    }
}

Everyone pay attention to where I draw the red line, you can see that these are all subclasses of JCTree. We can know JCCompilationUnitthat regarded as the root node, and then used as the constituent elements of the class, such as methods, private variables, and class classes, which are all used as a kind of tree composition.

Fill symbol table

> Filling the symbol table has little to do with our Lombok principle, just understand it here.

After completing the syntax analysis and lexical analysis, the next step is the process of filling the symbol table. The symbol table is a table composed of a set of symbol addresses and symbol information. It can be imagined in the form of KV value pairs in the hash table (symbol The table is not necessarily a hash table implementation, it can be an ordered symbol table, a tree-like symbol table, a stack-structured symbol table, etc.). The information registered in the symbol table is used in different stages of compilation. In semantic analysis, the content registered in the symbol table will be used for semantic checking (such as checking whether the use of a name is consistent with the original description) and generating intermediate code. In the object code generation stage, the symbol table is the basis for address allocation when the symbol name is allocated.

Annotation Processor

After the first step of parsing and filling the symbol table is completed, the next step is the annotation processor. Because this step is the key to Lombok's implementation principle.

After JDK1.5, the Java language provides support for annotations, which, like ordinary Java code, play a role during runtime. The JSR-269 specification is implemented in JDK1.6, providing a standard API for a set of plug-in annotation processors to process annotations during compilation. We can think of it as a set of compiler plug-ins, in these Inside the plugin, you can read, modify, and add any element in the abstract syntax tree.

If these plugins make modifications to the syntax tree during annotation processing, the compiler will go back to parsing and filling the symbol table and reprocessing until all the plug-in annotation processors have run out of modifications to the syntax tree. Each loop becomes a Round.

With the standard API for compiler annotation processing, our code may interfere with the behavior of the compiler. Since any element in the syntax tree, even code comments can be accessed in the plug-in, it is processed through the plug-in annotation. The plug-in implemented by the device has a lot of room to play in terms of function. With enough creativity, programmers can use plug-in annotation processors to achieve many things that could otherwise only be done in coding.

Semantic Analysis and Bytecode Generation

After the syntax analysis, the compiler obtains the abstract syntax tree representation of the program code. The syntax tree can represent the abstraction of a properly structured source program, but it cannot guarantee that the source program is logical. The main task of semantic analysis is to review the structurally correct source programs for context-sensitive properties, such as type checking.

For example we have the following code

int a = 1;
boolean b = false;
char c = 2;

Below we may have the following operations


int d = b+c;

In fact, the above code can form an accurate syntax tree in structure, but the following operation is wrong semantically. Therefore, if it is run, the compilation will fail and cannot be compiled.

Implement a simple Lombok by yourself

We have understood the process of javac above, then we can directly write a simple gadget for adding code to existing classes, and we only generate the set method. First write a custom annotation class.

@Retention(RetentionPolicy.SOURCE) // 注解只在源码中保留
@Target(ElementType.TYPE) // 用于修饰类
public @interface MySetter {
}

Then write the annotation processor class for this annotation class

@SupportedSourceVersion(SourceVersion.RELEASE_8)
@SupportedAnnotationTypes("aboutjava.annotion.MySetter")
public class MySetterProcessor extends AbstractProcessor {

    private Messager messager;
    private JavacTrees javacTrees;
    private TreeMaker treeMaker;
    private Names names;

    /**
     * @Description: 1. Message 主要是用来在编译时期打log用的
     *              2. JavacTrees 提供了待处理的抽象语法树
     *              3. TreeMaker 封装了创建AST节点的一些方法
     *              4. Names 提供了创建标识符的方法
     */
    @Override
    public synchronized void init(ProcessingEnvironment processingEnv) {
        super.init(processingEnv);
        this.messager = processingEnv.getMessager();
        this.javacTrees = JavacTrees.instance(processingEnv);
        Context context = ((JavacProcessingEnvironment)processingEnv).getContext();
        this.treeMaker = TreeMaker.instance(context);
        this.names = Names.instance(context);
    }

    @Override
    public boolean process(Set<!--? extends TypeElement--> annotations, RoundEnvironment roundEnv) {
        return false;
    }
}

Here we notice that we get some environment information during the compilation phase in the init method. We extract some key classes from the environment, described below.

JavacTrees : provides the abstract syntax tree to process
TreeMaker : encapsulates some methods of manipulating the AST abstract syntax tree
Names : Provides methods for creating identifiers
Messager: Mainly used for logging in the compiler

Then we use the provided tool class to modify the existing AST abstract syntax tree. The main modification logic exists in the processmethod, if the return is true, then the javac process will start again from parsing and filling the symbol table. processThe logic of the method is mainly as follows

@Override
    public boolean process(Set<!--? extends TypeElement--> annotations, RoundEnvironment roundEnv) {
        Set<!--? extends Element--> elementsAnnotatedWith = roundEnv.getElementsAnnotatedWith(MySetter.class);
        elementsAnnotatedWith.forEach(e-&gt;{
            JCTree tree = javacTrees.getTree(e);
            tree.accept(new TreeTranslator(){
                @Override
                public void visitClassDef(JCTree.JCClassDecl jcClassDecl) {
                    List<jctree.jcvariabledecl> jcVariableDeclList = List.nil();
                    // 在抽象树中找出所有的变量
                    for (JCTree jcTree : jcClassDecl.defs){
                        if (jcTree.getKind().equals(Tree.Kind.VARIABLE)){
                            JCTree.JCVariableDecl jcVariableDecl = (JCTree.JCVariableDecl) jcTree;
                            jcVariableDeclList = jcVariableDeclList.append(jcVariableDecl);
                        }
                    }
                    // 对于变量进行生成方法的操作
                    jcVariableDeclList.forEach(jcVariableDecl -&gt; {
                        messager.printMessage(Diagnostic.Kind.NOTE,jcVariableDecl.getName()+"has been processed");
                        jcClassDecl.defs = jcClassDecl.defs.prepend(makeSetterMethodDecl(jcVariableDecl));
                    });
                    super.visitClassDef(jcClassDecl);
                }
            });
        });
        return true;
    }

In fact, it looks more difficult, the principle is relatively simple, mainly because we are not familiar with the API, so it seems difficult to understand, but the main meaning is as follows

Find @MySetterthe class marked by the annotation and get its syntax tree
Traverse its syntax tree to find its parameter node
Build a method node yourself and add it to the syntax tree

In terms of diagrams, we built a test class TestMySetter, and we know that the general structure of its syntax tree is shown in the figure below.

Then our goal is to turn its syntax tree into the following figure, because the final generated bytecode is generated according to the syntax tree, so we add the node of the method to the syntax tree, then when the bytecode is generated The bytecode for the corresponding method is generated.

The code for generating the method node is as follows


private JCTree.JCMethodDecl makeSetterMethodDecl(JCTree.JCVariableDecl jcVariableDecl){

    ListBuffer<jctree.jcstatement> statements = new ListBuffer&lt;&gt;();
    // 生成表达式 例如 this.a = a;
    JCTree.JCExpressionStatement aThis = makeAssignment(treeMaker.Select(treeMaker.Ident(names.fromString("this")), jcVariableDecl.getName()), treeMaker.Ident(jcVariableDecl.getName()));
    statements.append(aThis);
    JCTree.JCBlock block = treeMaker.Block(0, statements.toList());

    // 生成入参
    JCTree.JCVariableDecl param = treeMaker.VarDef(treeMaker.Modifiers(Flags.PARAMETER), jcVariableDecl.getName(), jcVariableDecl.vartype, null);
    List<jctree.jcvariabledecl> parameters = List.of(param);

    // 生成返回对象
    JCTree.JCExpression methodType = treeMaker.Type(new Type.JCVoidType());
    return treeMaker.MethodDef(treeMaker.Modifiers(Flags.PUBLIC),getNewMethodName(jcVariableDecl.getName()),methodType,List.nil(),parameters,List.nil(),block,null);

}

private Name getNewMethodName(Name name){
    String s = name.toString();
    return names.fromString("set"+s.substring(0,1).toUpperCase()+s.substring(1,name.length()));
}

private JCTree.JCExpressionStatement makeAssignment(JCTree.JCExpression lhs, JCTree.JCExpression rhs) {
    return treeMaker.Exec(
            treeMaker.Assign(
                    lhs,
                    rhs
            )
    );
}

Finally, we execute the following three commands

javac -cp $JAVA_HOME/lib/tools.jar aboutjava/annotion/MySetter* -d
javac -processor aboutjava.annotion.MySetterProcessor aboutjava/annotion//TestMySetter.java
javap -p aboutjava/annotion/TestMySetter.class

You can see the output as follows

Compiled from "TestMySetter.java"
public class aboutjava.annotion.TestMySetter {
  private java.lang.String name;
  public void setName(java.lang.String);
  public aboutjava.annotion.TestMySetter();
}

You can see that the method we need has been generated in the bytecode setName.

code address

Summarize

So far, I have roughly explained the principle of Lombok, which is actually a variety of operations on abstract syntax trees. In fact, you can also use the compile time to do a lot of things, such as code specification checking and the like. Here I only wrote about the creation of the set method. If you are interested, you can write the code yourself and try the creation of the get method of Lombok.

refer to

In-depth understanding of the Java virtual machine
java annotation processor - modify syntax tree at compile time
Java-JSR-269 - Plug-in Annotation Processor </jctree.jcvariabledecl></jctree.jcstatement></jctree.jcvariabledecl>