"In-depth understanding of the Java virtual machine" reading notes (7)-virtual machine bytecode execution engine (below)

table of Contents

1. Java dynamic type language support

1.1 MethodHandle

1.2 The difference between MethodHandle and Reflection

1.3 invokedynamic instruction

2. Stack-based bytecode interpretation execution engine

2.1 Stack-based and register-based

2.2 The stack-based interpreter execution process

Three, summary


1. Java dynamic type language support

The key feature of a dynamically typed language is that the main process of type checking is at runtime rather than at compile time. There are many languages ​​that meet this feature, such as JavaScript, Python, etc., in contrast, languages ​​that perform type checking at compile time ( Such as C++/Java, etc.) is the most commonly used statically typed language .

For example, the following code:

obj.println("hello world");

Assuming that this line of code is in the Java language, and the static type of the variable obj is java.io.PrintStream, then the actual type of the variable obj must be a subclass of PrintStream (implementing the PrintStream interface) to be legal, otherwise even if obj does There is a legal println(String) method, but it has no inheritance relationship with the PrintStream interface, and the code cannot run because the type check is illegal.

The same code in JavaScript is different. No matter what type of obj is, as long as the definition of this type does include the println(String) method, then the method call can succeed.

The reason for this difference is that the Java language generates a complete symbolic reference of the println(String) method at compile time, and stores it in the class file as a parameter of the method call instruction, such as the following code:

invokevirtual #4//Method java/io/printStream.println:(Ljava/lang/String;)V

This symbolic reference contains information such as the specific type in which the method is defined, the name of the method, parameter order, parameter type, and method return value. Through this symbolic reference, the virtual machine can translate the direct reference of this method. In dynamically typed languages ​​such as JavaScript, the variable obj itself has no type, and the value of the variable obj has a type . When compiling, the information such as method name, parameters, and return value can only be determined at most, instead of determining the specific method. Types of.

Statically typed languages ​​determine types at compile time, so the compiler can provide rigorous type checking, which is conducive to stability ; while dynamic languages ​​determine types at runtime, which is more flexible , and when implementing functions, it is clearer and clearer than statically typed languages. Concise, the code will not appear so "bloated".

Because the first parameters of the invokevirtual, invokespecial, invokestatic, and invokeinterface instructions for the method invocation are all symbolic references of the invoked method, and symbolic references are generated at compile time, and the dynamic type language can only determine the method recipient type at runtime. Therefore, a new invokedynamic command was added in JDK1.7 to provide support.

1.1 MethodHandle

JDK1.7 provides a new mechanism for dynamically determining the target method, called MethodHandle , in addition to the previous method that simply relied on symbolic references to determine the target method to be called . Its function is basically to find a method matching method signature in a class, and the result is represented by MethodHandle, and then the method can be called through MethodHandle. In this way, the code is used to simulate the process of virtual machine dispatching and finding methods, which gives programmers a higher degree of freedom.

import java.lang.invoke.MethodHandle;
import java.lang.invoke.MethodHandles;
import java.lang.invoke.MethodType;

public class Main {
    static class ClassA {
        public void println(String arg) {
            System.out.println(arg);
        }
    }

    public static void main(String[] args) throws Throwable {
        Object obj = new ClassA();
        getMethodHandle(obj).invoke("hello MethodHandle");
    }

    private static MethodHandle getMethodHandle(Object receiver) throws Exception {
        //定义一个MethodType
        //第一个参数(void.class)是方法的返回类型,后面的参数(String.class)是方法的参数
        MethodType methodType = MethodType.methodType(void.class, String.class);
        //在receiver.class中寻找方法
        //并且通过bindTo将该方法的接收者(也就是this)传递给它
        return MethodHandles.lookup()
                .findVirtual(receiver.getClass(), "println", methodType)
                .bindTo(receiver);
    }
}

The above sample code uses invokeVirtual, which simulates the execution process of the invokevirtual instruction. Others include findStatic, findSpecial, and so on.

1.2 The difference between MethodHandle and Reflection

  • In essence, Reflection and Method mechanisms are both simulating method calls, but Reflection is simulating method calls at the Java code level ; the three methods in MethodHandles.lookup: findStatic(), findVirtual(), and findSpecial() are just for Corresponding to the execution permission verification behavior of the several bytecode instructions of invokestatic, invokevirtual&invokeinterface and invokespecial, and these low-level details do not need to be concerned when using the Reflection API.

  • Reflection is a heavyweight . The Mehod object contains method signatures, descriptors, and Java-side representations of various properties in the method attribute table. It also contains runtime information such as execution permissions. The MethodHandle only contains information related to the execution of the method, such as method name and parameters, and is relatively lightweight .

  • Because MethodHandle is a simulation of bytecode method execution instructions, in theory, various optimizations (such as method inlining) made by virtual machines in this area should also be supported by similar ideas on MethodHandle, and through Reflection To call the method will not work.

  • The design goal of Reflection is to serve only the Java language , while MethodHandle is designed to serve all languages ​​on the Java virtual machine .

1.3 invokedynamic instruction

The function of the invokedynamic instruction and the MethodHandle mechanism is the same, both are to solve the problem of the original four method call instruction dispatching rules being fixed in the virtual machine, and to transfer the decision on how to find the target method from the virtual machine to the specific user code. Among them, let users have a higher degree of freedom. The thinking of the two is also analogous, but one uses the upper-level Java code and API to achieve, and the other uses bytecode and other attributes and constants in the class to complete.

Invokedynamic every position contains instructions are called " dynamic call site " ( Dynamic Call Site ), this instruction is the first instruction is no longer representative of the CONSTANT_Methodref_info constant symbolic reference method, but new entrants into a JDK 1.7 the CONSTANT_InvokeDynamic_info constant, the constant from this new information can be obtained in 3: guidance method , method of the type and name . The boot method is a fixed parameter, and the return value is a java.lang.invoke.CallSite object, which represents the target method call to be executed. Through CONSTANT_InvokeDynamic_info, the boot method can be found and executed, thereby obtaining a CallSite object, and finally calling the target method to be executed.

The biggest difference between invokedynamic and the previous four invoke* instructions is that its dispatch logic is not determined by the virtual machine, but by the programmer. The user of this instruction is not the Java language, but a dynamic language on other Java virtual machines. Therefore, the Java language-based compiler javac cannot generate bytecode with the invokedynamic instruction.

At the end of this section, a very interesting question is given in the book: How to call the override method of the grandparent class in the subclass?

import java.lang.invoke.MethodHandles;
import java.lang.invoke.MethodType;

public class Main {
    static class GrandFather {
        void thinking() {
            System.out.println("i am grandfather");
        }
    }

    static class Father extends GrandFather {
        void thinking() {
            System.out.println("i am father");
        }
    }

    static class Son extends Father {
        void thinking() {
           //只完善这个方法的代码,实现调用祖父类的thinking()方法,打印"i am grandfather"
        }
    }

    public static void main(String[] args) throws Throwable {
        Son son = new Son();
        son.thinking();
    }

}

Note: Of course, we do not allow this to be filled: new GrandFather().thinking();

Before JDK1.7, we had no way, because the invokevirtual instruction dynamic dispatch uses the actual type of the receiver. This logic is solidified in the virtual machine, but an actual type cannot be obtained in the thinking method of the Son class. Refer to the object of GrandFather (unless we re-instantiate one). But after JDK1.7, you can use MethodHandle:

static class Son extends Father {
        void thinking() {
            try {
                MethodType methodType = MethodType.methodType(void.class);
                Field IMPL_LOOKUP = MethodHandles.Lookup.class.getDeclaredField("IMPL_LOOKUP");
                IMPL_LOOKUP.setAccessible(true);
                ((MethodHandles.Lookup) IMPL_LOOKUP.get(null)).findSpecial(GrandFather.class, "thinking", methodType, Father.class)
                        .invoke(this);
            } catch (Throwable e) {
                e.printStackTrace();
            }
        }
    }

We use MethodHandle to simulate the invokespecial instruction, find the thinking method from GrandFaher.class according to our own wishes, and complete our own dispatch logic.

Note: The solution given in the book is:

MethodType methodType = MethodType.methodType(void.class);
MethodHandles.lookup().findSpecial(GrandFather.class, "thinking", methodType, this.getClass())
    .bindTo(this)
    .invoke();

However, it has been verified that the code cannot achieve the expected effect under JDK1.7 and 1.8. The blogger found a way to use IMPL_LOOKUP by looking at the source code of Lookup. For details, see the comment area~

2. Stack-based bytecode interpretation execution engine

In the Java language, the javac compiler completes the process of lexical analysis -> grammatical analysis -> abstract syntax tree -> traversing the syntax tree to generate a linear bytecode instruction stream , and the interpreter is inside the virtual machine.

2.1 Stack-based and register-based

The instruction stream output by the Java compiler is basically a stack-based instruction set architecture. Most of the instructions in the instruction stream are zero-address instructions, and they rely on the operand stack to work. In contrast, another set of instruction set architecture is a register-based instruction set, and these instructions rely on registers to work.

If you want to calculate the result of "1+1", the stack-based instruction flow would look like this:

iconst_1 //int类型的1入栈
iconst_1 //int类型的1入栈
iadd //栈顶两个int类型出栈,相加,把结果入栈
istore_0 //将栈顶的值出栈放到局部变量表的第0位置的slot中

If it is based on registers, it might look like this:

mov eax,1 //把eax寄存器的值设为1
add eax,1 //把eax寄存器的值加1,结果保存在eax寄存器

The stack-based instruction set is portable , and the registers are provided directly or indirectly by the hardware. Programs relying on these hardware registers are subject to hardware constraints; however, the stack-based instruction set requires more instructions to complete the same function than the register architecture. Many, and the stack is implemented in memory. Frequent stack access means frequent memory access. Relative to the processor, memory is always the bottleneck of execution speed. Due to the number of instructions and memory access , the execution speed of the stack architecture instruction set is relatively slow. The instruction sets of all mainstream physical machines are based on registers.

2.2 The stack-based interpreter execution process

The same sample code in the book is used here, but for convenience, I will not draw the picture. Instead, I will note the state of the operand stack and the local variable table with text after each instruction. The Java code is as follows:

public int calc() {
        int a = 100;
        int b = 200;
        int c = 300;
        return (a + b) * c;
    }

After compilation, view the bytecode instructions through javap (the changes of the operand stack and the local variable table are in the remarks, where the stack is described, the right side is the direction of the top of the stack ):

public int calc();
    descriptor: ()I
    flags: ACC_PUBLIC
    Code:
      stack=2, locals=4, args_size=1
         0: bipush        100  //将100入栈。栈:100;变量表:0=this
         2: istore_1           //将100出栈,存放到局部变量表第1个slot。栈:空;变量表:0=this,1=100
         3: sipush        200  //将200入栈。栈:200;变量表:0=this,1=100
         6: istore_2           //将200出栈,存放到局部变量表第2个slot。栈:空;变量表:0=this,1=100,2=200
         7: sipush        300  //将300入栈。栈:300;变量表:0=this,1=100,2=200
        10: istore_3           //将300出栈,存放到局部变量表第3个slot。栈:空;变量表:0=this,1=100,2=200,3=300
        11: iload_1            //将局部变量表中第1个slot整型值入栈。栈:100;变量表:0=this,1=100,2=200,3=300
        12: iload_2            //将局部变量表中第2个slot整型值入栈。栈:100,200;变量表:0=this,1=100,2=200,3=300
        13: iadd               //将栈顶两个元素出栈做整型加法,然后把结果入栈。栈:300;变量表:0=this,1=100,2=200,3=300
        14: iload_3            //将局部变量表中第3个slot整型值入栈。栈:300,300;变量表:0=this,1=100,2=200,3=300
        15: imul               //将栈顶两个元素出栈做整型乘法,然后把结果入栈。栈:90000;变量表:0=this,1=100,2=200,3=300
        16: ireturn            //结束方法执行,将栈顶整型值返回给方法调用者
      LineNumberTable:
        line 8: 0
        line 9: 3
        line 10: 7
        line 11: 11
      LocalVariableTable:
        Start  Length  Slot  Name   Signature
            0      17     0  this   Lcom/demo/Main;
            3      14     1     a   I
            7      10     2     b   I
           11       6     3     c   I

It can be seen from the above that this code requires an operand stack with a depth of 2 (refer to the maximum depth of the stack during the pop/push process), and 4 slots of local variable space (this, a, b, c)

For the meaning of each instruction, you can refer to the official document: https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-6.html

The above execution process is just a conceptual model. The virtual machine will eventually optimize the execution process to improve performance. The actual operation process may be very different from the conceptual model. Because both the parser and the just-in-time compiler in the virtual machine optimize the input bytecode, for example, in the HotSpot virtual machine, there are many non-standard bytecode instructions starting with "fast_" to merge and replace the input bytes Code in order to improve the performance of interpretation and execution, and the optimization methods of the just-in-time compiler are more diverse (will be introduced in the following chapters).

Three, summary

Although much of the content described above is based on the conceptual model of the Java virtual machine, there will be a certain gap with the actual situation, but this does not hinder our understanding of the principle of the virtual machine.

Guess you like

Origin blog.csdn.net/huangzhilin2015/article/details/114467776