"In-depth understanding of the Java virtual machine" reading notes (7)-virtual machine bytecode execution engine (on)

table of Contents

 

Preface

1. Runtime stack frame structure

1.1 Local variable table

1.2 Operand stack

1.3 Dynamic connection

1.4 Method return address

1.5 Additional information

Second, determine the execution method

2.1 Analysis

2.2 Dispatching

2.2.1 Static dispatch

2.2.2 Dynamic dispatch

2.2.3 Single dispatch and multiple dispatch

2.2.4 Implementation of dynamic dispatch of virtual machines


Preface

This chapter mainly describes how the virtual machine determines the version of the method to call and how to execute the method.

1. Runtime stack frame structure

1.1 Local variable table

Used to store method parameters and local variables defined in the method . In the compilation stage, the max_locals data item of the Code attribute of the method table determines the maximum space of the local variable table required by the method. Its capacity is based on the variable slot ( slot ) as the smallest unit. The virtual machine specification does not clearly specify the size of the space that a slot should occupy, but it is directed to say that each slot should be able to store a boolean, byte, char, short, int, Data of type float, reference or returnAddress, these 8 data types can be stored in 32-bit or smaller memory, but the length of the slot is also allowed to vary with the processor, operating system or virtual machine, as long as Ensure that even if a 64-bit memory space is used to implement a slot, the virtual machine still needs to use alignment and padding to make the slot look consistent with the 32-bit virtual machine in appearance.

Note: Regarding the reference type, the virtual machine specification does not clearly specify its length, which may occupy 32 bits or 64 bits, and it does not clearly indicate what structure it should be. Generally speaking, the virtual machine should be able to do at least through this reference. Two points:

  • From this reference, you can directly or indirectly find the starting address index of the object's data storage in the Java heap

  • From this reference, you can directly or indirectly find the type information stored in the method area of ​​the object's data type

There are only two types of 64-bit data types defined by the Java language: long and double. For 64-bit data types, the virtual machine allocates two consecutive slot spaces in a high-aligned manner. Because the local variable table is built in the thread private stack space , So no matter whether the reading and writing of two consecutive slots is an atomic operation, it will not cause data security issues. This is different from the non-atomic protocols of long and double which may cause security issues.

The virtual machine uses the local variable table by index positioning , and the index range is from 0 to the maximum number of slots in the local variable table. If you are accessing 32-bit data, the index n represents the nth slot. If you are accessing 64-bit data, it means that both slots n and n+1 will be used at the same time. For two adjacent slots that store a 64-bit data together, it is not allowed to use any method to access one of them separately.

Note: For non-static methods, the slot with the 0th index in the local variable table is used by default to pass the reference of the object instance to which the method belongs, so that the implicit parameter can be accessed through the "this" keyword in the method.

In addition, in order to save stack frame space, slots can be reused. If the value of the program counter has exceeded the scope of a variable, the slot corresponding to this variable can be used by other variables.

But from the conceptual model , slot reuse may cause GC problems: a local variable refers to a large object, and now the variable exceeds its scope. It stands to reason that the large object is useless at this time, and the GC can reclaim it. However, due to slot reuse, when the slot has not been reused, it still maintains a reference to a large object as a GC Roots, causing the GC to be unable to reclaim it. If there are some time-consuming operations in the code behind, the large object occupied in the front is a big burden, so there is gradually a "recommendation" to manually set the variable to a null value. But what the author means is that this operation is only based on the understanding of the conceptual model of the bytecode execution engine. When the virtual machine is executed using an interpreter, it is usually close to the conceptual model, but after JIT compilation, it is The main way for the virtual machine to execute code, the operation of assigning a null value will be eliminated after JIT compilation and optimization, and after JIT compilation, when the object is referenced beyond the scope, the GC can usually recover normally, so there is no need to rely on This "sao operation".

Local variables are different from class variables. Local variables cannot be used if they are defined without initial values. If unassigned local variables are used, the compiler will report an error during compilation. If the bytecode is generated manually, the compiler will be skipped. Checks are also found during the bytecode verification phase of class loading.

1.2 Operand stack

Like the ordinary stack data structure, FILO, its maximum depth is written into the max_statcks item of the Code attribute of the method at compile time , and will not change later. During the execution of the method, various bytecode instructions will be continuously pushed/popped in the operand stack. The data elements in the stack must strictly match the sequence of bytecode instructions. This must be ensured when the compiler compiles, and this must be verified again in the class verification phase (via StackMapTable ). For example, when using iadd to perform integer data addition, the two elements on the top of the stack must be of type int.

In addition, in the conceptual model, the two stack frames are independent of each other, but in most virtual machine implementations, some optimizations will be made to make the two stack frames overlap: let the operand stack of the lower stack frame and the upper The local variable tables of the stack frame overlap, so that part of the data can be shared when the method is called, without the need for additional parameter assignment and transfer.

1.3 Dynamic connection

There are a large number of symbol references in the constant pool of the class file, and the method call instruction in the bytecode takes the symbol reference pointing to the method in the constant pool as a parameter. Some of these symbol references will be directly converted into direct references during the class loading stage or when they are used for the first time. This conversion is called static resolution . The other part will be converted into a direct reference during each run. This part is called dynamic linking .

1.4 Method return address

There are two ways to exit:

  • The first is that the execution engine encounters any returned bytecode instruction. Whether there is a return value and the type of return value will be determined according to the method return instruction encountered. This exit method is the normal completion of the exit .
  • The other is that an exception is encountered during the execution of the method, and the exception is not processed in the method body (the exception handler is not matched in the exception table of this method). This method completes the exit for the exception and does not generate it. return value.

No matter how you exit, after the method exits, you need to return to the place where the method was called before the program can continue. Generally speaking, when the method exits normally, the value of the program counter of the method caller can be used as the return address and stored in the stack frame corresponding to the method; when the method exits abnormally, it needs to be determined through the exception handling table.

1.5 Additional information

The virtual machine specification allows specific virtual machine implementations to add some information that is not described in the specification to the stack frame, such as debugging-related information, which is implemented by the virtual machine itself.

Second, determine the execution method

2.1 Analysis

The compilation process of the class file does not include the linking steps in the traditional compilation. All method calls are stored in the class file only by symbol references, not the entry address (direct reference) of the method in the actual runtime memory.

Some of the symbol references will be converted into direct references during the parsing stage of class loading. The premise of this parsing is that the method has a determinable version before the program runs, and it cannot be changed during the runtime . The methods that meet this premise mainly include static methods and private methods . Because these two methods cannot override other versions through inheritance or other methods, they are suitable for parsing in the class loading stage.

The Java virtual machine provides five method call bytecode instructions:

  • invokestatic: call a static method

  • invokespecial: Invoke the instance constructor <init> method, private method and super class method (super.method(...))

  • invokevirtual: call all virtual methods

  • invokeinterface: Invoke the interface method, and an object that implements this interface will be determined at runtime

  • invokedynamic: first dynamically parse out the method referenced by the call site qualifier at runtime, and then execute the method

Among them, the methods called by invokestatic and invokespecial can resolve symbol references into direct references of the method during the parsing stage of class loading. These methods include static methods, private methods, parent methods, and <init> methods , which are called non-virtual methods . Method . Other methods are called virtual methods.

Note:

1. Although the final modified method is called using the invokevirtual instruction, it cannot be overwritten, so there will be no other version, which is a non-virtual method.

2. Regarding the parent class method is a non-virtual method, the use of invokespecial call refers to the case of calling the parent class method through super. If the subclass overrides the superclass method, then the method in the subclass belongs to itself, and the parent class Method is okay

2.2 Dispatching

In addition to the above-mentioned parsing process to determine the version of the execution method, there is another method to determine the virtual method: dispatch . Distribution is divided into static distribution and dynamic distribution , which can be divided into single distribution and multi-distribution according to the amount of the distribution basis . The pairwise combination constitutes static single dispatch, static multiple dispatch, dynamic single dispatch, and dynamic multiple dispatch.

2.2.1 Static dispatch

The typical application of static dispatch is to handle method overloading . The English technical document is called " Method Overload Resolution " (the explanation in the book is that domestic materials generally translate this behavior into "static dispatch"). If object A inherits object B, then for the statement: B b = new A(); where B is called the static type of the b variable (Static Type) , and A is called the actual type of the b variable (Actual Type) . The static type of method parameters may be changed. For example, through a forced conversion operation, the static type of b is B, but the static type of (A)b is converted to A. However, the static type (B) of the variable itself will not change, and the final static type is known at compile time; the actual type can only be determined at runtime, and the compiler does not know the actual type of an object at compile time What is it.

The virtual machine uses the static type of the parameter instead of the actual type as the basis for judgment when processing overloading , and as mentioned above, the static type is known at compile time. Therefore, in the compilation phase, the compiler will decide which overloaded version to use according to the static type of the parameter. After selecting the overloaded version of the method, the compiler will write the symbolic reference of this method to the parameter of the method call bytecode instruction . For example, in the following sample code, there are 3 overloaded versions of the say() method (note that the main object here is uniquely determined):

public class Main {

    static class A {
    }

    static class B extends A {
    }

    static class C extends B {
    }

    public void say(A a) {
        System.out.println("A");
    }

    public void say(B b) {
        System.out.println("B");
    }

    public void say(C c) {
        System.out.println("C");
    }

    public static void main(String[] args) throws Exception {
        Main main = new Main();
        B os = new C();
        main.say(os);//静态类型为B,实际类型为C,确定的say方法重载版本为say(B b)
        main.say((A) os);//最终静态类型转换为了A,实际类型为C,确定的say方法重载版本为say(A a)
        main.say((C) os);//最终静态类型转换为了C,实际类型为C,确定的say方法重载版本为say(C c)
        //输出 B A C
    }
}

In addition, although the compiler can determine the overloaded version of the method, in many cases, this overloaded version is not "unique", and can only determine a "more suitable" version. The main reason for this vague conclusion is that the literal does not need to be defined, so the static type of the literal is not displayed, and its static type can only be understood and inferred through the rules of the language.

For example, for the method: say(...), there are 7 overloaded versions: say(char arg), say(int arg), say(long arg), say(Character arg), say(Serializable arg), say( Object arg), say(char... arg). If the program now tries to call the method: say('a');'a' does not need to be defined and can be used directly, so there is no static type displayed. Which overloaded version should the compiler choose?

  • 'a' is first a char type: corresponding to say(char arg)

  • Secondly, it can also represent the number 97 (refer to ASCII code): corresponding to say(int arg)

  • After being converted to 97, it can also be converted to long type 97L: corresponding to say(long arg)

  • In addition, it can be automatically boxed and packaged as Character: corresponding to say(Character arg)

  • The boxing class Character also implements the Serializable interface (if multiple interfaces are implemented directly or indirectly, the priority is the same. If there are multiple overloaded methods that can adapt to multiple interfaces, it will prompt the type to be ambiguous and refuse to compile) :Corresponding to say(Serializable)

  • And Character also inherits from Object (if there are multiple parent classes, it will search from bottom to top in the inheritance relationship, the closer to the upper level, the lower the priority): corresponding to say(Object arg)

  • Eventually it can also match the variable length type: corresponding to say(char... arg)

What is described above is actually the process of selecting static dispatch targets during compilation. This process is also the essence of Java language implementation of method overloading.

Note: Resolution and assignment are not an alternative relationship. They are a process of screening and determining target methods at different levels. For example, static methods will be directly referenced in the parsing stage of class loading, and static methods can also have overloaded versions. The process of selecting overloaded versions is also done through static dispatch.

2.2.2 Dynamic dispatch

The typical application of dynamic dispatch is method rewriting . In the case of method rewriting, the Java virtual machine dispatches the method execution version through the actual type when calling the method. For the following code:

public class Main {

    static class A {
        public void say() {
            System.out.println("A");
        }
    }

    static class B extends A {
        public void say() {
            System.out.println("B");
        }
    }

    static class C extends A {
        public void say() {
            System.out.println("C");
        }
    }

    public static void main(String[] args) throws Exception {
        A b = new B();
        A c = new C();
        b.say();
        c.say();
        //输出  B C
    }
}

After the b.say() and c.say() calls are compiled by the compiler, the method call bytecode instruction (invokevirtual here) and the parameter of the instruction (the symbol reference of A.say()) are the same, but The final execution goals are not the same (one B, one C). This involves the polymorphic search process of the invokevirtual instruction:

  1. Find the actual type of the object pointed to by the first element on the top of the operand stack, and record it as M
  2. If a method that matches the descriptor and simple name in the constant is found in the type M, then the access permission verification is performed. If it passes, the direct reference of this method is returned, and the search process ends; otherwise, IllegalAccessError is returned.
  3. Otherwise, follow the inheritance relationship from bottom to top to perform the second step of the search and verification process for each parent class of M
  4. If no suitable method is found, an AbstractMethodError exception is thrown

b.say(); The execution process of the statement is to first push the b instance object to the top of the stack, and then call it through the invokevirtual instruction. This b object is called the receiver of the say() method . As can be seen from the above steps, the first step is to determine the actual type of the receiver executing the say method as B during runtime. c.say(); The sentence is the same. So the A.say() symbol references in the two calls are resolved to different direct references. This process is the essence of Java method rewriting. This kind of dispatching process in which the method execution version is determined according to the actual type at runtime is called dynamic dispatch.

2.2.3 Single dispatch and multiple dispatch

Methods of recipients and method parameters collectively referred to as the method - variables . According to how many kinds of quantities the distribution is based on, the distribution can be divided into two types: single distribution and multiple distribution. Back to the example of static dispatch in 2.2.1. In this example, the main object is uniquely determined. Now the code is adjusted:

public class Main {

    static class A {
    }

    static class B extends A {
    }

    static class C extends B {
    }

    public void say(A a) {
        System.out.println("A");
    }

    public void say(B b) {
        System.out.println("B");
    }

    public void say(C c) {
        System.out.println("C");
    }

    public static void main(String[] args) throws Exception {
        Main main = new Main();
        Main superMain = new Super();
        B os = new C();
        main.say(os);
        superMain.say((A) os);
        //输出 B S-A
    }
}

class Super extends Main {
    public void say(A a) {
        System.out.println("S-A");
    }

    public void say(B b) {
        System.out.println("S-B");
    }

    public void say(C c) {
        System.out.println("S-C");
    }
}

For main.say(os) and superMain.sauy(os).

  • First look at the choice of the compiler at the compilation stage, that is , the process of static dispatch :

At this time, the selection of the target method is based on two points: one is whether the static type of the method receiver is Main or Super, and the other is whether the static type of the method parameter is B or C. Because the static types of main and superMain (method receivers) are both Main, and the static types of method parameters are one B and the other A. Therefore, the parameters of the two invokevitrual instructions generated this time are respectively the symbolic references to the methods of Main.say(B) and Main.say(A) in the constant pool. Because this selection is based on two arguments, the static dispatch of the Java language is called multiple dispatch.

  • Let's look at the selection of virtual machines in the runtime phase, which is the process of dynamic allocation :

From the introduction of dynamic dispatch in Section 2.2.2 and the results of the above static dispatch, we know that when the invokevirtual instructions of main.say(os) and superMain.say((A)os) are executed, the method signature is already statically dispatched The process is confirmed, it must be say(B) and say(A) respectively. The virtual machine does not care about the static type and actual type of the parameter at this time. Only the actual type of the method receiver will affect the choice of the method version, that is, there is only one argument as the basis for selection, so the dynamic dispatch of the Java language belongs to the single dispatch type.

Therefore, the current Java language is a static multi-dispatch language and dynamic single-dispatch language.

2.2.4 Implementation of dynamic dispatch of virtual machines

Dynamic dispatch is a very frequent action, and the method version selection process of dynamic dispatch needs to search for a suitable target method in the method metadata of the class at runtime. Therefore, due to performance considerations, the virtual machine is optimized: the class is in the method area Create a virtual method table (Virtual Method Table , corresponding to this, the interface method table ---Interface Method Table ) is also used when invokeinterface is executed . The virtual method table stores the actual entry address of each method.

If a method is not overridden in the subclass, then the address entry in the virtual method table of the subclass is the same as the address entry of the same method in the parent class, and they all point to the implementation entry of the parent class; if the subclass is overridden For this method, the address in the virtual method table in the subclass method table will be replaced with the entry address that points to the implementation version of the subclass. In this way, through redundant storage, when searching for the target method at runtime, there is no need to search for each parent class of the object in turn.

At the same time, the methods with the same signature should have the same index number in the virtual method table of the parent class and the subclass. In this way, when the type is converted, only the method table to be searched needs to be changed, and it can be from a different virtual method table. Convert the desired entry address according to the index. The method table is generally initialized in the connection phase (preparation phase) of class loading. After preparing the initial value of the class variable, the virtual machine also initializes the method table of the class.

In addition to using the method table, when conditions permit, the virtual machine will also use Inline Cache and Guarded Inlining based on the "Class Hierarchy Analysis (CHA)" technology. Ways to obtain higher performance.

Guess you like

Origin blog.csdn.net/huangzhilin2015/article/details/114437682