In-depth understanding of JVM virtual machine 5: virtual machine bytecode execution engine

1 Overview

The execution engine is one of the core components of the java virtual machine. The execution engine of the virtual machine is implemented by itself, so you can customize the structure of the instruction set and the execution engine, and can execute those instruction set formats that are not directly supported by the hardware.

The execution engines of all Java virtual machines are the same: the input is a bytecode file, the processing process is the equivalent process of bytecode parsing, and the output is the execution result . This section will mainly explain the method call and bytecode execution of the virtual machine from the perspective of the conceptual model .

2 Runtime stack frame structure

Stack frame  is a data structure used to support virtual machine method calls and method execution. It is a stack element of the virtual machine stack (Virtual Machine Stack) in the data area of ​​the virtual machine runtime .

The stack frame stores the method's local variable table, operand stack, dynamic connection and method return address and other information. The process of each method from the start of the call to the completion of the execution corresponds to the process of a stack frame from the stack to the stack in the virtual machine stack.

The conceptual structure of the stack frame is shown in the figure below:

Conceptual structure of stack frame

2.1 Local variable table

The local variable table is a group of variable value storage space used to store method parameters and local variables defined in the method.
The capacity of the local variable table uses Variable Slot as the smallest unit.
 A Slot can store a data type within 32 bits (boolean, byte, char, short, int, float, reference, and returnAddress). The reference type represents a reference to an object instance. ReturnAddress is rare and can be ignored.

For 64-bit data types (the only 64-bit data types clearly defined in the Java language are long and double), the virtual machine allocates two consecutive Slot spaces for it in a high-aligned manner.

The virtual machine uses the local variable table by index positioning , and the index value ranges from 0 to the maximum number of slots in the local variable table. The variable accessed is a 32-bit data type. The index n represents the use of the nth Slot. If it is a 64-bit data type, it means that both Slots n and n+1 will be used at the same time.

In order to save the stack frame space, the local variable Slot can be reused , and the scope of the variable defined in the method body does not necessarily cover the entire method body. If the current bytecode PC counter value exceeds the scope of a certain variable, then the Slot of this variable can be used by other variables. Such a design will bring some additional side effects, such as: in some cases, Slot reuse will directly affect the collection behavior of the system.

2.2 Operand stack

Operand Stack (Operand Stack)  is also often called operation stack, it is a last-in, first-out stack . When the execution of a method starts, the operand stack of this method is empty. During the execution of the method, there will be various bytecode instructions to write and extract content from the operand stack, that is, pop/ pull  operations .

Operand stack

In the conceptual model, two stack frames in an active thread are independent of each other. However, most virtual machine implementations will do some optimizations: let part of the operand stack of the next stack frame overlap with part of the local variable table of the previous stack frame. The advantage of this is that part of the data can be shared when the method is called, and there is no need Perform additional parameter copy transfer.

2.3 Dynamic connection

Each stack frame contains a reference to the method to which the stack frame belongs in the runtime constant pool. This reference is held to support dynamic connection during method invocation ;

The method call instruction in the bytecode takes the symbol reference to the method in the constant pool as the parameter. Some symbol references will be converted into direct references during the class loading stage or the first time they are used. This conversion is called  static resolution. , The other part is converted into direct reference during each run, this part is called dynamic connection .

2.4 Method return address

When a method is executed, there are two ways to exit the method:

  • The first is that the execution engine encounters a bytecode instruction returned by any method. This exit method is called Normal Method Invocation Completion .

  • The other is that an exception is encountered during the execution of the method, and the exception is not processed in the method body (that is, there is no matching exception handler in the exception handling table of this method), which will cause the method to exit. This exit method is called Abrupt Method Invocation Completion (Abrupt Method Invocation Completion) .
    Note: This exit method will not generate any return value to the upper caller.

No matter what exit method is used, after the method exits, it needs to return to the place where the method was called before the program can continue to execute . When the method returns, some information may need to be saved in the stack frame to help restore the execution of its upper method. status. Generally speaking, when the method exits normally, the value of the caller's PC counter can be used as the return address, and this counter value is likely to be saved in the stack frame. When the method exits abnormally, the return address is determined by the exception handler table, and this part of information is generally not saved in the stack frame.

The process of method exit is actually equivalent to popping the current stack frame, so the operations that may be performed when exiting are: restore the local variable table and operand stack of the upper method, and push the return value (if any) into the caller's stack In the operand stack of the frame, adjust the value of the PC counter to point to an instruction after the method call instruction, etc.

2.5 Additional information

The virtual machine specification allows virtual machine implementation to add some custom additional information to the stack frame, such as debugging-related information.

3 method call

The purpose of the method call phase: to determine the version of the method called (which method), does not involve the specific operation process inside the method , when the program is running, method invocation is the most common and frequent operation.

All method calls stored in the Class file are only symbol references, which need to be determined as the entry address of the method in the actual runtime memory layout during class loading or runtime (equivalent to the direct reference mentioned earlier) .

3.1 Analysis

The methods (static methods and private methods) that "know at compile time and immutable at runtime" will convert their symbolic references into direct references (entry addresses) during the parsing phase of class loading. The invocation of this type of method is called " Resolution ".

5 provides a method in the Java Virtual Machine calls the bytecode instructions:
invokestatic  : the static method call
the invokespecial : constructor method call instance, private method inherited methods
invokevirtual : call all virtual methods
invokeinterface : call interface Method, an object that implements this interface will be determined at runtime
invokedynamic : The method referenced by the dot qualifier is dynamically parsed at runtime, and then the method is executed. The dispatch logic of the previous four invocation commands is It is solidified in the Java virtual machine, and the dispatch logic of the invokedynamic instruction is determined by the guidance method set by the user.

3.2 Dispatch

The dispatch call process will reveal some of the most basic manifestations of polymorphism, such as how "overloading" and "overwriting" are implemented in Java virtual.

1 Static dispatch

All dispatch actions that rely on static types to locate the execution version of a method are called static dispatch. Static dispatch occurs during the compilation phase.

The most typical application of static dispatch is method overloading.

package jvm8_3_2;

public class StaticDispatch {
    static abstract class Human {

    }

    static class Man extends Human {

    }

    static class Woman extends Human {

    }

    public void sayhello(Human guy) {
        System.out.println("Human guy");

    }

    public void sayhello(Man guy) {
        System.out.println("Man guy");

    }

    public void sayhello(Woman guy) {
        System.out.println("Woman guy");
    }

    public static void main(String[] args) {
        Human man = new Man();
        Human woman = new Woman();
        StaticDispatch staticDispatch = new StaticDispatch();
        staticDispatch.sayhello(man);// Human guy
        staticDispatch.sayhello(woman);// Human guy
    }

}

operation result:

Human guy

Human guy

Why is there such a result?

Human man = new Man(); Among them, Human is called the static type of the variable (Static Type) , and Man is called the actual type of the variable (Actual Type) .
The difference between the two is : the static type is known by the compiler, and the actual type is not determined until the runtime.
When overloading, the static type of the parameter is used as the judgment basis instead of the actual type. Therefore, in the compilation phase, the Javac compiler will determine which overloaded version to use according to the static type of the parameter. So choose sayhello(Human) as the calling target, and write the symbolic reference of this method into the parameters of the two invokevirtual instructions in the main() method.

2 Dynamic dispatch

The dispatch process of determining the execution version of the method according to the actual type at runtime is called dynamic dispatch. The most typical application is method rewriting.

package jvm8_3_2;

public class DynamicDisptch {

    static abstract class Human {
        abstract void sayhello();
    }

    static class Man extends Human {

        @Override
        void sayhello() {
            System.out.println("man");
        }

    }

    static class Woman extends Human {

        @Override
        void sayhello() {
            System.out.println("woman");
        }

    }

    public static void main(String[] args) {
        Human man = new Man();
        Human woman = new Woman();
        man.sayhello();
        woman.sayhello();
        man = new Woman();
        man.sayhello();
    }

}

operation result:

man

woman

woman

3 Single dispatch and multiple dispatch

The receiver of the method and the parameters of the method can be called the quantity of the method. According to how many kinds of quantities the batches are based on, the distribution can be divided into single distribution and multiple distribution. Single-distribution selects the target method based on one quantity, and multi-distribution selects the target method based on more than one quantity.

When Java performs static dispatch, the target method should be selected based on two points: one is the static type of the variable, and the other is the type of the method parameter. Because the choice is based on two variables, the static dispatch of the Java language belongs to the multi-distribution type.

In the dynamic dispatch process at runtime, since the compiler has determined the signature of the target method (including method parameters), the runtime virtual machine only needs to determine the actual type of the recipient of the method before dispatching. Because it is based on a quantity as the basis for selection, the dynamic dispatch of the Java language belongs to the single dispatch type.

Note: As of JDK1.7, the Java language is still a static multi-dispatch and dynamic single-dispatch language, and it may support dynamic multi-dispatch in the future.

4 Implementation of dynamic dispatch of virtual machines

Because dynamic dispatch is a very frequent action, and dynamic dispatch needs to search for the appropriate target method in the method metadata during the method version selection process, the virtual machine implementation usually does not directly perform such frequent searches due to performance considerations. It is an optimization method.

One of the "stable optimization" methods is to create a virtual method table (Virtual Method Table, also known as vtable) in the method area of ​​the class. Corresponding to this, there is also an Interface Method Table-Interface Method Table, also known as itable. Use virtual method table index instead of metadata lookup to improve performance. The principle is similar to the virtual function table of C++.

The virtual method table stores the actual entry address of each method. If a method is not overridden in the subclass, the address entry in the virtual method table of the subclass is the same as the method in the parent class, and both point to the implementation entry of the parent class. The virtual method table is generally initialized during the connection phase of class loading.

3.3 Support for dynamically typed languages

The JDK newly added the invokedynamic instruction to realize the "dynamic type language".

The difference between static language and dynamic language:

  • Static language (strongly typed language) : A
    static language is a language in which the data type of a variable can be determined at compile time. Most statically typed languages ​​require that the data type must be declared before using the variable. 
    For example: C++, Java, Delphi, C#, etc.
  • Dynamic language (weakly typed language)  : A
    dynamic language is a language that determines data types at runtime. No type declaration is required before the variable is used. Usually the type of the variable is the type of the value being assigned. 
    For example, PHP/ASP/Ruby/Python/Perl/ABAP/SQL/JavaScript/Unix Shell and so on.
  • Strongly typed definition language  : A language that
    enforces the definition of data types. In other words, once a variable is assigned a certain data type, if it is not coerced, it will always be of this data type. For example: if you define an integer variable a, it is impossible for the program to treat a as a string type. A strongly typed language is a type-safe language.
  • Weakly typed definition language  : a language in which
    data types can be ignored. It is contrary to a strongly typed definition language, a variable can be assigned values ​​of different data types. Strongly typed languages ​​may be slightly inferior to weakly typed languages ​​in terms of speed, but the rigor brought by strongly typed languages ​​can effectively avoid many errors.

4 Stack-based bytecode interpretation execution engine

The content of how the virtual machine calls the method has been explained, now we will discuss how the virtual machine executes the bytecode instructions in the method.

4.1 Interpretation and execution

The Java language is often positioned as an  "interpreted execution" language . In the JDK1.0 era when Java was born, this definition was still relatively accurate. However, when the mainstream virtual machines include instant compilation, the code in the Class file is in the end Whether it will be interpreted or executed or compiled and executed is something that only the virtual machine can accurately judge. Later, Java also developed a compiler that directly generates native code [How to GCJ (GNU Compiler for the Java)], and C/C++ also appeared through an interpreter version (such as CINT), then general Saying "interpreted execution" has become a concept that has almost no meaning for the entire Java language. Only when it is determined that the object of discussion is a specific Java implementation version and execution engine operating mode, will it be possible to talk about interpreted execution or compilation execution? More precise .

Interpretation and execution

In the Java language, the javac compiler completes the process of lexical analysis and grammatical analysis of the program code to the abstract syntax tree, and then traversing the syntax tree to generate a linear bytecode instruction stream, because this part of the action is performed outside the Java virtual machine , And the interpreter is inside the virtual machine, so the compilation of Java programs is implemented semi-independently.

4.2 Stack-based instruction set and register-based instruction set

The instruction stream output by the Java compiler is basically a stack-based instruction set architecture (Instruction Set Architecture, ISA) , which relies on the operand stack for work . Correspondingly, another commonly used instruction set architecture is a register-based instruction set , which  relies on registers to work .

So, what is the difference between the stack-based instruction set and the register-based instruction set?

For a simple example, use these two instructions to calculate the result of 1+1. The stack-based instruction set will look like this:
iconst_1

iconst_1

iadd

istore_0

After two iconst_1 instructions successively push two constants 1 into the stack, the iadd instruction pops and adds the two values ​​at the top of the stack, and then puts the result back to the top of the stack, and finally istore_0 puts the value at the top of the stack into the local variable table In the 0th Slot in.

If the instruction set is based on the register, the program may look like this:

mov eax, 1

add eax, 1

The mov instruction sets the value of the EAX register to 1, and then the add instruction adds 1 to the value, and the result is stored in the EAX register.

The main advantage of the stack-based instruction set is that it is portable. The registers are directly provided by the hardware, and the program directly depends on these hardware registers, and it is inevitably subject to hardware constraints.

The instruction set of the stack architecture has some other advantages, such as relatively more compact code and simpler implementation of the compiler.
The main disadvantage of the stack architecture instruction set is that the execution speed is relatively slow.

to sum up

In this section, we analyze how to find the correct method when the virtual machine executes the code, how to execute the bytecode in the method, and the memory structure involved when executing the code.

Guess you like

Origin blog.csdn.net/wr_java/article/details/115209048