In-depth understanding of the Java virtual machine _ chapter eight _ virtual machine bytecode execution engine

Notes based on <<In-depth understanding of the Java virtual machine>>

Overview

The execution engine of the physical machine is directly built on the operating system level of the processor, cache, and instruction set

The virtual machine execution engine is implemented by software, which can customize the structure of the instruction set and execution engine without being restricted by physical conditions, and can execute instruction set formats that are not directly supported by hardware

When the execution engine executes bytecode, there are two options: interpretation, execution, and compilation. But the input and output are the same: the input bytecode binary stream, the processing process is the equivalent process of bytecode analysis and execution, and the output is the execution result

Runtime stack frame structure

The virtual machine takes a method as the most basic execution unit, and the stack frame corresponds to a method, which is a stack element of the virtual machine stack in the data area of ​​the virtual machine when it is running. The process of each method from the start of the call to the end of the execution corresponds to the process of a stack frame from the stack to the stack in the virtual machine stack

The stack frame stores the local variable table, operand stack, dynamic connection, method return address and additional information

How much memory needs to be allocated for a stack frame is calculated when the program source code is compiled and written into the Code attribute of the method table. It will not be affected by the variable data during program runtime, but only depends on the program source code and the specific virtual Stack memory layout

From the perspective of a Java program, at the same time, on the same thread, all methods on the call stack are in the execution state at the same time

For the execution engine, in the active thread, only the method at the top of the stack is running, which is called the current stack frame and the current method

Local variable table

Is a storage space for a set of variable values, used to store method parameters and local variables defined inside the method

When the Java program is compiled into a Class file, the maximum capacity of the local variable table that the method needs to be allocated is determined in the max_locals data item of the Code attribute of the method

The local variable table is the smallest unit of the variable slot, a variable slot can store a data type within 32 bits

At least two things should be done for quoting:

	1.  根据引用直接或间接地查找对象在Java堆中的数据存放的起始地址或索引
	2.  根据引用直接或间接地查找对象所属数据类型在方法区中的存储的类型信息

For 64-bit data, two consecutive variable slot spaces (long, double) will be allocated, and the split bits will be read and written twice for 32-bit data. Because the local variable table is built in the thread stack and belongs to thread-private data, no matter whether two consecutive variable slots are atomic operations, it will not cause data competition and thread safety issues.

When the method is called, the virtual machine uses the local variable table to complete the transfer process of parameter values ​​to the parameter variable list, that is, the transfer of actual parameters to formal parameters. If it is an instance method, the variable slot with the 0th index of the local variable table stores the reference to the object instance to which the method belongs by default, that is, this

Variable slots can be reused, and those out of scope can be reassigned

The local variable table has no preparation stage, so if a local variable is defined but no initial value is assigned, it cannot be used. The compiler can check and prompt this point during compilation.

Operand stack

The maximum depth is also written into the max_stacks of the Code attribute at compile time

The stack capacity occupied by 32-bit data types is 1, and 64 is occupied by 2, and the depth of the operand stack will not exceed the maximum value set by max_stacks at any time

When doing arithmetic operations, by pushing the operand involved in the operation onto the top of the stack and then calling the operation instruction

For example, the iadd instruction requires that two int types have been stored in the top and second top elements of the operand stack at runtime. Executing this instruction will pop the two ints from the stack and add them, and then re-insert them on the stack.

Dynamic link

In order to support dynamic connection during method invocation, each stack frame contains a reference to the method to which the stack frame belongs in the runtime constant pool.

Method return address

Two ways to exit:

​ Normal call completion: The execution engine encounters a bytecode instruction returned by any method, and then there may be a return value passed to the upper method caller

​ Exception call completed: an exception was encountered during the execution of the method, and no matching exception handler was found in the exception table of the method

After the method exits, it must return to the position when the original method was called. If it is a normal exit, the PC counter value of the main method is saved

When the method exits: restore the local variable table and operand stack of the upper method, push the return value into the operand stack of the caller's stack frame, adjust the value of the PC counter to point to an instruction after the method call instruction

extra information

Some information that is not described in the specification can be added to the stack frame, such as information related to debugging and performance collection

Generally, dynamic connection, method return address, and additional information are grouped into one category, called stack frame information

Method call

Determine the version of the called method (that is, which method is called), and the specific operation process inside the method is not involved for the time being

All method calls stored in the Class file are only symbolic references, rather than the entry address of the method in the actual runtime memory layout (ie direct reference)

So some calls need to be during class loading or even during runtime to determine the direct reference of the target method

Parsing

In the parsing phase of the class, some of the symbol references will be converted to direct references, provided that these methods have a determinable version of the call before the program is actually run, and it will not change during runtime

Comply with "Knowable at compile time and immutable at runtime", there are mainly two categories: static methods and private files. The former is directly related to the type, and the latter cannot be accessed externally.

Call bytecode instruction

  • invokestatic is used to call static methods
  • invokespecial is used to call () method, private method, method of parent class
  • invokevirtual is used to call all virtual methods
  • invokeinterface is used to call interface methods, and an object that implements the interface will be determined at runtime
  • Invokedynamic first dynamically resolves the method referenced by the call site qualifier at runtime, and then executes it

As long as the method can be called by the invokestatic and invokespecial instructions, the unique calling version can be determined in the parsing phase

There are a total of methods that meet these conditions: static methods, private methods, instance constructors, parent methods, plus methods modified by final (although they are modified by invokevirtual)

These five types of methods can resolve symbolic references into direct references of the method when loading the class, collectively referred to as non-virtual methods

The resolution call must be a static process

Dispatch

Static dispatch

static abstract class Human
{
}
static class Man extends Human
{
}
static class Woman extends Human
{
}

public void sayHello(Human guy)
{
    System.out.println("Human");
}
public void sayHello(Man guy)
{
    System.out.println("Man");
}
public void sayHello(Woman guy)
{
    System.out.println("Women");
}

public static void main(String[] args)
{
    Human man = new Man();
    Human women = new Woman();
    MainTest mainTest = new MainTest();
    mainTest.sayHello(man);
    mainTest.sayHello(women);
    /*
    result:
        Human
        Human
     */
}

Human man = new Man();In, Human is called static type, Man is called actual type

The final static type is known during recompilation, and the result of the actual type change can only be determined at runtime

When the virtual machine is overloaded, the static type of the parameter is used as the basis for judgment instead of the actual type. In the compilation phase, the compiler decides which overloaded version to use according to the static type of the parameter

All dispatch actions that rely on static types to determine the version of method execution are called static dispatch. The most typical application is method overloading.

Although the compiler can determine the overloaded version of the method, in many cases the overloaded version is not unique, and can only determine a relatively more suitable version.

Dynamic dispatch

It is the realization of rewriting in polymorphism.

Distribute the method execution version according to the actual type of the variable at runtime

Field never participates in polymorphism

Today's Java language is a static multi-dispatch language, dynamic single-dispatch language

Invokevirtual parsing process

  1. Find the actual type of the object pointed to by the first element at the top of the operand stack, denoted as C
  2. If a method that matches the descriptor and simple name in the constant is found in type C, the access permission verification is performed, and if it passes, the direct reference of this method is returned, and the search process ends. IllegalAccessError is returned if it fails
  3. Otherwise, according to the inheritance relationship, the second step of the search and verification process is performed on each parent class of C from bottom to top
  4. If no suitable method is found, an AbstractMethodError is thrown

Realization of dynamic dispatch of virtual machines

A common optimization method is to create a virtual method table (vtable) in the method area, and use the virtual method table index instead of metadata to improve search performance

The virtual method table stores the actual entry address of each method. If a method is not overridden in the subclass, the address entry in the virtual method table of the subclass is the same as the address entry of the same method in the parent class, and they all point to The realization entrance of the parent class.

If the subclass is overridden, it is replaced with the entry address pointing to the implementation version of the subclass.

The virtual method table is generally initialized in the connection phase of class loading. After preparing the initial value of the class variable, the virtual machine also initializes the virtual method table of the class.

The default methods without final modification are virtual methods

Dynamically typed language support

Invokedynamic instruction, generated to achieve dynamic type language support

Dynamically typed language

The key feature is: the main process of its type checking is carried out at runtime rather than at compile time

Runtime exception: As long as the code does not execute to this line, no exception will be generated

Exception during connection: Even if the code is placed on a path branch that cannot be executed at all, an exception will still be thrown when the class is loaded

Another core feature of dynamically typed languages: variables have no type, and variable values ​​have types

Pros and cons

  1. Statically typed languages ​​can determine variable types during compilation, and compilers can improve comprehensive and rigorous type checking, which is conducive to stability and makes it easier for projects to reach larger scales
  2. The type is determined only during the runtime of a dynamically typed language, which provides developers with great flexibility and clarity, which means that development efficiency is improved

java.lang.invoke 包

The difference between MethodHandle and Reflection

  • Reflection is a method call that is simulated at the Java code level, and MethodHandle is a method call that simulates the bytecode level
  • Reflection is a comprehensive image of the Java side, which is heavyweight, and MethodHandle is lightweight
  • Reflection is difficult to optimize, MethodHandle can achieve various optimizations (such as method inlining, etc.)
  • Reflection only serves the Java language, and MethodHandle is designed to serve all Java virtual machine languages

Stack-based bytecode interpretation execution engine

Discuss how to execute the bytecode instructions in the method. There are two types of interpretation and execution, and compilation and execution.

Explain execution

IMG_20200825_184807

Before execution, lexical analysis and syntax analysis are performed on the source code of the program, and the source code is converted into an abstract directory tree.

Stack-based instruction set and register-based instruction set

The bytecode instruction stream output by the Javac compiler is basically a stack-based instruction set architecture. Most of the bytecode instruction stream is zero address instructions, which rely on the operand stack for work.

For example 1+1:

iconst_1
iconst_1
iadd
istore_0

After two iconst_1 instructions successively push two constants 1 into the stack, the iadd instruction pops the two values ​​at the top of the stack, adds them, and then puts the result back to the top of the stack, and finally istore_0 puts the value at the top of the stack into the local variable table In the 0th variable slot

Pros and cons

  • The main advantage of stack-based is that it is portable. With a stack architecture, user programs do not directly use registers. It can be implemented by a virtual machine to put some of the most frequently accessed data (program counter, stack top cache, etc.) into registers to improve performance . The code is compact and the compiler is easy to implement
iconst_1
iconst_1
iadd
istore_0

After two iconst_1 instructions successively push two constants 1 into the stack, the iadd instruction pops the two values ​​at the top of the stack, adds them, and then puts the result back to the top of the stack, and finally istore_0 puts the value at the top of the stack into the local variable table In the 0th variable slot

Pros and cons

  • The main advantage of stack-based is that it is portable. With a stack architecture, user programs do not directly use registers. It can be implemented by a virtual machine to put some of the most frequently accessed data (program counter, stack top cache, etc.) into registers to improve performance . The code is compact and the compiler is easy to implement
  • The disadvantage is that the execution speed is slightly slower, the number of instructions required to complete the same function is large, and frequent stack access means frequent memory access

Guess you like

Origin blog.csdn.net/weixin_42249196/article/details/108253734