Virtual machine bytecode execution engine

Article directory

Virtual machine bytecode execution engine
I. Overview
2. Runtime stack frame structure
3. Method call

I. Overview

The execution engine is one of the core components of the Java virtual machine. It is implemented by software and can execute instruction set formats that are not directly supported by the hardware.
For different virtual machine implementations, the execution engine may have interpreted execution, compiled execution, or both. However, the input and output of all execution engines are the same. The input is a bytecode binary stream and the output is the execution result.

2. Runtime stack frame structure

The Java virtual machine uses methods as the most basic execution unit, and the stack frame is the data structure behind the virtual machine's method invocation and method execution. It is the basic element of the virtual machine stack in the virtual machine runtime data area. The stack frame stores information such as the method's local variable table, operand stack, dynamic link, and method return address. How much memory does a stack frame need to allocate? The stack memory layout form implemented by the specific virtual machine.
From the perspective of a Java program, at the same time, all methods on the call stack in the same thread are in the execution state. From the perspective of the execution engine, only the stack frame at the top of the stack is valid, which is called the current stack frame, and the corresponding method is called the current method. The stack frame structure is shown in Figure 8-1.
Insert image description here
(1) Local variable table
The local variable table is used to store method parameters and local variables within the method. When the Java program is compiled into a class file, the max_locals data item of the Code attribute of the method is determined in the max_locals data item that the method needs to allocate. The maximum size of the local variable table.
The capacity of the local variable table is based on the variable slot as the minimum unit. The size of the variable slot is not clearly defined. It only says that each variable slot needs to be able to store variables of type boolean, byte, char, short, int, float, reference or returnAddress. The maximum size of these 8 data types is 32 bits. So the variable slot needs to be larger than 32 bits, but not necessarily 32 bits.
For the 8 data types mentioned, the first 6 are similar to those in Java. Reference represents a reference to an object instance, and returnAddress points to the address of a bytecode instruction. This type is already rare. For 64-bit data types such as long and double, the Java virtual machine allocates two consecutive variable slot spaces in a high-bit alignment method.
The Java virtual machine uses local variable tables through index positioning. If it is a 32-bit data type, index N corresponds to the Nth variable slot. If it is a 64-bit data type, index N corresponds to the Nth and N+1 variable slot. For two adjacent variable slots that store 64-bit data type variables together, it is not allowed to access a certain variable slot independently.
When a method is called, the local variable table is used to pass the parameter values to the parameter variable list.
In order to save the memory space occupied by the stack frame, the variable slots in the local variable table can be reused. If the current bytecode PC counter has exceeded the counting range of a variable in the method body, then the variable slot of this variable can be reused by other variables. But sometimes the reuse of variable slots will affect garbage collection.
Insert image description here

The space occupied by the placeholder in Code 8-1 and 8-2 is not reclaimed, only the space occupied by the palceholder in Code 8-3 is reclaimed. This is because the fundamental reason for judging whether to be recycled is: whether there is still a reference to the placeholder in the variable slot in the local variable table. 8-1 is because it is still in the scope of placeholder when executing System.gc; 8-2 is because the variable slot originally occupied by placeholder has not been reused by other variables. In 8-3, it is no longer in the scope of the placeholder, and int a=0 reuses the variable slot originally occupied by the placeholder. Here, it is the same if you manually assign a null value to the placeholder, but assigning a null value will be eliminated as an invalid operation after even compilation and optimization.
Local variables do not have a preparation phase like the previously mentioned class variables. The previous class variables will be assigned a system initial value in the preparation phase and then a defined initial value in the initialization phase, so even if the class variable is not assigned in the code Initial values are also possible. But if a local variable is defined but not assigned an initial value, it cannot be used at all.

(2) Operand stack
The operand stack is similar to the local variable table. The maximum depth is also written into the max_stacks data item of the Code attribute during compilation. The 32-bit data type occupies one stack frame, and the 64-bit data type occupies two stack frames.
When the method is executed, the operands involved in the operation are first pushed onto the stack, and then the operation instructions are called to pop the corresponding numbers off the stack for operation, and then the results are pushed onto the stack. The element type in the stack needs to strictly match the bytecode instruction. For example, in the iadd instruction, the two elements on the top of the stack must be of type int.
In the conceptual model, two different stack frames are completely independent as elements of different virtual machine stacks. However, in most virtual machine implementations, stack frames overlap. as the picture shows.
Insert image description here
(3) Dynamic connection
Each stack frame contains a reference to the method to which the stack frame belongs in the runtime constant pool. This reference is held to support dynamic connection during method calling. Symbol references are converted into direct references during the class loading phase or for the first time, which is static resolution. At runtime symbolic references are converted into direct references, which are dynamically linked.

(4) Method return address
After a method starts executing, there are only two ways to exit the method. The first is to encounter a bytecode instruction returned by a method. This method is called "normal call completion". The second method is when an exception is encountered and the exception has not been handled yet. This method is called "exception call completion".
No matter which exit method is used, after the method exits, it must return to the location where the original method was called.
The method exit process is equivalent to popping the current stack frame. Therefore, the operations that may be performed when exiting include: restoring the local variable table and operand stack of the upper method, pushing the return value into the caller's operand stack, and adjusting the PC counter. value to point to the next instruction.

(5) Additional information
When discussing concepts, dynamic connections, method return addresses, and other additional information are generally grouped into one category, called stack frame information.

3. Method call

Method invocation does not involve specific execution, but only determines the version of the method (which method is called). All method calls are only symbolic references in the class file, not the entry addresses (direct references) in the actual runtime memory layout. This allows some classes to determine a direct reference to the target method during class loading or even during runtime.
(1) Analysis
During the class loading phase, some symbol references are converted into direct references. This premise is that the method has a certain version before the program is run. Such method calls are called parsing.
In Java, there are mainly static methods and private methods that are "compiler-knowable and run-time immutable".
To call different types of methods, there are different instructions in the bytecode instruction set:
invokestatic: call static methods
invokespecial: call instance constructor methods, parent class methods and private methods
invokevirtual: call virtual methods
invokeinterface: call interface methods
invokedynamic: dynamically parse first out of the method referenced by the call site qualifier, and then execute the method.
As long as the method can be called by invokestatic and invokespecial, the unique version can be determined during the parsing phase. There are four types: static methods, private methods, class constructors, parent class methods, and methods modified by final, but methods modified by final are called by invokevirtual. These five methods replace symbolic references with direct references during the class loading phase. They are called "non-virtual methods", and the other methods are called "virtual methods".
Insert image description here
As shown in the figure, the static method sayHello only belongs to this type, and there is no way to override or hide this method.
Use the javap command to check the bytecode and find that the sayHello() method is indeed called through the invokestatic method.

(2) Dispatch
Java is an object-oriented programming language. It has three basic characteristics of object-oriented: encapsulation, inheritance, and polymorphism. The dispatch call process in this section will reveal some basic manifestations of polymorphism.
1. Static
dispatch The word dispatch is inherently dynamic. In the book, the English name is "Method Overload Resolution", which should belong to the resolution in 8.2. However, many Chinese materials call this behavior static dispatch.
Insert image description here
The running result is: hello, guy!
hello, guy!
For the variables man and woman, Human is a static type, and the corresponding Man and Woman are runtime types (actual types). The final static type of a variable is known at compile time, but the actual type can only be determined at runtime.
The virtual machine uses the static type of the parameters as a basis when overloading. During compilation, the method to be called is selected based on the static type of the parameters. The most typical example of static dispatch is overloading, which occurs during compilation, so the static dispatch action is not actually performed by the virtual machine.
Overloaded method matching does not necessarily mean complete correspondence, and will also be automatically converted. This conversion has priority. For example, the character 'a', its overload matching method is to match the parameter types in the order of char>int>long>float>double. If there are none, it will be automatically boxed and matched to Character. Commenting out the Character type parameters will match the Serializable and Comparable interfaces implemented by Character. It is possible to match the parent class Object if not.

2. Dynamic dispatch:
Static dispatch has a great relationship with overloading, and dynamic dispatch has a great relationship with rewriting.
Insert image description here
The running result is man say hello
woman say hello
woman say hello

The static types of the two variables in the code are Human, and the actual types are Man and Woman. The actual type of the variable man later became Woman.
Insert image description here
According to the bytecode, it can be seen that the instructions and parameters of the two calls are the same, but the target method of final execution is different, so invokevirtual has some judgments.
The parsing process of invokevirtual is roughly as follows:
1) Find the actual type of the object pointed to by the first element on the top of the operand stack, denoted as C.
2) If a method matching the descriptor and simple name in the constant is found in type C, access permission verification is performed. If it passes, a direct reference to the method is returned, and the search process ends; if it fails, java.lang is returned. .IllegalAccessError exception.
3) Otherwise, perform the second step of search and verification process for each parent class of C from bottom to top according to the inheritance relationship.
4) If no suitable method is found, a java.lang.AbstractMethodError exception is thrown.
So invokevirtual will also select the method version based on the actual type of the method receiver. This process is the essence of Java rewriting, and this dispatch process is dynamic dispatch.
The root of polymorphism lies in the virtual method call invokevirtual, so the field is not polymorphic. When a subclass has a field with the same name as the parent class, although both fields exist in memory, the subclass will actually overwrite the field of the parent class.
Insert image description here

Result analysis: First, the subclass implicitly calls the parent class's constructor. The showMeTheMoney called by the parent class's constructor is a virtual method. The virtual method looks at the actual type, so it is actually Son::showMeTheMoney. At this time, Son's money field is still there. is 0, the first line is output. Then it comes to the call of Son's construction method, and the second line is output. The money field is accessed in the main method, looking at the static type, so it is 2 in Father.

3. Single dispatch and multiple dispatch.
The receiver of the method and the parameters of the method are collectively called the instance of the method. According to the number of instances, the method can be divided into single dispatch and multiple dispatch.
Insert image description here

The process of compiler selection during the compilation phase is the process of static dispatch. The basis for selecting the target at this time is: 1. The static type is Father or Son 2. The parameter type is 360 or QQ. This is based on two selections, so the static dispatch process is multi-dispatch.
The selection of virtual machines in the running phase is the process of dynamic allocation. When executing "son.hardChoice(new QQ())", it has been determined that the parameter type is QQ, so the only thing that affects the choice is whether the actual type of the receiver is Father or Son. The selection is based on only one volume, so the dynamic allocation process is single allocation.
Therefore, the Java language is a static multi-dispatch and dynamic single-dispatch language.

4. Implementation of virtual machine dynamic dispatch
Dynamic dispatch is a very frequent action, so metadata will not be searched so frequently during actual operation, but a virtual method table will be established.
Insert image description here
The virtual method table stores the actual entry address of each method. If a method has not been overridden in the subclass, then its address in the subclass and the parent class are the same, and both point to the implementation entry of the parent class. As shown in Figure 8-3, Son has overridden all methods of Father, so there is no arrow pointing to Father, but neither of them has overridden the methods of Object, so both have arrows pointing to Object.

The story of JVM - virtual machine bytecode execution engine

Virtual machine bytecode execution engine

Article directory

I. Overview

2. Runtime stack frame structure

3. Method call

おすすめ