JVM runtime stack frame structure

A stack frame is a data structure used for method invocation and method execution during virtual machine execution. It is a stack element of the virtual machine stack in the virtual machine runtime data area. The process from the beginning of the call to the completion of the execution of each method corresponds to the process of pushing and popping a stack frame from the stack.

The size of the local variable table and the depth of the operand stack required by each stack frame when compiling the program code have been determined and written into the Code attribute of the method table. Will be affected by the variable data of the program runtime, only depends on the specific virtual machine implementation.

Method calls in a thread may be very long, and many methods are executing at the same time. For the execution engine, only the stack frame at the top of the stack is valid, which is called the current stack frame (Current Stack Frame), and the method associated with it is called the current method (Current Method).

In terms of conceptual model, a typical stack frame is mainly composed of local variable table (Local Stack Frame), operand stack (Operand Stack), dynamic linking (Dynamic Linking), method return address (Return Address) and some additional information, as shown in the following figure shown:

Next, the specific structure of these four parts in the stack frame is explained separately.

1. Local variable table

The local scalar table is a storage space for a set of variable values, which is used to store method parameters and local variables defined inside the method . The maximum size of the local variable table required by the method is specified in the max_locals data item of the Code attribute of the method table of the Class file.

Variable slot (Variable Slot) is the smallest unit of the local variable table, there is no mandatory size of 32 bits, although 32 bits are enough to store most types of data. A Slot can store 8 types of boolean, byte, char, short, int, float, reference and returnAddress. The reference represents a reference to an object instance, through which the index of the starting address of the object stored in the Java heap and the type information of the data type to which the object belongs in the method area can be obtained. returnAddress points to the address of a bytecode instruction. For 64-bit long and double variables, the virtual machine allocates two consecutive Slot spaces.

The virtual machine uses the local variable table by index positioning , and the index value ranges from 0 to the maximum number of slots in the local variable table. We know before that the local variable table stores method parameters and local variables. When the calling method is a non-static method, the slot at the 0th index in the local variable table is used to pass the reference of the object instance to which the method belongs by default, that is, the object pointed to by the "this" keyword. After the method parameters are assigned, the local variables defined inside the method are in turn assigned.

To save stack frame space, Slots in the local variable table can be reused. After leaving the scope of some variables, the slots corresponding to these variables can be handed over to other variables for use. This mechanism sometimes affects garbage collection behavior.

Consider the following three pieces of code (with the -verbose:gc parameter):

Code 1:

public class LocalVariableTable {
    public static void main(String[] args) {
        {
            byte[] placeholder = new byte[64 * 1024 * 1024];
        }
        System.gc();
    }
}

operation result:

[GC (System.gc()) 68864K->66336K(125952K), 0.0028420 secs]

[Full GC (System.gc()) 66336K->66253K(125952K), 0.0135963 secs]

Code 2:

public class LocalVariableTable {
    public static void main(String[] args) {
        {
            byte[] placeholder = new byte[64 * 1024 * 1024];
        }
        int a = 0;
        System.gc();
    }
}

operation result:

[GC (System.gc()) 68864K->66296K(125952K), 0.0018972 secs]

[Full GC (System.gc()) 66296K->717K(125952K), 0.0049899 secs]

Analysis: It can be known from the results that the placeholder variables in Code 1 and Code 2 should be recycled after System.gc() is executed, but the result is that only Code 2 is recycled. The reasons are as follows:

In code 1, although the placeholder leaves the scope, there is no local variable to read or write it, that is to say, the Slot occupied by it is not reused, that is to say, the memory occupied by the placeholder still has a reference to it, so it does not Be recycled. The variable a in the code 2 reuses the Slot of the placeholder, causing the placeholder reference to be deleted, so the occupied memory space can be recycled.

The book "Practical Java" makes "unused objects should be manually assigned to null" as a recommended coding rule, which is not a completely meaningless operation. But there should not be too much reliance on assigning null values, mainly for two reasons:

(1) From a coding point of view, it is the most elegant solution to use appropriate variable scope to control the recovery of variables.

(2) From the perspective of execution, using the operation of assigning null to optimize memory recovery is based on the conceptual model of the bytecode execution engine, but the conceptual model and the actual execution model may be completely different. When using the interpreter to execute, it is usually close to the conceptual model, but once it is compiled into native code by JIT, it is the main way for the virtual machine to execute the code. The assignment of null values will be completely eliminated after JIT compilation and optimization. At this time, assign null value is meaningless.

The difference between class variables and local variables:

Local variables do not have a "preparation stage" like class variables. There are two processes of assigning initial values to class variables, one in the preparation stage, giving the system initial value; the other in the initialization stage, giving the initial value defined by the programmer. So even if the programmer does not assign an initial value to the class variable during the initialization phase, it does not matter, the class variable still has a definite initial value. But local variables are different. If a local variable is defined but not assigned an initial value, the local variable cannot be used. Fortunately, the general compiler can check and prompt this.

2. Operand stack

The Operand Stack, also known as the Operation Stack, is a last-in, first-out stack. The max_stacks in the Code attribute of the Class file specifies the maximum stack depth during execution. The interpretation execution engine of the Java virtual machine is called a "stack-based execution engine", where the stack refers to the operand stack.

Arithmetic operations during method execution or when calling other methods for parameter transfer are performed through the operand stack.

In the conceptual model, the two stack frames are independent of each other. But most virtual machine implementations optimize so that the two stack frames partially overlap. Let the lower part of the operand stack overlap with the upper local variable table, so that part of the data can be shared when the method is called, and there is no need to copy and pass additional parameters.

3. Dynamic connection

Each stack frame contains a reference to the method to which the stack frame belongs in the runtime constant pool. This reference is held to support Dynamic Linking during method invocation.

A large number of symbolic references are stored in the constant pool of the Class file, and the method call instruction in the bytecode takes the symbolic reference to the method in the constant pool as a parameter. Some of these symbolic references are converted into direct references during the class loading phase or when they are used for the first time. This conversion is called static resolution . The other part will be turned into a direct reference during each run, this part is called dynamic linking .

4. Method return address

When a method starts executing, there are only two ways to exit the current method:

(1) When the execution encounters a return instruction, the return value will be passed to the upper method caller. This exit method is called Normal Method Invocation Completion. Generally speaking, the caller's PC counter can be used as a Return address.

(2) When the execution encounters an exception, and the current method body is not processed, it will cause the method to exit. At this time, there is no return value, which is called Abrupt Method Invocation Completion. The return address must pass through the exception handler. table to determine.

When the method returns, there are 3 possible actions:

(1) Restore the local variable table and operand stack of the upper-level method;

(2) Push the return value (if any) onto the operand stack of the caller's stack frame;

(3) Adjust the value of the PC counter to point to one instruction after the method call instruction.

5. Additional Information

The virtual machine specification does not specify what additional information is included in a specific virtual machine implementation, and the content of this part depends entirely on the specific implementation. In actual development, dynamic connection, method return address and additional information are generally classified into one category, which is called stack frame information.