"In-depth understanding of the Java virtual machine" (second edition) learning 1: JVM memory division

runtime data area

Let's take a picture to describe the memory division of the JVM

Please add image description

PS: I drew it myself, it’s unavoidable to be ugly…

Program Counter Register

The Program Counter Register is a small memory space that can be seen as a line number indicator of the bytecode executed by the current thread.

Features:

  1. The thread is private, and the counters between each thread do not affect each other;
  2. It is the only area in the Java Virtual Machine specification that does not specify any OutOfMemoryError conditions.

Usage: If the thread is currently executing a Java method, the counter records the address of the currently executed virtual machine bytecode instruction; if the current execution is a Native method, the counter is empty (Undefined).

Purpose: To restore the correct execution position after thread switching.

Virtual Machine Stack

The virtual machine stack describes the memory model of Java method execution. The basic internal unit is the stack frame. A stack frame corresponds to a method. The process of each method from invocation to completion of execution corresponds to a stack frame in the virtual machine stack. The process from stacking to popping.

Features:

  1. Thread private, life cycle is the same as thread

There are two exceptions to the virtual machine stack:

  1. Stack Overflow (StackOverflowError): The stack depth requested by the thread is greater than the depth allowed by the virtual machine, which mostly occurs in method recursion;
  2. Out of memory (OutOfMemoryError)

Stack Frame

A stack frame is a data structure used to support method invocation and method execution by a virtual machine, and is a stack element of a virtual machine stack.

insert image description here

As shown in the figure above, the stack frame includes local variable table, operand stack, dynamic connection, method return address and additional information.

In the active stack, only the stack frame at the top of the stack is valid, which is called the current stack frame (Current Stack Frame) , and the method associated with this stack frame is called the current method (Current Method) . All bytecode instructions operate on the current stack frame.

Local Variable Table

The Local Variable Table is a set of variable value storage spaces used to store method parameters and local variables defined inside the method.

The maximum capacity of the local variable table is written into the max_locals data item of the Code attribute during compilation (the size is determined during compilation, and there is no overflow).

The basic unit of the local variable table

The capacity of the local variable table takes the variable slot (Variable Slot, Slot for short) as the smallest unit, and each slot can store a boolean, byte, char, short, int, float, reference (reference to an object instance) or returnAddress type data , they all occupy a memory length of less than or equal to 32 bits in Java.

For the book "As long as you ensure that even if you use 64-bit physical memory space to implement a Slot in a 64-bit machine, the virtual machine still uses alignment and padding to make the Slot look the same as in a 32-bit virtual machine. "This sentence is questionable.
According to this sentence, doesn't it mean that although int occupies 64 bits in the 64-bit JVM, the remaining 32 bits are a bunch of blanks?
I haven't found an answer to this question online, so I'll leave it to you.

For 64-bit data types, the virtual machine allocates two consecutive Slot spaces in a high-order alignment. There are only two types of 64-bit data types in Java (the reference type may be 32-bit or 64-bit), which leads to a problem, which is the non-atomic contract of ** long and double** , we will leave this for later.

I have doubts about the "reference type may be 32-bit or 64-bit" in the book, so how to judge whether the reference is 32-bit or 64-bit?
I haven't found an answer to this for a while.

Use of local variable table

The virtual machine uses local variable variables by index positioning, and the index range starts from 0 to the maximum number of slots in the local variable table. If the variable of 32-bit data type is accessed, the index n represents the use of the nth Slot. If it is a variable of 64-bit data type, it means that both n and n+1 Slots will be used at the same time.

For two adjacent Slots that jointly store a 64-bit data, it is not allowed to access one of them in any way! ! !

When the method is executed, the virtual machine uses the local variable table to complete the transfer process of parameter values ​​to the parameter variable list. If an instance method (non-static method) is executed, the slot at the 0th index in the local variable table is used to transfer the method by default. The reference to the owning object instance, the implicit parameter can be accessed in the method through the this keyword. The remaining parameters are arranged in the order of the parameter table, occupying the local variable Slot starting from 1. After the parameter table is allocated, the remaining Slots are allocated according to the variable order and scope defined inside the method body.

What if it is a class method (static method)? What is the Slot value of index 0 in the local variable table?
If it is a class method, the Slot value of the 0th index in the local variable table is 0, which explains why the this keyword cannot be used in the class method from the virtual machine level.

Slot reuse mechanism

In order to save the stack frame space as much as possible, the Slot in the local variable table can be reused. The scope of the variable defined in the method body does not necessarily cover the entire method body. If the value of the current bytecode PC counter has exceeded a certain The scope of the variable, the slot corresponding to this variable can be handed over to other variables for use.

Side effects of slot reuse: In some cases, the reuse of slots will affect garbage collection.

example:

/**
 * @author 小关同学
 * @create 2021/9/26
 */
public class Test1 {
    
    
    public static void main(String[] args) throws Exception{
    
    
        {
    
    
            byte[]placeholder = new byte[64 * 1024 *1024];
        }
        System.gc();
    }
}

Memory data:
[GC 69468K->66368K(249344K), 0.0012025 secs]
[Full GC 66368K->66091K(249344K), 0.0098479 secs]

We can see that the placeholder variable has not been recycled. This is because the original placeholder variable occupied Slot has not been reused by other variables, so the local variable table as part of GC Roots still maintains its association with it.

after modification

/**
 * @author 小关同学
 * @create 2021/9/26
 */
public class Test1 {
    
    
    public static void main(String[] args) throws Exception{
    
    
        {
    
    
            byte[]placeholder = new byte[64 * 1024 *1024];
        }
        int i = 0;
        System.gc();
    }
}

Memory data:
[GC 69468K->66320K(249344K), 0.0011144 secs]
[Full GC 66320K->555K(249344K), 0.0100487 secs]

We can clearly see that the memory is reclaimed, because the variable i reuses the Slot originally occupied by the variable placeholder, so the memory occupied by the variable placeholder is reclaimed.

Operand Stack

The Operand Stack, also called the Operation Stack, is a Last In First Out (LIFO) stack.

Like the local variable table, the maximum depth of the operand stack is also written into the max_stacks data item of the Code attribute during compilation (the size is determined during compilation, and there is no overflow).

The stack capacity occupied by 32-bit data types is 1, and the capacity occupied by 64-bit data types is 2.

When a method just starts to execute, the operand stack of this method is empty. During the execution of the method, there will be various bytecode instructions to write and extract content from the operand stack, that is, pop/ Push operation.

The execution engine of the Java virtual machine is called a "stack-based execution engine", and the "stack" referred to is the operand stack.

The data type of the elements in the operand stack must strictly match the sequence of bytecode instructions. When compiling the program code, the compiler must strictly guarantee this, and it must be verified again in the data flow analysis in the class verification stage. .

Modern virtual machines generally perform some optimization processing on the operand stack, as shown in the following figure:
insert image description here
As shown in the figure above, the virtual machine partially overlaps two stack frames, so that part of the operand stack of the lower stack frame overlaps with part of the local variable table of the upper stack frame. Together (mainly reflected in the case of parameter passing in the method), so that part of the data can be shared when the method is called, and there is no need to copy and pass additional parameters.

Dynamic Linking

Dynamic linking is mainly a method reference to the runtime constant pool. Each stack frame contains a reference to the method in the runtime constant pool to which the stack belongs. This reference is held to support Dynamic Linking in method calls.

Dynamic linking is closely related to method invocation, we will talk about method invocation later.

Method Return Address (Return Address)

After a method starts executing, there are only two ways to exit the method.
The first is that the execution engine encounters a bytecode instruction returned by any method. This method of exiting the method is called Normal Method Invocation Completion.
The second exit method is that an exception is encountered during the execution of the method, and the exception is not handled inside the method (no matching exception handler is found in the exception table of the local method), which will cause the method to exit. This way of exiting a method is called Abrupt Method Invocation Completion. A method that exits with an exception completion exit will not produce any return value to its upper-level caller.

No matter which exit method is adopted, after the method exits, it needs to return to the location where the method was called before the program can continue to execute.
When the method exits normally, the value of the caller's PC counter can be used as the return address, and this counter value may be saved in the stack frame.
When the method exits abnormally, the return address is determined by the exception handler table, and this part of the information is generally not saved in the stack frame.

Difference between method return address and PC counter:

  • The program counter refers to where the processor records where the current thread executes when it switches thread execution back and forth;
  • The return address of the method is to record the address of the method instruction to be executed next after the current method instruction is executed.

Essentially: the exit of the method is the process of popping the current stack frame from the stack. At this time, it is necessary to restore the local variable table and operand stack of the upper-level method, push the return value to the operand stack of the caller's stack frame, set the PC counter value, etc., so that the caller method can continue to execute.

Additional Information

This part of the information depends on the specific implementation of the virtual machine, such as debugging related information.

Native Method Stack

The function of the local method stack is similar to that of the virtual machine stack. The difference is that the virtual machine stack serves Java methods, while the local method stack serves the Native methods used by the virtual machine .

Like the virtual machine stack, the native method stack also throws StackOverflowError and OutOfMemoryError exceptions.

Java ((Java Heap)

The Java heap is the largest piece of memory managed by the virtual machine and is created when the virtual machine starts. Its sole purpose is to store object instances, and almost all object instances allocate memory here.

Note the previous "almost", it turns out that all object instances are indeed allocated on the heap, but with the development of the JIT compiler and the gradual maturity of escape analysis technology, the stack allocation and scalar replacement optimization technology will lead to some changes, some objects are not will be allocated on the heap again.

Features: thread sharing;

The Java heap is the main area managed by the garbage collector. Since the current garbage collector basically adopts a generational algorithm, the Java heap can also be subdivided into: the new generation and the old generation from the perspective of memory recovery; space, From Survivor space, To Survivor space, etc. From the perspective of memory allocation, the Java heap shared by threads may be divided into multiple thread-private allocation buffers (Thread Local Allocation Buffer, TLAB) .

The Java heap can be in a physically discontinuous memory space, as long as it is logically contiguous.

Method Area

The Method Area is used to store data such as class information, constants, static variables, and code compiled by the just-in-time compiler that have been loaded by the virtual machine.

Features: thread sharing;

On the Hotspot virtual machine, the method area is also called the "Permanent Generation", because the design team of the Hotspot virtual machine extended the GC generational collection to the method area, so that the garbage collector can manage the Java heap like Manage this part of the memory (not a good idea now), save the work of writing memory management code for the method area (it feels like being lazy).

Note: The concept of permanent generation does not exist on other virtual machines. This is unique to Hotspot virtual machines.

The memory reclamation goal of the method area is mainly for the reclamation of the constant pool and the unloading of the type. Generally speaking, the recovery rate of the method area is not high, especially the type of unloading, the conditions are quite harsh.

When the method area cannot meet the memory allocation requirements, an OutOfMemoryError exception will be thrown.

Runtime Constant Pool

The Runtime Constant Pool is part of the method area. One piece of information in the Class file is the Constant Pool Table, which is used to store various literals and symbolic references generated at compile time. Literals are equivalent to the concept of constants at the Java language level, such as text Strings, constant values ​​declared as final, etc. Symbolic references belong to the concept of compilation principles, including the following three types of constants:

  • Fully qualified names of classes and interfaces
  • Field names and descriptors
  • method name and descriptor

This part of the content is stored in the runtime constant pool of the method area after the class is loaded. In general, in addition to saving the symbolic references described in the Class file, the translated direct references are also stored in the runtime constant pool.

Another important feature of the runtime constant pool relative to the Class file constant pool is that it is dynamic . The Java language does not require constants to be generated at compile time, and new constants may also be put into the pool during runtime, such as the intern of the String class. ()method.

When the constant pool can no longer apply for memory, an OutOfMemoryError exception will be thrown.

Direct Memory

Direct Memory is not a memory area defined in the Java Virtual Machine Specification. In JDK 1.4, the NIO (New Input / Output) class is newly added, which introduces an I/O method based on Channel and Buffer, which can directly allocate off-heap memory using the Native function library, and then Operates as a reference to this memory through a DirectByteBuffer object stored in the Java heap. This avoids copying data back and forth between the Java heap and the Native heap, improving performance in some scenarios.

Although this part of the area uses the local direct memory, an OutOfMemoryError exception will also occur.

Reclamation of direct memory

It should be noted that the off-heap memory is not directly controlled by the JVM. This memory can only be reclaimed after the DirectByteBuffer is reclaimed, and the Young GC will only reclaim the unreachable DirectByteBuffer objects in the young generation and their direct memory. If these Most of the objects are promoted to the old generation, so the DirectByteBuffer object and its associated off-heap memory can only be completely reclaimed when the Full GC occurs. Therefore, the recovery of off-heap memory depends on Full GC

  • Full GC generally occurs when old-age garbage collection or code calls System.gc. Relying on old-age garbage collection to trigger Full GC to realize off-heap memory recovery is obviously too uncertain. If the old age has not been garbage collected, the off-heap memory will not be recycled, and the physical memory of the machine may be slowly exhausted. In order to avoid this situation, you can specify the maximum direct memory size through the parameter -XX:MaxDirectMemorySize. When its usage reaches the threshold, it will call System.gc to do a Full GC to complete controllable off-heap memory recovery. . The problem with this is that the recovery of off-heap memory depends on the code calling System.gc, and the JVM parameter -XX:+DisableExplicitGC will cause System.gc to be equal to an empty function, and will not trigger Full GC at all, so when using NIO such as Netty The framework should pay attention to whether the direct memory leak will be caused by this parameter.

  • If the -XX:MaxDirectMemorySize parameter is not specified, then according to directMemory = Runtime.getRuntime().maxMemory(), the value of the maximum direct memory is similar to the heap memory size

PS: You can also go to my personal blog to see more content
Personal blog address: Xiaoguan classmate's blog

Guess you like

Origin blog.csdn.net/weixin_45784666/article/details/120584553