How is memory allocated when the program is running?

Many people have previously divided Java memory into heap memory (heap) and stack memory (Stack). This division method reflects to a certain extent that these two areas are the memory areas that Java engineers are most concerned about. However, this division is not entirely accurate.

Java's memory area division is actually far more complicated than this: the Java virtual machine divides the memory it manages into different data areas during the execution of Java programs. The following picture describes the process of a HelloWorld.java file being loaded into memory by the JVM :

  1. The HelloWorld.java file first needs to be compiled by a compiler to generate the HelloWorld.class bytecode file.

  2. When accessing the HelloWorld class in a Java program, it is necessary to load HelloWorld.class into the memory of the JVM through the ClassLoader (class loader).

  3. The memory in the JVM can be divided into several different data areas, mainly divided into: program counter, virtual machine stack, local method stack, heap, and method area .

1.1 Program Counter Register

Java programs are multi-threaded, and the CPU can allocate execution time slices among multiple threads. When a thread is suspended by the CPU, it is necessary to record the position where the code has been executed, so that when the CPU re-executes the thread, it knows which line of instruction to start executing. This is what the program counter is for.

The "program counter" is a small memory space in the virtual machine, which is mainly used to record the execution position of the current thread.

As shown in the figure below: each thread will record a position where the current method is executed. When the CPU switches back to a certain thread, it will continue to execute instructions downward according to the number recorded by the program counter.

In fact, in addition to the recovery thread operation demonstrated in the above figure, other familiar branch operations, loop operations, jumps, exception handling, etc. also need to rely on this counter to complete.

There are a few more things to note about the program counter:

  1. In the Java virtual machine specification, there is no OutOfMemoryError condition for the program counter area (maybe it feels unnecessary).

  2. Thread-private, each thread has a private program counter inside. Its life cycle is created with the creation of the thread, and dies with the end of the thread.

  3. When a thread is executing a Java method, this counter records the address of the virtual machine bytecode instruction being executed. If the Native method is being executed, the counter value is empty (Undefined).

1.2 Virtual machine stack

The virtual machine stack is also thread-private and synchronized with the life cycle of the thread. In the Java virtual machine specification, two exception conditions are specified for this area:

  1. StackOverflowError : Thrown when the thread request stack depth exceeds the depth allowed by the virtual machine stack.

  2. OutOfMemoryError : Thrown when the Java virtual machine dynamically expands to the point where it cannot apply for enough memory.

In the process of learning the Java virtual machine, we often see a sentence:

The JVM is executed by a stack-based interpreter, and the DVM is executed by a register-based interpreter.

The "stack-based" in the above sentence refers to the virtual machine stack . The original intention of the virtual machine stack is to describe the memory model of Java method execution. When each method is executed, the JVM will create a stack frame in the virtual machine stack. Next, let's see what this stack frame is.

stack frame

Stack Frame is a data structure used to support virtual machine method calling and method execution. When each thread executes a certain method, it will create a stack frame for this method.

We can understand it this way: a thread contains multiple stack frames, and each stack frame contains local variable tables , operand stacks , dynamic links , return addresses, etc. As shown below:

local variable table

The local variable table is the storage space for variable values. The parameters passed when we call the method and the local variables created inside the method are all stored in the local variable table . When Java is compiled into a class file, the max_locals data item in the method's Code attribute table will determine the capacity of the maximum local variable table that the method needs to allocate. As shown in the following code:

public static int add(int k) {
    
    

	int i = 1;

	int j = 2;

	return i + j + k;

}

After decompiling with javap -v, the following bytecode instructions are obtained:

public static int add(int);

  descriptor: (I)I

  flags: ACC_PUBLIC, ACC_STATIC

  Code:

    stack=2, locals=3, args_size=1

      0: iconst_1

      1: istore_1

      2: iconst_2

      3: istore_2

      4: iload_1

      5: iload_2

      6: iadd

      7: iload_0

      8: iadd

      9: ireturn

The above locals=3 means that the length of the local variable table is 3, that is to say, after compilation, the length of the local variable table has been determined to be 3, and the parameters k and local variables i and j are saved respectively.

Note : The system will not assign initial values ​​​​to local variables (instance variables and class variables will be assigned initial values), which means that there is no preparation phase like class variables. This point will be introduced in detail in the subsequent Class initialization class.

operand stack

The operand stack (Operand Stack) is also often called the operation stack, which is a last-in, first-out (LIFO) stack.

Like the local variable table, the maximum depth of the operand stack is also written into the max_stacks data item in the Code attribute table of the method when compiling. The elements in the stack can be any Java data type, including long and double.

When a method just starts executing, the method's operand stack is empty. In the process of method execution, various bytecode instructions will be pushed into and popped from the operand stack (for example, the iadd instruction pops the two elements at the top of the operand stack, performs addition operations, and recreates the result pushed back onto the operand stack).

dynamic link

The main purpose of dynamic linking is to support dynamic linking (Dynamic Linking) during method invocation.

In a class file, if a method calls other methods, the symbolic references of these methods need to be converted into direct references in the memory address where they are located, and the symbolic references exist in the method area .

In the Java virtual machine stack, each stack frame contains a symbolic reference pointing to the method to which the stack belongs in the runtime constant pool. The purpose of holding this reference is to support dynamic linking (Dynamic Linking) during method invocation. The specific process will be introduced in the subsequent bytecode execution class.

return address

After a method starts executing, there are only two ways to exit the method:

  • Normal exit : refers to the normal completion of the code in the method, or encounters any bytecode instruction (such as return) returned by the method and exits without throwing any exception.

  • Abnormal exit : refers to an exception encountered during method execution, and the exception is not handled inside the method body, causing the method to exit.

No matter how the current method exits, after the method exits, it is necessary to return to the place where the method was called, so that the program can continue to execute. The "return address" in the virtual machine stack is used to help the current method restore its upper method execution state.

Generally speaking, when the method exits normally, the caller's PC count value can be used as the return address, and this count value may be saved in the stack frame. When the method exits abnormally, the return address is determined by the exception handler table, and this part of information is generally not saved in the stack frame.

Examples

I use a simple  add()  method to demonstrate, the code is as follows:

public int add() {
    
    

  int i = 1;

  int j = 2;

  int result = i + j;

  return result + 10;

}

We often use the javap command to view the bytecode instructions of a certain class, such as the code of the add() method. The bytecode instructions after javap are as follows:

0: iconst_1    (把常量 1 压入操作数栈栈顶)

1: istore_1    (把操作数栈栈顶的出栈放入局部变量表索引为 1 的位置)

2: iconst_2    (把常量 2 压入操作数栈栈顶)

3: istore_2    (把操作数栈栈顶的出栈放入局部变量表索引为 2 的位置)

4: iload_1     (把局部变量表索引为 1 的值放入操作数栈栈顶)

5: iload_2     (把局部变量表索引为 2 的值放入操作数栈栈顶)

6: iadd        (将操作数栈栈顶的和栈顶下面的一个进行加法运算后放入栈顶)

7: istore_3    (把操作数栈栈顶的出栈放入局部变量表索引为 3 的位置)

8: iload_3     (把局部变量表索引为 3 的值放入操作数栈栈顶)

9: bipush 10   (把常量 10 压入操作数栈栈顶)

11: iadd       (将操作数栈栈顶的和栈顶下面的一个进行加法运算后放入栈顶)

12: ireturn    (结束)

It can also be seen from the above bytecode instructions that, in fact, the local variable table and the operand stack cooperate to achieve a certain operation effect during code execution. Next, let's look at the actual situation of the virtual machine stack during the execution of these lines of code through the diagram.

First, let’s talk about what each command means:

  • iconst and bipush , these two instructions push the constant to the top of the operand stack, the difference is: when the value of int is -1~5, the iconst instruction is used, and the value of -128~127 is the bipush instruction.

  • istore puts the top element of the operand stack into a certain index position of the local variable table, for example, istore_5 means putting the top element of the operand stack into the position with subscript 5 in the local variable table.

  • iload loads the value at a subscript in the local variable table to the top of the operand stack, for example, iload_2 means to push the value at index 2 in the local variable table to the top of the operand stack.

  • iadd  represents an addition operation, specifically adding the two elements at the top of the operand stack, and then pushing the result back onto the top of the stack.

First, when Add.java is compiled into Add.class, the size of the local variable table and the depth of the operand stack in the stack frame have been completely determined, and written into the Code attribute of the method table . Therefore, the size of the local variable table is determined. There are 3 local variables in the add() method, so the size of the local variable table is 3, but the operand stack is empty at this time.

So when the code is just executed to the add method, the local variable table and operand stack are as follows:

icons_1 pushes the constant 1 onto the top of the operand stack, the result is as follows:

istore_1 pops the element at the top of the operand stack and puts it into the position marked as 1 in the local variable table. The result is as follows:

It can be seen that the operand stack becomes empty again at this time, and the popped element 1 is saved in the local variable table.

iconst_2 pushes the constant 2 onto the top of the operand stack, the result is as follows:

istore_2 pops the element at the top of the operand stack and puts it into the position marked as 2 in the local variable table. The result is as follows:

Next is the two-step iload operation, namely iload_1 and iload_2. Respectively represent that the elements with subscript 1 and subscript 2 in the local variable table are pushed back into the operand stack, and the results are as follows:

Next, perform the iadd operation. This operation will add the two elements at the top of the stack (that is, 1 and 2), and then push the result back to the top of the stack. The result after execution is as follows:

istor_3 pops the top element of the operand stack and saves it in the position marked 3 in the local variable table. The result is as follows:

iload_3 pushes the element with subscript 3 in the local variable table to the top of the operand stack, and the result is as follows:

bipush 10 pushes the constant 10 onto the operand stack, the result is as follows:

Execute the iadd operation again. Note that the top two elements on the top of the stack are 3 and 10, so the result after execution is as follows:

Finally, the return instruction is executed, and the element 13 at the top of the operand stack is returned to the upper layer method. So far the add() method is executed. The local variable table and operand stack will also be destroyed one after another.

1.3 Native method stack

The local method stack is basically the same as the virtual stack described above, but for the local (native) method. If JNI is involved in development, you may have more access to the local method stack. In some virtual machine implementations, the two have been combined into one (such as HotSpot).

1.4 heap

The Java heap (Heap) is the largest piece of memory managed by the JVM. The sole purpose of this area is to store object instances. Almost all object instances are allocated in the heap, so it is also the main area managed by the Java garbage collector (GC) , sometimes called "GC heap" ( the GC recovery mechanism of the heap will be introduced in detail in subsequent lessons ). At the same time, it is also a memory area shared by all threads, so if objects allocated in this area are accessed by multiple threads, thread safety issues need to be considered.

According to the different object storage time, the memory in the heap can be divided into the young generation (Young) and the old generation (Old), and the young generation is divided into Eden and Survivor areas. Specifically as shown in the figure below:

Different areas in the figure store objects with different life cycles. In this way, different garbage collection algorithms can be used according to different regions, so as to be more targeted and improve the efficiency of garbage collection.

1.5 Method area

The method area (Method Area) is also a runtime data area specified in the JVM specification. The method area mainly stores class information (version, field, method, interface), constants, static variables, code and data compiled by the JVM that have been loaded by the JVM. Like the heap, this area is also a memory area shared by each thread.

Note: Regarding the method area, many developers will confuse it with the "permanent area" . So I will compare these two concepts here:

  • The method area is an area stipulated in the JVM specification, but it is not an actual implementation. Do not confuse the specification with the implementation. Different JVM vendors may have different versions of the "method area" implementation.

  • HotSpot used the "permanent area" (or Perm area) to implement the method area before JDK 1.7 . After JDK 1.8, the "permanent area" has been removed and replaced by one called "metaspace (metaspace)" Method to realize.

To sum it up :

  • The method area is something at the specification level, which specifies what data should be stored in this area.

  • The permanent area or metaspace is a different implementation of the method area, and it is something at the implementation level.

1.6 Abnormal reproduction

StackOverflowError stack overflow exception

Recursive calls are a common scenario that causes StackOverflowError, such as the following code:

In the method method, recursively calls itself, and no recursive end condition is set. When running the above code, a StackOverflowError is generated.

The reason is that every time the method method is called, a stack frame will be created in the virtual machine stack. Because it is a recursive call, the method method will not exit, nor will it destroy the stack frame, so it will inevitably lead to a StackOverflowError. Therefore, extreme caution should be exercised when using recursion.

OutOfMemoryError memory overflow exception

In theory, OutOfMemoryError may occur in the virtual machine stack, heap, and method area. But in actual projects, most of them happen in the heap. For example the following code:

In an infinite loop, dynamically add new HeapError objects to the ArrayList. This will continue to occupy the memory in the heap. When the heap memory is not enough, an OutOfMemoryError will inevitably occur, which is a memory overflow exception.

Xms and Xmx in the above figure are virtual machine operating parameters, which will be introduced in detail in the next section of garbage collection.

Summarize

For the memory layout of the JVM runtime, we need to always remember one thing: the five pieces of content described above are all rules defined in the Java Virtual Machine Specification. These rules only describe what each area is responsible for and what kind of storage it is. Data, how to handle exceptions, whether to allow sharing between threads, etc. Do not understand them as the " concrete implementation " of virtual machines. There are many specific implementations of virtual machines, such as Sun's HotSpot, JRocket, IBM J9, and Android Dalvik and ART, which we are very familiar with. These specific implementations have different implementation methods on the premise of conforming to the above five runtime data areas.

Finally, we use a picture to summarize the content introduced in this lesson:

To sum up, there are two "stacks" and one "heap" in the runtime memory structure of the JVM, which are: the Java virtual machine stack and the local method stack, as well as the "GC heap" and the method area. In addition, there is a program counter, but our developers will hardly use this part, so it is not the key learning content. Only the heap and the method area in the JVM memory are data areas shared by threads, and other areas are private to threads. And the program counter is the only area in the Java Virtual Machine Specification that doesn't specify any OutOfMemoryError conditions.

Guess you like

Origin blog.csdn.net/gqg_guan/article/details/132366070