[Java][JVM] Structure Principle and Runtime Data Area

1. Overview of Java Virtual Machine

The Java technology system officially defined by Oracle mainly includes the following parts:

  • Java programming language
  • Java virtual machines for various platforms
  • Class file format
  • Java API library
  • Third-party Java class library

The three parts of Java programming language, Java virtual machine and Java API library can be collectively referred to as JDK (Java Development Kit), which is the smallest environment for Java program development. In addition, the Java SE API subset and the Java virtual machine in the Java API are collectively referred to as JRE (Java Runtime Environment), which is the standard environment for running Java programs.
It can be seen from the above that the Java virtual machine is extremely important. It is the cornerstone of the entire Java platform and the running platform of the Java language compiled code. You can think of the Java virtual machine as an abstract computer with various instruction sets and various runtime data areas.

1.1 Java virtual machine family

Many students may think that the Java virtual machine is just a virtual machine. Does it have a family? Or think that the Java virtual machine refers to Oracle's HotSpot virtual machine. Here is a brief introduction to the Java virtual machine family. Since the Sun Classic VM included in the JDK1.0 released by Sun in 1996 to today, many kinds of virtual machines have appeared and disappeared. Here we will only briefly introduce the current surviving relatively mainstream Java virtual machines. .

HotSpot VM
Oracle JDK and OpenJDK's own virtual machines are the most mainstream and widely used Java virtual machines. Technical articles that introduce the Java virtual machine. If there are no special instructions, most of them are about HotSpot VM. HotSpot VM was not developed by Sun, but by Longview Technologies, a small company. It was acquired by Sun in 1997, and Sun was acquired by Oracle in 2009.
J9 VM 
J9 VM is a VM developed by IBM and is currently its main development Java virtual machine. The market positioning of J9 VM is close to that of HotSpot VM. It is a multi-purpose virtual machine that is designed from server to desktop application to embedded. The current performance level of J9 VM is roughly on the same level as HotSpot VM.
Zing VM is
based on Oracle's HotSpot VM and has improved many details that will affect latency. The three biggest selling points are:

  • 1. Low latency, "no pause" C4 GC, the pause caused by GC can be controlled below 10ms, and the supported Java heap size can reach 1TB;
  • 2. Quick preheating function after startup.
  • 3. Manageability: Zing Vision, a monitoring tool integrated in the JVM with zero overhead, can be opened all the time in the production environment.

1.2 Java virtual machine execution process

When we execute a Java program, what is its execution flow? As shown below.

Java program execution flow (2).png

From the above figure, we can see that the Java virtual machine is not necessarily related to the Java language. It is only related to a specific binary file: Class file.

2. Java virtual machine structure

The architecture mentioned here refers to the abstract behavior of the Java virtual machine, rather than specific implementations such as HotSpot VM. According to the Java virtual machine specification, the abstract Java virtual machine is shown in the figure below.

Java virtual machine architecture (3).png

2.1 Class file format

The Java file is compiled to generate a Class file. This binary format file does not depend on specific hardware and operating systems. Each Class file corresponds to the definition information of a unique class or interface, but the class or interface is not necessarily defined in the file. For example, the class and interface can be directly generated by the class loader.

The file structure of ClassFile is shown below.

ClassFile {

u4 magic; //Magic number, the fixed value is 0xCAFEBABE, used to determine whether the current file is a Class file that can be processed by the Java virtual machine

u2 minor_version; //Minor version number

u2 major_version; // Major version number

u2 constant_pool_count; //constant pool counter

cp_info constant_pool[constant_pool_count-1]; //constant pool

u2 access_flags; //Access flags at the class and interface level

u2 this_class; //Class index

u2 super_class; //Parent class index

u2 interfaces_count; //Interface counter

u2 interfaces[interfaces_count]; //interface table

u2 fields_count; //Field counter

field_info fields[fields_count]; //field table

u2 methods_count; //Method counter

method_info methods[methods_count]; //方法表

u2 attributes_count; //Attribute counter

attribute_info attributes[attributes_count]; //属性表

}

2.2 Class loader subsystem

The class loader subsystem finds and loads Class files into the Java virtual machine through a variety of class loaders. The Java virtual machine has two types of loader: system loader and user-defined loader. The system loader includes the following three types:

  • Bootstrap Class Loader: A loader implemented with C/C++ code to load the system classes required by the Java virtual machine when it is running. These system classes are in the {JRE_HOME}/lib directory. The startup of the Java virtual machine is accomplished by creating an initial class by the boot class loader. Since the class loader is implemented using the underlying C/C++ language related to the platform, the loader cannot be accessed by Java code. However, we can query whether a certain class has been loaded by the bootstrap class loader. The boot class loader does not inherit java.lang.ClassLoader.
  • Extensions Class Loader (Extensions Class Loader): Used to load Java extension classes. The extension classes are generally placed in the {JRE_HOME}/lib/ext/ directory to provide additional functions in addition to the system classes.
  • Application Class Loader: This class loader is used to load user code and is the entry point for user code. The application class loader regards the extended class loader as its own parent class loader. When trying to load a class, first try to load the extended class loader. If the extended class loader is successfully loaded, it will directly return the loading result Class instance, if If loading fails, it will ask whether the boot class loader has already loaded the class. If not, the application class loader will try to load it by itself.

User-defined loader implements its own class loader by inheriting the java.lang.ClassLoader class.
In addition to loading the Class file classes into the Java virtual machine, the class loader subsystem must also be responsible for verifying the correctness of the imported Class classes, allocating and initializing memory for class variables, and helping to resolve symbol references. These actions must be performed strictly in the following order:

1. Load: Find and load the Class file.
2. Links: verification, preparation, and analysis.

  • Verification: to ensure the correctness of the imported type.
  • Preparation: Assign fields to the static fields of the class and initialize these fields with default values.
  • Analysis: The specific value process is dynamically determined based on the symbolic reference of the runtime constant pool.

3. Initialization: Initialize the class variable to the correct initial value.

2.3 Data type

The data types of the Java virtual machine and the Java language are similar and can be divided into two categories: basic types and reference types. The Java virtual machine expects the compiler to complete type checking as much as possible during compilation, so that the virtual machine does not need to perform type checking operations during runtime.

2.4 Runtime data area

Many people divide Java's memory into heap memory and stack memory. This distribution is not accurate enough. Java's memory area division is actually far more complicated than this.
The Java virtual machine divides the memory it manages into different data areas during the execution of the Java program. According to the "Java Virtual Machine Specification (Java SE7 Edition)", these data areas are the program counter and the Java virtual machine. Stack, local method stack, Java heap and method area, let's introduce them one by one.

2.4.1 Program Counter

In order to ensure that the program can be executed continuously, the processor must have some means to determine the address of the next instruction, and the program counter plays this role.
Program Counter Register, also called PC register, is a small memory space. In the virtual machine conceptual model, when the bytecode interpreter works, it selects the next bytecode instruction that needs to be executed by changing the program counter. The multithreading of the Java virtual machine is achieved by alternately switching and allocating processor execution time. It is realized that only one processor executes instructions in a thread at a certain moment. In order to restore to the correct execution position after thread switching, each thread will have an independent program counter. Therefore, the program counter is thread-private. of. If the method executed by the thread is not a Native method, the program counter saves the address of the bytecode instruction being executed, and if it is a Native method, the value of the program counter is empty (Undefined). The program counter is the only data area in the Java virtual machine specification that does not specify any OutOfMemoryError.

2.4.2 Java virtual machine stack

Each Java virtual machine thread has a thread-private Java virtual machine stack (Java Virtual Machine Stacks). Its life cycle is the same as that of a thread, and it is created at the same time as the thread. The Java virtual machine stack stores the state of Java method calls in threads, including local variables, parameters, return values, and intermediate results of operations. A Java virtual machine stack contains multiple stack frames, and a stack frame is used to store information such as local variable tables, operand stacks, dynamic links, and method exits. When a thread calls a Java method, the virtual machine pushes a new stack frame into the thread's Java stack, and when the method is executed, the stack frame is popped from the Java stack. We usually refer to the stack memory (Stack) refers to the Java virtual machine stack.

When compiling the program code, how large the local variable table and how deep the operand stack needs to be in the stack frame have been completely determined, and written into the Code attribute of the method table. Therefore, how much memory needs to be allocated for a stack frame will not be affected by the variable data during program runtime, but only depends on the specific virtual machine implementation.


Two exceptions are defined in the Java Virtual Machine Specification:

  • If the stack capacity allocated by the thread request exceeds the maximum capacity allowed by the Java virtual machine, the Java virtual machine throws a StackOverflowError.
  • If the Java virtual machine stack can be dynamically expanded (most Java virtual machines can be dynamically expanded), but cannot apply for enough memory when expanding, or there is not enough memory to create the corresponding Java virtual machine stack when a new thread is created, It will throw an OutOfMemoryError exception.

There are some overlaps between these two situations: when the stack space cannot be allocated, whether the memory is too small or the used stack space is too large, is essentially just two descriptions of the same thing. In a single-threaded operation, whether it is because the stack frame is too large or the virtual machine stack space is too small, when the stack space cannot be allocated, the virtual machine throws StackOverflowError exceptions instead of getting OutOfMemoryError exceptions. In a multithreaded environment, OutOfMemoryError will be thrown.

 

The function and data structure of each part of the information stored in the stack frame are described in detail below. 

   1. Local variable table

   The local variable table is a set of variable value storage space used to store method parameters and local variables defined inside the method. The types of data stored in it are various basic data types, object references (references) and returnAddress types known at compile time ( It points to the address of a bytecode instruction). The memory space required by the local variable table is allocated during compilation, that is, when the Java program is compiled into a Class file, the capacity of the largest local variable table to be allocated is determined. When entering a method, how much local variable space the method needs to allocate on the stack is completely determined, and the size of the local variable table will not be changed during the running of the method.

    The capacity of the local variable table takes the variable slot (Slot) as the smallest unit. The virtual machine specification does not clearly specify the size of the memory space that a Slot should occupy (allow it to vary with the processor, operating system, or virtual machine). A Slot can store a data type within 32 bits: boolean , Byte, char, short, int, float, reference, and returnAddresss. Reference is the reference type of the object, and returnAddress serves the byte instruction. It executes the address of a bytecode instruction. For 64-bit data types (long and double), the virtual machine allocates two consecutive Slot spaces in a high-order manner.

    The virtual machine uses the local variable table by index positioning. The range of index values ​​is from 0 to the maximum number of slots in the local variable table. For variables of 32-bit data type, index n represents the nth Slot. For 64-bit variables, The index n represents the nth and n+1th Slots.

    When the method is executed, the virtual machine uses the local variable table to complete the transfer process of parameter values ​​to the parameter variable list. If it is an instance method (non-static), the slot with the 0th index in the local variable table is used by default Pass the reference of the object instance to which the method belongs. This implicit parameter can be accessed in the method through the keyword "this". The remaining parameters are arranged in the order of the parameter table, occupying the local variable Slot starting from 1. After the parameter table is allocated, the remaining Slots are allocated according to the variable order and scope defined in the method body.

    The Slot in the local variable table is reusable. The scope of the variable defined in the method body does not necessarily cover the entire method body. If the current bytecode PC counter value has exceeded the scope of a certain variable, then this The Slot corresponding to the variable can then be used by other variables. This design is not only to save space, in some cases Slot reuse will directly affect the garbage collection behavior of the system.

    2. Operand stack

    The operand stack is often called the operation stack, and the maximum depth of the operand stack is also determined at compile time. The stack capacity occupied by the 32-bit data type is 1, and the stack capacity occupied by the 64-bit data type is 2. When a method starts to execute, its operation stack is empty. During the execution of the method, there will be various bytecode instructions (such as: addition operation, assignment element calculation, etc.) to write and extract content to the operation stack , That is, push and pop operations.

    The interpretative execution engine of the Java virtual machine is called a "stack-based execution engine", where the "stack" referred to is the operand stack. Therefore, we also say that the Java virtual machine is stack-based, which is different from the Android virtual machine, which is register-based.

    The main advantage of the stack-based instruction set is its portability. The main disadvantage is that the execution speed is relatively slow; and because the registers are directly provided by the hardware, the main advantage of the register-based instruction set is the fast execution speed. The disadvantage is poor portability.

    3. Dynamic connection

    Each stack frame contains a reference to the method to which the stack frame belongs in the runtime constant pool (in the method area, described later). This reference is held to support dynamic connection during method invocation. There are a large number of symbol references in the constant pool of the Class file, and the method call instruction in the bytecode takes the symbol reference pointing to the method in the constant pool as the parameter. Some of these symbol references will be converted into direct references (such as final, static domains, etc.) during the class loading stage or the first time they are used, which is called static resolution, and the other part will be converted into direct references during each run. Part is called dynamic connection.

    4. Method return address

    When a method is executed, there are two ways to exit the method: the execution engine encounters a bytecode instruction returned by any method or encounters an exception, and the exception is not handled in the method body. No matter what exit method is used, after the method exits, it needs to return to the place where the method was called before the program can continue to execute. When the method returns, it may need to save some information in the stack frame to help restore the execution state of its upper method. Generally speaking, when the method exits normally, the value of the caller’s PC counter can be used as the return address. This counter value is probably saved in the stack frame. When the method exits abnormally, the return address is determined by the exception handler. , This part of information is generally not saved in the stack frame.

    The process of method exit is actually equivalent to exiting the current stack frame, so the operations that may be performed when exiting are: restore the local variable table and operand stack of the upper method, and if there is a return value, push it into the caller's stack frame In the operand stack, adjust the value of the PC counter to point to an instruction after the method call instruction.

 

2.4.3 Local method stack

The Java virtual machine implementation may need to use C Stacks to support the Native language. This C Stacks is the Native Method Stack. It is similar to the Java virtual machine stack, except that the native method stack is used to support Native method services. If the Java virtual machine does not support Native methods and does not rely on C Stacks, there is no need to support native method stacks. In the Java virtual machine specification, there are no mandatory provisions on the language and data structure of the local method stack, so the specific Java virtual machine can freely implement it. For example, HotSpot VM combines the local method stack and the Java virtual machine stack into one.
Similar to the Java virtual machine stack, the native method stack also throws StackOverflowError and OutOfMemoryError exceptions

2.4.4 Java Heap

The Java Heap is a runtime memory area shared by all threads. The Java heap is used to store object instances, and almost all object instances allocate memory here. The objects stored in the Java heap are managed by the garbage collector, and these managed objects are destroyed without and cannot be displayed. From the perspective of memory recovery, the Java heap can be roughly divided into the new generation and the old generation. From the perspective of memory allocation, the Java heap may be divided into multiple thread-private allocation buffers. No matter how it is divided, the content of Java heap storage is unchanged, and the division is done to recover or allocate memory faster.
The capacity of the Java heap can be fixed from time to time or dynamically expanded. The memory used by the Java heap does not need to be physically continuous, but logically continuous.
An exception is defined in the Java Virtual Machine Specification:

  • If there is not enough memory in the heap to complete the instance allocation, and the heap cannot be expanded, an OutOfMemoryError exception will be thrown.

 

The Java heap can be divided into two areas: the new generation and the old generation. The new generation can be divided into an Eden area and two Survivor areas. The two Survivor areas are named From and To to distinguish between the new generation and the old generation. The ratio of is 1:2, which together form the memory area of ​​the heap, so the new generation occupies 1/3 of the heap, and the old generation occupies 2/3, but this ratio can be modified. Let's introduce the new generation and the old generation respectively.

1, [Cenozoic]

The Cenozoic is divided into three areas, an Eden area and two Survivor areas. The ratio between them is (8:1:1). This ratio can also be modified. Under normal circumstances, objects are mainly allocated in the Eden area of ​​the young generation. In a few cases, they may also be directly allocated in the old generation. Each time the Java virtual machine uses Eden in the new generation and one of the Survivor (From), after a Minor GC, the surviving objects in Eden and Survivor are copied to another Survivor space at once (the copy used here) The algorithm performs GC), and finally clears Eden and the Survivor (From) space just used. Set the age of the objects that survived in the Survivor space at this time to 1. Every time these objects survive a GC in the Survivor area, their age is increased by 1. When the age of the object reaches a certain age (the default value is 15), Will move them to the old age.

When performing GC in the new generation, it is possible that another piece of Survivor space does not have enough space to store the surviving objects collected in the last new generation. These objects will directly enter the old generation through the allocation guarantee mechanism;

to sum up:

1. Minor GC is the garbage collection that occurs in the new generation, and the replication algorithm adopted;

2. The space used in the new generation does not exceed 90% each time, which is mainly used to store new objects;

3. The Eden area and a Survivor area are emptied after each collection of the Minor GC;

 

2. [Old age]

In the old age, objects with a long life cycle are stored. For some larger objects (that is, a larger contiguous memory space needs to be allocated), they are directly stored in the old age, and many are stored in the Survivor area of ​​the new generation. The object of coming.

In the old age, Full GC was used, and Full GC used a mark-sweep algorithm. The Full GC in the old age is not as frequent as the Minor GC operation, and the time required to perform a Full GC is longer than that of the Minor GC.

to sum up:

1. Using Full GC in the old age, the mark-sweep algorithm adopted

 

2.4.5 Method area

Method Area (Method Area) is a runtime memory area shared by all threads. Used to store the structural information of the class that has been loaded by the Java virtual machine, including:
runtime constant pool, field and method information, static variables and other data. The method area is a logical part of the Java heap. It does not need to be physically continuous, and you can choose not to implement garbage collection in the method area. The method area is not equivalent to the permanent generation, just because the HotSpot VM uses the permanent generation to implement the method area. For other Java virtual machines, such as J9 and JRockit, there is no permanent generation concept.
An exception is defined in the Java Virtual Machine Specification:

  • If the memory space of the method area does not meet the memory allocation requirements, the Java virtual machine throws an OutOfMemoryError exception.

Runtime constant pool
runtime constant pool (Runtime Constant Pool) is part of the zone method. In the section 2.1 Class file format, we learned that the Class file not only contains information such as the version, interface, field, and method of the class, but also contains the constant pool, which is used to store the literal and symbolic references generated during compilation. These contents will be stored in the runtime constant pool in the method area after the class is loaded. The runtime constant pool can be understood as the runtime manifestation of the constant pool of a class or interface.
An exception is defined in the Java virtual machine specification:
when creating a class or interface, if the memory required to construct the runtime constant pool exceeds the maximum value that the method area can provide, the Java virtual machine throws an OutOfMemoryError exception.

 

Direct Memory

 

Direct memory is not part of the data area of ​​the virtual machine runtime, nor is it the memory area defined in the Java virtual machine specification. It is directly allocated from the operating system, so it is not limited by the Java heap size, but will be affected by the total memory of the machine The size and addressing space of the processor are limited, so it may also cause OutOfMemoryError to appear. The NIO mechanism is newly introduced in JDK1.4. It is a new I/O method based on channels and buffers. It can allocate direct memory directly from the operating system, that is, allocate memory outside the heap, which can be used in some scenarios. Improve performance because it avoids copying data back and forth between the Java heap and the Native heap. For the detailed use of NIO, please refer to the related articles about NIO in my Java network programming series .

Guess you like

Origin blog.csdn.net/xfb1989/article/details/110047271