Summary of JVM core knowledge

Introduction to JVM

Java virtual machine (English: Java Virtual Machine, abbreviated as JVM ), a virtual machine that can execute Java bytecode. The JVM shields the information related to the specific operating system platform, so that the Java program only needs to generate the object code (byte code) that runs on the Java virtual machine, and it can run on various platforms without modification. Through the implementation of software executed by the central processing unit (CPU), the compiled Java program code (Applet) and application program can be executed.

The following is an architectural diagram of the JVM:

architecture.jpeg

The JVM is divided into three main subsystems:

  • Class Loader (ClassLoader)
  • Runtime data area (memory area)
  • Execution Engine

Class Loader (ClassLoader):

The Java class loader (Class Loader) is a part of the Java virtual machine (JVM), responsible for loading class files into memory and generating corresponding Java class objects. The class loader plays an important role in the running process of a Java application. It is responsible for dynamically loading class files, parsing the bytecode of the class, and converting it into an executable Java class. It has three main stages:

  • Loading: Loading is a stage for loading class files. Basically, loading is divided into three parts:

    • Bootstrap class loader
    • Application class loader
    • Extension class loader

    The above class loaders follow the parental delegation model when loading class files .

  • Linking: The linking process associates loaded classes with other classes in the JVM. Linking is divided into three phases:

    • Verification: Verify that the bytecode of the class complies with the Java Virtual Machine Specification and ensure the security of the class.
    • Preparation: Allocate memory space for static variables of the class and initialize default values.
    • Resolution: Convert symbolic references to direct references, that is, resolve references to classes, methods, and fields into actual memory addresses.
  • Initialization (Initialization): The initialization stage performs actual initialization operations on the class, including the execution of static code blocks and static variable assignments. During this phase, the constructor of the class is also called.

runtime data area

The runtime data area refers to the different areas used by the JVM to store and manage data during the running of a Java application. Each of these areas performs its duties to ensure the correct execution of Java programs. The JVM runtime data area is mainly divided into five parts: program counter ( Program Counter Register), virtual machine stack ( VM Stack), local method stack ( Native Method Stack), heap ( Heap), and method area ( Method Area). When the JVM is running, the data area dynamically allocates and releases memory when the program is running, and the memory management is automatically completed by the JVM. Different data areas have different memory management mechanisms and garbage collection algorithms to ensure the efficiency and stability of program operation.

Among them, the program counter, virtual machine stack, and local method stack belong to the thread private area, and are established and destroyed following the start and end of the thread. The heap and method area are thread-shared areas that exist following the startup of the virtual machine process.

jvm runtime memory.jpg

The program counter ( Program Counter Register)  is a small memory space that indicates the address of the JVM bytecode instruction that the current thread is executing.

The virtual machine stack ( JVM Stack)  stores some basic types of variables (such as int, long) and object references. The memory model of Java method execution is based on the stack frame (Stack Frame). Each method will create a stack frame when it is executed. The stack frame stores local variable tables, operand stacks, dynamic links, method exits, etc. information.

The native method stack ( Native Method Stack)  is similar to the virtual machine stack, which mainly serves the Native methods used by the JVM.

The heap area ( Heap)  is the largest memory space managed by the JVM, and is mainly used to store Java object instances shared by all threads. This is also the main area of ​​activity of the garbage collector.

The method area ( Method Area)  is used to store loaded class information, constants, static variables and other data. This area is shared by threads.

execution engine

The execution engine executes ".class" (bytecode) files. It reads the bytecode line by line and executes instructions using the data and information in the various memory regions. The execution engine can be divided into three parts:

  • Interpreter ( Interpreter): It interprets the bytecode line by line and executes it. The disadvantage is that when a method is called multiple times, it needs to be interpreted and executed each time.

  • Just-in-time compiler ( Just-In-Time Compiler,JIT): Used to improve the efficiency of the interpreter. It compiles the entire bytecode into native code, so when the interpreter encounters repeated method calls, JIT will directly provide the corresponding native code without re-interpretation and execution, thereby improving efficiency.

  • Garbage Collector ( Garbage Collector): It destroys unreferenced objects. For more information on the garbage collector, see Garbage Collector.

Java native interface ( Java Native Interface,JNI):

It is an interface to interact with native method libraries, providing native libraries (C, C++) required for execution. It enables the JVM to call, and be called by, C/C++ libraries, which may be hardware-specific.

Native method library ( Native Method Libraries):

It is a collection of a set of native libraries (C, C++) required by the execution engine.

Detailed Explanation of Runtime Data Area

program counter

The program counter ( Program Counter Register) is a small memory space that can be seen as an indicator of the line number of the bytecode being executed by the current thread. When the bytecode interpreter works, it selects the next bytecode instruction to be executed by changing the value of this counter. Basic functions such as branching, looping, jumping, exception handling, and thread recovery all rely on this counter to complete. Since the multi-threading of the Java virtual machine is realized by switching threads in turn and allocating processor execution time, at any given moment, a processor (a core for a multi-core processor) will only execute one instructions in the thread. Therefore, in order to return to the correct execution position after thread switching, each thread needs to have an independent program counter. The counters between each thread do not affect each other and are stored independently. We call this type of memory area "thread private". of memory.

If the thread is executing a Java method, this counter records the address of the virtual machine bytecode instruction being executed; if the thread is executing a Native method, the counter value is empty ( ) Undefined. This area of ​​memory is the only one that does not specify anything in the Java Virtual Machine Specification OutOfMemoryError.

Java virtual machine stack

Like the program counter, the Java virtual machine stack ( Java Virtual Machine Stacks) is also private to the thread, and its life cycle is the same as the thread. The virtual machine stack describes the memory model of Java method execution : each method will create a stack frame ( ) during execution Stack Frameto store information such as local variable table, operand stack, dynamic link, method exit, etc. The process of each method from invocation to execution completion corresponds to the process of a stack frame being pushed into the virtual machine stack to popped out of the stack.

Virtual machine stack(1).jpg

The local variable table mainly stores various basic data types (boolean, byte, char, short, int, float, long, double) and object reference (reference type, which is not equivalent to a pointer, it may be a pointing object The reference pointer of the starting address, which may also point to a handle representing an object or other locations related to this object) and returnAddress type (pointing to the address of a bytecode instruction).

The operand stack is a temporary storage area used when executing bytecode instructions. For example, when performing arithmetic operations, the operand stack is used to store operands and receive results.

native method stack

The functions played by the native method stack (Native Method Stack) and the virtual machine stack are very similar. The difference between them is that the virtual machine stack serves for the virtual machine to execute Java methods (that is, bytecodes), while the native method stack It serves the Native method used by the virtual machine.

  • Interacting outside of the Java environment: Sometimes a Java application needs to interact with an environment outside of Java, which is why native methods exist.
  • Interact with the operating system: JVM supports the Java language itself and the runtime library, but sometimes still needs to rely on the support of some underlying systems. Through the local method, we can use Java to interact with the underlying system that implements jre, and some parts of the JVM are written in C language.
  • Sun's Java: Sun's interpreter is implemented in C, which allows it to interact with the outside like some ordinary C. Most of the jre is implemented in Java, and it also interacts with the outside world through some native methods. For example, a method java.lang.Thread of  a class setPriority() is implemented in Java, but it calls a local method of the class  setPrioruty(), which is implemented in C and embedded in the JVM.

In the virtual machine specification, there are no mandatory regulations on the language, usage and data structure used by the methods in the local method stack, so specific virtual machines can freely implement it. Even some virtual machines (such as the Sun HotSpot virtual machine) directly combine the local method stack and the virtual machine stack into one. Like the virtual machine stack, the native method stack area also throws StackOverflowError and OutOfMemoryError exceptions.

Java heap

For most applications, the Java heap (Java Heap) is the largest piece of memory managed by the Java virtual machine. The Java heap is a memory area shared by all threads and created when the virtual machine starts. The sole purpose of this memory area is to store object instances, and almost all object instances allocate memory here. This point is described in the Java Virtual Machine Specification: All object instances and arrays must be allocated on the heap, but with the development of JIT compilers and the gradual maturity of escape analysis techniques, stack allocation and scalar replacement optimization techniques will As a result, some subtle changes have occurred, and all objects are allocated on the heap and gradually become less "absolute".

The Java heap is the main area managed by the garbage collector, so it is often called the "GC heap" ( Garbage Collected Heap). In order to perform efficient garbage collection, the virtual machine logically divides the heap memory into three areas (the only reason for generation is to optimize GC performance):

  • Cenozoic generation (young generation): new objects and objects that have not reached a certain age are in the new generation
  • Old generation (retirement area): For objects that have been used for a long time, the memory space of the old generation should be larger than that of the young generation
  • Metaspace (called permanent generation before JDK1.8): Like the operation of temporary objects in some methods, JVM memory was occupied before JDK1.8, and physical memory was directly used after JDK1.8

JVM memory model.png

Young Generation

The young generation is where all new objects are created. When the young generation is populated, garbage collection is performed. This garbage collection is called Minor GC . The young generation is divided into three parts - Eden Memory and two survivor areas ( Survivor Memory , called from/to or s0/s1), the default ratio is 8:1:1

Old Generation

The old generation memory contains objects that survive many rounds of mini-GCs. Usually, garbage collection is performed when the old generation is full. Old generation garbage collection is called major GC and usually takes longer.

Large objects go directly to the old age (large objects refer to objects that require a large amount of continuous memory space). The purpose of this is to avoid a large number of memory copies between the Eden area and the two Survivor areas.

method area

Regardless of whether it is the permanent generation before JDK8 or the metaspace after JDK8, it can be regarded as the implementation of the method area in the Java Virtual Machine Specification.

Although the Java virtual machine specification describes the method area as a logical part of the heap, it has an alias called Non-Heap (non-heap), which should be distinguished from the Java heap.

runtime constant pool

The Runtime Constant Pool is part of the method area. In addition to the class version, fields, methods, interfaces and other description information in the Class file, there is also a constant pool (Constant Pool Table), which is used to store various literals and symbol references generated during compilation. This part of the content It will be stored in the runtime constant pool in the method area after the class is loaded.

In the JVM, the runtime constant pool is thread-safe. Each thread has its own thread stack, which contains local variable tables, and the objects referenced in these local variable tables are located in the heap. When a thread needs to refer to a constant in the runtime constant pool, the JVM will first copy the constant value from the runtime constant pool to the local variable table of the thread stack, and then make a reference.

Why remove the PermGen

The HotSpot team chose to remove the persistent generation. There are two parts: internal and external. In terms of external factors, let’s take a look at the Motivation section of JEP 122 :

This is part of the JRockit and Hotspot convergence effort. JRockit customers do not need to configure the permanent generation (since JRockit does not have a permanent generation) and are accustomed to not configuring the permanent generation.

Roughly speaking, the removal of the persistent generation is also an effort to integrate with JRockit. JRockit users do not need to configure the persistent generation (because JRockit does not have a persistent generation).

From the perspective of internal factors, the size of the persistent generation is limited by the two parameters -XX:PermSize and -XX:MaxPermSize, and these two parameters are limited by the memory size set by the JVM, which may cause the permanent generation to appear during use . The problem of memory overflow , so in Java 8 and later versions, the permanent generation is completely removed and Metaspace is used instead.

direct memory

Direct memory (Direct Memory) is not part of the virtual machine runtime data area, nor is it a memory area defined in the Java Virtual Machine Specification. But this part of memory is also frequently used, and it may also cause an OutOfMemoryError exception.

direct memory

A channel-based and buffered IO method is introduced in NIO. It can directly allocate memory outside the Java virtual machine by calling the local method, and then directly manipulate the memory through an object stored in the heap DirectByteBuffer, without first copying the data in the external memory to the heap for operation, thereby improving Efficiency of data manipulation.

The size of the direct memory is not controlled by the Java virtual machine, but since it is memory, an OutOfMemoryError exception will be thrown when the memory is insufficient.

The relationship between stack, heap and method area

epub_603120_75.jpeg

escape analysis

Escape analysis ( Escape Analysis) is currently a cutting-edge optimization technology in the Java virtual machine . This is a cross-function global data flow analysis algorithm that can effectively reduce the synchronization load and memory heap allocation pressure in Java programs. Through escape analysis, the Java Hotspot compiler can analyze the usage scope of a new object reference and decide whether to allocate this object on the heap.

  • When an object is defined in a method, it may be referenced by an external method, such as passed to other places as a call parameter, called 方法逃逸.

  • Another example is assigning to a class variable or an instance variable that can be accessed in other threads, called线程逃逸

  • Using escape analysis, the compiler can optimize the code as follows:

    • Synchronization omission: If an object is found to be accessible only from one thread, then operations on this object may not be considered synchronous.
    • Convert heap allocations to stack allocations: If an object is allocated in a subroutine, the object may be a candidate for stack allocation rather than heap allocation if a pointer to the object never escapes.
    • Separation of objects or scalar replacement: Some objects may not need to exist as a continuous memory structure and can be accessed, then part (or all) of the object may not be stored in memory, but stored in CPU registers.
 
 

Java

copy code

public static StringBuffer createStringBuffer(String s1, String s2) { StringBuffer s = new StringBuffer(); s.append(s1); s.append(s2); return s; }

s is an internal variable of a method. The above code directly returns s. This StringBuffer object may be changed by other methods, so its scope is not only inside the method, even if it is a local variable, it still escapes. To the outside of the method, called 方法逃逸.

It may also be accessed by external threads, such as assigning to class variables or instance variables that can be accessed in other threads, called 线程逃逸.

  • During compilation, if JIT finds that some objects do not have escape methods through escape analysis, then it is possible that heap memory allocation will be optimized into stack memory allocation.
  • jvm parameter setting, -XX:+DoEscapeAnalysis : enable escape analysis, -XX:-DoEscapeAnalysis : disable escape analysis
  • Escape analysis has been started by default since jdk 1.7.

TLAB

  • The full name of TLAB is Thread Local Allocation Buffer, that is, the thread local allocation buffer area, which belongs to the Eden area. This is a thread-specific memory allocation area, which is private to the thread and enabled by default (of course it is not absolute, it also depends on which type of virtual machine)
  • The heap is shared globally. At the same time, there may be multiple threads applying for space on the heap, but each object allocation needs to be performed synchronously (the virtual machine uses CAS with failed retries to ensure the atomicity of the update operation. ) but the efficiency drops a bit
  • Therefore, TLAB is used to avoid multi-thread conflicts. When allocating memory to objects, each thread uses its own TLAB, which can synchronize threads and improve the efficiency of object allocation.
  • Of course, not all objects can successfully allocate memory in TLAB. If it fails, a locking mechanism will be used to maintain the atomicity of the operation.
  • -XX:+UseTLAB Use TLAB, -XX:+TLABSize set TLAB size

object allocation process

Through escape analysis and TLAB, let's look at the object allocation process:

Object allocation process (1).jpg

  • Determine whether the object is allocated on the stack based on escape analysis
    • If allocated on the stack, use the scalar replacement method to allocate the object in VM Stack. If the thread is destroyed or the method call ends, it will be automatically destroyed without GCintervention.
  • Determine whether it is a large object
    • If yes, allocate directly to  Old Generation the old generation on the heap. If the object becomes garbage, GCit is collected by the old generation.
  • Determine whether  TLABto allocate in
    • If yes,  TLABallocate the upper heap Edenarea in .
    • Otherwise TLAB, Edenthe region is allocated on the outer heap.
  • EdenIf the space is full when creating a new object, it will be triggered Minor GC, and the object that is no longer referenced by other objects in Eden will be destroyed, and then the new object will be loaded into the Eden area. Note that it Survivorwill not be triggered when the area is full Minor GC. , but the Eden space is filled, Minor GCand then Survivorthe area is cleaned up by the way
  • Move remaining objects in Eden to Survivor0area
  • Trigger garbage collection again. At this time Survivor, the ones that were downloaded last time will be placed in Survivor0the area. If they are not recycled, they will be placed in Survivor1the area
  • Go through garbage collection again, and put the survivors back into Survivor0the zone, and so on
  • The default is 15 cycles, if more than 15 times, the survivors will be transferred to the elderly area jvm parameter setting times: -XX:MaxTenuringThreshold=Nset

Guess you like

Origin blog.csdn.net/wdj_yyds/article/details/131961935