JAVA basic JVM explanation

JVM architecture

insert image description here

insert image description here 

It mainly includes two subsystems and two components:

Class loader (class loader) subsystem (used to load .class files);

Execution engine (execution engine) subsystem (execute bytecode, or execute native method);

Runtime data area (runtime data area) components (method area, heap, java stack, PC register, local method stack);

Native interface (local interface) components.

  • Class loader subsystem: load the content of the class file into the method area (method area) in the Runtime data area according to the given fully qualified class name (such as java.lang.Object).

  • Execution engine subsystem: execute the instructions in classes. The bytecode of a method is composed of the sequence of instructions of the Java virtual machine. Each instruction consists of a single-byte opcode followed by zero or more operands. When the execution engine executes the bytecode, it first obtains an opcode, and if the opcode has an operand, obtains its operand. It performs the action specified by the opcode and the following operands before fetching the next opcode. This process of executing bytecode will continue until the thread completes. The core of any JVM implementation is the Execution engine. In other words: the quality of Sun's JDK and IBM's JDK mainly depends on the quality of their respective implementations of the Execution engine.

  • Native interface component: interacts with native libraries, and is an interface for interacting with other programming languages. Most of the methods declared as native in Java can be found in jdk/src/<platform>/native. It can be share, that is, platform-neutral code; it can also be a specific platform. The structure in this native directory is organized by package name just like the Java source code structure. However, it needs to be reminded that these native methods are not "JVM", but "class library", not in the JVM.

  • Runtime data area component: The virtual machine defines several types of runtime data areas used by programs running, some of which are created with the start of the virtual machine and destroyed with the exit of the virtual machine, such as the heap and the method area. The second is one-to-one correspondence with threads, created and destroyed with the start and end of threads, such as stacks and registers.

JVM memory module

The Java memory model is the specification of the Java language for reading and writing shared variables (actually memory operations corresponding to shared variables) under the condition of multi-threaded concurrency, mainly to solve the problems of multi-thread visibility and atomicity, and solve the multi-threading of shared variables operation conflict problem

JMM (java memory model) Java Memory Model itself is an abstract concept, not real. It describes a set of rules or specifications, through which each variable in the program is defined (including instance fields, static fields and The access mode for the elements that make up the array object).

JMM defines the abstract relationship between threads and main memory: shared variables between threads are stored in main memory (Main Memory), each thread has a private local memory (Local Memory), and the local memory stores the Threads read/write copies of shared variables.

Three major characteristics of the JMM memory model
1. Atomicity
Use synchronized mutex locks to ensure the atomicity of operations
2. Visibility:
volatile, which will force the state of the variable itself and other variables at that time to be flushed out of the cache.
synchronized, before executing the unlock operation on a variable, the variable value must be synchronized back to the main memory.
final, once the field modified by the final keyword is initialized in the constructor, and there is no this escape (other threads access the half-initialized object through this reference), then other threads can see the value of the final field.
3. Order
source code -> compiler optimized rearrangement -> instruction parallel rearrangement -> memory system rearrangement -> final executed command.
The reordering process will not affect the execution of single-threaded programs, but it will affect the correctness of multi-threaded concurrent execution.
The processor must consider the data dependency when performing rearrangement. In a multi-threaded environment, threads are executed alternately. Due to the existence of compiler optimization rearrangement, it is uncertain whether the variables used in the two threads can guarantee consistency.

JMM defines an abstract relationship between threads and main memory:

The shared variables between threads are stored in the main memory (Main Memory) and
each thread has a private local memory (Local Memory). Local memory is an abstract concept of JMM and does not really exist. It covers cache, write Buffers, registers, and other hardware and compiler optimizations. A copy of the thread's read/write shared variables is stored in local memory.
From a lower level, the main memory is the hardware memory, and in order to obtain better operating speed, the virtual machine and hardware system may allow the working memory to be stored in registers and caches first.
The working memory of a thread in the Java memory model is an abstract description of the CPU's registers and caches. The JVM's static internal storage model (JVM memory model) is just a physical division of memory. It is only limited to memory, and it is only limited to JVM memory.

Heap allocation:

 

JVM runtime data area

1 Method area MethodArea
The method area (MethodArea), like the Java heap, is a memory area shared by each thread. It is used to store data such as type information, constants, static variables, and code caches compiled by the instant compiler that have been loaded by the virtual machine. . Although the "Java Virtual Machine Specification" describes the method area as a logical part of the heap, it has an alias called "Non-Heap" to distinguish it from the Java heap.

runtime constant pool

The runtime constant pool (RuntimeConstantPool) is part of the method area. In addition to the class version, fields, methods, interfaces and other description information in the Class file, there is also a constant pool table (ConstantPoolTable), which is used to store various literals and symbol references generated during compilation. This part will be After the class is loaded, it is stored in the runtime constant pool in the method area.

The Java virtual machine has strict regulations on the format of each part of the Class file (including the constant pool naturally). For example, each byte is used to store which data must meet the requirements of the specification before it can be recognized, loaded and executed by the virtual machine. , but for the runtime constant pool, the "Java Virtual Machine Specification" does not make any detailed requirements. Virtual machines implemented by different providers can implement this memory area according to their own needs, but generally speaking, in addition to saving the Class file In addition to the symbolic references described, the direct references translated from the symbolic references are also stored in the runtime constant pool [1].

Another important feature of the runtime constant pool compared to the class file constant pool is that it is dynamic. The Java language does not require constants to be generated only at compile time, that is, the contents of the constant pool that are not preset in the class file can be entered. The runtime constant pool in the method area can also put new constants into the pool during runtime. This feature is used more by developers than the intern() method of the String class.

Since the runtime constant pool is part of the method area, it is naturally limited by the memory in the method area. When the constant pool can no longer apply for memory, an OutOfMemoryError exception will be thrown.

2 Java heap Heap
For Java applications, the Java heap (JavaHeap) is the largest piece of memory managed by the virtual machine. The Java heap is a memory area shared by all threads and created when the virtual machine starts. The sole purpose of this memory area is to store object instances, and "almost" all object instances in the Java world allocate memory here. The description of the Java heap in the "Java Virtual Machine Specification" is: "All object instances and arrays should be allocated on the heap", and the "almost" written by the author here means that from an implementation point of view, as the Java language Now we can see some signs that support for value types may appear in the future. Even if we only consider the present, due to the advancement of just-in-time compilation technology, especially the increasingly powerful escape analysis technology, on-stack allocation and scalar replacement optimization methods have led to Some subtle changes have happened quietly, so it is not so absolute to say that Java object instances are all allocated on the heap.

The Java heap is a memory area managed by the garbage collector, so it is also called "GC heap" in some materials. From the perspective of reclaiming memory, since most modern garbage collectors are designed based on generational collection theory, "new generation", "old generation", "permanent generation", "Eden space" and "FromSurvivor space" often appear in the Java heap. "" "ToSurvivor space" and other terms, these area divisions are only the common characteristics or design styles of some garbage collectors, not the inherent memory layout of a specific implementation of a Java virtual machine, let alone the "Java Virtual Machine Specification" Further detailed division of the Java heap. From the perspective of memory allocation, multiple thread-private allocation buffers (ThreadLocalAllocationBuffer, TLAB) can be divided into the Java heap shared by all threads to improve the efficiency of object allocation. However, no matter from what point of view, no matter how divided, it will not change the commonality of the storage content in the Java heap. No matter which area, it can only store object instances. The purpose of subdividing the Java heap is only for better recycling. memory, or allocate memory faster.

According to the "Java Virtual Machine Specification", the Java heap can be in a physically discontinuous memory space, but logically it should be regarded as continuous, just like we use disk space to store files, not Each file is required to be stored contiguously. But for large objects (typically such as array objects), most virtual machine implementations are likely to require continuous memory space for the sake of simple implementation and efficient storage.

The Java heap can be implemented as either a fixed size or an expandable one, but the current mainstream Java virtual machines are all implemented according to the expandable one (set by the parameters -Xmx and -Xms). If there is no memory in the Java heap to complete the instance allocation, and the heap can no longer be expanded, the Java virtual machine will throw an OutOfMemoryError exception.

3 Virtual machine stack
The memory space required by each thread to run becomes a virtual machine stack;
the thread is private, and its life cycle is consistent with the thread;
each stack is composed of multiple stack frames, corresponding to the memory occupied by each method call;
Each thread can have only one active stack frame, corresponding to the currently executing method.
Describes the memory model of Java method execution: each method will create a stack frame (Stack Frame) to store local variable table, operand stack, dynamic link, method exit and other information when it is executed. From the call to the end of execution, each method corresponds to the process of a stack frame being pushed from the virtual machine stack to popped from the stack.

StackOverflowError: The stack depth requested by the thread is greater than the depth allowed by the virtual machine.

OutOfMemoryError: If the virtual machine stack can be expanded dynamically, but cannot apply for enough memory when expanding.

 

4 Program counter
The program counter (ProgramCounterRegister) is a small memory space, which can be regarded as the line number indicator of the bytecode executed by the current thread, pointing to the next instruction code to be executed, and the execution engine reads the next instruction code instruction. More precisely, the execution of a thread is to change the value of the counter of the current thread through the bytecode interpreter to obtain the next bytecode instruction that needs to be executed, so as to ensure the correct execution of the thread.

Holds the address of the currently executing instruction, once the program is executed, the program counter will be updated to the next instruction.

In order to ensure that the correct execution position can be restored after thread switching (context switching), each thread has an independent program counter, and the counters of each thread do not affect each other and are stored independently . In other words, the program counter is thread-private memory.

5 Native method stack
Native method stacks (NativeMethodStacks) are very similar to virtual machine stacks . Serves the local (Native) method used by the virtual machine .

The "Java Virtual Machine Specification" does not have any mandatory regulations on the language, usage and data structure of the methods in the local method stack.

Therefore, the specific virtual machine can freely implement it according to the needs, and even some Java virtual machines (such as the Hot-Spot virtual machine) directly combine the local method stack and the virtual machine stack into one. Like the virtual machine stack, the local method stack will also throw StackOverflowError and OutOfMemoryError exceptions when the stack depth overflows or the stack expansion fails.

6 Direct memory
Direct memory (DirectMemory) is not part of the virtual machine runtime data area, nor is it the memory area defined in the "Java Virtual Machine Specification". But this part of memory is also frequently used, and it may also cause an OutOfMemoryError exception, so we put it here to explain it together.

In JDK1.4, the NIO (NewInput/Output) class is newly added, and an I/O method based on Channel (Channel) and Buffer (Buffer) is introduced . It can use Native function library to directly allocate off-heap memory, and then Operate through a DirectByteBuffer object stored in the Java heap as a reference to this memory. This can significantly improve performance in some scenarios, because it avoids copying data back and forth between the Java heap and the Native heap .

Obviously, the direct memory allocation of the machine will not be limited by the size of the Java heap, but since it is memory, it will definitely be limited by the total memory of the machine (including physical memory, SWAP partition or paging file) and the addressing space of the processor. Generally, when server administrators configure virtual machine parameters, they will set -Xmx and other parameter information according to the actual memory, but often ignore the direct memory, making the sum of each memory area greater than the physical memory limit (including physical and operating system level) limit), resulting in an OutOfMemoryError exception during dynamic expansion.

4. Execution engine
Although not all Java virtual machines adopt the running architecture in which interpreter and compiler coexist, the current mainstream commercial Java virtual machines, such as HotSpot, OpenJ9, etc., contain both interpreter and compiler internally.

Two (or three) just-in-time compilers are built into the HotSpot virtual machine, two of which have existed for a long time, and are called "Client Compiler" and "Server Compiler" respectively. , or C1 compiler and C2 compiler for short (C2 is also called Opto compiler in some materials and JDK source code), the third is the Graal compiler that only appeared in JDK10, and the long-term goal is to replace C2.

garbage collection (GC)

Manual garbage collection is not possible, only manual reminders, waiting for the JVM to automatically reclaim the
GC area When the JVM performs GC in the heap (Heap) and method area
, these three areas (new area, survivor area, and old area) are not unified Recycling, recovery is the new generation.
Light GC (ordinary GC) is only for the new generation area, and occasionally affects the survival area (when the new generation area is full). Heavy
GC (global GC) releases memory globally
. 14. Common garbage collection algorithms
1. References The principle of the counting algorithm
is that the object has a reference, that is, a count is increased, and a reference is deleted, and a count is decreased. During garbage collection, only objects with a collection count of 0 are used. The most fatal problem of this algorithm is that it cannot handle circular references.

2. Copy algorithm
This algorithm divides the memory space into two equal areas, and only uses one of the areas at a time. During garbage collection, traverse the currently used area, copy the objects in use to another area, and recycle unused objects at the same time. This algorithm only processes objects in use each time, so the cost of copying is relatively small, and at the same time, corresponding memory organization can be performed after copying.
Advantages: there will be no fragmentation problems
Disadvantages: twice the memory space is required, wasting

3. Mark-clear algorithm
This algorithm is executed in two stages. The first stage starts from the reference root node to mark the surviving objects used, and the second stage traverses the entire heap to clear unmarked objects.
Advantages: No memory space wasted
Disadvantages: This algorithm needs to suspend the entire application, and at the same time, memory fragmentation will occur

4. Mark-compression algorithm
This algorithm combines the advantages of the "mark-clear" and "copy" algorithms. It is also divided into two phases. The first phase marks all surviving objects starting from the root node. The second phase traverses the entire heap, clears unmarked objects and "compresses" the surviving objects into one of the heaps, and discharges them in order.
This algorithm avoids the fragmentation problem of "mark-sweep" and also avoids the space problem of "copy" algorithm.

Generational recycling strategy

1. Most newly created objects will be stored in the Eden area
2. When the Eden area is full for the first time, MinorGC (light GC) will be triggered. First, the garbage objects in the Eden area are reclaimed and cleared, and the surviving objects are copied to S0, and S1 is empty at this time.
3. The next time the Eden area is full, perform garbage collection again. This time, all garbage objects in the Eden and S0 areas will be cleared, and the surviving objects will be copied to S1, and S0 will become empty at this time.
4. After switching between S0 and S1 several times (default 15 times), the surviving objects will be transferred to the old generation.
5. FullGC (full GC) will be triggered when the old generation is full. The algorithm used by
MinorGC is a copy algorithm . It will be triggered when the young generation heap space is tight. Compared with full collection, the collection interval is shorter. The algorithm used by FullGC is generally mark compression Algorithm When the old generation heap space is full, a full collection operation will be triggered. You can use the System.gc() method to explicitly start the full collection. The full collection is very time-consuming







Write picture description here

Guess you like

Origin blog.csdn.net/hongyucai/article/details/130970444