[Java] JVM execution process, class loading process and garbage collection mechanism

JVM is the Java virtual machine, and Java programs run in the JVM.

JVM execution process

Before the program is executed, the java source code must be converted into a bytecode (class file). The JVM first needs to load the bytecode into the runtime data area in the memory through a certain method (ClassLoader) (Runtime Data Area), and the bytecode file is a set of instruction set specifications of the JVM, which cannot be directly handed over to the underlying operating system for execution, so a specific command parser execution engine (Execution Engine) is required to translate the bytecode into The underlying system instructions are handed over to the CPU for execution, and in this process, interfaces of other languages ​​(Native Interface) need to be called to realize the functions of the entire program.
insert image description here

execution engine

Convert Java bytecode into CPU instructions.

native method interface

Call APIs of different systems to achieve different functions.

runtime data area

method area

The method area stores class objects, which can be understood as templates. In the "Java Virtual Machine Specification", this area is called "method area", and in the implementation of HotSpot virtual machine, this area is called PermGen (PermGen) in JDK 7, and Metaspace (Metaspace) in JDK 8 . The runtime constant pool is part of the method area, storing literals and symbol references.

Changes in JDK 1.8 metaspace
1. For HotSpot, which is the most widely used now, the memory of JDK 8 metaspace belongs to local memory, so the size of metaspace is no longer affected by the parameters of JVM maximum memory, but is related to local memory related to the size.
2. In JDK 8, the string constant pool was moved to the heap.

heap

Stored in the heap are new specific objects. The memory between the heap area and the method area is shared: multiple threads can go to new objects, so the template of the object must be obtained from the method area; the objects created by each thread will be placed in the heap.

Virtual machine stack (thread private)

The stack mainly records the calling relationship of the method and possible stack overflow errors. Each thread has a corresponding Java virtual machine stack. Every time a method is called, it will be added to the thread's stack in the form of a stack frame. After the method execution is completed, the stack frame will be called out of the stack. At this time, there may be a situation that when the recursive call is made, the depth of the call may be too deep and a stack overflow error may occur.
insert image description here

  1. Local variable table : Stores various basic data types (8 basic data types) and object references known to the compiler. The memory space required by the local variable table is allocated during compilation. When entering a method, how much local variable space this method needs to allocate in the frame is completely determined, and the size of the local variable table will not be changed during execution. Simply put, it stores method parameters and local variables.
  2. Operation stack : Each method generates a first-in, last-out operation stack.
  3. Dynamic Link : A method reference pointing to the runtime constant pool.
  4. Method return address : the address of the PC register.

What is thread privateness?
Since JVM multithreading is implemented by switching threads in turn and allocating processor execution time, at any given moment, a processor (a multi-core processor refers to a core) is only Executes instructions in one thread. Therefore, in order to return to the correct execution position after switching threads, each thread needs an independent program counter, and the counters between each thread do not affect each other and are stored independently. We call this type of area "thread private" memory

Native method stack (thread private)

The working principle is the same as that of the Java virtual machine stack, and it records the call relationship of the local method.

program counter (thread private)

The line (instruction) to which the method of the current thread is executed is recorded. The program counter is a relatively small memory space, which can be regarded as a line number indicator of the bytecode executed by the current thread. If the current thread is executing a Java method, this counter records the address of the virtual machine bytecode instruction being executed; if it is a Native method being executed, the counter value is empty.

heap overflow problem

The Java heap is used to store object instances. Continuously creating objects may cause memory overflow after the number of objects reaches the maximum heap capacity.
Demonstrate heap overflow phenomenon:
set JVM parameters -Xms: set the minimum value of the heap, -Xmx: set the maximum value of the heap.

public class HeapDemo {
    
    
    static class OOMObject {
    
    }

    public static void main(String[] args) {
    
    
        List<OOMObject> list = new ArrayList<>();

        // 不停的为list添加元素
        while (true) {
    
    
            list.add(new OOMObject());

        }
    }
}

insert image description here
When "Java heap space" appears, it clearly tells us that OOM occurs on the heap, and the heap memory is full at this time. At this time, you need to optimize the size of the heap memory (by adjusting the -Xss parameter) to avoid this error.

class loading

The process of class loading

For a class, its life cycle is as follows:

insert image description here

load

Loading is reading the .class file.
1) Get the binary byte stream defining this class by its fully qualified name.
2) Convert the static storage structure represented by this byte stream into the runtime data structure of the method area.
3) Generate a java.lang.Class object representing this class in memory as the access entry for various data of this class in the method area.

connect

verify

The purpose of this stage is to ensure that the information contained in the byte stream of the Class file complies with all the constraint requirements of the "Java Virtual Machine Specification", and to ensure that the information will not endanger the security of the virtual machine itself after being run as code.
insert image description here

Prepare

The preparation stage is the stage of formally allocating memory for the variables defined in the class (that is, static variables, variables modified by static) and setting the initial value of the class variables. For example, there is such a line of code at this time: public static int value = 123; it initializes the int value of value to 0, not 123.

analyze

The parsing phase is the process in which the Java virtual machine replaces the symbol references in the constant pool with direct references, that is, the process of initializing constants.

initialization

In the initialization phase, the Java virtual machine actually starts executing the Java code written in the class, handing over control to the application. The initialization phase is the process of executing the class constructor method.

Parental Delegation Mechanism

If a class loader receives a class loading request, it will not try to load the class by itself first, but delegate the request to the parent class
loader to complete. This is the case for each level of class loader, so all The loading request should be sent to the top-level startup
class loader eventually, and only when the parent loader reports that it cannot complete the loading request (it does not find the required class in its search scope), the child
loader will Try to complete the loading yourself.
insert image description here
1.BootStrap: start class loader: load the core class library of Java in the lib directory of JDK, that is, the $JAVA_HOME/lib directory. Extension class loader. Load classes in the lib/ext directory.
2. ExtClassLoader: Extended class loader, loads classes in the lib/ext directory; 3.
AppClassLoader: Application class loader;
4. Custom loader: Customize the class loader according to your own needs;

garbage collection

public class GCDemo {
    
    
    public static void main(String[] args) {
    
    
        test();
    }

    private static void test() {
    
    
        Student student = new Student();
        System.out.println(student);
    }
}

For the above example, after the test execution is completed, it will not be used again, so such invalid objects will be treated as garbage collection. How to mark this object as garbage?

Judgment algorithm for dead objects

reference counting algorithm

Add a reference counter to the object.
Whenever die".
However, in the mainstream JVM, the reference counting method is not used to manage memory. The main reason is that the reference counting method cannot solve the
circular reference problem of objects.

public class GCDemo01 {
    
    
    public Object instance = null;
    private static int _1MB = 1024 * 1024;
    private byte[] bigSize = new byte[2 * _1MB];
    public static void testGC() {
    
    
        GCDemo01 test1 = new GCDemo01();
        GCDemo01 test2 = new GCDemo01();
        test1.instance = test2;
        test2.instance = test1;
        test1 = null;
        test2 = null;
        // 强制jvm进行垃圾回收
        System.gc();
    }
    public static void main(String[] args) {
    
    
        testGC();
    }
}

For example, in the above code, when test1=null; test=null, then the instance in test1 and test2 can no longer be accessed, so the reference count of the object in the heap cannot be reset to zero at this time, resulting in the failure of garbage collection.

Reachability Analysis Algorithm

Use a series of objects called "GC Roots" as the starting point, start searching downward from these nodes, and the path traveled by the search is called "reference chain". When an object is connected to GC Roots without any reference chain ( When this object is unreachable from GC Roots), it proves that this object is unavailable. Java uses "reachability analysis" to determine whether the object is alive .
insert image description here
In the Java language, objects that can be used as GC Roots include the following types:
1. Objects referenced in the virtual machine stack (local variable table in the stack frame);
2. Objects referenced by class static attributes in the method area;
3. Methods 4. Objects referenced
by JNI (Native method) in the local method stack.

From the above, we can see the function of "reference". In addition to the earliest we used it (reference) to find objects, now we can also use "reference" to judge dead objects. Therefore, in JDK1.2, Java expanded the concept of references and divided references into four types: strong references, soft references, weak references, and phantom references. The strengths of these four citations are in descending order.

1. Strong reference: a reference similar to Student student = new Student() will go through normal GC, and will be recycled when it is judged to be dead; 2.
Soft reference: soft reference is used to describe something that is still useful but not necessary Objects will be recycled when the system memory is insufficient or the threshold is triggered;
3. Weak references: Weak references are also used to describe non-essential objects. Weak references will be recycled every time the new generation GC;
4. Phantom references: only receive a notification when the object is recycled.

garbage collection process

Through the above learning, dead objects can be marked in the heap, and garbage collection can be performed after marking. Let’s take a look at the heap structure first:
insert image description here
HotSpot’s default ratio of the new generation to the old generation is 1:2, and the ratio of the size of the Eden area to the Survivor area in the new generation is 8:1, that is to say, Eden:Survivor From(S0):Survivor To(S1)=8:1:1. All new objects are all in the Eden area. The available memory space of each new generation is 90% of the entire new generation capacity, and the remaining 10% is used to store objects that survive after recycling.

The recovery process is as follows:
1. When the Eden area is full, the first Minor gc will be triggered, and the surviving objects will be copied to the Survivor From area; when the Eden area triggers the Minor gc again, the Eden area and the From area will be scanned. area for garbage collection. The surviving objects after this recycling are directly copied to the To area and the Eden and From areas are cleared.
2. When Minor gc occurs in Eden, the Eden and To areas will be garbage collected and the surviving objects will be copied to the From area and the Eden and To areas will be cleared.
3. Some objects will be copied back and forth between the From and To areas, and exchanged 15 times in this way (by the JVM parameter MaxTen), and finally if they are still alive, they will be placed in the old generation.

Young generation: Generally created objects will enter the new generation;
Old generation: Large objects and objects that have survived garbage collection N times (the default value is 15) will move from the new generation to the old generation.

The GC of the new generation is called Minor GC, and the GC of the old generation is called Full GC or Major GC.

Every time garbage collection is performed, the program will enter the suspended state (STW), STOP THE WORLD. In order to efficiently scan the memory area and shorten the program pause time, there are a series of garbage collection algorithms.

mark-sweep algorithm

The "mark-and-sweep" algorithm is the most basic collection algorithm. The algorithm is divided into two phases of "marking" and "clearing": first mark all the objects that need to be recycled, and recycle all the marked objects uniformly after the marking is completed.
There are two main shortcomings of the "mark-clear" algorithm:
1. Efficiency problem: the efficiency of the two processes of mark and clear is not high
. It may cause that when the program needs to allocate large objects in the future
, it will not be able to find enough contiguous memory and have to trigger another garbage collection in advance.
insert image description here

copy algorithm

The "copy" algorithm is to solve the efficiency problem of "mark-clean". It divides the available memory into two pieces of equal size according to the capacity, and only uses one of them at a time . When this piece of memory needs to be garbage collected, the surviving objects in this area will be copied to another piece, and then the used memory area will be cleaned up at one time. The advantage of this is that the entire half area is reclaimed every time, and there is no need to consider complex situations such as memory fragmentation when allocating memory. You only need to move the top pointer of the heap and allocate in order. This algorithm is used by HotSpot in the S0 and S1 areas .
insert image description here

Mark-Collating Algorithm

The "Mark-Collating Algorithm" is mainly used in the old generation . The marking process is still consistent with the "mark-clear" process, but instead of directly cleaning up recyclable objects, the subsequent steps move all surviving objects to one end, and then directly clean up the memory outside the end boundary.
insert image description here
The disadvantage is that there is an extra step to organize the memory after recycling; the advantage is that there can be a large amount of continuous memory space.

In the new generation, a large number of objects die every time garbage collection, and only a small number survive, so the copy algorithm is used; while in the old generation, the object survival rate is high, and there is no extra space to allocate it, so "mark-organization" must be used algorithm.

garbage collector

The garbage collection algorithm is the methodology of memory recovery, and the garbage collector is the specific implementation of memory recovery. The role of the garbage collector: The garbage collector is a technology to ensure the normal and long-lasting operation of the program. It clears the dead objects that are not used in the program, that is, garbage objects, so as to ensure that new objects can normally apply to the memory space . The continuous update of the garbage collector is to reduce STW .
insert image description here

Serial

The Serial collector is the most basic and oldest serial GC collector. It is a single-threaded collector, but its "single-threaded" meaning does not only mean that it will only use one CPU or one collection thread to complete the garbage collection work, and more importantly, when it collects garbage, it must Suspends all other worker threads until it finishes collecting.
insert image description here

ParNew

ParNew is a parallel GC optimized for Serial . Scan memory in a multi-threaded manner to improve garbage collection efficiency and reduce STW time.
insert image description here

Parallel Scavenge

The Parallel Scavenge collector is a new generation collector, it is also a collector using the copy algorithm, and it is also a parallel multi-threaded collector.
The difference from the previous one is that it adopts the GC adaptive adjustment strategy:

Parallel Scavenge collector has a parameter - XX:+UseAdaptiveSizePolicy . When this parameter is turned on, there is no need to manually specify detailed parameters such as the size of the new generation, the ratio of Eden and Survivor areas, and the age of objects promoted to the old generation. The virtual machine collects performance monitoring information according to the current system operation and dynamically adjusts these parameters. To provide the most suitable pause time or maximum throughput.

Serial Old

Serial Old is an old-age version of the Serial collector , which is also a single-threaded collector that uses a mark-sort algorithm.
insert image description here

Parallel Old

Parallel Old is an old generation version of the Parallel Scavenge collector , using multi-threading and the "mark-sort" algorithm.

CMS

CMS is an old generation concurrent GC . Unlike previous methods, it uses a three-color labeling algorithm . Its operation process is more complicated than the previous collectors. The whole process is divided into 4 steps: initial mark (CMS initial mark), concurrent mark (CMS concurrent mark), re-mark (CMS remark), concurrent clearing (CMS concurrent sweep).
insert image description here

G1

The G1 (Garbage First) garbage collector is used when the heap memory is large. The division of the memory area is no longer the same as the previous new generation and old generation. Instead, the heap is divided into many, many region blocks, and then Garbage collection is performed in parallel to improve efficiency.
insert image description here
A region in the figure may belong to Eden, Survivor or Tenured memory area. E in the figure indicates that the region belongs to the Eden memory area
, S indicates that it belongs to the Survivor memory area, and T indicates that it belongs to the Tenured memory area. Blank spaces in the figure represent unused memory space.
The G1 garbage collector also adds a new memory area called the Humongous (large object) memory area , as shown in the H block in the figure. This memory area is mainly used to store large objects - that is, objects whose size exceeds 50% of the size of a region.

1. Young generation
In the G1 garbage collector, the garbage collection process of the young generation uses the copy algorithm. Copy the objects in the Eden area and the Survivor area to the new Survivor area.

2. Old generation
For garbage collection in the old generation, the G1 garbage collector is also divided into 4 stages, basically the same as the CMS garbage collector.

The life of an object
I am an ordinary Java object, I was born in Eden District, and I also saw my little brother who looks very similar to me in Eden District, and we played in Eden District for a long time. One day there were too many people in the Eden area, so I was forced to go to the "From" area (S0 area) of the Survivor area. " area, sometimes in Survivor's "To" area (S1 area), no permanent residence. Until I was 18 years old, my father said that I was an adult and it was time to break into the society. So I went to the old generation. There are many people in the old generation, and they are all quite old. I also met a lot of people here. In the old generation, I live for many years (one year per GC) and then get recycled.


Keep going~
insert image description here

Guess you like

Origin blog.csdn.net/qq_43243800/article/details/131724230