[jvm series-09] The underlying principles and algorithms of garbage collection and the basic use of JProfiler

JVM series overall column


content link address
[1] Getting to know virtual machines and java virtual machines https://blog.csdn.net/zhenghuishengq/article/details/129544460
[2] The class loading subsystem of jvm and the basic use of jclasslib https://blog.csdn.net/zhenghuishengq/article/details/129610963
[3] The virtual machine stack, program counter, and local method stack of the private area at runtime https://blog.csdn.net/zhenghuishengq/article/details/129684076
[4] Heap and escape analysis of the shared area of ​​the data area at runtime https://blog.csdn.net/zhenghuishengq/article/details/129796509
[5] The method area and constant pool of the runtime data area shared area https://blog.csdn.net/zhenghuishengq/article/details/129958466
[6] Object instantiation, memory layout and access positioning https://blog.csdn.net/zhenghuishengq/article/details/130057210
[7] Execution engine, interpreter, JIT instant compiler https://blog.csdn.net/zhenghuishengq/article/details/130088553
[8] Proficient in the underlying mechanism of String https://blog.csdn.net/zhenghuishengq/article/details/130154453
[9] The underlying principles and algorithms of garbage collection and the basic use of JProfiler https://blog.csdn.net/zhenghuishengq/article/details/130261481

1. The underlying principle of garbage collection

1. Overview of Garbage Collection

1.1, what is garbage

Garbage collection is not a product of the Java language. As early as 1960, the first Lisp language using memory dynamic allocation and garbage collection technology was born. The garbage collection mechanism is also a signature capability of Java, which greatly improves development efficiency. Therefore, when facing garbage collection, three main problems need to be solved: what memory needs to be recycled, when to recycle, and how to recycle?

insert image description here

Garbage: refers to the object that does not have any pointers in the running program , and this object is the garbage that needs to be recycled. If an object is created, the variable that refers to the object is stored in the stack frame of the virtual machine stack, and the stack frame is destroyed as the stack is pushed and popped, then the local variable variable that references the object in the stack frame is also destroyed. At this time, there is no variable referring to the object just created, then the object will become garbage and wait for recycling.

1.2, why do you need gc

First of all, if the garbage is not collected, the memory is easily consumed. When facing a large system, it is difficult to ensure the normal operation of the application without GC.

If the garbage in the memory is not collected in time, the memory space occupied by these objects will be reserved until the end of the application, and the reserved space cannot be used by other objects, and even memory overflow may occur.

1.3, java garbage collection mechanism

Through automatic memory management in Java, developers do not need to manually participate in memory allocation and recovery, which can reduce the problems of memory leaks and memory overflows, thereby saving heavy memory management, and can focus more on business development.

However, automatic memory management is like a black box. If you rely too much on automation, it will weaken the developer's ability to locate and solve problems when the program has memory overflow. Therefore, when it is necessary to troubleshoot various problems such as memory overflow and memory leaks, and when garbage collection becomes a bottleneck for the system to achieve higher concurrency, it is necessary to monitor and adjust the garbage collection as necessary .

During garbage collection, the main target is this area of ​​the Java heap. In terms of frequency, the Young area is frequently collected, the Old area is collected less frequently, and the method area (permanent generation or metaspace) is basically not moved.

2. Garbage collection algorithm

In this garbage collection algorithm, two things are mainly done: one is to find garbage, and the other is to remove garbage .

2.1, Garbage marking stage

The garbage marking stage is mainly to find garbage , which is used to judge whether the object is alive. Almost all objects in the heap are stored in the heap. Before the GC performs garbage collection, it is necessary to distinguish which are living objects and which are dead objects. Only objects that are marked as dead will be released by the GC when performing garbage collection. This stage is called the garbage mark stage

Inside the JVM, there are two main ways to determine whether an object is alive: the reference counting algorithm and the reachability analysis algorithm .

2.1.1, reference counting algorithm

This algorithm saves a reference counter attribute for each object. For example, for an object A, as long as any object references A, the counter corresponding to A will increase by 1, and the counter will decrease by 1 when the reference fails. Only when the value of the reference counter is 0, it means that it is no longer referenced by any variable, then the object will be marked, and will be recycled according to whether it is marked or not.

The advantage is that the implementation is relatively simple, garbage objects are easy to identify, and its efficiency is relatively high; the disadvantage is that a separate field is required to store the counter, which requires a certain amount of space overhead, and the counter needs to be added and subtracted, which increases the time overhead. The most important thing is that it cannot Dealing with the problem of circular references , so Java did not choose this algorithm.

public class A{
    
    
    public A a = null;
    List<A> list = new ArrayList();
    public static void main(String[] args) {
    
    
        A objectA = new A();
        A objectB = new A();
        //这两个对象相互引用,俩个计数器都+1,导致无法回收
        objectA.a = objectB;
        objectB.a = objectA;
    }
}

2.2.2, Reachability Analysis Algorithm

Compared with the reference counter, the reachability analysis algorithm not only has the characteristics of simplicity and high execution efficiency, but more importantly, it can effectively solve the problem of circular references, thereby preventing memory leaks .

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-VnXrU94q-1681959175717)(img/1681701973152.png)]

As can be seen from the figure above, the reachability analysis algorithm starts from the root object set and searches from top to bottom whether the target connected to the root object set is reachable. All surviving objects in memory will be directly or indirectly detected by the root object set Connected, such as GC roots and obj1 are directly connected, and 2, 3, and 4 are indirectly connected, and the entire connection path is called a reference chain . If the target object is not connected by any reference chain, such as 6, 7, 8, it is unreachable, which means that the object is dead and can be marked as a garbage object.

In java, the following types are often used as GC objects: parameters in methods, local variables, static properties, string constant pool references, objects held by synchronization locks, Class objects corresponding to basic data types, and exception objects , local cache, etc.

When using this reachability analysis algorithm to determine whether the memory can be recycled, the analysis must be performed in a snapshot that can guarantee consistency, and dynamic garbage increase is not allowed during the analysis, so as to ensure the analysis results accuracy. But it is precisely because of this snapshot problem that the appearance of stw is inevitable.

3, JProfiler view GC Root

There are several ways to view GC Root, such as using MAT, etc., but MAT involves eclipse, so MAT is discarded. This is mainly through JProfilerthis tool, you can directly search for this plug-in installation in the idea, and then restart the idea.

insert image description here

In addition to installing here, it is best to install .exean executable file, you can refer to Huang Ying, a big guy who wrote it, here is a free JProfiler download: https://blog.csdn.net/weixin_42311968/article/details/ 120726106 After downloading, keep clicking and install. I will use it for free for 10 days first.

After the installation is complete, go back to idea and click the JProfile icon in the upper right corner

insert image description here

Then in the prompt box, enter the .exe under the bin directory under the .exe installation path just now, then save it, and click it again to use it directly

//如我这边安装目录是在D盘下,因此找到这个路径下面的 JProfiler.exe 文件即可
D:\environment\jprofiler\jprofiler11\bin

Then the next step is also written in the boss, just change the parameters of the JVM exit action as shown in the following figure.

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-9H08THq7-1681959175720)(img/1681808812105.png)]

After clicking ok, then this screen will appear, and the tool can basically be used

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-9N6FxTNm-1681959175721)(img/1681809199238.png)]

After a period of time, the following screen will appear, which is dynamic

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-3KeL8hja-1681959175721)(img/1681810080320.png)]

All memory conditions are monitored in All Objects under Live memory, which is similar to the function of JVisualVM mentioned before. As mentioned before, it is an example to check the existence of this string constant pool.

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-2hyQPQKg-1681959175722)(img/1681809814749.png)]

If OOM occurs, you can directly check the object information in the Heap Walker directory to determine which object has OOM and whether there are large objects.

4. Garbage collection related algorithms

The above describes two ways to find out the garbage, and then you need to clear the found garbage, release the memory space of useless objects, so that there is enough available memory for the allocation of new objects. In jvm, there are three common garbage collection algorithms: mark removal algorithm, copy algorithm, mark compression algorithm

4.1, Mark and Clear Algorithm

When the effective memory space in the heap is exhausted, the entire program will be stopped, referred to as stw (stop the world), which can prevent new garbage from appearing when the mark is recycled and cleared.

There are mainly two boxes of work here, the first is marking, and the second is clearing. Marking starts from the root node and marks all reference objects, which are generally reachable objects; clearing is a linear traversal from beginning to end, and if an object is found not marked as reachable, it will be recycled.

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-mT2u9H5O-1681959175723)(img/1681871396273.png)]

As shown in the figure above, the green part represents the surviving object, the black represents the garbage object, and the white represents the idle object. Then start from the root node to judge whether the object is reachable, so as to determine whether the object needs to be recycled, and finally recycle the object corresponding to the black color, so that Implement the mark-and-sweep algorithm.

The advantage of this algorithm is that it is simple and easy to understand. But the disadvantages are also obvious, the efficiency is low; and when the GC needs to stop the entire application (stw), the user experience is poor; the most important thing is that it will generate a lot of memory fragments, so a free list needs to be maintained internally . The clearing here is not to clear the object directly, but to add the address of the object to be cleared to the free list, and then record the pointer to the object to be cleared. After the new object comes later, the pointer of the free list record points to the new to this object.

4.2, Replication Algorithm

The defects of the algorithm for mark clearing, such as memory fragmentation, were born, so this copy algorithm was born.

The core idea is to divide the memory space into two pieces, and only use one of them at a time. During garbage collection, copy the surviving objects in the memory being used to the unused memory, and then copy the memory being used The objects in the object are cleared, and then the roles of the two memory are exchanged, and finally the garbage collection is completed

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-YmNZmPj3-1681959175723)(img/1681879548891.png)]

The copy here is to copy the complete object. The s0 and s1 areas of the survival area in the new generation implement this algorithm. The classic is to exchange space for time. And when objects are allocated later, the pointer collision algorithm can be used directly.

The advantages of the replication algorithm are: there is no marking and clearing process, simple implementation, efficient operation, and solves the problem of fragmentation.

The disadvantage of the copy algorithm is that it needs twice the memory space, and if there is a lot of garbage, it needs to move more times, which affects the efficiency.

4.3, Marking Algorithm

Although the copy algorithm is efficient, it is more suitable for use in the new generation, because the objects there live and die, so there will not be too many objects that need to be copied and moved, and the efficiency will not be affected too much. However, the copy algorithm is not suitable for use in the old generation, because the objects in it are basically alive. If you really want to copy and move, it will seriously affect the efficiency, and the cost of this algorithm will be relatively high.

The mark-and-clear algorithm will generate garbage fragments, and obviously cannot be used in this old age, because if there is a large object, it may not be able to fit it, so a new algorithm, the mark-sorting method, has emerged . This algorithm is an algorithm optimized on top of the mark-and-clear algorithm to solve the problem of memory fragmentation.

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-DtDj66pf-1681959175724)(img/1681885073399.png)]

The core idea is to mark all objects that can be referenced through the root node, and then arrange all surviving objects in order, and then clean up the garbage that is not referenced. The final effect of the mark-clearing algorithm is equivalent to performing a fragmentation management after mark-clearing, which can also be called a mark -clear-compression algorithm, and this algorithm does not need to use a free list to record fragments.

The advantages of the mark sorting algorithm are: to solve the mark clear fragmentation problem, copy the algorithm space problem

The advantages of the markup algorithm are: the efficiency is lower than that of the copy algorithm, the object reference needs to be moved while moving the object, and the movement process requires stw

4.4, Summary of Three Algorithms

In terms of efficiency, the copy algorithm is the most efficient, but it will waste a lot of memory. Therefore, taking into account the three indicators of speed, space overhead, and whether objects need to be moved, the mark sorting algorithm is relatively more stable, but the efficiency is not satisfactory, because it has one more mark stage than the copy algorithm, and more than the mark clear algorithm. A stage of sorting out the memory.

5. Concepts related to garbage collection

5.1, Understanding of System.gc()

By default, through the call of System.gc(), the Full GC will be explicitly triggered, and the old generation and the new generation will be recycled at the same time, trying to release the memory occupied by the discarded objects.

But this System.gc() will call an attached disclaimer, which means that this can indeed trigger this Full GC, but whether the garbage collector will make a specific response, it is difficult to guarantee whether the response time is instant, it is possible It will not respond to this request, and it may be triggered after a long time, resulting in the resurrection of the object you want to clear or the delay in recycling, resulting in OOM.

Garbage collection can generally be performed automatically without manual triggering, otherwise it would be too troublesome. But when benchmarking performance, you can call this System.gc() during runtime.

/**
 * @author zhenghuisheng
 * @date : 2023/4/19
 */
public class Test {
    
    
    public static void main(String[] args) {
    
    
        User user = new User();
        //提醒Jvm进行垃圾回收
        System.gc();
        //强制执行
        //System.runFinalization();
    }
	
    //触发了垃圾回收就会调用这个finalize()
    @Override
    protected void finalize() throws Throwable {
    
    
        super.finalize();
        System.out.println("触发了垃圾回收");
    }
}

From the above code, we can know that after System.gc(), the full GC is sometimes triggered and sometimes not, but when using the System.runFinalization() method, it can be guaranteed that the FULL GC will be triggered because it is mandatory implemented.

5.2, memory overflow and memory leak

5.2.1, Out of Memory (OOM)

Since GC technology has been continuously improved and developed, OOM will not occur under normal circumstances, unless the memory occupied by the application grows very fast, resulting in garbage collection that cannot keep up with the speed of memory consumption. OOMs may occur. GC will perform garbage collection of various age groups, and a FULL GC operation will be triggered before OOM occurs. At this time, a large amount of memory will be reclaimed. If it is not enough after reclaiming a large amount of memory, then OOM will occur. . The explanation of OutOfMemoryError in the java documentation is this: There is no free memory, and the garbage collector cannot provide more memory .

The reason for insufficient memory may be that the heap memory setting of the Java virtual machine is not enough ; it may also be that a large number of large objects are created, and the objects are referenced and cannot be recycled , or the size of the object directly exceeds the maximum value of the heap memory .

5.2.2, memory leaks

It means that the objects are no longer used by the program, but the GC cannot reclaim these objects, which is called a memory leak. For example, some static objects have a relatively long life cycle, but these objects are associated with some objects that are only used once or some resource objects, and the resource objects are not closed, such as mysql connection, etc., which always lead to this memory leak.

5.3,Stop The World

Abbreviated as STW, it means that after the GC time is triggered, the program will pause, that is, all user threads will be suspended without any response, a bit like a stuck feeling. When triggering this STW, it is necessary to ensure that its work is carried out in a snapshot to ensure data consistency. If the object is still dynamically changing during the analysis process, it is difficult to guarantee the accuracy of the final analysis result.

Applications interrupted by STW will resume after GC. However, frequent interruptions will make users feel stuck and make the user experience unfriendly. Therefore, STW is the focus of attention in subsequent optimization. And STW is automatically initiated and completed in the background, and it will forcibly stop all normal threads of the user without the user's visibility. Therefore, System.gc() should be used less in development, otherwise it is easy to trigger this STW.

5.4, ​​Citation

When the memory space is enough, it can be kept in memory. If the memory space is still tight after garbage collection, these objects can be discarded. These objects are called references.

Strong Reference (StrongReference) : Refers to the reference assignment that is ubiquitous in the code, such as passing an object through new. In any case, as long as the relationship of the strong reference is still there, the garbage collector will never recycle the referenced object . Strongly referenced objects are basically reachable, that is, rootGc is reachable, and strong references are also one of the main causes of memory leaks.

StringBuffer sbu = new StringBuffer("zhenghuisheng");

Soft Reference (SoftReference) : Before the system is about to overflow memory, these memories will be included in the scope of recycling for the second recycling, if there is not enough space after the first recycling (recovering objects that are not reachable by GC Root) , then the second recycling will recycle these objects, if there is still insufficient memory space after the second recycling, then a memory overflow exception will be thrown. For example, some caches typically use soft references

//声明一个强引用
Object obj = new Object();
//实例化一个软引用
SoftReference<Object> sf = new SoftReference(obj);
obj = null;

Weak Reference (WeakReference) : Objects that are only associated with weak references can only survive until the next garbage collection. When the garbage collector works, it will be recycled regardless of whether there is enough space. Weak references can also be used as a cache

//声明一个强引用
Object obj = new Object();
//实例化一个弱引用
WeakReference<Object> sf = new WeakReference(obj);
obj = null;

PhantomReference : Whether an object has a phantom reference will not affect its lifetime at all, and it is impossible to obtain an instance of an object through a virtual reference. The only purpose of setting a phantom reference association is to receive a system notification when this object is reclaimed by the collector

//声明一个强引用
Object obj = new Object();
//引用队列
ReferenceQueue ReferenceQueue = new ReferenceQueue();
//实例化一个虚引用
PhantomReference<Object> sf = new PhantomReference(obj,ReferenceQueue);
obj = null;

To sum up: strong references are not recycled by default, soft references are recycled when memory is insufficient, weak references are recycled when found, and phantom references are used for object recycling tracking

If you reprint, please attach the reprint link address: https://blog.csdn.net/zhenghuishengq/article/details/130261481

Guess you like

Origin blog.csdn.net/zhenghuishengq/article/details/130261481