Detailed explanation of jvm (4)丨Garbage collection algorithm

For the previous 3 articles, view them on the homepage~

Let’s start with this article and talk about garbage collection. This is a big topic and a very important part of our application performance optimization. If you are good at diagnosing jvm GC problems, it will not only make you outstanding at work, but also make you It is easier for the interviewer to favor you during the interview. In this part of GC, I will talk about common garbage collection algorithms. Garbage collection algorithms include basic algorithms, as well as more complex incremental algorithms and generational algorithms. Then there are also common garbage collector working processes and optimization details, collection I will focus on CMS, ParNew, G1, and ZGC. I will also take you step by step to read the GC log. I will also make certain optimization summaries based on application types and common scenarios. I will also introduce several troubleshooting methods. tool. Some things here, such as the working process of the garbage collector, are somewhat difficult. If you have any questions, you can leave a message below the article. As long as I see it, I will reply to you.

There are many points involved in GC. I will explain them in detail here. My goal is to make you a practical person, so in the following GC articles, I will often show various tools for troubleshooting problems and what I have troubleshooted. process, so the article will inevitably become longer.

This chapter first talks about the basic garbage collection algorithm, which is a big prerequisite for us to understand what follows.

1. Basis for judging whether an object can be recycled

There are two ways to determine whether an object can be recycled. Let’s talk about them respectively.

1. Reference counting method

This is a very old algorithm. Specifically: when allocating an object in heap memory, an extra space will be allocated for the object. This space is used to maintain a counter. If there is a new reference pointing to this object, the counter will The value is incremented by 1; if the reference to the object is nulled or points to another object, the counter value is decremented by 1. Each time there is a new reference pointing to this object, the counter is incremented by 1; conversely, if the reference pointing to the object is nulled or points to another object, the counter is decremented by 1; when the counter value is 0, the object is automatically deleted. .

2. Accessibility analysis method

The reachability analysis method uses the root set (GCRoot) as the starting point. Starting from these nodes, the search starts based on the reference relationship. The path passed is called the reference chain. When an object is not accessed by any reference chain, it is proved that This object is inactive and can be recycled. This is the method used by jvm.

Fundamentally speaking, there is no better or worse between the two algorithms. There is only the difference between using more and less. The reachability analysis algorithm is used more. The possible reasons are as follows: 1. The reference counting method is difficult to solve. Although there is a Recycler algorithm that can solve the problem of circular references (don't say it can't be solved when you go to the interview again), this brings about unpredictable performance consumption problems. 2. Changing the number of references in a multi-threaded environment is a performance-intensive matter.

名词解释:GCRoot 指的是执行可达性分析的起点,在java中,可以作为GCRoot的有:
1、虚拟机栈中(栈帧中的本地变量表)引用的对象
2、方法区中的类静态属性引用的对象
3、方法区中常量引用的对象
4、本地方法栈中 JNI(Native 方法) 引用的对象

The two methods mentioned above are the most basic methods for judging whether an object can be recycled. Many algorithms such as marking algorithms are based on this. Let’s take a look at the specific garbage collection algorithm.

2. Basic garbage collection algorithm

What we will introduce here are three basic garbage collection algorithms. These are the basis for understanding the working principle of the garbage collector later.

(1) Mark and clear algorithm

This algorithm is divided into two phases: marking phase and clearing phase. The task of the marking phase is to mark all objects that need to be recycled, and the clearing phase is to recover the space occupied by the marked objects. The process is as follows:

We can also see from the figure that the mark and clear algorithm has a serious memory fragmentation problem, which will lead to low memory allocation efficiency when creating objects, and too many memory fragments will also lead to additional garbage collection actions.

(2) Mark compression algorithm

This algorithm is to solve the above memory fragmentation problem. After marking, the surviving objects will be moved to one end of the memory. The process is as follows:

Let’s not talk about the advantages and disadvantages of this algorithm first. Let’s talk about another algorithm first.

(3) Mark copy algorithm

This algorithm divides the memory into two blocks. Only one block is used at a time. The marking action only occurs in the used block of memory. Then all surviving objects are copied to another block of memory, and then all previous memory blocks are cleared. , the process is as follows:

Let’s take a look at the mark compression algorithm and the mark copy algorithm together: Both algorithms solve the problem of memory fragmentation. However, the mark copy algorithm wastes half of the memory, which means there is a problem of low memory usage. Is that right? Are you saying that the mark compression algorithm is suitable for all scenarios? Obviously not. If there are many surviving objects in this memory, using the mark compression algorithm will involve copying a large number of objects and modifying the reference addresses, which will reduce the execution time allocated to the user thread (actually the mark copy algorithm will be used in the surviving objects This problem often occurs).

3. Improvements to the garbage collection algorithm

The two garbage collection algorithms introduced below will improve the shortcomings of the basic algorithm such as memory fragmentation, long pause times, and low space utilization.

(1) Generational algorithm

The generational algorithm is based on the hypothesis that most objects are born and destroyed. This hypothesis has actually been confirmed. It is not unimaginable at all. In most OLTP applications, an object will be recycled soon after it is created. The generational algorithm classifies objects into several generations, and uses different GC algorithms for different generations: the newly generated objects are called new generation objects, and the GC performed on new objects is called new generation GC (minor GC). Those who reach a certain age Objects are called old generation objects, GC for old generation objects is called old generation GC (major GC), and the conversion of new generation objects into old generation objects is called promotion. But more algebra is not always better. Overall, two or three generations of algebra are the best.

这里引进了一个新的概念,OLTP和OLAP,这是两种不同的类型的应用,OLAP一般是数据分析型应用,OLTP应用是普通的实时类型
业务的应用。这两种应用的jvm优化是不一样的,这里你先记住这个概念,后边我们再具体讲优化。

The old generation GC will not be executed until the old generation space is filled with objects promoted through the new generation GC. Therefore, the old generation GC is executed less frequently than the young generation GC. By using generational garbage collection, you can reduce the time spent on GC.

(2) Incremental algorithm

The incremental algorithm mainly controls the STW (stop the world, application pause) time through concurrency. Specifically, the work of the garbage collection thread and the work of the user thread are executed in a concurrent manner. This is because JVM uses a reachability analysis algorithm. The working process of the marking algorithm based on reachability analysis is generally divided into several stages. Not every stage needs to stop the user thread, so these do not need to stop the user thread. The stages only need to be executed concurrently with the user thread (if you don’t understand the increment, don’t worry, you will have a better understanding when we talk about the garbage collector later)

4. Problems and solutions caused by using the improved garbage collection algorithm

The above improvements also come at a cost, because the essence of this type of algorithm is to make a trade-off between time and space. If the speed becomes faster, a certain amount of memory must be sacrificed. If the memory occupied is smaller, the speed will definitely decrease. It will be relatively slower.

(1) The object reference relationship changes during the marking process

Let’s first take a look at the marking process in the marking algorithm. This process can be abstractly summarized using the three-color algorithm, that is, objects are divided into three categories according to the different degrees of marking. White: objects that have not been marked by the garbage collector. Gray: objects that have not been marked by the garbage collector. It is marked, but its member variables have not been marked. Black: It has been marked, and all the member variables of the object itself have also been marked.

At the beginning of GC, all objects are white, such as state 1. First, when scanning from GCRoot, mark A in the picture as gray and E as black, such as state 2. Then starting from the gray objects (no longer scanning black), The gray reference is marked as black (A object), and then B is marked as gray, such as state 3, and then the recursive loop scans from gray. In the end, only D will not be scanned, so D is garbage, and the garbage will be in was cleaned up during the cleaning process.

Question 1: If in state 3, when jvm is preparing for the next step of marking, the reference relationship between A and B is released, then the next marking will still start from B, mark B as black, and mark C as gray. In the end, both B and C will be marked, but in fact, we can easily infer from the figure that after A and B cancel the reference relationship, B and C are unreachable from GCRoot, so B and C cannot be recycled in this round of recycling. , so B and C become floating garbage.

Question 2: Another situation occurs at the end of state 2. Before the jvm is ready to proceed to the next step, E references B, and the reference relationship between A and B is released. Then B and C cannot be scanned in the next round of scanning. (Because the next scan will only start from gray objects), then B and C will become garbage, which is very serious because it affects the correctness of the program. In order to solve these two problems, jvm introduces read and write barriers. The conditions for the occurrence of read and write barriers are as follows

问题2这个场景的代码如下:
void test(A a,E e) {    
    B b = a.b
    a.b = null
    e.b = b
}

Specifically: (1) Write barrier: When e.b=b, mark the e object or b object in gray so that it can be scanned in the next scan. (2) Read barrier: When assigning a.b to B b, Immediately mark a.b as gray to ensure the accuracy of the scan. This is also called incremental update.

(2) Cross-generational references

There is also a cost to using generation. Objects in the new generation may not only be referenced by GCRoot, but may also be referenced by objects in the old generation, as shown in the figure. If you want to know whether B can be recycled, you must scan the old generation, but In this way, the meaning of generation is lost. The JVM must safely recycle the new generation without scanning the old generation. Otherwise, it is better not to separate generations.

In the picture above, we can use C as a GCRoot, so that we can start from C every time we scan, or mark C as gray (write barrier). The method of using C as GRCRoot is to record the old generation objects into a collection called Remember Set whenever a cross-generation reference occurs, and then use the objects in the Remember Set as GRCroot during scanning, so that Avoids the problem of scanning the entire old generation.

Guess you like

Origin blog.csdn.net/weixin_54542328/article/details/134875380