[JVM] (3) In-depth understanding of JVM garbage collection mechanism (GC)


foreword

JVM's garbage collection mechanism (Garbage Collection) is one of the important features in Java. It is responsible for automatically reclaiming unused memory when the program is running, so as to avoid memory leaks and improve program performance. The design and implementation of garbage collection mechanism is crucial to the running efficiency and stability of Java programs. Therefore, as a qualified programmer, it is necessary to have a deep understanding of the garbage collection mechanism of the JVM in order to write and design reasonable and efficient code.

1. Judgment method of the dead object

In the garbage collection mechanism of the JVM, it is very important to determine whether an object is dead, because only dead objects can be reclaimed by the garbage collector and release the memory it occupies . The commonly used judgment methods include 引用计数算法and 可达性分析算法.

1.1 Reference counting algorithm

The reference counting algorithm is a simple garbage collection algorithm. Its basic idea is to maintain a reference counter for each object to record how many references currently point to the object. When the reference counter becomes zero, it means that the object does not have any references pointing to it, so it can be judged that the object is in a dead state, and it can be recycled .

The implementation of reference counting is very simple, and the judgment efficiency is also very high. It is a good algorithm in most cases. For example, Python uses this method to manage memory. But the biggest problem with reference counting is circular references.

For example, the following code demonstrates a circular reference:

class Test{
    
    
    public Test test;
}

public class Demo {
    
    
    public static void main(String[] args) {
    
    
        Test test1 = new Test();
        Test test2 = new Test();
        test1.test = test2;
        test2.test = test1;
        test1 = null;
        test2 = null;
    }
}

The above code is a case of simple circular reference. The field Testin the class testis a reference type, which points to another Testobject. In the function, two objects mainare common , namely and , and then they are assigned to each other's fields.Testtest1test2test

If the method of reference counting is used, because test1and test2each other refer to each other, their reference counters are always not zero, because the number of references between them is always 1. According to the principle of the reference counting algorithm, even if these two objects are no longer used by the program, their reference counters will not become zero, so they will not be reclaimed by the garbage collector.

1.2 Accessibility Analysis Algorithm

In order to solve the problem of circular references, JVM uses a more complex reachability analysis algorithm. The basic idea of ​​the reachability analysis algorithm is to use a group GC Rootsof objects as the starting point, and determine whether the object is reachable by searching downward and traversing the object application chain .

The meaning of object reachable:

  • An object is reachable, meaning it can be GC Rootsreached from the object through a chain of references.
  • If an object is unreachable, that is, it is not GC Rootsconnected to any object, then the object will be judged as dead, that is, it can be reclaimed by the garbage collector.

Objects in the JVM GC Rootsinclude:

  • the class object of the loaded class (Class Object);
  • Objects referenced by class static variables;
  • The stack frame (Stack Frame) of the active thread (Active Thread);
  • JNI(Java Native Interface)The object referenced in the native method stack (Native Method Stack) in .

Through the reachability analysis algorithm, the JVM can accurately judge the life cycle of the object, avoid the circular reference problem of the reference counting algorithm, and effectively perform garbage collection.

2. Garbage collection algorithm

The garbage collection algorithm is an important part of the JVM garbage collection mechanism. It is responsible for recycling objects that are no longer used, releasing memory resources, and ensuring the operating efficiency and stability of the program. In the JVM, common garbage collection algorithms include: mark-clear algorithm, copy algorithm, mark-organize algorithm and generational algorithm.

2.1 Mark-Clear Algorithm

标记-清除算法It is one of the most basic garbage collection algorithms. The process is divided into two phases: marking phase and clearing phase.

  1. Marking phase : Starting from a group of GC Rootsobjects called , traverse all reachable objects and mark them, indicating that these objects are active objects and will not be recycled.
  2. Cleanup phase : traverse the entire heap space, treat unmarked objects as garbage objects, and directly reclaim the memory of these garbage objects. The reclaimed memory forms some discontinuous fragments, which may cause memory fragmentation problems .

2.2 Replication Algorithm

复制算法It is 解决标记-清除算法a garbage collection algorithm designed for the memory fragmentation problem. It divides the heap space into two equally sized regions, using only half of them at a time. Its process is divided into three phases: marking phase, copying phase and role swapping phase.

  1. Marking phase : same 标记-清除算法as that, GC Rootsstarting from the object, traversing all reachable objects, and marking active objects.
  2. Copy stage : copy all active objects from one area to another, so that the copied memory is continuous and there will be no memory fragmentation problem .
  3. Role exchange : After the copy is completed, the roles of the two areas are exchanged, the original surviving object becomes the new free area, and the original free area becomes the new working area.

Today's commercial virtual machines, including HotSpotall use this collection algorithm to reclaim objects in the new generation area.

  • 98% of the objects in the new generation are 朝生夕死private, so it is not necessary to 1 : 1divide the memory space according to the ratio, but to divide the memory (new generation memory) into a larger Eden(Eden) space and two smaller Survivor(survival) spaces. or) space.
  • Each use Edenand one of the two regions Survivor( one is called the region and the other is called the region).SurvivorFrom(S0) To (S1)
  • When recycling, copy the surviving objects in Edenand to another space at one time, and finally clean up and the space just used .SurvivorSurvivorEdenSurvivor
  • When Survivorthe space is not enough, you need to rely on other memory (such as the old generation) for allocation guarantee.

About HotSpot:

HotSpotEdenThe default Survivorsize ratio is 8 : 1, that is Eden : Survivor From : Survivor To = 8:1:1. Therefore, the available memory space of each new generation is 90% of the entire new generation capacity, and the remaining 10% is used to store surviving objects after recycling.

HotSpot is an implementation of the most widely used Java Virtual Machine (JVM) on the Java platform. It is developed by Oracle (formerly Sun Microsystems) as part of the Java > Development Kit (JDK) and one of the default JVM implementations of OpenJDK.

HotSpotThe implemented replication algorithm flow is as follows:

  1. When Edenthe area is full, it will trigger the first time Minor GCto copy the surviving objects to Survivor Fromthe area; when Edenthe area is triggered again Minor GC, it will scan Edenthe area and Fromthe area, and perform garbage collection on the two areas. object, it is directly copied to Tothe area, Edenand Fromthe area is cleared.
  2. When the follow-up Edenhappens again Minor GC, Edenthe and Toarea will be garbage collected, the surviving objects will be copied to Fromthe area, and Edenthe and Toarea will be emptied.
  3. Part of the objects will be exchanged 15 times in the Fromand Toarea ( determined by the JVM parameter, this parameter is 15 by default), and finally if they are still alive, they will be stored in the old age.来回复制MaxTenuringThreshold

2.3 Marking-Collating Algorithm

复制收集算法When the object survival rate is high, more copy operations will be performed, and the efficiency will become lower. Therefore, the copy algorithm cannot generally be used in the old age .

According to the characteristics of the old age, it is proposed 标记-整理算法. The marking process is still 标记-清除consistent with the process, but the subsequent steps are not to directly clean up the recyclable objects, but to move all surviving objects to one end, and then directly clean up the memory outside the end boundary . The flow chart is as follows:

2.5 Generation Algorithm

The above three algorithms all have some common problems:

  1. Efficiency issues : 标记-清除算法and 标记-整理算法need to traverse the entire heap space, which may lead to low efficiency of garbage collection .

  2. Memory fragmentation problem : 标记-清除算法Memory fragmentation will occur during the recycling phase, resulting in waste of memory space and discontinuous memory layout .

In 分代算法order to solve these problems, 分代算法it is a strategy that comprehensively utilizes multiple garbage collection algorithms. By dividing the heap memory into regions, different garbage collection strategies are adopted for different regions, so as to achieve better garbage collection effects .

分代算法Design thinking:
In Java programs, differences 对象的生命周期are often differences. Most newly created objects become garbage very quickly, while some objects may live for a long time. Therefore, according to this feature, the heap is divided into different generations to process objects with different life cycles, so that the efficiency and performance of garbage collection can be better optimized.

The generational algorithm usually divides the heap memory into the following generations:

  1. Young Generation : Newly created objects are usually allocated to the Young Generation . The new generation is used 复制算法for garbage collection, because these objects have a short life cycle and generate more garbage .

  2. Old Generation (Old Generation) : Objects that are still alive after multiple GCs are moved to the Old Generation . The old generation is used 标记-清除or 标记-整理算法garbage collected, because these objects have a long life cycle and generate relatively little garbage .

  3. Permanent Generation (Permanent Generation) : The permanent generation is used to store information such as metadata and constants of the class . After Java 8, the permanent generation is replaced by Metaspace .

Through the generational algorithm, different generations adopt different garbage collection strategies, which can perform finer garbage collection for objects with different life cycles, avoid traversing the entire heap space, and improve garbage collection efficiency. This design enables the garbage collector to dynamically adjust the recycling strategy according to the running status of the application, so as to better adapt to different application scenarios.

2.6 Minor GC and Major GC

In the JVM, garbage collection (Garbage Collection, GC) can be divided into two types according to the tasks performed : namely Minor GCand Major GC(也称为 Full GC).

Minor GC

Minor GCis for 新生代the garbage collection process. 新生代Is part of the Java heap memory, used to store just created objects. Usually, newly created objects have a short lifetime, so they are garbage collected after 新生代use .复制算法

Minor GCThe working process is as follows:

  1. Marking phase : GC RootsStarting from the object, mark all the objects that survive in the new generation.

  2. Copy phase : All surviving objects are Edencopied from zone to Survivorzone.

  3. Role swap : After the copy is completed, the roles of Edenthe zone and Survivorthe zone are reversed, so that the original Edenzone becomes the new free zone, and the original Survivorzone becomes the new work zone.

Minor GCThe purpose is to clean up the garbage objects in the new generation, so that the new generation can allocate space for new objects, try to ensure that the space of the new generation is continuous, and avoid memory fragmentation .

Major GC

Major GCIt is the garbage collection process for the old area, which is used to store long-lived objects. 标记-清除Objects used in the old area have a long life cycle. If the copy algorithm is used for garbage collection, it may lead to a large copy cost, so or is usually used 标记-整理算法.

Major GCThe working process is as follows:

  1. Marking phase : GC RootsStarting from the object, mark all the objects that survived in the old age.

  2. Clearing or sorting phase : Perform corresponding garbage collection operations according to the adopted algorithm. 标记-清除算法Unmarked garbage objects will be cleaned up, 标记-整理算法live objects will be moved and unmarked garbage objects will be cleaned up.

Major GCThe purpose of is to clean up the garbage objects in the old generation to avoid occupying too much memory resources in the old generation. It is also to ensure that the space in the old generation is continuous and avoid memory fragmentation .

Guess you like

Origin blog.csdn.net/qq_61635026/article/details/132048520