Detailed Garbage Collection Algorithm---Java Virtual Machine



Garbage collection algorithm

After the JVM generates an object, the objects marked as "dead" must be recycled. From the perspective of determining whether the object is "dead", the garbage collection algorithm can be divided into "reference counting garbage collection" and " tracing garbage" Reclamation (Tracing GC) ". But we also learned before that most JVMs do not use reference counting algorithms to determine whether an object is dead. So JVM mainly uses tracking garbage collection algorithms. However, because the tracking garbage collection algorithms follow The theory of " generational recycling " is designed. Therefore, it is necessary to understand the generational hypothesis.

1. Generational collection theory

Most objects in the JVM are extinct day and night, but if an object "survives" many garbage collections and survives, then it means that it is harder to die. So the JVM divides the objects in the heap into two categories : Young generation objects and old generation objects. The heap is divided into two areas to store two types of objects. The young generation (Young Generation and the Old Generation) . According to different areas. GC can be based on the two parts Different features are collected separately. But if you simply treat the heap memory as two divided parts, it may cause problems.
For example, if you come to MinorGC that only targets the young generation, if the objects in the young generation are affected by the old generation the object reference both inter-generational references (but this is very rare), then you have outside fixed GC Roots, the entire old's search to verify the reachability analysis correctness because if a new generation of objects Cross-generation references are generated.Since old generation objects are difficult to destroy, new generation objects will also become difficult to destroy.

We also have a solution for this. That is to establish a global data structure ( memory set ) in the new generation area to divide the old generation into several parts, and record which parts of the old generation will generate cross-generation references, then after that When MinorGC is performed, the parts that generate cross-generation references in the old generation will also be added to the reachability analysis.

GC can perform garbage collection for different areas.

  • Minor GC/Young GC (new generation collection): collect for the new generation.
  • Major GC/Old GC (Old GC): only collects for the old generation, currently only the CMS collector has such behavior.
  • Mixed GC (mixed collection): Collect the entire young generation and part of the old generation, currently only the G1 collector has such behavior
  • Full GC (whole collection): Collect the entire java heap and method area

2. Mark-Sweep algorithm (Mark-Sweep)

The mark removal algorithm is the earliest and the most basic GC algorithm. As the name suggests, this algorithm is divided into two steps: Mark and Sweep. First mark all objects that need to be cleared, and then uniformly recycle the marked The object. This algorithm is simple and easy to understand. But there are many disadvantages:

  • In most cases, the efficiency is low : because most of the objects in the java heap need to be collected day and night. So a lot of marking and cleaning are required. The more objects that need to be cleaned, the lower the efficiency
  • The problem of fragmentation of memory space : After collecting the space in the heap memory, a large amount of discontinuous space (fragmentation) will be generated. This will lead to the future if a large object is to be allocated, the total memory space in the heap is sufficient , But due to fragmentation, there is not enough contiguous space for allocation.

Schematic diagram of mark removal algorithm:
Insert picture description here

3. Mark-Copying-suitable for the new generation

The mark copy algorithm is referred to as the copy algorithm.

3.1 Initial stage

Researchers aimed at tagging and eliminating the shortcomings of the algorithm fragmentation. Another algorithm- Semispace Copying (Semispace Copying) , the semispace copy algorithm divides the memory area in half. Half of it is used to allocate memory, and the other half is used for storage. Surviving objects.

Insert picture description here

The specific operation of half area copy is: copy the objects that survived GC in one half area to the other half area and arrange them in order . If most of the objects in the memory will be recycled, then only a small part of the objects need to be copied.

So this kind of algorithm is suitable for the new generation. But this kind of algorithm has a fatal shortcoming: the space waste is too big, 50% of the space can not be used to allocate memory. So are there any improvements?

3.2 Optimization

Based on the half-area replication algorithm and the characteristics of the new era, the researchers proposed a more optimized algorithm called "Appel recycling". So how is it optimized? Think about it, since the new generation is extinct most objects will soon be recovered, you will need to use half the space to store objects survive it? .

Therefore, the specific strategy of Appel recycling is: the new generation is no longer simply and roughly divided into two equal areas, but into a larger Eden space (80%) and two smaller Survivor spaces (20% in total). Each allocation uses only Eden and a piece of Survivor space. When garbage collection occurs, the objects that are still alive in the allocated space are copied to another piece of Survivor space at a time. So there is only one Survivor area (10%). Was wasted.

Appel algorithm diagram:
Insert picture description here

But this may also cause a little problem: Although 98% of the new generation objects can't survive the first round of garbage collection (researched by IBM), no one can guarantee 100%, and each collection only does not exceed 10%. The object survives. If the memory of the surviving object is larger than the Survivor area and the Survivor area is not enough, what should I do? The Appel algorithm provides a safety guarantee: but the Survivor space is not enough to store the objects that survived a Minor recovery, then it will rely on other memory Area (in fact, most cases are old generations) for allocation guarantee (Handle Promotion)

4. Mark-Compact algorithm is suitable for the old age

The mark-copy algorithm is very suitable for the Cenozoic area where there are very few surviving objects. But it is no longer applicable in the old age. All the characteristics of the old age are preserved. In 1974, researchers proposed a targeted algorithm-" Mark- Mark-Compact (Mark-Compact) ".

The marking process of the mark sorting algorithm is the same as the mark clearing algorithm. But the subsequent steps are not directly reclaimed. Instead, all surviving objects are moved to a section of memory, and then all objects outside the survival boundary are cleared. Algorithm diagram as follows:

Insert picture description here

The essential difference between the mark-sweep algorithm and the mark-sort algorithm is that the former is a non-mobile recycling algorithm, while the latter is mobile. Whether to move the object after the survival is a risky decision with both advantages and disadvantages . Because of moving Objects, especially a large number of objects in the old age, is an extremely burdensome operation.

4.2 Another way of thinking

In addition, there is a "and thin mud" solution: let the virtual machine use the mark-and-sweep algorithm most of the time, temporarily tolerate the existence of memory fragmentation, until the degree of fragmentation in the memory has reached the memory allocation affecting the object, Then use the tag defragmentation algorithm to organize the space fragments once.

Guess you like

Origin blog.csdn.net/qq_44823898/article/details/109843429