[JVM] garbage collection algorithm

generational collection theory

Most of the current garbage collectors for commercial virtual machines are designed according to the theory of "Generational Collection" [1]. rule of thumb, which is based on two generational hypotheses:

  1. Weak Generational Hypothesis: Most objects are ephemeral.

  2. Strong Generational Hypothesis (Strong Generational Hypothesis): The more objects that survive the garbage collection process, the harder it is to perish.

Putting the generational collection theory into the current commercial Java virtual machine, designers generally divide the Java heap into at least two areas: the Young Generation and the Old Generation [2]. As the name suggests, in the new generation, a large number of objects are found to die every time garbage is collected, and a small number of objects that survive each collection will be gradually promoted to the old generation for storage.

For the generational theory, since there will be cases where the old generation refers to the new generation, this will have a performance impact on the judgment when recycling objects, so the following assumptions should also be made

  1. Intergenerational Reference Hypothesis (Intergenerational Reference Hypothesis): Intergenerational references are only a very small number compared to same-generation references.

The above conclusion can be derived from the implicit inference based on the logical reasoning of the first two hypotheses: two objects that have a mutual reference relationship should tend to live or die at the same time.

Here are some common generational collection nouns

  • Partial collection (Partial GC) : refers to the garbage collection that the goal is not to completely collect the entire Java heap, which is divided into:

    • Minor GC/Young GC : Refers to the garbage collection whose goal is only the new generation.

    • Old generation collection (Major GC/Old GC) : Refers to the garbage collection that targets only the old generation. At present, only the CMS collector has the behavior of separately collecting the old generation.

    • Mixed collection (Mixed GC) : Refers to the garbage collection whose goal is to collect the entire new generation and part of the old generation. Currently only the G1 collector exhibits this behavior.

  • Whole heap collection (Full GC) : Garbage collection that collects the entire Java heap and method area.

mark-sweep algorithm

Mark and Sweep Algorithm (Mark and Sweep Algorithm) is a classic garbage collection algorithm used to identify and release objects that are no longer referenced, thereby reclaiming memory space. It consists of two main phases: Mark and Sweep.

Here's how the mark-and-sweep algorithm works:

  1. Mark (Mark) stage :
    • Starting from the root object (usually the program's entry point or global variable), traverse the entire object graph, marking all objects that can be accessed from the root object.
    • At this stage, marked objects are usually marked with a mark bit or added to a "marked" set to indicate that they are active objects and are still referenced.
  2. Sweep phase :
    • During the cleanup phase, the garbage collector traverses the entire heap memory, finding all unmarked objects, which are objects that are no longer referenced.
    • Unmarked objects are considered garbage, and the garbage collector marks their memory space as free space for future object allocations.
    • After clearing, only marked objects remain in the heap memory, and the memory of unmarked objects has been released.

One thing to note above is that the marked objects can be all objects that need to be recycled. After the marking is completed, all marked objects will be recycled uniformly, or in turn, the surviving objects will be marked and all unmarked objects will be recycled uniformly. .

The following is a schematic diagram of the mark-clear algorithm

image-20230904194201368

The advantage of the mark-and-sweep algorithm is that it can reclaim objects that are no longer referenced, but it also has some disadvantages:

  • Fragmentation issues : The mark-sweep algorithm leaves discontinuous free blocks in memory, which may lead to memory fragmentation issues. This increases the risk of out-of-space problems when allocating large objects.
  • Efficiency issues : The mark-clear algorithm needs to traverse the entire heap memory twice: once for marking and once for clearing. This incurs additional performance overhead, especially during the cleanup phase.
  • Halting problem : The mark-and-sweep algorithm needs to stop the execution of the application during the cleanup phase, because it requires the entire heap memory to be in a constant state to perform the cleanup operation. This can cause the application to stall during garbage collection, affecting the user experience.

mark-copy algorithm

Mark and Copy Algorithm (Mark and Copy Algorithm) is an algorithm for garbage collection, which solves the memory fragmentation problem that occurs in the mark-clear algorithm. The mark-copy algorithm is mainly used for the garbage collection of the young generation (Young Generation), and is usually divided into two stages: mark (Mark) and copy (Copy).

Here's how the mark-and-copy algorithm works:

  1. Mark (Mark) stage :

    • Starting from the root object (usually the entry point of the program or a global variable), the entire object graph is traversed, marking all objects that can be reached from the root object, and these objects are considered to be active objects.
    • At this stage, marked objects are usually marked with a mark bit or added to a "marked" set to indicate that they are active objects and are still referenced.
  2. Copy (Copy) phase :

    • During the copy phase, the garbage collector copies all objects marked as live objects from one area (often called the "From" or "Eden" area) to another area (often called the "To" or "Survivor" area. ).
    • After copying, all copied objects are arranged consecutively without fragmentation, and the "From" area becomes empty.
  3. Clean-Up phase :

    • The objects being copied are no longer needed during the cleanup phase, so the entire "From" area can be emptied to become new free space for future object allocations.

The schematic diagram is as follows

image-20230904195708455

The main advantage of the mark-copy algorithm is that it effectively solves the problem of memory fragmentation. Because the copied objects are all neatly arranged together, the "From" area becomes empty, so there will be no memory fragmentation. This improves memory utilization and reduces possible out-of-space issues when allocating large objects. The disadvantage is that it is a waste of space.

The mark-copy algorithm is mainly used for garbage collection in the young generation, while the old generation usually uses other algorithms, such as mark-sweep or mark-compact algorithms. These different garbage collection algorithms are combined to form a complex garbage collection strategy in modern Java virtual machines to improve the efficiency and performance of memory management.

Mark-Collating Algorithm

Mark and Compact Algorithm (Mark and Compact Algorithm) is an algorithm for garbage collection, usually used for memory recovery in the Old Generation (Old Generation). It is an improved version of the mark-clear algorithm, which mainly solves the problem of memory fragmentation that the mark-clear algorithm may cause.

Compared with the mark-clear algorithm, there is only one more sorting process

Finishing (Compact) stage :

  • During the tidy phase, the garbage collector moves all objects marked as live to one end so that they line up contiguously. At the same time, unmarked/marked objects are considered garbage, no longer needed, and their memory space will be released.
  • After sorting, the active objects in memory become more compact without fragmentation, which can improve memory utilization.

The schematic diagram is as follows

image-20230904200134998

Guess you like

Origin blog.csdn.net/m0_51545690/article/details/132676846
Recommended