The basic process and common algorithms of Java garbage collection

Table of contents

1. Basic overview

2. Garbage classification

basic background

Give examples to illustrate the role of various reference types

Strong Reference

Soft Reference

Weak Reference

Phantom Reference

3. Trash search

Find junk opportunities

Find spam operations

4. Garbage Cleaning

Introduction to commonly used algorithms

Mark-Sweep

Mark-Copy

Mark-Compact

Generational collection algorithm

Problem background

Generational area description

Generational garbage collection algorithm execution process

References, books and links


1. Basic overview

When a Java program is running, objects are dynamically allocated in heap memory. As the program runs, some objects may no longer be referenced and become garbage. Garbage collection refers to cleaning up these garbage objects when the program is running to free up memory space for new objects.

The basic process of Java garbage collection can be divided into the following three steps:

  1. Garbage Classification : The garbage collector first needs to determine which objects are garbage objects and which objects are alive objects. Generally, the garbage collector will traverse the object graph starting from the root node of the heap (such as the program counter, virtual machine stack, local method stack and class static attributes in the method area, etc.) and mark all reachable objects as alive objects. Unmarked objects are considered garbage objects.
  2. Garbage Tracing : The garbage collector needs to find all garbage objects in order to clean them. Different methods of garbage search will lead to different garbage collection algorithms. Common garbage search algorithms include mark-sweep algorithm, copy algorithm, mark-sort algorithm, generational algorithm, etc.
  3. Garbage Collection : The garbage collector needs to clean up all garbage objects. Garbage cleaning methods are also different. Common ones include mark-sweep algorithm, copy algorithm, mark-sort algorithm, generational algorithm, etc. Garbage cleaning may cause application pauses. Different garbage collectors reduce this pause time in different ways, thereby improving application performance and reliability.

It should be noted that different garbage collectors may use different algorithms and strategies when performing garbage collection. Therefore, for different application scenarios, it is necessary to select an appropriate garbage collector and perform appropriate parameter tuning. To achieve the best garbage collection effect.

2. Garbage classification

basic background

Garbage classification refers to the process of dividing objects in the heap into living objects and garbage objects. It has no direct relationship with reference types such as strong references, soft references, weak references, and virtual references.

In the garbage classification phase, the JVM will start from a set of root objects, traverse all objects through the reference relationships between objects, and mark all surviving objects. During the marking process, objects are marked for processing in subsequent stages of garbage collection. Marked objects are alive objects, while unmarked objects are considered garbage objects and can be recycled by the garbage collector.

Reference types such as strong references, soft references, weak references, and virtual references are used to control the life cycle of objects during the garbage collection process. Their role is to tell the garbage collector which objects can be recycled and which objects cannot be recycled.

Give examples to illustrate the role of various reference types

Strong Reference

Strong reference is the most common reference type and the default reference type. If an object has a strong reference, the garbage collector will not reclaim it. When there is insufficient memory space, the JVM would rather throw an OutOfMemoryError than reclaim objects with strong references. Example code for strong references:

Object obj = new Object(); //强引用

Soft Reference

Soft references are used to describe objects that are useful but not necessary. Objects associated with soft references will only be recycled when there is insufficient memory. Soft references can be used to implement memory-sensitive caches, such as web page cache, image cache, etc. Example code for soft references:

Object obj = new Object();

SoftReference<Object> softRef = new SoftReference<>(obj); //软引用

obj = null; //obj 不再具有强引用,但仍有软引用

Weak Reference

Weak references are used to describe non-essential objects. Objects associated with weak references can only survive until the next garbage collection occurs. When the garbage collector is working, objects associated with only weak references will be recycled regardless of whether the current memory is sufficient. Example code for weak references:

Object obj = new Object();

WeakReference<Object> weakRef = new WeakReference<>(obj); //弱引用

obj = null; //obj 不再具有强引用,只有弱引用

Phantom Reference

Virtual references, also called ghost references or phantom references, are the weakest reference type. An object holding a virtual reference may be recycled by the garbage collector at any time just like it does not have any reference. Virtual references are mainly used to track the status of objects being garbage collected. When an object is about to be recycled, the virtual references will be placed in a ReferenceQueue, and notifications can be obtained through the ReferenceQueue. Example code for virtual references:

Object obj = new Object();

ReferenceQueue<Object> queue = new ReferenceQueue<>();

PhantomReference<Object> phantomRef = new PhantomReference<>(obj, queue); //虚引用

obj = null; //obj 不再具有强引用,只有虚引用

In short, through different reference types, we can more flexibly control the life cycle of objects and avoid being recycled by the garbage collector too early or too late. 

3. Trash search

Find junk opportunities

Different garbage collectors have different strategies. The following are just examples:

  • Insufficient space is applied for when applying for new object space and loading Class.
  • The space usage of the old generation and permanent generation has reached the configured value (cms: CMSInitiatingOccupancyFraction=60, CMSInitiatingPermOccupancyFraction=60)
  • Call System.gc()

Find spam operations

There are two methods for finding garbage: reference counting and reachability analysis.

Reference counting method: It is a simple garbage collection algorithm. Its basic idea is to add a reference counter to an object. Whenever there is a reference to it, the counter is increased by 1; when the reference expires, the counter is decremented by 1. When the counter reaches 0, it can be considered that the object is no longer referenced and can be recycled. However, the reference counting method cannot solve the problem of circular references, that is, a ring structure is formed between objects, causing their counters to not be 0, even if they are no longer used by the program.

Reachability analysis method: It is the main implementation method of modern garbage collection algorithms. Its basic idea is to start from a group of objects called "root objects" (such as global variables, stacks, method areas), and through a series of reference relationships, reachable objects are considered "alive" and unreachable objects It is considered garbage and needs to be recycled. Circular references formed between objects are also handled correctly during reachability analysis because there is no reference chain between them and the root object.

4. Garbage Cleaning

Introduction to commonly used algorithms

Mark-Sweep

GC is divided into two phases, mark and sweep .

First mark all recyclable objects, and then recycle all marked objects uniformly after the marking is completed.

The disadvantage is that discontinuous memory fragments will be generated after clearing. Too much fragmentation will result in the inability to find enough contiguous memory when a larger object needs to be allocated when the program is running, and the GC will have to be triggered again.

Mark-Copy

Divide the memory into two blocks according to capacity, and only use one block at a time. When this block of memory is used up, the surviving objects are copied to another block, and then the used memory space is cleared at once. This makes it possible to recycle half of the memory area every time without having to consider memory fragmentation issues, which is simple and efficient.

Disadvantages require twice the memory space. One optimization method is to use eden and survivor areas. The specific steps are as follows:

The default memory space ratio between the eden and survivor areas is 8:1:1. Only the eden area and one of the survivor areas are used at the same time. After the marking is completed, copy the surviving objects to another unused survivor area (some old objects will be upgraded to the old generation).

 In this way, compared with the ordinary two-space mark copy algorithm, only 10% of the memory space is wasted. The reason for this is: in most cases, there are very few surviving objects left after a young gc .

Mark-Compact

Marking and sorting are also divided into two stages. First, mark the recyclable objects, then move the surviving objects to one end, and then clean up the memory outside the boundary.

 This method avoids the fragmentation problem of the mark-sweep algorithm and also avoids the space problem of the copy algorithm.

Generally, after GC is executed in the young generation, a small number of objects will survive, and the copy algorithm will be used. The collection can be completed with a small cost of copying surviving objects.

In the old generation, due to the high survival rate of objects, the data copy efficiency is low when using the marked copy algorithm, and the space is wasted. Therefore, mark-sweep or mark-compact algorithm needs to be used for recycling.

Therefore, you can usually use the mark and clear algorithm first, and then use the mark deflation algorithm when the fragmentation rate is high.

Generational collection algorithm

Problem background

From the above, none of the basic garbage collection algorithms is a silver bullet. They have different characteristics and cannot cope with all scenarios. In modern JVM, through the analysis of a large number of actual scenarios, it can be found that objects in JVM memory can be roughly divided into two categories: one type of objects, whose life cycle is very short, such as local variables, temporary objects, etc. Another type of object will survive for a long time, such as the Connection object in the DB long connection in the user application.

In the above figure, the vertical axis is JVM memory usage, and the horizontal axis is time. It can be found in the figure that the life cycle of most objects is extremely short, and few objects can survive after GC. Based on this, the idea of ​​generation was born. In JDK7, the Hotspot virtual machine mainly divides the memory into three large blocks: Young Generation, Old Generation, and Permanent Generation.

Generational area description

The main basic regional classification analysis is as follows:

Cenozoic generation: The new generation is mainly divided into two parts: Eden area and Survivor area. The Survivor area can be divided into two parts, S0 and S1. In this area, the space is smaller compared to the old generation, the life cycle of objects is short, and GC is frequent. Therefore marked replication algorithms are often used in this area.

Old generation: The overall space of the old generation is larger, the life cycle of objects is long, the survival rate is high, and recycling is infrequent. Therefore, it is more suitable for mark sorting algorithms.

Permanent generation: The permanent generation is also called the method area, which stores meta-information of classes and interfaces and interned string information. Replaced by metaspace in JDK8.

Metaspace: Introduced after JDK8, the method area also exists in the metaspace.

Generational garbage collection algorithm execution process

  • Initial state: The object is allocated in the Eden area, and the S0 and S1 areas are almost empty.

  • As the program runs, more and more objects are allocated in the Eden area.

  • When Eden cannot be released, MinorGC (i.e. YoungGC) will occur. At this time, unreachable garbage objects will be identified first, then the reachable objects will be moved to the S0 area, and the unreachable objects will be cleaned up. At this time, the Eden area is empty. In this process, the mark cleaning algorithm and the mark copy algorithm are used.

  • When Eden cannot be released, minorGC will be triggered again. Just like the previous step, mark it first. At this time, there may be garbage objects in Eden and S0 areas, but S1 area is empty. At this time, the objects in the Eden and S0 areas will be directly moved to the S1 area, and then the garbage objects in the Eden and S0 areas will be cleaned up. After this round of MinorGC, the Eden and S0 areas are empty.

  • As the program runs, the Eden space will be allocated, and the MinorGC process will be repeated. However, at this time, the S0 area is empty, and the S0 and S1 areas will be interchanged. At this time, the surviving objects will be transferred from Eden and S1 area, move to S0 area. Then the garbage in Eden and S1 areas will be cleared, and after this round is completed, these two areas will be empty.

  • During the running of the program, although most objects will die quickly, there are still some objects that survive for a long time. For these objects, repeated movement in the S0 and S1 areas will cause a certain performance overhead and reduce the efficiency of the GC. . Therefore the behavior of object promotion was introduced.

  • When an object is between the Eden, S0, and S1 areas of the new generation, each time it moves from one area to another, its age will be increased by one. After reaching a certain threshold, if the object is still alive, the object will be promoted. to the old age.

  • If the old generation is also allocated, MajorGC (i.e. Full GC) will occur. Since there are usually many objects in the old generation, the mark-complement algorithm takes a long time, so the STW phenomenon will occur. Therefore, most applications will try their best to Reduce or avoid the causes of Full GC.

References, books and links

1. The operating mechanism and principle of the JVM classic garbage collector - Kang Zhixing's Blog | kangzhixing Blog

2. "In-depth understanding of Java virtual machine"

3. "Algorithm and Implementation of Garbage Collection"

 

Guess you like

Origin blog.csdn.net/xiaofeng10330111/article/details/130456197