JVM (8)-Garbage Collection Algorithm and Garbage Collector

I. Overview

In-depth understanding of the java virtual machine wrote that "There is a high wall between Java and C++ surrounded by dynamic memory allocation and garbage collection technology. People outside the wall want to go in, but people inside the wall want to come out."

Java is already automated in dynamic memory allocation and recycling, but when it is necessary to troubleshoot various memory overflows and memory leaks, when garbage collection becomes a bottleneck for the system to reach higher concurrency, we need to "automated" these The technology implements the necessary monitoring and adjustments.

1.1 What is garbage

Garbage refers to an object that does not have any pointers in the running program. This object is garbage that needs to be recycled. If the garbage in the memory is not cleaned up in time, the memory space occupied by these garbage objects will remain until the end of the application, and the reserved space cannot be used by other objects. It may even cause memory overflow.

1.2 Memory overflow or memory leak

  • Memory overflow: When the program is applying for space, the memory is insufficient and an oom exception occurs. For example, a plate can only hold 5 apples, and forcibly giving him 6 apples will cause memory overflow
  • Memory leak: When the program is applying for memory space, it cannot release the space that has been applied for. After multiple memory leaks, serious memory problems will occur. For example, if the key of a cabinet is locked in the cabinet, the contents of the cabinet cannot be accessed.

1.3 Garbage collection area

The key area for garbage collection in the heap area, in terms of frequency:

  • Frequent collection of Young area
  • Less collection of old areas
  • Basically immobile Perm area

2. Determine whether the object is garbage

2.1 Reference counting method

Add a reference counter to the object. Whenever there is a reference to it, the value of the counter is increased by 1, and when the reference is invalid, the value of the counter is decreased by 1

This method is not used in Java, because the use of reference counting will cause circular references. For example, look at the following code:

public class ReferenceCountingGC {
    
    
    public ReferenceCountingGC instance = null;
    private Byte[] bytes = new Byte[1024 * 1024];

    public static void main(String[] args) {
    
    
        ReferenceCountingGC A = new ReferenceCountingGC();
        ReferenceCountingGC B = new ReferenceCountingGC();
        A.instance = B;
        B.instance = A;
        A = null;
        B = null;
        System.gc();    //手动进行GC
    }
}

Add -XX:+PrintGCDetails parameter to print GC information

image-20201225115237310

It can be seen that even if there is a circular reference, GC can still be performed, so this marking method is not used in Java

2.2 Root search algorithm

The basic idea of ​​this algorithm is to use a series of objects named "GC Roots" as the starting point, starting from these nodes and searching downwards. The path taken by the search is called the Reference Chain. When an object reaches the GC When Roots are not connected by any reference chain (in the words of graph theory, from GC Roots to the object is unreachable), it proves that the object is unavailable.

image-20201225115850736

For example, in the above figure, object5, object6, and object7 are judged as unreachable objects and considered garbage.

Then, the objects judged as GC Root include the following:

  • Reference objects in the virtual machine stack
  • Objects referenced by class static properties in the method area
  • Objects referenced by constants in the method area
  • Referenced objects in the local method stack

2.3 finalization mechanism

The Java language provides an object finalization mechanism to allow developers to provide custom processing logic before the object is destroyed. When the garbage collector finds that there is no reference to an object, that is, before the object is garbage collected, if there is a finalize() method and it has not been called, it will always call the finalize() method of the object first.

If this object needs to be executed finalize method. Then if this object is determined to be necessary to execute the finalize() method, then this object will be placed in a queue named F-Queue, and will be later automatically created by the virtual machine, with a low priority Execute in the Finalizer thread, which may cause the object to be resurrected, because the object can be returned to the GC Root reference chain when the finalize method is executed.

public class FinalizeEscapeGC {
    
    
    private static FinalizeEscapeGC finalizeEscapeGC = null;

    @Override
    protected void finalize() throws Throwable {
    
    
        super.finalize();
        System.out.println("finalize method invoke");
        //重新获得引用 对象复活
        FinalizeEscapeGC.finalizeEscapeGC = this;
    }

    public static void main(String[] args) throws InterruptedException {
    
    
        finalizeEscapeGC = new FinalizeEscapeGC();
        finalizeEscapeGC = null;
        System.gc();
        //线程休眠 让步给finalize线程
        Thread.sleep(500);
        if(finalizeEscapeGC == null){
    
    
            System.out.println("finalizeEscapeGC is not alive");
        }else{
    
    
            System.out.println("finalizeEscapeGC is still alive");
        }
        //只调用一次finalize方法
        finalizeEscapeGC = null;
        System.gc();
        Thread.sleep(500);
        if(finalizeEscapeGC == null){
    
    
            System.out.println("finalizeEscapeGC is not alive");
        }else{
    
    
            System.out.println("finalizeEscapeGC is still alive");
        }

    }
}

image-20201225134425587

Note: The finalization method of any object is only called once!

Three, garbage collection algorithm

3.1 mark-sweep algorithm

The algorithm is divided into two stages: marking and clearing. The marking stage uses the previous algorithm to determine whether the object is garbage. After marking the garbage, all the marked objects are collected in a unified manner, as shown in the figure below

image-20201225135314771

Advantages: It is the basis of the post-order algorithm, and the algorithm implementation is relatively simple

Disadvantages:

  • Time issue: not very efficient
  • Space problem: after the mark is cleared, a large number of discontinuous memory fragments will be generated, and too much space fragmentation will cause no space if large objects need to be allocated

3.2 Copy algorithm

It divides the available memory into two pieces of equal size according to the capacity, and only uses one of them at a time. When this block of memory is used up, the surviving objects are copied to another block, and then the used memory space is cleaned up at once. In this way, one piece of memory is reclaimed every time, and there is no need to consider complex situations such as memory fragmentation when memory is allocated.

image-20201225155317466

advantage:

  • No marking and removal process, simple implementation and efficient operation
  • After copying the past, ensure the continuity of the space, and there will be no "fragmentation" problem.

Disadvantages:

  • Reduced the memory space to half of the memory
  • If there are many surviving objects, a large number of objects need to be copied

The replication algorithm is generally used to recycle the new generation

3.3 Marking-Organizing

The first step is the same as the mark-sweep algorithm, but the next step is not to directly clean up the reclaimable objects, but to move the surviving objects to one end, and then directly clean up the memory outside the end boundary.

image-20201225155903016

advantage:

  • Eliminates the shortcomings of the scattered memory area in the mark-clear algorithm. When we need to allocate memory for a new object, the JVM only needs to hold a starting address of the memory.
  • Eliminates the high cost of halving memory in the copy algorithm

Disadvantages:

  • In terms of efficiency, the mark-to-sort algorithm is lower than the copy algorithm.
  • While moving the object, if the object is referenced by other objects, you also need to adjust the referenced address. · During the move, the user application needs to be suspended throughout the process. Namely: STW

3.4 Generational collection algorithm

There is no best algorithm, only a better algorithm. The previous algorithms have their own advantages and disadvantages, so there is a generational collection algorithm.

First of all, the life cycles of different objects are different, so different collection algorithms can be used for objects in different periods. The Java heap is divided into young and old generations, so that different recycling algorithms can be used according to the characteristics of each age. In order to improve the efficiency of recycling

Young generation, the characteristics of the young generation are that the area is relatively small compared to the old generation, the object life cycle is short, the survival rate is low, and the recycling is frequent. Therefore, the copy algorithm is adopted in the young generation, and the problem of low memory utilization of the copy algorithm is alleviated by the design of two survivors in hotspot.

The old age, the old age is characterized by a large memory area, a long object life cycle, and a high survival rate. Generally, a hybrid implementation of mark-sweep algorithm or mark-sort algorithm is adopted, and different GCs have different implementations.

Fourth, the garbage collector

Garbage collection algorithms are ideas, and specific garbage collectors are practitioners of these ideas. In the process of continuous update of the jdk version, the technology of the garbage collector is constantly updated and iterated. Here we only introduce the classic garbage collector before jdk8, and the garbage collector that appears in subsequent versions will be introduced in the new features of the corresponding version.

4.1 4 ways

There are 7 types of garbage collectors in jdk8 (not explained later, all versions are jdk1.8), and these 7 types can be divided into 4 ways

  • Serial Garbage Collector (Serial)
  • Parallel Garbage Collector (Parallel)
  • Concurrent Garbage Collector (CMS)
  • G1 garbage collector
4.1.1 Serial Garbage Collector (Serial)
  • Use single thread for garbage collection
  • Exclusive garbage collection.

When the serial collector performs garbage collection, the threads in the Java application need to be suspended (STW) and wait for the completion of the garbage collection, which will cause a poor user experience

4.1.2 Parallel Garbage Collector (Parallel)
  • Use multithreading for garbage collection
  • Exclusive garbage collection

Parallel Garbage Collector (Parallel) is mostly used on multi-core CPUs and has strong concurrency. Like serial Garbage Collector, Java applications must be stopped when it works, but its pause time is longer than that of Serial Garbage Collector. Shorter. But if in a single-threaded single-core CPU, its performance may be worse than the serial garbage collector

4.1.3 Concurrent Garbage Collector (CMS)

CMS is the abbreviation of Concurrent Mark Sweep, which means Concurrent Mark Sweep. It can be known from the name that it uses the mark-sweep algorithm. When the CMS collector is working, java applications are allowed to run, but in the specific steps of the CMS , Some steps still do not allow Java applications to run. but

On the whole, CMS is not exclusive

4.1.4 G1 Garbage Collector

The goal of the G1 collector is to be a server garbage collector. Therefore, it is expected to outperform the CMS collector in terms of throughput and pause control. Compared with the CMS collector, the G1 collector is based on the mark-compression algorithm. Therefore, it will not generate space fragments, and there is no need to perform an exclusive defragmentation after the collection is completed. The G1 collector can also perform very precise pause control

4.2 Classic Garbage Collector

The seven types of garbage collectors can be divided into garbage collectors in the new generation, garbage collectors in the old generation, and G1 collectors according to the classification of the scope.

  • The new generation garbage collector:

    • Serial collector
    • ParNew collector
    • Paraller Scavenge collector
  • Old age garbage collector:

    • CMS collector
    • Serial Old collector
    • Paraller Old collector
  • G1 collector

image-20210101200625872

The figure shows seven types of collectors that act on different generations. If there is a connection between the two collectors, it means that they can be used together. The area where the collector is located in the figure indicates that it belongs to the new generation collector. Or is it the old-age collector.

4.2.1 Serial collector (new generation)

The Serial collector is a single-threaded garbage collector. During garbage collection, all other worker threads must be suspended until the collection is over, which is STW, as shown in the figure below.

image-20210101201740761

The Serial collector is the most basic and oldest garbage collector, and it is also the most stable and most efficient collector. Serial collector or JVM virtual machine is running onIn client modeThe default young generation garbage collector.

Note here: it is in client mode, and most of the current machines are in server mode

image-20210101202150970

You can also use the corresponding parameters to force the Serial collector to turn on. The JVM parameters are:

  • -XX:+UseSerialGC

But note that because JVM uses a generational collection algorithm, using -XX:+UseSerialGC to open the new generation is the Serial collector, the old generation will use the combination of the Serial Old collector by default

//-Xms10m -Xmx10m -XX:+PrintGCDetails -XX:+PrintCommandLineFlags -XX:+UseSerialGC
public static void main(String[] args) throws InterruptedException {
    
    
    System.out.println("*************");
    Byte[] bytes = new Byte[50 * 1024 * 1024];
}

image-20210101203745282

4.2.2 ParNew collector (new generation)

The ParNew collector is essentially a multi-threaded parallel version of the Serial collector. The most common application scenario is to cooperate with the GC work of the old CMS. The rest is the same as the Serial collector. The ParNew collector also needs to be suspended during the garbage collection process. All other worker threads.It is the default garbage collector of the new generation of many java virtual machines running in service mode

image-20210101204306911

Turn on the parameters:

  • -XX:+UseParNewGC

image-20210101205405651

When using the above parameters, the new generation will use the ParNew collector, and the old generation will use the Serial Old collector, but the output console also prints a red log as shown in the figure above. The general idea is that the combination of ParNew + Serial Old will be cancelled in the future version, which corresponds to the initial connection diagram. The combination of ParNew + Serial Old is no longer recommended in the jdk9 version, and it has been Cancel the -XX:+UseParNewGC parameter directly

image-20210101210154229

4.2.3 Paraller Scavenge collector (new generation)

Paraller Scavenge collector is similar to ParNew and is also a new-generation garbage collector. It uses a replication algorithm and is a multi-threaded parallel garbage collector. But the difference between Paraller Scavenge and ParNew is that the goal of Paraller Scavenge is to achieve a controllable throughput. , The definition of throughput is as follows:

Throughput = time to run user code / (time to run user code + time to run garbage collection)

image-20210102102122700

The adaptive adjustment strategy of the Paraller Scavenge collector is also an important difference from the ParNew collector, that is: the virtual machine collects performance monitoring information according to the current system operation, and dynamically adjusts these parameters to provide an appropriate pause time or maximum throughput

Turn on the parameters:

  • -XX:+UseParallerGC

In the environment of jdk1.8 server, the default garbage collector is the Paraller Scavenge collector for the new generation and the Paraller Old collector for the old generation.

image-20210102103230141

The above picture is the information printed without using the -XX:+UseParallerGC parameter. It can be seen that the garbage collector used by default is Paraller Scavenge + Paraller Old

The Paraller Scavenge collector has some adaptively adjusted parameters:

  • -XX: MaxGCPauseMillis: control the maximum garbage collection pause time. The value of this parameter needs to be greater than 0ms, but it's not that the time adjustment of this parameter will make the garbage collection speed of the system faster. The garbage collection pause time is shortened at the expense of throughput and new generation space.

  • -XX: GCTimeRatio: Set the throughput size, which is an integer greater than 0 and less than 100, that is, the ratio of garbage collection time to the total time, which is equivalent to the reciprocal of throughput. For example, if this parameter is set to 19, the maximum garbage allowed The collection time accounts for 5% of the total time (ie 1/(1+19)), and the default value is 99, which allows a maximum of 1% (ie 1/(1+99)) garbage collection time

4.2.4 Serial Old Collector (Old Generation)

The Serial Old collector is the old version of the Serial collector. It is also a single-threaded collector, using the mark-and-sort algorithm. The main meaning of this collector is also for the HotSpot virtual machine in client mode.

If it is in the server mode, it may also have two purposes: one is to be used with the Parallel Scavenge collector in JDK 5 and earlier versions, and the other is to be used as a backup plan when the CMS collector fails. Concurrent collection is used when Concurrent Mode Failure occurs.

image-20210102110113135

The opening parameters are the same as the Serial collector

4.2.5 Paraller Old collector (old generation)

Parallel Old is the old version of the Parallel Scavenge collector. It supports multi-threaded concurrent collection and is based on the mark-sort algorithm.

image-20210102110628693

4.2.6 CMS collector (old generation)

The CMS (Concurrent Mark Sweep) collector is a collector whose goal is to obtain the shortest recovery pause time. At present, a large part of Java applications are concentrated on the server side of Internet websites or browser-based B/S systems. Such applications usually pay more attention to the response speed of the service, and hope that the system pause time is as short as possible to bring users Good interactive experience. The CMS collector is very suitable for the needs of such applications.

The CMS collector is divided into 4 stages:

  • Initial mark: The initial mark is still STW at this stage, but the initial mark is only to mark the objects that the GC Root can directly associate with (reachability analysis algorithm,)
  • Concurrent marking: Concurrent marking refers to the process of traversing the entire object graph to find garbage from the initially marked object. This stage takes a long time, but there is no need to pause the user thread.
  • Re-marking: Re-marking is to mark the object generated by the user thread in the previous stage because of the concurrent process. This stage is still STW
  • Concurrent cleanup: clean up garbage objects that have been judged dead in the marking phase. This process can also be concurrent with user threads.

image-20210102115404020

It can be seen from the above four stages that the CMS collector can be parallel to the user thread in the more time-consuming stages, so on the whole, the memory garbage collection process of the CMS is concurrent with the user thread.

Turn on the parameters:

  • -XX:+UseConcMarkSweepGC

image-20210102120526147

It can be seen that if -XX:+UseConcMarkSweepGC is set, then JVM will use the ParNew + CMS combination

Disadvantages:

  • The CMS collector is very sensitive to CPU resources, because in the concurrent phase, the need to occupy a part of the user thread will slow down the application and reduce the total throughput. In order to alleviate this situation, the virtual machine provides a method called "incremental concurrency". The "collector" (Incremental Concurrent Mark Sweep/i-CMS) is a variant of the CMS collector. What it does is the same as the idea of ​​preemptive multi-tasking to simulate multi-core parallel multi-tasking by PC operating systems in the single-core processor era. During concurrent marking and cleaning, the collector thread and user thread are allowed to run alternately to minimize the time of the garbage collection thread's exclusive resources, so that the entire garbage collection process will be longer, but the impact on the user program will be less , The intuitive feeling is that the speed slows down for more time, but the speed drop is not so obvious

  • The CMS collector cannot handle floating garbage, because when the CMS collector is working in the concurrent phase, the user thread may generate garbage again, resulting in the garbage generated this time can only be cleaned up next time

  • The CMS collector is a collector based on the mark-sweep algorithm, which will generate a lot of space fragmentation

4.2.7 G1 Garbage Collector

The G1 garbage collector is a milestone change for the previous six collectors. The previous garbage collector:

  • The young generation and the old generation are independent and contiguous memory blocks
  • Young generation collection uses replication algorithm
  • Old generation collection must scan the entire old generation area

The G1 garbage collector is a server-oriented collector. Its design goal is to replace the CMS collector. Therefore, the G1 garbage collector has the advantages of the CMS collector, but compared to the CMS collector, it has the following aspects Better performance:

  • The G1 collector does not generate a lot of memory fragmentation
  • The STW of G1 is more controllable, and a prediction mechanism is added to the pause time, and the user can specify the expected pause time

The previous garbage collector divides the continuous memory space into the new generation and the old generation, the meta space, the characteristic of this division is that the allocated memory is continuous

image-20210103192314051

However, the storage space of each generation of the G1 collector is discontinuous. Each generation uses n discontinuous regions of the same size. Each region occupies a continuous memory address. That is to say, G1 divides the memory space into every A region of the same size, and then these regions can store the new generation or the old generation

image-20210103192601750

In addition, some regions in the above figure are marked with H, which stands for Humongous, which means that these regions store huge objects.

The G1 garbage collector is divided into 4 steps:

  • Initial mark: Only mark the objects that GC Root can directly associate, and the user thread needs to be paused

  • Concurrent marking: starting from the GC Root, the reachability analysis of the objects in the heap is carried out, and the object graph in the entire heap is scanned recursively to find the objects to be recycled. This stage takes a long time, but it can be executed concurrently with the user program. After the object graph scan is completed, the objects recorded by SATB that have reference changes during concurrency must be reprocessed

  • Final mark: During the correction of concurrent mark, the part of the object whose mark has changed due to the running of the program needs to pause the user thread

  • Screening and recycling: according to time to maximize the value of recycling

image-20210103193616808

Turn on the parameters:

  • -XX:+UseG1GC

616632107059)]

In addition, some regions in the above figure are marked with H, which stands for Humongous, which means that these regions store huge objects.

The G1 garbage collector is divided into 4 steps:

  • Initial mark: Only mark the objects that GC Root can directly associate, and the user thread needs to be paused

  • Concurrent marking: starting from the GC Root, the reachability analysis of the objects in the heap is carried out, and the object graph in the entire heap is scanned recursively to find the objects to be recycled. This stage takes a long time, but it can be executed concurrently with the user program. After the object graph scan is completed, the objects recorded by SATB that have reference changes during concurrency must be reprocessed

  • Final mark: During the correction of concurrent mark, the part of the object whose mark has changed due to the running of the program needs to pause the user thread

  • Screening and recycling: according to time to maximize the value of recycling

[External link image is being transferred...(img-1YDMKNeH-1616632107060)]

Turn on the parameters:

  • -XX:+UseG1GC

image-20210103194024541

Guess you like

Origin blog.csdn.net/weixin_44706647/article/details/115193435