Talking about garbage collection of virtual machine

I recently studied in the "In-depth Dismantling of the Java Virtual Machine" class at Geek Time, so I recorded the study notes, and thought about and expanded the related content.

1. How to distinguish whether an object is alive or dead?

There are two calculation methods for identification: reference counting and reachability analysis.

1.1 Reference counting method

Implementation: Add a reference counter to each object to count the number of references pointing to the object. Once the reference counter of an object is 0, it means that the object is dead and can be recycled.

Disadvantages:
1. Need extra space to store the counter, and tedious update operations

2. There is a major loophole in the reference counting method, that is, it cannot handle circular reference objects.

1.2 Accessibility analysis

Implementation method: This is the mainstream garbage collector. The essence of this algorithm is to take a series of GC Roots as the initial live set (live set), and then start from the set, explore all objects that can be referenced by the set, and add them to the set. In this process, we Also called a mark. In the end, the unexplored object is dead and can be recovered.

GC Roots include (but are not limited to) the following:

Local variables in the Java method stack frame;

Static variables of the loaded class;

JNI handles;

A Java thread that was started and not stopped.

Disadvantages: For example, in a multi-threaded environment, other threads may update references in objects that have already been accessed, resulting in false positives (set the reference to null) or false negatives (set the reference to an unvisited object ). There is no harm in false positives, and the Java virtual machine loses at most some opportunities for garbage collection. False reporting is more troublesome, because the garbage collector may reclaim the memory of objects that are still being referenced. Once the object that has been recycled is accessed from the original reference, it is likely to directly cause the Java virtual machine to crash.

2. How to solve this problem?

Stop-the-world in the Java virtual machine is implemented through the safepoint mechanism. When the Java virtual machine receives a Stop-the-world request, it will wait for all threads to reach a safe point before allowing the thread requesting Stop-the-world to perform exclusive work. And how to implement it in detail will not be said one by one. In summary, in each method a safe point, then call a method, or complete a method, it will trigger the safety point check, if the safety point check finds that there is a need to stop garbage collection, the thread will block first. And the size of the check is the method, why not every bytecode, the main problem is the overhead.

3. How to perform garbage collection?

When all the surviving objects are marked, we can proceed with the recycling of dead objects. The mainstream basic recycling methods can be divided into three types.

3.1 Clear (sweep)

Implementation: Mark the memory occupied by the dead object as free memory and record it in a free list. When a new object needs to be created, the memory management module will look for free memory from the free list and divide it into the newly created object.

Disadvantages:

1) Will cause memory fragmentation. Since the objects in the heap of the Java virtual machine must be continuously distributed, there may be an extreme situation where the total free memory is sufficient but cannot be allocated.

2) The distribution efficiency is low. If it is a contiguous memory space, then we can allocate it through pointer bumping. For the free list, the Java virtual machine needs to access the items in the list one by one to find the free memory that can be put into the newly created object.

3.2 Compression (compact)

Implementation: Gather the surviving objects to the beginning of the memory area, leaving a continuous memory space. This approach can solve the problem of memory fragmentation

Disadvantages: the performance overhead of the compression algorithm is large, a large number of objects need to be migrated, and the process consumes a lot of resources and time.

3.3 Copy

Implementation: Divide the memory area into two equal parts, use two pointers from and to to maintain respectively, and only use the memory area pointed to by the from pointer to allocate memory. When garbage collection occurs, the surviving objects are copied to the memory area pointed to by the to pointer, and the contents of the from pointer and the to pointer are exchanged. Copying this recycling method can also solve the problem of memory fragmentation.

Disadvantages: The use of space is extremely inefficient. Explain that only half of it can be used, usually empty.

4. Introduction to JVM Garbage Collection

The Java virtual machine can use different recycling algorithms for different generations. For the new generation, we guess that most of the Java objects only survive for a short period of time, then we can frequently use short-time garbage collection algorithms to allow most of the garbage to be collected in the new generation. For the old generation, we guess that most of the garbage has been collected in the young generation, and objects in the old generation have a high probability of continuing to survive. When the collection for the old generation is actually triggered, it means that this assumption is wrong, or the heap space has been exhausted.

5. Java virtual machine heap division

The Java virtual machine divides the heap into a new generation and an old generation. Among them, the new generation is divided into Eden area and two Survivor areas of the same size. As shown below:

image

By default, the Java virtual machine adopts a dynamic allocation strategy (corresponding to the Java virtual machine parameter -XX:+UsePSAdaptiveSurvivorSizePolicy), which dynamically adjusts the ratio of the Eden area and the Survivor area according to the rate of generated objects and the usage of the Survivor area .

-XX:SurvivorRatio to fix this ratio. But it should be noted that one of the Survivor areas will always be empty, so the lower the ratio, the higher the wasted heap space.

6. The mechanism of garbage collection

6.1 Minor GC

When the Minor GC is triggered, when the Eden area is full, what do you do with the mark-copy algorithm? See the flow chart below:

image

Each of the above judgments actually has parameters that can be configured and set, which is omitted here.

Minor GC benefits:

1) The objects in the Eden area are basically cleared after only one use, so only the surviving objects need to be marked for copying. In fact, clearing is very efficient.

2) There is no need to garbage collect the entire heap. It only cleans the new generation, which is highly efficient.

Problems:

That is, objects in the old generation may refer to objects in the new generation. In other words, when marking live objects, we need to scan objects in the old age. If the object has a reference to the new generation object, then this reference will also be used as GC Roots. Therefore, the concept of Full GC was introduced.

6.2 Full GC

Full trigger timing: When an object is promoted to the old age, it will be triggered if it finds that the old age is insufficient or has insufficient space. Insufficient space for permanent generation. Or when an object is created in the persistent generation and it is found that there is insufficient memory, but now the usual persistent generation is in the system memory, so there is usually less memory shortage.

Implementation process: Different garbage collectors use different algorithms. The main process is to create objects in the old age and find that the memory is insufficient. Then they will recycle the objects in the area that are not reachable. After the recovery, if the memory is still insufficient, it cannot Save the object, this time an error memory leak will be reported.

Usually sending full gc is a serious warning for the program, and more attention is paid to prevent some behaviors that change the full gc together, thereby ensuring the normal operation of the system.

7. Garbage Collector

Common garbage collectors: Serial GC, ParNew GC, Parrallel GC, CMS GC, G1 GC

7.1 Serial GC

It is the oldest garbage collector, "Serial" is reflected in its collection work is single-threaded, and in the process of garbage collection, it will enter the infamous "Stop-The-World" state. Of course, its single-threaded design also means streamlined GC implementation, no need to maintain complex data structures, and simple initialization, so it has always been the default option for JVM in Client mode.

7.2 ParNew GC

It is a new generation GC implementation. It is actually a multi-threaded version of Serial GC. The most common application scenario is to work with the old CMS GC.

7.3 Parallel GC

It is the default GC selection for the server mode JVM, and is also known as throughput-first GC. Its algorithm is similar to Serial GC, although the implementation is much more complicated, its characteristic is that the new generation and the old generation GC are performed in parallel, which is more efficient in common server environments.

7.4 CMS GC

Based on the Mark-Sweep algorithm, the design goal is to minimize the pause time. This is very important for response time sensitive applications such as the Web. To this day, many systems still use CMS GC. However, the mark-and-sweep algorithm adopted by CMS has the problem of memory fragmentation, so it is difficult to avoid the occurrence of full GC under conditions such as long-term operation, resulting in bad pauses. In addition, since Concurrent is emphasized, CMS will take up more CPU resources and compete with user threads. CMS has been marked as deprecated in JDK 9

7.5 G1 GC

G1 GC This is a GC implementation that takes into account throughput and pause time. It is the default GC option after Oracle JDK 9. G1 can intuitively set the goal of the pause time. Compared with CMS GC, G1 may not be able to achieve the delayed pause of CMS in the best case, but it is much better in the worst case. Therefore, it is a good choice to choose G1 GC as the server's garbage collector in many cases. So here is a detailed introduction to C1 GC.

7.5.1 First of all, our basic understanding of GC1

Region : The storage addresses of each generation of G1 are discontinuous. Each generation uses n discontinuous regions of the same size, and each region occupies a continuous virtual memory address.

As shown below:

image

The memory structure of G1 is not only different from the traditional structure, but you can also see that there is one more structural module than the traditional one, which is the H in the figure. Its full name is Humongous, which literally means huge, and these Regions are used To store huge objects. That is, when the size of an object exceeds a certain threshold, it will be marked as Humongous. According to our tradition, such large objects will be directly arranged to the old generation.

RSet: Full name Remembered Sets, used to record all external references to this Region. Each Region maintains an RSet. The advantage of using such a recording method is that you don't need to perform a full heap scan when reclaiming a Region, you only need to check its RS to find external references, and these references are one of the roots of the initial mark.

Card: JVM divides memory into fixed-size Cards. Here you can compare the concept of page on physical memory.

Card Table: If a thread modifies the reference inside the Region, it must notify the RS to change the record. In order to achieve this goal, G1 recycler introduced a new structure, CT (Card Table)-card table. Each card uses a Byte to record whether it has been modified. The card table is a collection of these bytes.

The relationship between Region, Card, Card Table, and Rset is as follows:

image

7.5.2 C1 GC mode

Which mainly includes: Young GC, Mixed GC, Full GC.
When the Eden area is insufficient in memory, Young GC will occur. When the percentage of the old age in the total heap size reaches the threshold, Mixed GC will occur. And when the garbage collection process of G1 is executed concurrently with the application, when the speed of Mixed GC cannot keep up with the speed of application for memory, Mixed G1 will be downgraded to Full GC, using Serial GC.

Here is a detailed talk about Mixed GC.

  1. What is Mixed GC?
    Reclaim all regions of the young generation + some regions of the old generation. The recycling part of the old generation is the parameter -XX:MaxGCPauseMillis, which is used to specify the target pause time of a G1 collection process. The default value is 200ms. Of course, this is only an expected value. The power of G1 is that it has a pause prediction model (Pause Prediction Model), he will select some regions selectively, to try to meet the pause time.

  2. What are the trigger conditions for Mixed GC?
    The triggering of Mixed GC is also controlled by some parameters. For example, XX:InitiatingHeapOccupancyPercent represents the percentage of the old age to the total heap size. The default value is 45%. When this threshold is reached, a Mixed GC will be triggered.

  3. What is the process of Mixed GC?
    Mixed GC can be divided into two stages:
    Global Concurrent Marking
    Global Concurrent Marking can be divided into five stages as follows:
    Initial Marking: Marking the objects that are directly reachable from the GC Root. The initial marking phase borrows the pause of the young GC, so there is no additional, separate pause phase.
    Concurrent marking: In this stage, the objects in the heap are marked from the GC Root, the marking thread is executed in parallel with the application thread, and the live object information of each Region is collected.
    Final mark: Mark those objects that have changed during the concurrent mark phase and will be recycled.
    Clean up garbage: If a region with no live objects is found at this stage, it will be reclaimed as a whole into the list of allocatable regions. Clear empty regions.
    The
    Evacuation phase of copying live objects is fully suspended. It is responsible for copying a part of the live objects in the region to an empty region (parallel copy), and then reclaiming the original region space. In the Evacuation phase, you can freely select any number of regions to independently collect and form a collection set (collection set, CSet for short). The selection of the Region in the CSet collection depends on the pause prediction model mentioned above. This stage does not evacuate all living objects. Only select a few regions with high revenue to evacuate, and the overhead of this suspension can be controlled (within a certain range).

  4. What are the effects of Mixed GC?
    Mixed GC can mainly clean up and recycle useless objects in the old area early, reducing the backlog of useless objects, causing full GC, and causing object program pauses.

Thank you very much for learning articles:

1. What are garbage collection?

2. Some key technologies of Java Hotspot G1 GC

3. G1 from entry to giving up

4. Detailed explanation of G1 garbage collection

5. Geek Time Class-"In-depth Disassembly of the Java Virtual Machine"

Guess you like

Origin blog.csdn.net/vipshop_fin_dev/article/details/108039608