[In-depth understanding of JVM] G1 garbage collector

The G1 collector is a garbage collector that runs on the server side, specifically for machines with multi-core processors and large memory. It was officially launched when the JDK 7u4 version was released, and it was designated as the official GC collector in JDK9. It satisfies high throughput while satisfying the GC pause time as short as possible. The G1 collector is specifically designed for the following application scenarios

  • Can run concurrently with the application like a CMS collector
  • Compress free memory fragments without the need for lengthy GC pauses
  • Better predictions can be made for GC pauses
  • Don't want to sacrifice a lot of throughput performance
  • No need for a larger Java Heap

From a long-term plan, G1 aims to replace CMS. Compared with CMS, there are several differences that make G1 a better solution for GC.

  • The first point: G1 will compress free memory to make it compact enough. The approach is to use regions instead of fine-grained free lists for allocation to reduce memory fragmentation.
  • The second point: G1's STW is more controllable. G1 adds a prediction mechanism to the pause time, and the user can specify the desired pause time.

1. Why learn G1

G1 (Garbadge First Collector), as the latest garbage collector of JVM, can solve the Concurrent Mode Failed problem in CMS, minimize the pause in processing super large heaps, and complete memory compression during garbage collection in G1 to reduce the generation of memory fragments. G1 shows relatively high throughput and short pause time when the heap memory is relatively large, and has become the default collector of Java 9. It is only a matter of time to replace CMS in the future.

2. G1's GC principle

In G1, the heap is divided into heap regions of equal size, generally more than 2,000, and these regions are logically continuous. Each region will be marked with a unique generation mark (eden, survivor, old). Logically, eden regions constitute Eden space, survivor regions constitute Survivor space, and old regions constitute old space.

-XX:NewRatio=nConfigure the ratio of the new generation to the old generation through command line parameters , the default is 2, that is, the ratio is 2:1; -XX:SurvivorRatio=nyou can configure the ratio of Eden to Survivor, and the default is 8.

The operation mode of G1 during GC is similar to that of CMS. There will be a concurrent global marking phase process to determine the survival of objects in the heap. After the concurrent marking is completed, G1 knows which regions have more free space (more reclaimable objects), and preferentially reclaims these empty regions to release a large amount of free space. This is why this garbage collection method is called G1 (Garbage-First).

G1 concentrates its collection and compression activities on areas in the heap that may be full of recyclable objects (ie garbage), uses a pause prediction model to meet user-defined pause time goals, and selects the number of areas to be collected based on the specified pause time goals.

It should be noted that G1 is not a real-time collector. It can meet the set pause time goal with a high probability, but it is not absolutely certain. Based on the previously collected data, G1 estimates how many areas can be collected within the target time specified by the user. Therefore, the collector has a fairly accurate model of the cost of the collection area, and it uses this model to determine which areas and how many areas to collect within the pause time target.

2.1 Region in G1

The size of each Region in G1 is fixed and equal. The size of the Region can be set by the parameter -XX:G1HeapRegionSize, the value range is from 1M to 32M, and it is an exponent of 2. If not set, G1 will be automatically determined according to the Heap size.

Decision logic:

size =(堆最小值+堆最大值)/ TARGET_REGION_NUMBER(2048) ,然后size取最靠近2的幂次数值, 并将size控制在[1M,32M]之间。

The memory structure of G1 is quite different from the traditional memory space division. G1 divides the memory into multiple regions of equal size (the default is 512K), the regions are logically continuous, and the physical memory addresses are not continuous. At the same time, each Region is marked as E, S, O, H, representing Eden, Survivor, Old, and Humongous respectively. Among them, E and S belong to the young generation, and O and H belong to the old generation.
The schematic diagram is as follows:

 

  • H stands for Humongous. It can be understood literally to represent large objects (hereinafter referred to as H objects). When the allocated object is greater than or equal to half the size of the Region , it will be considered a huge object. H objects are allocated in the old generation by default, which can prevent memory copying of large objects during GC. If it is found that the heap memory cannot accommodate the H object, a GC operation will be triggered.

2.2 Cross-generation references

During Young GC, objects in the Young area may still have references to the Old area. This is the problem of cross-generation references . In order to solve the problem of Young GC and scan the entire old age, G1 introduced the concept of Card Tablesum Remember Set. The basic idea is to use space for time. These two data structures are specifically used to handle references from the Old area to the Young area. The references from Young area to Old area do not need to be processed separately, because the objects in the Young area themselves have changed a lot, so there is no need to waste space to record them.

  • RSet: Full name Remembered Sets, used to record all external references to this Region. Each Region maintains an RSet.
  • Card: JVM divides memory into fixed-size Cards. Here can be analogous to the concept of page on physical memory.

RS (Remember Set) is an abstract concept used to record a collection of pointers from the non-collection part to the collection part.
In the traditional generational garbage collection algorithm, RS (Remember Set) is used to record pointers between generations. In the G1 collector, RS is used to record pointers from other regions to a region. Therefore, one Region will have one RS. This kind of recording can bring a great benefit: when reclaiming a Region, you don't need to perform a full heap scan, you only need to check its RS to find external references, and these references are one of the roots of the initial mark.

Then, if a thread modifies the reference inside the Region, it must notify the RS to change the record. In order to achieve this goal, G1 recycler introduced a new structure, CT (Card Table)-card table. Each Region is divided into several cards of fixed size (Card). Each card uses a Byte to record whether it has been modified. The card table is a collection of these bytes. In fact, if RS is understood as a conceptual model, then CT can be said to be an implementation of RS.

Concurrency problems will also be encountered in the modification of RS. Because a Region may be modified concurrently by multiple threads, they will also modify RS concurrently. In order to avoid such a conflict, the G1 garbage collector further divides RS into multiple hash tables. Each thread is modified in its own hash table. Ultimately, logically speaking, RS is a collection of these hash tables. Hash table is one of the usual ways to implement RS. It has a great advantage that it can remove duplication. This means that the size of RS will be equivalent to the number of modified pointers. Without deduplication, the number of RSs is equivalent to the number of write operations.

The name of the dashed table of RS in the figure is that RS is not a separate and different data structure from Card Table, but refers to RS as a conceptual model. In fact, Card Table is an implementation of RS. Regarding the maintenance of the RSet structure, you can refer to this article , and I won't go into too much depth here.

2.3 SATB

SATB (snapshot-at-the-beginning) is a technology originally used for real-time garbage collectors. The G1 garbage collector uses this technology to record a snapshot of a live object in the marking phase ("logically takes a snapshot of the set of live objects in the heap at the start of marking cycle"). However, in the concurrent marking phase, the application may modify the original reference, such as deleting an original reference. This will cause the snapshots of live objects after the end of concurrent marking to be inconsistent with SATB. G1 solves this problem by introducing a write barrier in the concurrent marking phase: whenever there is a reference update, G1 will write the value before the modification to a log buffer (this record will filter out the original null reference) , Scan SATB in the final marking phase to correct SATB errors.
First, we will introduce the three-color marking algorithm.

  • Black: The root object, or the object and its child objects are scanned
  • Gray: The object itself is scanned, but the sub-objects in the object have not been scanned yet
  • White: Objects that have not been scanned. After scanning all objects, the final white objects are unreachable objects, that is, garbage objects.

(1) The scanning process of the three-color marking method GC Root is as follows:

  • The colors before GC scan C are as follows:

  • In the concurrent marking phase, the application thread changes this reference relationship
A.c=C
B.c=null

The following results are obtained.

  • The scan results in the re-marking phase are as follows

In this case, C will be collected as garbage. The surviving objects of Snapshot were originally A, B, and C, but now they have become A and B. The integrity of Snapshot has been destroyed. Obviously, this approach is unreasonable.
What G1 uses is to pre-write barrier(写屏障)solve this problem. Simply put, in the concurrent marking phase, when the reference relationship changes, the pre-write barrierfunction will record and save this change in a queue, which is called in the JVM source code satb_mark_queue. In the remark phase, the queue will be scanned. In this way, the object pointed to by the old reference will be marked, and its descendants will also be marked recursively, so that no objects are missed, and the integrity of the snapshot is also It is guaranteed.

3.G1's GC mode

3.1 Young GC

Young GC reclaims all regions of the young generation. It will trigger when the E area can no longer allocate new objects . The objects in area E will be moved to area S. When the space in area S is not enough, the objects in area E will be directly promoted to area O, and the data in area S will be moved to the new area S. If some objects in area S reach a certain age , Will be promoted to area O.
The schematic diagram of Yung GC process is as follows:

(1) Does YGC need to scan the entire old age?

We know that judging whether an object is alive needs to start from the GC ROOTS node, and the objects reachable from the GC ROOTS node are alive. In YGC, the objects in the old generation are not collected, which means that the objects in the old generation should be included in the GC ROOTS. However, scanning the entire old age will be time-consuming and will inevitably affect the performance of the entire GC! . Therefore, the structure of Card Table is used in CMS, which records the references from the old generation to the new generation. The structure of the Card Table is a continuous byte[] array. The time to scan the Card Table is much smaller than the cost of scanning the entire old age! G1 also refers to this idea, but adopts a new data structure, Remembered Set, or Rset for short. RSet records the relationship between objects in other regions referencing objects in this region, and belongs to the points-into structure (who references my objects). The Card Table is a points-out (whose object do I refer to), and each Card covers a certain range of Heap (usually 512Bytes). G1's RSet is implemented on the basis of Card Table: each Region will record that other Regions have pointers to itself, and mark the range of which Cards these pointers are within. This RSet is actually a Hash Table, Key is the starting address of other Regions, Value is a set, and the elements inside are Index of Card Table. Each Region has a corresponding Rset .

How does RSet assist GC? When doing YGC, you only need to select the RSet of the young generation region as the root set. These RSets record old->young cross-generation references, avoiding scanning the entire old generation. In mixed gc, the old->old RSet is recorded in the old generation, and the young->old reference is obtained by scanning all the young generation regions, so there is no need to scan all the old generation regions. Therefore, the introduction of RSet greatly reduces the workload of GC.

Therefore, YGC in G1 does not need to scan the entire old generation, just scan Rset to know which objects in the new generation are referenced by the old generation.

3.2 Mixed GC

The MIXGC in G1 selects all regions in the new generation, plus a number of old-generation regions with high collection revenue based on global concurrent marking statistics, and selects the old-generation regions with high revenue as much as possible for recycling within the cost target range specified by the user. Therefore, the memory area reclaimed by MIXGC is the young generation + the old generation.

Before introducing MIXGC, we need to understand global concurrent marking, global concurrent marking. Because old generation recycling depends on this process.
1. Global concurrent marking (global concurrent marking)
Global concurrent marking can be further subdivided into the following steps:

  • Initial mark (STW) . Initial Mark is an STW event, and its completed work is to mark objects directly reachable by GC ROOTS. And push their fields into the marking stack and wait until subsequent scans. G1 uses an external bitmap to record mark information, instead of using the mark bit in the mark word of the object header. Because of STW, YGC usually uses STW of YGC to start Initial Mark by the way, that is, start global concurrent marking, which is logically independent from YGC.
  • Root Region Scanning: Root Region Scanning starts from the objects in the Survior area, marks the objects that are referenced to the old generation , and pushes their fields into the marking stack until subsequent scans. Unlike Initial Mark, Root Region Scanning does not require STW to run concurrently with the application. Root Region Scanning must be completed before YGC starts.
  • Concurrent Marking . No STW is required. Constantly fetch references from the scan stack to recursively scan the objects in the entire heap. Every time an object is scanned, it will be marked, and its fields will be pushed onto the scan stack. Repeat the scanning process until the scan stack is emptied. In the process, the references recorded by the SATB write barrier are also scanned. Concurrent Marking can be interrupted by YGC
  • Final mark (Remark, STW) . STW operation. After concurrency marking is completed, each Java thread will have some remaining SATB write barrier (write barrier) record references that have not yet been processed. This stage is responsible for processing the remaining references. At the same time, weak reference processing (reference processing) is also performed at this stage. Note that there is an essential difference between this pause and the remark of CMS, that is, this pause only needs to scan the SATB buffer, while the remark of CMS needs to rescan the dirty card in the mod-union table plus the entire root set, and at this time the entire young gen (Regardless of whether the object is dead or alive) will be regarded as part of the root collection, so CMS remark may be very slow.
  • Cleanup (part of STW) . STW operation, count out Regions with surviving objects and Regions (Empty Region) without surviving objects, STW operation, update Rset; Concurrent operation, collect Empty Regions into the allocatable Region queue.

After global concurrent marking, the collector knows which regions have surviving objects. And collect those completely recyclable Regions (without surviving objects) and add them to the allocatable Region queue to realize the recycling of this part of memory. For Regions with surviving objects, G1 will find out the regions with the highest revenue and cost not exceeding the upper limit specified by the user to reclaim the objects according to the statistical model. The collection of these selected and recycled Regions is called collection set, or Cset for short!

  • The Cset in MIXGC is to select all the regions in the young gen, plus a number of old gen regions with high collection revenue based on global concurrent marking statistics.
  • Cset in YGC is to select all regions in young gen. The cost of young GC is controlled by controlling the number of regions of young gen.
  • Both YGC and MIXGC use multi-threaded replication and cleanup, and the entire process will be STW. The low-latency principle of G1 is that its reclaimed area becomes precise and the range becomes smaller.

2. Copying live objects (Evacuation) The
Evacuation phase is fully suspended. It is responsible for copying a part of the live objects in the region to an empty region (parallel copy), and then reclaiming the original region space. In the Evacuation phase, you can freely select any number of regions to independently collect and form a collection set (collection set, CSet for short). The selection of the Region in the CSet collection depends on the pause prediction model mentioned above . This stage does not evacuate all living objects. Only select a small number of regions with high revenue to evacuate, and the overhead of this suspension can be controlled (within a certain range).

The schematic diagram of the cleaning process of Mixed GC is as follows:

3.2 Full GC

The garbage collection process of G1 is executed concurrently with the application. When the speed of Mixed GC cannot keep up with the speed of application of memory, Mixed G1 will be downgraded to Full GC, using Serial GC. Full GC will cause long-term STW, which should be avoided as much as possible.
There may be two reasons for G1 Full GC:

  1. There is not enough to-space to store the promoted object during Evacuation;
  2. Run out of space before concurrent processing is complete

4. Working principle

From the highest level, the collector side of G1 is actually two major parts:

  • Global concurrent marking (global concurrent marking)
  • Copy survival objects (evacuation) and these two parts can be executed relatively independently.

4.1  Global concurrent marking (global concurrent marking)

Global concurrent marking is based on concurrent marking in the form of SATB. It is divided into the following stages:

  • Initial marking (initial marking) : pause phase . Scan the root set, mark all objects that are directly reachable from the root set and push their fields into the marking stack until subsequent scans. G1 uses an external bitmap to record mark information, instead of using the mark bit in the mark word of the object header. In the generational G1 mode, the initial marking phase borrows the pause of the young GC, so there is no additional, separate pause phase.
  • Concurrent marking (concurrent marking) : Concurrent phase . Constantly fetch references from the scan stack to recursively scan the object graph in the entire heap. Every time an object is scanned, it will be marked, and its fields will be pushed onto the scan stack. Repeat the scanning process until the scan stack is emptied. The process also scans the references recorded by the SATB write barrier .
  • Final marking (final marking, also called remarking in implementation) : the pause phase . After concurrency marking is completed, each Java thread will have some remaining SATB write barrier (write barrier) record references that have not been processed. This stage is responsible for processing the remaining references. At the same time, weak reference processing (reference processing) is also performed at this stage.
    Note that there is an essential difference between this pause and the remark of CMS, that is, this pause only needs to scan the SATB buffer, while the remark of CMS needs to rescan the dirty card in the mod-union table plus the entire root set, and at this time the entire young gen (Regardless of whether the object is dead or alive) will be regarded as part of the root collection, so CMS remark may be very slow.
  • Cleanup : The suspension phase . Inventory and reset flag status. This stage is a bit like the sweep stage in mark-sweep, but instead of sweeping actual objects on the heap, it counts how many objects each region is marked as alive in the marking bitmap. At this stage, if a region that has no live objects at all is found, it will be reclaimed as a whole into the list of allocatable regions.

There are two sub-modes for selecting CSet in the generational G1 mode, corresponding to young GC and mixed GC:

  • Young GC : Select all regions in young gen. The cost of young GC is controlled by controlling the number of regions of young gen.
  • Mixed GC : Select all regions in the young gen, plus a number of old gen regions with high collection revenue based on global concurrent marking statistics. Choose an old gen region with high revenue as much as possible within the cost target range specified by the user.

 

related articles

  1. Garbage-First (G1) garbage collector 

 

 

 

Guess you like

Origin blog.csdn.net/qq_41893274/article/details/113901839