In-depth analysis of G1 garbage collector and performance optimization

This article introduces in detail the parameter configuration of the G1 garbage collector, how to perform performance tuning, and how to analyze and evaluate GC performance.

0. Introduction to G1

The full name of G1 is Garbage First Garbage Collector , which is a server-side garbage collector built in HotSpot JVM.
G1 uses the [generational algorithm] to disassemble the GC process into multiple concurrent and parallel stages, and break up the pause time, thereby achieving low-latency characteristics and maintaining good throughput.
As long as G1 thinks that garbage collection can be performed, a GC will be triggered. Of course, G1 will give priority to reclaiming areas with less surviving data.
Less surviving data means more garbage objects inside, which is also the origin of the name Garbage First.

The garbage collector is essentially a memory management tool. The G1 algorithm mainly implements automatic memory management in the following ways:

  • [Generation] Allocate new objects in the young generation, and objects that reach a certain age are promoted to the old generation.
  • [Concurrency] Traverse all surviving objects in the old generation during the concurrent marking phase. Whenever the total usage of heap memory in Java exceeds a threshold, HotSpot triggers a marking cycle.
  • [Organization] Organize surviving objects through parallel copying to release available memory.

In GC, parallel (parallel) refers to multiple GC threads working together, and concurrent (concurrent) refers to concurrent execution of GC threads and business threads.

This article first briefly introduces how to configure G1 parameters, and then introduces how to analyze and evaluate GC performance.
If you want to perform GC tuning, you must at least have a certain understanding of Java's garbage collection mechanism .

G1 is an incremental generational garbage collector. What is an increment?
G1 divides the heap memory into many [small areas, small blocks] (regions) of the same size.
When the JVM starts, the size of each region is determined according to the configuration of the heap memory. The size range of the region is 1MBup to 32MB, and the total number generally does not exceed 2048.

In G1, the new generation (eden), the survival area (survivor) and the old generation (old generation) are all logical concepts, which are composed of these regions, and these regions do not need to be continuous.

https://img-blog.csdn.net/20170205235146220

You can set parameters to specify the "expected maximum pause time", and G1 will try to meet this soft real-time target value.
During garbage collection in [pure young mode (young)], G1 can dynamically adjust the size of the young generation (eden + survivor) to achieve this soft real-time target pause time.
In the [mixed mode (mixed)] garbage collection process, G1 can adjust the number of old-age regions that need to be recycled in this GC, depending on the [total number of regions to be recycled], [the percentage of surviving objects in each region], And [ratio of allowed waste of heap memory] and other data.

G1 adopts [Incremental Parallel Copy] to implement [Heap Memory Defragmentation Function], and copies the surviving objects in the collection to the new region. The set of regions involved.
The goal is to reclaim as much heap memory as possible from free regions, while also trying to achieve the desired pause time metric.

G1 sets up a separate [memory set] for each region. The English name is Remembered Set, or RSet for short, which is used to track and record references from other regions to this region.
Through this region division and independent RSet data structure, G1 can perform incremental garbage collection in parallel without traversing the entire heap memory.
Because you only need to scan the RSet, you can know which cross-region references point to this region, and then recycle these regions.
G1 uses [post-write barrier] (post-write barrier) to record the modification information of heap memory, and is responsible for updating RSet.

1. Introduction to garbage collection phase

The pure young generation mode GC of the G1 garbage collector, and the mixed mode GC, in addition to the STW phase of the transfer pause (evacuation pause), there are parallel, concurrent, marking cycles consisting of multiple sub-phases.
G1 uses the start snapshot algorithm (SATB, Snapshot-At-The-Beginning) to take a snapshot of the surviving object information in the heap memory at the beginning of the marking cycle.
The total surviving objects then include the surviving objects in the starting snapshot, plus newly created objects since the mark started.
G1's marking algorithm uses [pre-write barrier] (pre-write barrier) to record and mark objects that logically belong to this snapshot.

2. Garbage collection in pure young generation mode

G1 sends most of the memory allocation requests to the eden area.
During the garbage collection process in the young generation mode, G1 will collect the eden area and the survival area used by the previous GC.
And copy/transfer the surviving objects to some new regions, where the specific copying depends on the age of the objects;
if it reaches a certain GC age, it will be transferred/promoted to the old generation; otherwise, it will be transferred to the survival area.
The survival area this time will be added to the CSet of the next young generation GC/mixed mode GC.

3. Mixed-mode garbage collection

After the concurrent marking cycle is executed, G1 will switch from pure young mode to mixed mode.
When performing mixed-mode garbage collection, G1 will select a part of the old generation region to add to the recycling collection. Of course, each recycling collection includes all eden areas and survival areas.
Specifically, how many old generation regions are added at a time and which parameters are determined by them will be discussed later.
After multiple mixed-mode garbage collections, many old generation regions have been processed, and then G1 switches back to pure young generation mode until the next concurrent marking cycle is completed.

4. Phases of the marking cycle

The marking cycle of G1 consists of the following phases:

  • [Initial Marking Phase] ( Initial mark phase): Marking GC roots at this stage is usually performed in addition to a regular young generation GC.
  • [Scan the region where the GC root is located] ( Root region scanning phase): According to the GC root elements determined in the initial marking stage, scan the region where these elements are located, obtain references to the old generation, and mark the referenced objects. This phase executes concurrently with application threads, that is, without STW pauses, and must be completed before the next young generation GC starts.
  • [Concurrent marking phase] ( Concurrent marking phase)": traverse the entire heap to find all reachable surviving objects. This phase is executed concurrently with the application thread, and it is also allowed to be interrupted by the young generation GC.
  • 【Re-marking phase】( Remark phase): There is a STW pause in this phase to complete the marking cycle. G1 will clear the SATB buffer, track unreached surviving objects, and perform reference processing.
  • 【Cleanup stage】( Cleanup phase): This is the last substage. G1 will have a STW pause when performing statistics and cleaning RSet. During the statistical process, completely idle regions will be marked, and candidate regions suitable for mixed-mode GC will also be marked. Part of the cleanup phase is performed concurrently, such as when free regions are reset and added to the free list.

5. Common parameters and default values

G1 is an adaptive garbage collector. Most of the parameters have default values. Generally, it can run efficiently without much configuration.
Commonly used parameters and corresponding default values ​​are listed below. If there are special requirements, you can adjust the JVM startup parameters to meet specific performance indicators.

-XX:G1HeapRegionSize=n

Used to set the size of the G1 region. Must be 2的幂(power of x), allowed range is 1MBto 32MB. The default value of this parameter will be dynamically adjusted according to the initial size ( ) and maximum value ( )
of the heap memory , so as to divide the heap memory into about 2048 regions.-Xms-Xmx

-XX:MaxGCPauseMillis=200

The desired maximum pause time. The default is 200 milliseconds. This value will not be adjusted automatically, it is whatever is set at startup.

-XX:G1NewSizePercent=5

Set the minimum space ratio of the young generation, the default value 5is equivalent to at least 5% of the heap memory will be used as the young generation.
This parameter overrides -XX:DefaultMinNewGenPercent.
This is an experimental parameter and may change in subsequent versions.

-XX:G1MaxNewSizePercent=60

Set the maximum space ratio of the young generation. The default value 60is equivalent to a maximum of 60% of the heap memory will be used as the young generation.
This setting overrides -XX:DefaultMaxNewGenPercent.
This is an experimental parameter and may change in subsequent versions.

-XX:ParallelGCThreads=n

Set the number of parallel worker threads in the STW phase.

  • If the number of logical processors is less than or equal to 8, it is nequal to the number of logical processors by default.
  • If the number of logical processors is greater than 8, the ndefault is approximately equal to the number of processors 5/8+ 3.
  • If it is a high-configuration SPARC system, the default nis approximately equal to the number of logical processors 5/16.
  • In most cases use the default value.
  • There is one exception, that is, a low version of JDK is used in the Docker container. For the case reference: JVM Troubleshooting and Analysis Part II (case combat) .

-XX:ConcGCThreads=n

Sets the number of GC threads for concurrent marking. The default is approximately equal to ParallelGCThreadsthe value of 1/4.

-XX:InitiatingHeapOccupancyPercent=45

Sets the trigger threshold for the marking cycle, which is the percentage of Java heap memory usage. The default trigger threshold is the entire Java heap 45%.

-XX:G1MixedGCLiveThresholdPercent=65

When performing mixed-mode GC, determine whether to include it in the collection according to the usage rate of the old generation region. Threshold defaults to 65%.
This setting overrides -XX:G1OldCSetRegionLiveThresholdPercent.
This is an experimental parameter and may change in subsequent versions.

-XX:G1HeapWastePercent=10

Sets the tolerable percentage of heap memory waste.
If the reclaimable heap memory ratio is less than this threshold ratio, HotSpot will not start mixed-mode GC.
The default value is 10%.

-XX:G1MixedGCCountTarget=8

How many mixed-mode GCs to expect to perform after a marking cycle is complete until the fraction of live data falls below G1MixedGCLiveThresholdPercent.
The default is to perform 8 mixed-mode GCs. The number of specific executions is generally less than this value.

-XX:G1OldCSetRegionThresholdPercent=10

In mixed-mode GC, the upper limit of the number of old regions processed each time. The default is the Java heap 10%.

-XX:G1ReservePercent=10

Set a certain percentage of reserved space to keep it free and reduce to空间the risk of insufficient memory. The default value is 10%.
Although this is a percentage, it is actually mapped to a specific size, so when increasing or decreasing the percentage, it is best to adjust the total size of the Java heap to the same size.

6. How to unlock experimental JVM parameters

To modify the value of an experimental JVM parameter, it must first be declared.
We can explicitly specify it in the command line parameters before setting the experimental parameters -XX:+UnlockExperimentalVMOptions. For example:

java -XX:+UnlockExperimentalVMOptions -XX:G1NewSizePercent=10 -XX:G1MaxNewSizePercent=75 G1test.jar

7. Best Practices and Recommendations

Before tuning the G1 parameters, there are a few things to keep in mind:

  • 禁止设置年轻代的大小: Do not use options like -Xmn, , -XX:NewRatioetc. to specify the size of the young generation. If you specify a fixed young generation size, you will override the maximum pause time goal, which can be said to outweigh the gain.
  • 期望的最大暂停时间值: No matter which garbage collector is tuned, there is a trade-off between latency and throughput metrics.
    G1 is an incremental garbage collector with uniform pause time, so the overhead on CPU resources is relatively large. The throughput goal of G1 refers to ensuring that application threads occupy more than 90% of the CPU time in high-load scenarios, and the overhead of GC threads is kept below 10%.
    In contrast, the built-in high-throughput garbage collector in HotSpot can be optimized to 99% of the application thread time, which means that there is less than 1% of GC overhead.
    Therefore, when measuring the throughput index of G1, the pause time index needs to be relaxed. Setting a pause time target that is too small means that you are willing to incur a large GC overhead, but this will affect throughput. When stress testing G1's latency index, you can set the expected soft real-time pause time index, and G1 will try its best to achieve this goal. The side effect is that throughput will suffer.
  • For most server-side applications, the CPU load will not exceed 50%, even if the GC takes up a little more CPU, it will not affect much, because there are still many redundancy, we pay more attention to the GC pause time, because it is related to Response latency metrics.
  • 混合模式的GC: When tuning a mixed-mode GC, the following options can be tried. See the previous sections for details on these options:
    • -XX:InitiatingHeapOccupancyPercent: Sets the trigger threshold for the marking cycle.
    • -XX:G1MixedGCLiveThresholdPercentand -XX:G1HeapWastePercent: Adjust mixed-mode GC-related strategies.
    • -XX:G1MixedGCCountTargetand -XX:G1OldCSetRegionThresholdPercentare used to optimize and adjust the proportion of the old generation region in the CSet.

8. Messages of memory overflow and memory exhaustion in the GC log

If we see this in the GC logs to-space overflow/exhausted, it means that G1 does not have enough memory for either the live area or the objects that need to be promoted, or both. At this time, the Java heap memory has generally reached the maximum value and cannot be automatically expanded. Examples are as follows:

924.897: [GC pause (G1 Evacuation Pause) (mixed) (to-space exhausted), 0.1957310 secs]

or this:

924.897: [GC pause (G1 Evacuation Pause) (mixed) (to-space overflow), 0.1957310 secs]

To fix such issues, you can try the following adjustments:

  • Increase -XX:G1ReservePercentthe value of the option to increase the reserved "to-space" size. Generally speaking, the total size of the heap memory needs to be increased accordingly.
  • Lower -XX:InitiatingHeapOccupancyPercentto trigger marking cycles earlier.
  • Appropriately increase -XX:ConcGCThreadsthe value of the option to increase the number of concurrent marking threads.

For specific information on these options, please refer to the previous description.

9. Memory allocation for large objects/huge objects

If an object exceeds half of a single region space, it will be regarded as a Humongous object by G1. For example a very large array or String.
Such objects will be allocated directly to the "Humongous region" of the old generation. A large object region is a group of contiguous regions in the virtual address space. StartsHumongousMarks the first region, and ContinuesHumongousmarks the subsequent set of regions.

Before allocating large object regions, G1 will first determine whether the threshold for starting the marking cycle is reached, and will start a concurrent marking cycle if necessary.

During the cleanup phase at the end of the marking cycle and during the cleanup of FullGC, giant objects that are no longer in use are released.

In order to reduce the overhead of memory copying, all transfer pause GCs do not compress and organize giant objects. Full GC will only organize the huge objects in place.

Since each StartsHumongous and ContinuesHumongous collection holds only one Humongous object, some of the last space within the collection is always wasted.
If the space occupied by an object is only a little bit larger than N regions, then the unused part of the space actually generates memory fragmentation.

If in the GC log, you see a large number of concurrent cycles triggered by Humongous allocation, and a large number of memory fragments are formed in the old generation, you need to increase the value so that the previous huge objects are no longer regarded as giants -XX:G1HeapRegionSize. Instead, follow the conventional object allocation method [as long as it is less than 50% of the region].

10. Summary

G1 is an [incremental] garbage collector in [parallel + concurrent] mode, which divides the heap memory into many regions. Compared with other GC algorithm implementations, it provides more predictable and accurate pause times.
The incremental feature allows G1 to handle larger heap memory spaces while still maintaining reasonable response times in the worst case.

G1 has self-adaptive features. In general, only three tuning parameters need to be set:

  • The desired maximum pause time, e.g.-XX:MaxGCPauseMillis=50
  • The maximum size of the heap memory, for example-Xmx4g
  • The minimum value of heap memory, for example-Xms4g

About the Author

Monica Beckwith, a core member of the Oracle Technical Working Group, is the performance lead for the Garbage First Garbage Collector under the Java HotSpot VM project.
More than 10 years of working experience in the field of performance and architecture.
Prior to Oracle and Sun Microsystems, Monica was responsible for performance tuning for Spansion Inc.
Monica has collaborated with many Java-based benchmarks to find performance improvements for the Java HotSpot VM.

Related Resources and Links

Guess you like

Origin blog.csdn.net/renfufei/article/details/108476781