Detailed explanation of JVM garbage collector

1. Garbage collector

If the collection algorithm is the methodology of memory recovery, then the garbage collector is the practitioner of memory recovery. The "Java Virtual Machine Specification" does not make any regulations on how the garbage collector should be implemented. Therefore, the garbage collectors included in different manufacturers and different versions of virtual machines may be very different. Different virtual machines generally Various parameters will also be provided for users to combine the collectors used by each memory generation according to their own application characteristics and requirements.

insert image description here

The figure shows seven collectors that act on different generations. If there is a connection between the two collectors, it means that they can be used together. The area where the collector is located in the figure indicates that it belongs to the new generation collector. Or the old generation collector.

1. Serial collector

The Serial collector is the most basic and oldest collector. It used to be (before JDK1.3.1) the only choice for the new generation collector of the HotSpot virtual machine.

As you can guess just by looking at the name, the Serial collector is a single-threaded collector , but its "single-threaded" meaning does not mean that it only uses one processor or one collection thread to complete garbage collection Work, and more importantly, it emphasizes that when it is collecting garbage, all other worker threads must be suspended until it finishes collecting. Although the Serial collector is old, it is still recommended on the client side because it is efficient, simple, and the collector with minimal additional memory consumption.

“Stop The World”This word may sound cool, but this work is automatically initiated and completed by the virtual machine in the background, and all the normal working threads of the user are stopped when the user is unknowable and uncontrollable.

insert image description here

Corresponding parameters:

  • -XX:+UseSerialGC

Use the serial collector for recycling. This parameter will make both the new generation and the old generation use the serial collector, the new generation uses the copy algorithm, and the old generation uses the mark-sort algorithm. The Serial collector is the most basic and oldest collector, and it is a single-threaded collector. Once the collector starts running, the entire system stops. It is enabled by default in Client mode, and disabled by default in other modes.

  • -XX: SurvivorRatio

-XX:SurvivorRatio=6, set the ratio of Eden area to one Survivor area is 6:1, Eden is 6, two Survivors are 2, Eden accounts for 6/8 of the new generation, that is, 3/4, each Survivor accounts for 1/8, two accounts for 1/4

  • -XX:PretenureSizeThreshold

-XX:PretenureSizeThreshold=1024 1024 10 When this value is exceeded, the object will allocate memory directly in the old generation. The default value is 0, which means that no matter how large it is, memory will be allocated in eden first.

  • -XX: HandlePromotionFailure

-XX: HandlePromotionFailure=true/false, switch for space allocation guarantee:

Before Minor GC occurs, the virtual machine first checks whether the maximum available continuous space in the old generation is greater than the total space of all objects in the new generation.

If it is larger, perform Minor GC, and if it is smaller, check whether the -XX:+HandlePromotionFailure setting allows guarantee failure (if not, then direct Full GC).
If it is allowed, it will continue to check whether the maximum available continuous space in the old generation is greater than the average size of objects promoted to the old generation. If it is
larger, try Minor GC (if the attempt fails, Full GC will also be triggered), and if it is smaller, Full GC will be performed.

Note: After JDK 6 Update 24, the -XX:HandlePromotionFailure parameter will no longer affect the space allocation guarantee policy of the virtual machine.

2. ParNew collector

The ParNew collector is essentially a multi-threaded parallel version of the Serial collector . In addition to using multiple threads for garbage collection at the same time, the remaining behaviors include all control parameters, collection algorithms, Stop The World, and object allocation rules available to the Serial collector. , recycling strategy, etc. are exactly the same as the Serial collector.

insert image description here

Speaking of ParNew, I have to talk about CMS in advance:

When JDK 5 was released, HotSpot launched an epoch-making garbage collector, the CMS collector. For the first time, it is possible to have garbage collection threads work (essentially) concurrently with user threads. Parameters to activate CMS: -XX:+UseConcMarkSweepGC. After activating CMS, the combination of CMS+ParNew is used for garbage collection by default. After JDK9, the switch configuration of the ParNew collector: -XX:+UseParNewGC has also been cancelled! It can be said that ParNewGC is the first garbage collector to exit the stage from HotSpot.

Corresponding parameters:

  • -XX:+UseParNewGC

Parallel means parallel. The ParNew collector is a multi-threaded version of the Serial collector. After using this parameter, parallel collection will be performed in the new generation, and serial collection will still be used in the old generation. The new generation S area still uses the replication algorithm. The operating system is effective on multi-core CPUs, and the serial collector is recommended for single-core CPUs.
When printing GC details, ParNew indicates that the ParNewGC collector is used. Disabled by default.

Both parallelism and concurrency are professional terms in concurrent programming. In the context of talking about garbage collectors, they can be understood as:

  • 并行(Parallel): Parallel describes the relationship between multiple garbage collector threads, indicating that multiple such threads are working together at the same time. Usually, the user thread is in a waiting state by default at this time.
  • 并发(Concurrent): Concurrency describes the relationship between garbage collector threads and user threads, indicating that both garbage collector threads and user threads are running at the same time. Since the user thread is not frozen, the program can still respond to service requests, but because the garbage collector thread occupies a part of system resources, the processing throughput of the application program will be affected to a certain extent.

3. Parallel Scavenge collector

The Parallel Scavenge collector is also a new generation collector . It is also a collector based on the mark copy algorithm and a multi-threaded collector that can collect in parallel.

Many features of Parallel Scavenge are superficially similar to ParNew. The characteristic of Parallel Scavenge collector is that its focus is different from other collectors:

  • The focus of collectors such as CMS is that 尽可能地缩短垃圾收集时用户线程的停顿时间the shorter the pause time, the more suitable for programs that need to interact with users or ensure the quality of service response. Good response speed can improve user experience;
  • And the goal of the Parallel Scavenge collector is 达到一个可控制的吞吐量(Throughput) .

Throughput here: refers to the ratio of the time the processor spends on running user code to the total time consumed by the processor.

That is: throughput = running user code time / (running user code time + running garbage collection time)
high throughput can make the most efficient use of processor resources and complete the calculation tasks of the program as soon as possible, mainly suitable for background calculations without too much Multiple interactive analysis tasks.

Parallel Scavenge collector related parameters:

  • -XX:+UseParallelGC

The new generation uses the Parallel collector, and the old generation uses the serial collector. The Parallel Scavenge collector is similar in every respect to the ParNew collector, and its purpose is to achieve a controllable throughput. Server mode is enabled by default, and other modes are disabled by default.

  • -XX: MaxGCPauseMillis parameter

The value allowed by this parameter is a number of milliseconds greater than 0, and the collector will try its best to ensure that the time spent on memory recovery does not exceed the user-set value.
But don't think that if you set the value of this parameter to be smaller, the garbage collection speed of the system will be faster. The shortened garbage collection
pause time is at the expense of throughput and new generation space:
the system Adjust the new generation to be smaller. Collecting 300MB of the new generation is definitely faster than collecting 500MB, but this also directly leads to more frequent garbage collection. It used to be collected every 10 seconds with a
pause of 100 milliseconds, but now it is collected every 5 seconds. Each pause is 70 milliseconds. Pause times did drop, but so did throughput.

  • -XX: GCTimeRatio parameter
    The value of this parameter should be an integer greater than 0 and less than 100, that is, the ratio of garbage collection time to the total time, which is equivalent to the reciprocal of throughput.
    If this parameter is set to 19, the maximum allowed garbage collection time will account for 5% of the total time (ie 1/(1+19)), and the default value is 99, which allows a maximum of 1% (ie
    1/(1+ 99)) garbage collection time.

  • -XX: +UseAdaptiveSizePolicy parameter
    This is a switch parameter. When this parameter is activated, there is no need to manually specify the size of the new generation (-Xmn), the ratio of Eden to Survivor area (-XX: SurvivorRatio), and upgrade to the old generation Object size (-XX: PretenureSizeThreshold) and other detailed parameters,
    the virtual machine collects performance monitoring information according to the current system operation status, and dynamically adjusts these parameters to provide the most suitable pause time or maximum throughput.

4. Serial Old Collector

Serial Old is an old-age version of the Serial collector , which is also a single-threaded collector that uses a markup algorithm .

The main significance of this collector is also used by the HotSpot virtual machine in client mode. Sometimes you will see that the default implementation of the name PS MarkSweep is actually a layer of skin, and the code that actually does the mark-sweep-compact work under it and the serial old share the same code.

insert image description here

In the Server mode, there are two major uses:
(1) Before JDK1.5, it can be used with the Parallel Scavenge collector (JDK1.6 has a Parallel Old collector);
(2) As a backup plan for the CMS collector, Used when Concurrent Mode Failure occurs in concurrent collection

5. Parallel Old Collector

Parallel Old is an old-age version of the Parallel Scavenge collector , which supports multi-threaded concurrent collection and is implemented based on the mark-sorting algorithm .

This collector was not provided until JDK 6. Before that, the Parallel Scavenge collector of the new generation has been in a rather embarrassing state. The reason is that if the new generation chooses the Parallel Scavenge collector, the old generation except the Serial Old collector There is no other choice, other well-behaved old generation collectors, such as CMS cannot work with it.

The single thread of the Serial Old collector is the bottleneck of the entire collection system. Until the emergence of the Parallel Old collector, the "throughput priority" collector finally has a more veritable combination. When throughput is emphasized or processor resources are scarce, the combination of Parallel Scavenge and Parallel Old collector can be given priority. .

insert image description here

-XX:+UseParallelOldGC: Specifies to use the Parallel Old collector;

6. CMS collector

The CMS (Concurrent Mark Sweep) collector is a collector that aims to obtain the shortest recovery pause time .

At present, a large part of Java applications are concentrated on the server side of Internet websites or browser-based B/S systems. These applications usually pay more attention to the response speed of services, and hope that the system pause time is as short as possible to bring users Good interactive experience.

The CMS collector is very suitable for the needs of this type of application. From the name (including "Mark Sweep"), it can be seen that the CMS collector is implemented based on the mark-sweep algorithm. Its operation process is more complicated than that of the previous collectors. The whole process is divided into four steps. include:

  • 1) Initial mark (CMS initial mark)
  • 2) Concurrent mark (CMS concurrent mark)
  • 3) Remark (CMS remark)
  • 4) Concurrent sweep (CMS concurrent sweep)

The two steps of initial marking and re-marking are still required "Stop The World".

  • Initial marking stage: just mark the objects that GC Roots can directly relate to, the speed is very fast;
  • The concurrent marking phase is the process of traversing the entire object graph from the directly associated objects of GC Roots. This process takes a long time but does not need to pause user threads, and can run concurrently with garbage collection threads;
  • The re-marking phase is to correct the marking records of the part of the object that has been changed due to the continued operation of the user program during the concurrent marking period. The pause time of this phase is usually slightly longer than the initial marking phase, but it is also much longer than the concurrent marking. The duration of the phase is short;
  • In the concurrent clearing stage , the dead objects judged by the marking stage are cleaned up and deleted. Since there is no need to move surviving objects, this stage can also be concurrent with user threads.

Since the garbage collector thread can work together with the user thread during the longest time-consuming concurrent marking and concurrent clearing phases in the entire process, generally speaking, the memory recovery process of the CMS collector is related to the user thread. are executed simultaneously.

insert image description here

CMS is an excellent collector, and its main advantages are already reflected in the name: 并发收集、低停顿 .

The CMS collector is the first successful attempt of the HotSpot virtual machine to pursue low pauses, but it is far from perfect, and has at least the following three obvious shortcomings:

  • 1) First of all, it is very sensitive to computer computing resources

In the concurrency stage, although it will not cause user threads to stall, it will slow down the application and reduce the total throughput because it takes up a part of the computing power of the processor.

The number of recycling threads started by CMS by default is (number of processor cores + 3)/4, that is to say, if the number of processor cores is four or more, garbage collection threads only occupy no more than 25% of processor operations during concurrent recycling resources and will decrease as the number of processor cores increases. But when the number of processor cores is less than four, the impact of CMS on user programs may become very large.

  • 2) Second, the CMS cannot handle "floating garbage"

There may be "Concurrent Mode Failure", leading to "Stop The World" in the concurrent marking and concurrent cleaning phase of the CMS, the user thread is still running, the program is running, and naturally new garbage objects will continue to be generated , but this part of garbage objects appears after the marking process is over, and CMS cannot dispose of them in the current collection, so it has to be cleaned up in the next garbage collection. This part of garbage is called "floating garbage".

It is also because the user thread needs to continue to run during the garbage collection phase, so it is necessary to reserve enough memory space for the user thread to use, so the CMS collector cannot wait until the old generation is almost completely filled like other collectors. To collect, a part of space must be reserved for program operation during concurrent collection.

Under the default setting of JDK 5, the CMS collector will be activated when 68% of the space in the old generation is used. This is a conservative setting. By the time of JDK 6, the startup threshold of the CMS collector has been increased to 92%. But this will make it easier to face another risk: if the memory reserved during the operation of the CMS cannot meet the needs of the program to allocate new objects, there will be a "Concurrent Mode Failure", and the virtual machine will have to Start the backup plan: freeze the execution of the user thread and "Stop The World" at this time, and temporarily enable the Serial Old collector to re-collect the garbage in the old age, but the pause time will be very long.

  • 3) Finally, CMS uses the "mark-clean" algorithm and will not move surviving objects, which will generate excessive memory space fragmentation

When there are too many space fragments, it will bring a lot of trouble to the allocation of large objects. Often there is still a lot of remaining space in the old generation, but it is impossible to find a large enough contiguous space to allocate the current object, and has to trigger a Full GC in advance Case. When the CMS collector has to perform Full GC, it will merge and organize memory fragments by default.

Corresponding configuration parameters:

  • -XX:+UseConcMarkSweepGC

Enable CMS: -XX:+UseConcMarkSweepGC

  • -XX:ParallelCMSThreads

The number of recycling threads started by CMS by default is (ParallelGCThreads + 3)/4). If you need to set it explicitly, you can set it by -XX:ParallelCMSThreads=20, where ParallelGCThreads is the number of parallel collection threads in the young generation

  • -XX:+UseCMSCompactAtFullCollection and -XX:CMSFullGCsBeforeCompaction=10 (enabled by default, JDK 9 obsolete)

CMS will not defragment the heap, so in order to prevent full gc caused by heap fragmentation, the option of merging fragments at the CMS stage will be enabled: -XX:+UseCMSCompactAtFullCollection (
default enabled, JDK 9 obsolete), enabling this option will affect to a certain extent Performance, perhaps by configuring the appropriate
-XX:CMSFullGCsBeforeCompaction=10 (JDK 9 obsolete), to adjust the performance.

After the last CMS concurrent GC execution, how many more full GCs will be executed before merging fragments. The default is 0, that is, under the default configuration, every time the CMS GC can't stand it and wants to transfer to the full GC, it will merge fragments.

If you configure CMSFullGCsBeforeCompaction to 10, it will only merge fragments every 10 real full GCs.

  • -XX:+CMSParallelRemarkEnabled

In order to reduce the time of the second pause, enable parallel remark: -XX:+CMSParallelRemarkEnabled
If the remark is still too long, you can enable the -XX:+CMSScavengeBeforeRemark option to
force a minor gc to start before the remark, reducing the remark pause time, but in Another minor gc will start immediately after remark.

  • In order to avoid the full gc caused by the full Perm area, it is recommended to enable the CMS recycling Perm area option:

-XX:+CMSPermGenSweepingEnabled
-XX:+CMSClassUnloadingEnabled

  • -XX:CMSInitiatingOccupancyFraction

The default CMS of JDK5 is to start CMS collection when the tenured generation is full of 68%.
In JDK 6, the startup threshold of the CMS collector has been increased to 92% by default. If necessary, you can adjust this value appropriately:
-XX:CMSInitiatingOccupancyFraction=80 Here, the CMS recovery will start only when 80% is full.

  • -XX:ParallelGCThreads

The default number of parallel collection threads in the young generation is (cpu <= 8) ? cpu : 3 + ((cpu * 5) / 8). If you want to reduce the number of threads, you can adjust it with -XX:ParallelGCThreads=N
.

7. G1 collector

The Garbage First (G1 for short) collector is a milestone achievement in the history of garbage collector technology development. It pioneered the design idea of ​​the collector for local collection and the Region-based memory layout form . G1 is a garbage collector mainly for server-side applications.

On the day JDK9 was released, G1 announced to replace the combination of Parallel Scavenge and Parallel Old as the default garbage collector in server mode, while CMS was reduced to a collector declared as not recommended (Deprecate).

For all other collectors before the G1 collector, including CMS, the target range of garbage collection is either the entire new generation (MinorGC), or the entire old age (MajorGC), or the entire Java heap (Full GC). But G1 jumped out of this cage, it can face any part of the heap memory to form a collection (Collection Set, generally referred to as CSet) for recycling, the measurement standard is no longer which generation it belongs to, but the amount of garbage stored in which memory At most, the recovery benefit is the largest, which is the Mixed GC mode of the G1 collector.

The Region-based heap memory layout pioneered by G1 is the key to its ability to achieve this goal.

Although G1 is still designed according to the theory of generational collection, its heap memory layout is very different from other collectors:

  • The traditional GC collector divides the continuous memory space into the new generation, the old generation and the permanent generation

insert image description here

  • The storage addresses of each generation of G1 are discontinuous, and each generation uses n discontinuous regions of the same size, and each region occupies a continuous virtual memory address.

insert image description here

Each Region can act as the Eden space of the new generation, the Survivor space, or the old generation space according to the needs.

The collector can adopt different strategies to deal with Regions that play different roles, so that whether it is a newly created object or an old object that has survived for a period of time and survived multiple collections, it can obtain good collection results.

There is also a special Humongous area in Region, which is specially used to store large objects.

G1 believes that as long as the size exceeds half of the capacity of a Region, it can be judged as a large object.

The size of each Region can be set by the parameter -XX: G1HeapRegionSize, the value range is 1MB ~ 32MB, and it should be the Nth power of 2.

For those super large objects that exceed the capacity of the entire Region, they will be stored in N consecutive Humongous Regions. Most of G1's behaviors regard the Humongous Region as part of the old age.

Although G1 still retains the concept of the young generation and the old generation, the new generation and the old generation are no longer fixed. They are a dynamic collection of a series of regions (not necessarily continuous).

The reason why the G1 collector can establish a predictable pause time model is that it regards the Region as the smallest unit of a single collection, that is, the memory space collected each time is an integer multiple of the Region size, so that it can be planned to avoid in- Area-wide garbage collection is performed on the entire Java heap.

A more specific processing idea is to let the G1 collector track the "value" of garbage accumulation in each Region. The value is the size of the space obtained by recycling and the experience value of the time required for recycling, and then maintain a priority list in the background. Each time, according to the allowed collection pause time set by the user (specified using the parameter -XX: MaxGCPauseMilis, the default value is 200 milliseconds), those Regions with the highest recovery value are prioritized, which is the origin of the name " Garbage First" . This method of using Region to divide memory space and prioritized region recycling ensures that the G1 collector can obtain the highest possible collection efficiency within a limited time.

The GC mode of G1 provides two GC modes:

  • Young GC
  • Mixed GC。

7.1 G1 Young GC

It will be triggered when the Eden space is exhausted, and the GC of the Eden area will start. In this case, the data in the Eden space will be moved to the Survivor space. If the Survivor space is not enough, part of the data in the Eden space will be directly promoted to the old space. generation space.

The data in the Survivor area is moved to the new Survivor area, and some data is also promoted to the old generation space.

In the end, the data in the Eden space is empty, the GC stops working, and the application thread continues to execute.

insert image description here

At this time, we need to consider a problem. If only the new generation objects are GC, the objects in the Young area may still have references to the Old area. This is the problem of cross-generational references.

In order to avoid scanning the entire heap, it will take a lot of time. Therefore, G1 introduced the concepts of RSet (Remembered Set) and card table.

The basic idea is to trade space for time.

7.1.1 Remembered Set

Young generation GC (happens very frequently). Generally speaking, the GC process is like this:

First enumerate the root node. The root node may be in the new generation or in the old generation.

Here, since we only want to collect the new generation (in other words, we don't want to collect the old generation), there is no need to do a comprehensive reachability analysis on the GC Roots located in the old generation. But the problem is that there may indeed be a GC Root in the old generation, which refers to an object in the new generation. You cannot clear this object, and the G1 mode is a living object.

The key is how to quickly judge which objects are such objects? A full scan of the old generation is clearly not economical.

insert image description here

In fact, for the reference relationship between objects in different ages, the virtual machine will record it during the running of the program.

Corresponding to the above example, the relationship of "objects in the old generation referencing objects in the new generation" will be recorded in a special space next to the new generation when the reference relationship occurs. This is the Remembered Set, and the Remembered Set records the new generation Objects in the old generation are referenced by the old generation.

Therefore, "GC Roots of the new generation" + "content stored in the Remembered Set" are the real GC Roots when the new generation is collected. Then you can use this as a basis to do reachability analysis on the new generation and perform garbage collection.

The G1 collector uses the idea of ​​dividing into parts, and divides a large memory into many domains (Region).

But the problem is that it is inevitable that objects in one Region refer to objects in another Region. In order to achieve the purpose of garbage collection in units of Regions, the G1 collector also uses the technology of Remembered Set. Each Region in G1 has a corresponding Remembered Set, which records the situation that its own objects are referenced by external objects in each Region. When performing memory recovery, adding Remembered Set to the enumeration range of the GC root node can ensure that the whole heap will not be scanned and there will be no omissions.

G1 GC only relies on RSet in two scenarios:

  • References from the old generation to the young generation: G1 GC maintains a pointer from the old generation to the young generation, which is stored in the RSet of the young generation.
  • References from the old generation to the old generation: the pointer from the old generation to the old generation is stored in the RSet of the old generation.

7.1.2 Card Table

If there are many reference relationships between the old generation and the new generation, each reference must be recorded in Rset, and Rset will take up a lot of space, so the gain outweighs the gain. In order to solve this problem, another concept is introduced in G1, the card Table (Card Table).

The card table is a collection of bits, and each bit can be used to identify whether all objects in a certain sub-area of ​​the old generation (this area is called a card. G1 is 512 bytes) hold references to new generation objects , so that the new generation GC does not need to spend a lot of time scanning the old generation objects to determine the reference of each object, but can scan the card table first. Only when the card table is marked as 1, it needs to scan the old generation objects in this area, which is 0 It must not contain references to the new generation.

insert image description here

In general, this RSet is actually a Hash Table, the Key is the starting address of other Regions, and the Value is a collection whose elements are the Index of the Card Table.

insert image description here

7.2 G1 Mix GC

Mix GC not only performs normal new generation garbage collection, but also reclaims some old generation partitions marked by background scanning threads. Its GC steps are divided into 4 steps:

  • 1. Initial mark (STW): At this stage, G1 GC marks the root.
  • 2. Concurrent Marking: G1 GC looks for accessible (surviving) objects throughout the heap.
    3. Final mark (Remark, STW): Help complete the marking cycle.
    4. Evacuation (STW): Identify all free partitions; organize heap partitions, and identify old generation partitions with high recovery value for mixed garbage collection; RSet sorting.

insert image description here

7.2.1 Three-color labeling algorithm

The original snapshot (Snapshot At The Beginning, SATB), when it comes to concurrent marking, is inseparable from SATB. To clarify STAB, we must talk about the three-color marking algorithm.

The three-color marking algorithm is a useful way to describe the tracking collector, and it can be used to deduce the correctness of the collector's concurrent marking.

First, we divide objects into three types:

  • Black: The root object, or both the object and its children, have been scanned and are determined to be alive.
  • Gray: The object itself has been scanned, but the sub-objects in the object have not been scanned, that is, the relationship referenced by the field attributes in the object has not been scanned yet.
  • White: Objects that have not been scanned. After scanning all objects, the objects that are finally white are unreachable objects, that is, garbage objects.

When the GC starts to scan the object, scan the object according to the following steps:

The root object is colored black and child objects are colored gray.

insert image description here

The results after the GC concurrent scan are as follows:

insert image description here
During the concurrent marking phase, the application thread changes this reference relationship

A.c=C

insert image description here

The scan results during the relabeling phase are as follows:

insert image description here

In this case C will be collected as garbage.

The surviving objects of the Snapshot were originally A and B, but the actual surviving objects are now A, B, and C. The integrity of the Snapshot is destroyed. Obviously, this approach is unreasonable.

G1 uses a pre-write barrier to solve this problem. Simply put, in the concurrent marking phase, when the reference relationship changes, the pre-write barrier function will record and save this change in a queue, which is called satb_mark_queue in the JVM source code.

In the remark stage, this queue will be scanned. In this way, the object pointed to by the old reference will be marked, and its descendants will also be marked recursively, so that no object will be missed, and the integrity of the snapshot will be reduced. Got guaranteed.

The incremental update design of CMS makes it necessary to re-scan all thread stacks and the entire young gen as the root during the remark phase; the SATB design of G1 only needs to scan the remaining satb_mark_queue during the remark phase, which solves the long-term remark phase of the CMS garbage collector Potential risks of STW.

The SATB method records live objects, that is, the object snapshot at that moment, but the objects in it may become garbage later, called floating garbage, and such objects can only be collected and recycled next time. During the GC process, newly allocated objects are considered alive, and other unreachable objects are dead.

How to know which objects are newly allocated after GC starts?

In the Region, the newly configured objects are recorded through the top-at-mark-start (TAMS) pointers, respectively prevTAMS and nextTAMS. The schematic diagram is as follows:

insert image description here

where top is the current allocation pointer of the region,

  • [bottom, top) is the currently used part of the region, and [top, end) is the unused allocatable space (unused).
  • (1): [bottom, prevTAMS): Objects in this part have been marked by the n-1 round of concurrent marking
  • (2): [prevTAMS, nextTAMS): The objects in this part are implicitly alive in the n-1 round of concurrent marking
  • (3): [nextTAMS, top): The objects in this part are implicitly alive in the nth round of concurrent marking

7.2.2 Evacuation

The Evacuation phase is fully suspended. It is responsible for copying some live objects in the region to the empty region (parallel copy), and then recovering the space of the original region. In the Evacuation stage, you can freely select any number of regions to collect independently. These selected regions constitute a collection set (collection set, referred to as CSet) . The selection of Regions in the CSet set depends on the allowed collection pause time set by the user (using the parameter - XX: MaxGCPauseMilis specified, the default value is 200 milliseconds), this phase does not choose to evacuate all the regions with live objects, but only selects a small number of regions with high recycling value to evacuate, the overhead of this pause can be (within a certain range) control.

insert image description here

7.2.3 Full GC

The garbage collection process of G1 is executed concurrently with the application. When the speed of Mixed GC cannot keep up with the speed of memory application by the application, Mixed G1 will downgrade to Full GC and use Serial GC. Full GC can lead to long STW and should be avoided as much as possible.

There may be two reasons for G1 Full GC:

  • During Evacuation, there is not enough to-space to store promoted objects;
  • Exhaustion of space before concurrent processing completes

Related core configuration parameters use JDK8 as the environment:

G1 GC provides us with a lot of command line options, that is, parameters. These parameters start with a Boolean type, "+" means to enable this option, and "-" means to disable this option. The other type uses numeric assignment and does not require a Boolean type to start.

  • -XX:+UseG1GC enable G1 garbage collector

  • -XX:G1HeapRegionSize=nM (with unit)

This is a unique option of G1GC. The size of the Region defaults to 1/200 of the heap size, and can also be set to 1MB, 2MB, 4MB, 8MB, 16MB, and 32MB, which are divided into six grades. Increasing the size of the Region block is good for handling large objects.

As mentioned earlier, large objects are not managed and allocated in the same way as ordinary objects. If the size of the Region block is increased, some large objects that originally went through the special processing channel can be included in the ordinary processing channel.

This is like our airport security check. Pilots and flight attendants can use special passages. If passengers also specialize and some people go to special passages, then several special passages have to be added, and the corresponding ordinary passages have to be reduced. It played a role in reducing.

Conversely, if the region size is set too small, the flexibility of G1 will be reduced, which will cause allocation problems for each age generation.

  • -XX:MaxGCPauseMillis=200 Set target value for maximum pause time. The default is 200 milliseconds.

  • -XX:G1NewSizePercent=5

Sets the percentage of the heap to use to the minimum of the young generation size. The default is 5% of the Java heap, which is an experimental flag.

See How to Unlock Experimental VM Flags for an example. Requires configuration first: -XX:+UnlockExperimentalVMOptions

  • -XX:G1MaxNewSizePercent=60

Sets the percentage of the heap size to use to the maximum of the young generation size. The default is 60% of the Java heap, which is an experimental flag.

It is required to configure first: -XX:+UnlockExperimentalVMOptions

  • -XX:ParallelGCThreads=n

Set the value of STW worker threads. Set the value of n to the number of logical processors. The value of n is the same as the number of logical processors, up to 8.

If there are more than 8 logical processors, set the value of n to approximately 5/8 logical processors.

This works in most cases, except for larger SPARC systems, where the value of n is about 5/16 of the logical processors.

  • -XX:ConcGCThreads=n

-XX:ConcGCThreads=n Set the number of parallel marking threads. Set n to approximately 1/4 the number of parallel garbage collection threads (ParallelGCThreads).

  • -XX:InitiatingHeapOccupancyPercent=45

This option determines whether to start an old generation collection action, that is, after the young generation GC ends, G1 will evaluate whether the remaining objects have reached the threshold of 45% of the entire Java heap.

  • -XX:G1MixedGCLiveThresholdPercent=85

Identify the old region that needs to be recycled in the concurrent marking phase, and mark it as a candidate old region, so that it can enter the CSet and be recycled in the Mixed GC phase.

It is controlled by G1MixedGCLiveThresholdPercent,

When the proportion of surviving data in the region does not exceed the threshold, it means that it will be recycled, and the default occupancy rate is 85%.

During the concurrent marking phase, the proportion of surviving data in each region will be recalculated. Those regions with a large proportion of surviving data will be relatively expensive to recycle, and they will also be marked as expensive regions.

If a large number of such regions enter the CSet during the MixedGC phase, it may cause the MixedGC pause time to be too long.

In order to distinguish these regions, G1 made a separate mark. During the MixedGC stage, the candidate old region is preferentially recycled. If the cost permits, it will try to recycle the expensive region.

  • -XX:G1HeapWastePercent=5

Indicates the tolerable percentage of wasted heap space. If the recyclable percentage is less than the set percentage, the JVM will not start a mixed garbage collection cycle.

  • -XX:G1MixedGCCountTarget=8

The recovery time of the old generation Region is generally slightly longer than that of the young generation Region. This option can set how many mixed GCs are started after a concurrent mark. The default value is 8. Setting a relatively large value can make G1 GC spend more time in the old generation Region recovery,

If the pause time of a mixed GC is very long, it means that it has a lot to do, so you can increase the setting of this value to shorten the pause time,

But if this value is too large, it will also cause a corresponding increase in the time for the parallel loop to wait for the mixed GC to complete.

  • -XX:G1OldCSetRegionThresholdPercent=10

During mixed garbage collection, the maximum threshold of the old region that can enter the CSet each time (entering the CSet means garbage collection). laboratory markers

The default is 10% of the Java heap. If the value is set too large, the number of old regions that need to be collected each time Mixed GC will increase, resulting in longer pause time.

This value can limit the maximum number of old regions that can be recycled each Mixed GC.

  • -XX:G1ReservePercent=10

The value of the option indicates that the total heap size will be increased accordingly, and the amount of reserved memory will be increased for the "target space"; it is relatively guaranteed that the promoted object will not fail to be allocated!

  • -XX:+UnlockExperimentalVMOptions

To change the value of an experimental flag, it must first be unlocked. Set this parameter explicitly.

2. Low-latency garbage collector (understand)

1. Shenandoah collector

Shenandoah was originally a new collector project independently developed by RedHat. In 2014, RedHat contributed Shenandoah to OpenJDK and promoted it to become one of the official features of OpenJDK 12, but OracleJDK does not support it.

Shenandoah also uses a Region-based heap memory layout. It also has Humongous Regions for storing large objects. The default recycling strategy is also to give priority to the Region with the highest recycling value. But it has at least three distinct differences from G1 in terms of managing heap memory .

  • 1. Of course, the most important thing is to support concurrent sorting algorithms. The recovery phase of G1 can be multi-threaded in parallel, but it cannot be concurrent with user threads. However, the core function of Shenandoah is that the recovery and cleaning threads are not only multi-threaded but also compatible with User thread concurrency.

  • 2. Shenandoah (currently) does not use generational collection silently, and there will be no dedicated new generation Region and old generation Region.

  • 3. Shenandoah and G1 have different data structures to record the reference relationship of the region, using the "link matrix",

insert image description here

For the specific working process of Shenandoah, please refer to the Shenandoah garbage collector paper published by RedHat in 2016.

Paper address: https://www.researchgate.net/publication/306112816_Shenandoah_An_open-source_concurrent_compacting_garbage_collector_for_OpenJDK

Here is a comparison of the actual application performance published by RedHat in 2016:

insert image description here

2. ZGC collector

ZGC ("Z"" is not an abbreviation of a professional term, the name of this collector is called Z Garbage Collector) is an experimental low-latency garbage collector newly added in JDK 11, developed by Oracle Corporation Developed. In 2018, ZGC was submitted to OpenJDK, pushing it into the release list of OpenJDK 11.

The goals of ZGC and Shenandoah are highly similar, but the implementation technology is very different. Let’s briefly understand the technical characteristics of ZGC:

  • 1) First start with the memory layout of ZGC. Like Shenandoah and G1, ZGC also adopts a Region-based heap memory layout, but unlike them, ZGC's Region (called Page or ZPage in some official materials) is dynamic and can be dynamically created and destroyed. And dynamic area capacity size. Under the x64 hardware platform, the Region of ZGC can have three types of capacity: large, medium and small:

    • Small Region (Small Region): The capacity is fixed at 2MB, and it is used to place small objects less than 256KB.
    • Medium Region (Medium Region): The capacity is fixed at 32MB, and it is used to place objects greater than or equal to 256KB but less than 4MB.
    • Large Region (Large Region): The capacity is not fixed and can be changed dynamically, but it must be an integer multiple of 2MB, used to place large objects of 4MB or above.

Only one large object will be stored in each large Region, which also indicates that although the name is called "Large Region", its actual capacity may be smaller than that of a medium-sized Region, and the minimum capacity can be as low as 4MB.

  • 2) The core function of ZGC realizes the concurrent sorting function like Shenandoah, but the implementation method is different from Shenandoah. It uses a key technology, dyed pointer technology.

Finally, let's talk about performance:

In ZGC's strong pause time test, it mercilessly opened up a gap of two orders of magnitude with Parallel Scavenge and G1.

Whether it is the average pause, 95% pause, 99% pause, 99.9% pause, or the maximum pause time, ZGC can easily control it within ten milliseconds, so that it can be compared with other two types of pauses by hundreds or thousands When the millisecond collectors are put together for comparison, the ZGC column bar is almost invisible:

insert image description here
Related parameters:

Activate ZGC: -XX:+UnlockExperimentalVMOptions -XX:UseZGC

2. Epsilon (ε) collector

While increasingly complex and advanced garbage collectors such as G1, Shenandoah, or ZGC are appearing one after another, there is also a "opposite" new garbage collector that appears in the feature list of JDK 11—Epsilon, This is a garbage collector whose "selling point" is not being able to collect garbage, a collector that "doesn't work" .

The Epsilon collector was proposed by RedHat in JEP 318. In this proposal, Epsilon is described as a no-operation collector. In fact, as long as the Java virtual machine can work, the garbage collector cannot be truly "no-operation". . The reason is that the name "garbage collector" does not describe all of its responsibilities, and a more appropriate name should be "automatic memory management subsystem".

In addition to its own job of garbage collection, a garbage collector is also responsible for the management and layout of the heap, the allocation of objects, the cooperation with the interpreter, the cooperation with the compiler, and the cooperation with the monitoring subsystem . This part of the function of management and object allocation is the necessary support for the normal operation of the Java virtual machine, and it is also the content that must be implemented by a garbage collector with minimal functions.

Starting from JDK 10, in order to isolate the relationship between the garbage collector and the Java virtual machine interpretation, compilation, monitoring and other subsystems, RedHat proposed a unified interface for the garbage collector, namely the JEP 304 proposal. Epsilon is the validity verification and reference of this interface implementation, but also for performance testing and stress testing that require stripping away the impact of the garbage collector.

In the actual production environment, Epsilon, which cannot perform garbage collection, is still useful. For a long time, the development focus of the Java technology system has been oriented to long-term, large-scale enterprise-level applications and server-side applications. The trend has become more and more obvious. Compared with up-and-coming stars such as Golang, Java does have some inherent shortcomings in this regard, and its usage rate is gradually declining. Traditional Java has the characteristics of large memory usage, long startup time in the container, and slow optimization of just-in-time compilation. This is not a big problem for large-scale applications, but there are many short-term and small-scale service forms. discomfort. In order to cope with new technological trends, recent versions of JDK have gradually added support for pre-compilation and application-oriented class data sharing.

Epsilon also has a similar goal. If the application only needs to run for a few minutes or even a few seconds, as long as the Java virtual machine can allocate memory correctly, it will exit before the heap is exhausted. Obviously, Epsilon, which has a very small running load and no recycling behavior, is very good. Appropriate choice.

Related parameters:

-XX:+UnlockExperimentalVMOptions
-XX:+UseEpsilonGC

3. Choose the right garbage collector

The HotSpot virtual machine provides a wide variety of garbage collectors. Too many choices make it difficult to decide. It is obviously impossible to choose the most advanced ones to meet all application scenarios. Let's discuss how to choose a suitable garbage collector.

1. The answer to this question is mainly influenced by the following three factors:

1) What is the main focus of the application?

If it is a data analysis or scientific computing task, the goal is to calculate the results as soon as possible, then the throughput is the main focus;

If it is a client/server application, the pause time will directly affect the service quality, and even cause the transaction to time out, so the delay is the main concern; if it is a client application or an embedded application, then the memory usage of garbage collection It cannot be ignored.

2) How about the infrastructure to run the application?

For example, hardware specifications, whether the system architecture to be involved is X86-32/64, SPARC or ARM/Aarch64;

The number of processors and the size of allocated memory;

Choose the operating system is Linux, Solaris or Windows.

3) What is the distribution of the JDK used? What is the version number?

Is it ZingJDK/Azul, OracleJDK, Open-JDK, OpenJ9 or other company's distribution?

Which version of the Java Virtual Machine Specification does the JDK correspond to?

Generally speaking, the choice of the collector is considered from the above points.

2. Give an example

Assuming that a B/S system that directly provides services to users is going to choose a garbage collector, generally speaking, delay time is the main concern of this type of application, then,

If you have sufficient budget but don’t have much tuning experience, then a set of proprietary hardware or software solutions with commercial technical support is a good choice. The Vega system that Azul used to promote and the Zing VM that is now the main push are in this regard. , so that you can use the legendary C4 collector.

If you don't have enough budget to use commercial solutions, but you can control the hardware and software models, use newer versions, and pay special attention to latency, then ZGC is worth trying.

If you have concerns about the stability of the collector that is still in the experimental state, or the application must run under the Windows operating system, then ZGC is out of the question, try Shenandoah.

If you are taking over a legacy system, and the software and hardware infrastructure and JDK version are relatively backward, then measure it according to the memory size. For heap memory of about 4GB to less than 6GB, CMS can generally handle it better, and for larger For heap memory, you can focus on G1.

Of course, the above analysis is only based on theory. In actual combat, you must not talk about it on paper. Testing according to the actual situation of the system is the ultimate basis for selecting a collector.

The above information comes from the book "In-depth Understanding of JAVA Virtual Machine (JVM Advanced Features and Best Practices 3rd Edition)" author Zhou Zhiming, here is the study to make corresponding records.

– If you are hungry for knowledge, be humble if you are foolish.

Guess you like

Origin blog.csdn.net/qq_42402854/article/details/130109393