In layman's language Java Virtual Machine (five) G1 Garbage Collector

This article is a "layman's language Java Virtual Machine" series of course notes, invasion deleted. Learning addresses in simple terms Java Virtual Machine

Presence of 1 CMS garbage collector problem

The following scenario is extreme, but it often happens.

In the event of Minor GC, because the area has not let go of Survivor, the extra objects can only enhance (promotion) to the old era. But this time old's in because of space debris, will be concurrent mode failure error. This time, we need to downgrade to collect as Serail Old garbage collector. This is more serious than the promotion failed concurrent mode failure problem.

A simple Major GC, can actually evolved into the longest Full GC. The most terrible is that this pause time is unpredictable.

Is there a way, we can first define a pause time and then reverse projections collect content? There are drops, G1 garbage collector to find out, it does not require waste disposal clean every time, it's just trying to do what it thought was right.

We ask G1, in any one second pause should not exceed 10ms, which is developing it in to a goal. G1 will try to achieve this goal, it is possible to calculate the area of ​​this generally minor collection, the collection is completed incrementally.

This is one argument G1 garbage collector has to be provided of: -XX: MaxGCPauseMillis = 10

2 What is the G1 garbage collector?

G1 garbage collector CMS substituted for the garbage collector.

G1 garbage collector and other garbage collector on the division of the heap a little different. Other collector, are on the whole collection a decade to collect on bad natural time control. G1 heap cut a lot of copies, get a handle as a small target, part of the goal is easy to reach.

So, G1 have to distinguish between the old and the young generation's it?

Here Insert Picture Description
As shown, there is the concept of Eden Gl and Survivor zone region, but they are not continuous in memory, but by a small portion of a small portion thereof.

The size of a small part of the area is fixed, the name is called small pile region (Region). Small heap may be Eden area, can also be a Survivor areas, it can also be Old District. So the concepts are logically the young generation and the old age of the G1.

Region each block, the size is the same, its value is a power value between 1M to 32M bytes of a 2.

But if my object is too large, a Region does not fit how to do? Note that the figure there is a large area of ​​the yellow area, its name is called Humongous Region, Region 50% more than the size of the object will be assigned here.

So, recovery time, recovery in the end zone which small pile of it? What is random? This is certainly not. In fact, most of the small garbage heap, priority will be collected. This is the origin G1 (GarbageFirst GC) name.

3 G1 garbage collection process

Logically, G1 divided into the old and the young generation's, but it's the proportion of the young generation and the old, not so "fixed", in order to achieve specified MaxGCPauseMillis, G1 will automatically adjust the ratio between the two.

G1 recovery process is mainly divided into three categories:

  1. G1 "younger generation" of garbage collection, also called Minor GC, similar to the process and we described earlier, the timing of occurrence is the Eden area is full time
  2. Garbage collection of old age, in fact, strictly speaking not a collection, it is a process of "concurrent mark", the way to clean up a little bit objects
  3. The real clean-up took place in "mixed mode", it's more than clear up the young generation, the old part of the region's will also clean up

RSet

RSet is a spatial data structure for time.

We mentioned before called a card table (CardTable) data structure, to solve the problem of inter-generational references. RSet function similar to this, it's full name is RememberedSet, used to record and maintain the object reference relationship between the Region.

But RSet the Card Table some different places. Card Table is a points-out (who I quoted object) structure. The RSet recorded other Region in this Region object reference relationship of objects belonging to points-into structure (who cites my object), a little taste of inverted index.

You can RSet understood as a Hash, key is a reference to the Region address, value is a reference to an object of its card collection of pages.

With this data structure, when the recovery of a Region, you do not have to scan the entire heap memory for objects. It makes part of the collection as possible.

For the younger generation Region, its RSet only holds a reference from old age, it is because the young generation is recovered Region for all the young generation, no need to gild the lily. So the young generation Region of RSet there may be empty.

As for the Region's old, it's RSet it will only hold a reference to it's old. This is because before old's recovery, for the young generation will first be recovered. At this time, Eden area is empty, and in the recycling process will scan Survivor partition, so there is no need to save the young generation from the reference.

RSet usually take up a lot of space, about 5% or higher. Not just space, a lot of computational overhead is relatively large.

Specific recycling process

The G1 has a CSet concept. Its full name is Collection Set, a collection that is collected, stored in the GC to perform a range of garbage collection (Region). GC is in all survival data (Live Data) CSet in will be transferred.

The young generation recycling

Recycling is a younger generation of STW process, its intergenerational use RSet reference data structure to trace, a one-time recovery of the young generation out all Region.

When the JVM starts, G1 will first be ready Eden area, the program continues to create objects during the run to the Eden area, when all of the Eden area are full, G1 will start once the young generation garbage collection process.

Young generation collection includes the following recovery stage:

  1. Scan root: root, it can be seen as we described earlier GC Roots, together with other external reference Region of RSet record
  2. Update RS: handling dirty card queue in the card page, updated RSet. After this phase is complete, RSet can accurately reflect the references to old's memory object segment is located. It can be seen as the first step in supplement
  3. Processing RS: identify the object's old Eden pointed to the objects in the object pointed to Eden are believed to be the object of survival
  4. Copy the object: Yes, collection algorithm is still using the Copy algorithms. At this stage, the object tree is traversed in the memory segment Eden area live objects are copied to the hollow Region Survivor region. This process is the same and other garbage collection algorithms, including the subject's age and promotion
  5. Processing references: processing references Soft, Weak, Phantom, Final, JNI Weak and so on. The end of the collection
Concurrent mark

When the entire heap memory usage reaches a certain percentage (default is 45%), concurrent mark phase will be started. This ratio also can be adjusted by the parameters -XX: InitiatingHeapOccupancyPercent configuration.

Concurrent Marking is to provide services to mark Mixed GC, GC is not one at a time to be part of the process. Specific labeling procedure as follows:

  1. Initial labels: This process has shared Minor GC pause, because they can be multiplexed root scan operation. Although STW, but time is usually very short
  2. Root zone scan
  3. Concurrent mark: This stage marks the beginning of an object in the heap, marking thread and application threads execute in parallel from GC Roots, and collect various objects Region survival information
  4. Relabel: CMS and similar, but also the STW. Marking objects that change in the concurrent mark phase
  5. Clean-up phase: This process does not require STW. If you find Region was full of garbage, at this stage it will immediately be removed. Not all garbage Region, and will not be processed immediately, it will be in Mixed GC stage, collected
SATB

If the concurrent mark phase, there are new objects change, how to do? This is guaranteed by the algorithm SATB.

SATB That Snapshot At The Beginning, to ensure the correctness of the concurrent mark phase.
Here Insert Picture Description
This is a snapshot, there are several main pointer logic, the Region into a plurality of segments. The object allocation, during concurrent mark As shown, will be between next TAMS and top.

Mixing recovered (Mixed GC)

Concurrent years old can clean up in a small heap of entire area is an optimal situation. Mixed collection process, not only to clean up the young generation, but also part of the region's old also added to the CSet in.

By Concurrent Marking phase, we have counted the proportion of rubbish old age. After Minor GC, if it is judged that the proportion reached a certain threshold, it will trigger the next Mixed GC. This threshold, the -XX: setting (default 5% heap size) G1HeapWastePercent parameters. Because this case, GC will spend a lot of time but rarely recovered memory. So this parameter can also adjust the frequency of the Mixed GC.

Then there are parameters G1MixedGCCountTarget, for controlling a concurrent mark, is performed up to the number of Mixed GC.

4 ZGC

G1 possible problems

After the system is switched to the G1 garbage collector, GC serious problems occur line has been very little, thanks to predictive models G1 and its innovative partitioning scheme. But there will be time forecasting model failed, it does not always run so as we expected, especially after you give it a demanding target set.

In addition, if the application is very tight memory, the memory is part of the recovery was not enough, always want to recover the entire Heap, then the work will be done G1 would not be less than other garbage collector, but also because of the complexity of the algorithm itself also may be worse than any other collector.

ZGC features

  1. Dwell time will not exceed 10ms
  2. Dwell time will not increase with the increase of the heap (heap regardless of how much can be maintained at 10ms or less)
  3. It can support hundreds of M, even a few T's heap size (maximum support 4T)

In ZGC, the old and even the young generation's logical also removed, only one page is divided into blocks, each time GC, will be on the page compression operation, there is no fragmentation problem. ZGC also aware NUMA architecture, to improve access speed memory. Compared with the traditional collection algorithm, a ZGC reference pointer to the object on an issue directly, to identify the state of the object, it can be used only on 64-bit machines.

ZGC now using on-line is still very small. Even with, it can only be used on the Linux platform. Waiting for its popularity, it will take some time.

Published 134 original articles · won praise 263 · views 20000 +

Guess you like

Origin blog.csdn.net/Geffin/article/details/104731478