Talking about JVM from the principle (3): Detailed explanation of modern garbage collectors Shenandoah and ZGC | JD Cloud technical team

Author: JD Technology Kang Zhixing

Shenandoah

The word Shenandoah comes from the Indian language. In the 1840s, there was a famous sailing song widely circulated among sailors. It told the story of a young rich businessman who fell in love with the daughter of the Indian chief Shenandoah. Later, a small river in the west of Virginia in the United States was named after it, so the Chinese translation of Shenandoah is "Lover's Ferry".

Shenandoah first appeared in Open JDK12 and was developed by Red Hat, mainly to solve the problem of long pauses when various garbage collectors processed large heaps.

Compared with G1, which achieves a low pause of 100 milliseconds, Shenandoah's design goal is to compress the pause to 10ms, regardless of the heap size. Its design is very aggressive, and many design points are more inclined to trade off low pauses than high throughput.

"Successor to G1"

Shenandoah is the garbage processor in OpenJDK, but compared to the ZGC in Oracle JDK, Shenandoah can be said to be more like the successor of G1. It is very similar to G1 in many aspects, and even shares part of the code.

Overall, there are three main differences between Shenandoah and G1:

1. The recycling of G1 requires STW, and this part of the pause accounts for more than 80% of the overall pause time. Shenandoah realizes concurrent recycling.

2. Shenandoah no longer distinguishes between the young generation and the old generation.

3. Shenandoah uses a connection matrix instead of the card table in G1.

For a detailed introduction to G1, please read the previous article: Talking about JVM from the principle (2): From serial collector to partition collection creator G1

Connection Matrix

Each Region in G1 has to maintain a card table, which consumes computing resources and occupies a very large memory space. Shenandoah uses a connection matrix to optimize this problem.

The connection matrix can be simply understood as a two-dimensional table. If there is an object in Region A pointing to an object in Region B, then mark row A and column B of the table.

For example, Region 1 points to Region 3, Region 4 points to Region 2, and Region 3 points to Region 5:

connection-matrix

Compared with the memory set of G1, the granularity of the connection matrix is ​​coarser, directly pointing to the entire Region, so the scanning range is larger. But since GC is performed concurrently at this time, this is a decision to compromise throughput by choosing a connection matrix with lower resource consumption.

forwarding pointer

Performance benefits of forwarding pointers

To achieve concurrent recycling, it is necessary to gradually copy surviving objects to empty Regions while user threads are running. During this process, two objects, old and new, will exist in the heap at the same time. So how to let the user thread access the new object?

Previously, a protection trap (Memory Protection Trap) was usually set on the original memory of the old object. When the old object was accessed, a self-trap exception occurred, causing the program to enter the preset exception handler, and then the processor The code in forwards access to the new object after copying.

Self-trap is initiated by a thread to interrupt the currently executing program, and then obtain the right to use the CPU. This operation usually requires the participation of the operating system, and then there will be a conversion from user mode to kernel mode, which is very expensive.

So Rodney A. Brooks proposed to use the forwarding pointer to access the new object through the old object: add a new reference field in front of the object header, point to itself in the case of non-concurrent movement, and point to the new object after the new object is generated. Then when accessing an object, you need to access the forwarding pointer first to see where it points. Although compared with the memory trap scheme, it also requires one more access and forwarding overhead, but the former consumes much less.

brooks-pointers

The problem with forwarding pointers

There are two main problems with forwarding pointers: thread safety when modifying and performance problems with high-frequency access .

1. A forwarding pointer is added to the object body. There are thread safety issues in the modification of this pointer and the modification of the object itself. If it is accessed, it may happen that after the new object is copied, the old object is modified before the forwarding object is modified, and there is a problem of inconsistency between the two objects. For this problem, Shenandoah uses CAS operations to ensure the correctness of the modification.

2. The addition of forwarding pointers needs to cover all object access scenarios, including reading, writing, locking, etc., so it is necessary to set a read barrier and a write barrier at the same time. In particular, read operations occur more frequently than pure write operations, so the performance problems caused by such high-frequency operations have a huge impact. So Shenandoah optimized this in JDK13, changing the memory barrier model to a reference access barrier, that is, only adding a barrier to the read and write operations of the reference type in the object, regardless of the operation of the original object, which saves To a large number of object access operations.

Shenandoah's running steps

shenandoah-steps

  1. Initial mark (Init Mark) [STW] [same as G1]

Mark objects directly associated with GC Roots.

  1. Concurrent Marking [same as G1]

Traverse the object graph and mark all reachable objects.

  1. Final Mark (Final Mark) [STW] [same as G1]

Process the remaining SATB scans, and count the Regions with the highest recovery value at this stage, and form these Regions into a recovery set.

  1. Concurrent Cleanup

Recycle all Regions that do not contain any surviving objects (this type of Region is called Immediate Garbage Region).

  1. Concurrent Evacuation

Copy the inventory objects in the collection to an unused Region. If the surviving objects are copied concurrently, there will be two copies of the same object in the heap at the same time, so there will be read-write consistency problems for the object. Shenandoah solves this problem by using forwarding pointers to direct requests from old objects to new objects. This is also the biggest difference between Shenandoah and other GCs.

  1. Init Update References [STW]

After concurrent collection, all references to old objects need to be fixed to new objects. In fact, there is no actual operation at this stage, but a blocking point is set to ensure that all the above concurrent operations have been completed.

  1. Concurrent Update References

Traverse the heap space linearly along the memory physical address, and update the references of the objects copied in the concurrent recovery phase.

  1. Final Update References [STW]

After the references in the heap space are updated, the references in GC Roots need to be corrected finally.

  1. Concurrent Cleanup

At this point, all the Regions in the recovery set should become Immediate Garbage Regions. Perform concurrent cleanup again to recycle all these Regions.

ZGC

ZGC is officially developed by Oracle and introduced in JDK11, and is ready for production in JDK15. Its design defines three major goals:

1. Support TB level memory

2. The pause is controlled within 10ms and does not increase with the increase of the heap size

3. The impact on program throughput is less than 15%

With the iteration of JDK, the current JDK16 and above versions, ZGC can achieve a pause of no more than 1 millisecond, which is suitable for heap sizes between 8MB and 16TB.

Memory layout of ZGC

Like G1, ZGC also adopts the heap memory layout by region. The difference is that ZGC's Region (officially called Page, the concept is the same as G1's Region) can be dynamically created and destroyed, and its capacity can also be dynamically adjusted.

The Region of ZGC is divided into three types:

1. The small Region capacity is fixed at 2MB, which is used to store objects smaller than 256KB.

2. The medium-sized Region capacity is fixed at 32MB, which is used to store objects greater than or equal to 256KB but less than 4MB.

3. The capacity of a large region is an integer multiple of 2MB, storing objects with a size of 4MB or above, and only one large object is stored in each large region. Since large objects are expensive to move, the object will not be reallocated.

zgc-regions

Relocation Set (Relocation Set)

The recovery collection in G1 is used to store all Regions that need to be scanned by G1. In order to save the maintenance of the card table, ZGC will scan all Regions during the marking process. If it is determined that the surviving objects in a Region need to be reallocated, then the Regions are put into the redistribution set.

In layman's terms, if the GC is divided into two main stages of marking and recycling, then the collection set is used to determine which Regions to mark , and the reallocation set is used to determine which Regions to recycle .

dyed pointer

Like Shenandoah, ZGC also implements concurrent recycling. The difference is that the former is implemented using forwarding pointers, while the latter is implemented using the technology of dyed pointers.

The three-color mark has nothing to do with the object in essence, it is only related to the reference: it is determined whether the object is alive or not through the reference relationship. Different garbage collectors in the HotSpot virtual machine have different processing methods, some are marked in the object header, some are marked in a separate data structure, and ZGC is directly marked on the pointer.

The 64-bit machine pointer is 64-bit, and the upper 18 bits of the 64-bit under Linux cannot be used for addressing. Of the remaining 46 bits, ZGC selects 4 of them to assist GC work, and the other 42 bits can support a maximum memory of 4T, usually to That said, 4T of memory is completely sufficient.

Specifically, ZGC adds 4 flag bits in the pointer, including Finalizable, Remapped, Marked 0and Marked 1.

The source code comments are as follows:

 6                 4 4 4  4 4                                             0
 3                 7 6 5  2 1                                             0
+-------------------+-+----+-----------------------------------------------+
|00000000 00000000 0|0|1111|11 11111111 11111111 11111111 11111111 11111111|
+-------------------+-+----+-----------------------------------------------+
|                   | |    |
|                   | |    * 41-0 Object Offset (42-bits, 4TB address space)
|                   | |
|                   | * 45-42 Metadata Bits (4-bits)  0001 = Marked0
|                   |                                 0010 = Marked1
|                   |                                 0100 = Remapped
|                   |                                 1000 = Finalizable
|                   |
|                   * 46-46 Unused (1-bit, always zero)
|
* 63-47 Fixed (17-bits, always zero)


FinalizableThe identifier indicates whether the object can only finalize()be accessed through the method, Remapped, Marked 0and Marked 1are used as three-color marks (hereinafter referred to as M0and M1).

Why M0both M1?

Because the next garbage collection cycle can be performed without waiting for the remapping of the object pointer after the ZGC mark is completed, that is to say, the whole process of the two garbage collections overlaps, so two mark bits are used as two adjacent Marker for GC process, M0and M1use alternately.

The role of colored pointers in the GC process

We use the three colors of red, blue and yellow to represent the three marking states:

1. All pointers are in Remappedstate at the beginning of the first mark

  1. Starting from GC Root, traverse the scan along the object graph, and the surviving objects are marked asM0

  1. After marking is complete, concurrent reallocations begin. The ultimate goal is to move the three surviving objects A, B, and C to the new Region.

Throughout the marking process, newly allocated objects are directly marked as M0, such as object D.

After the object is copied, the pointer can be changed from M0 to Remapped, and the mapping relationship from the old object to the new object is saved in the forwarding table.

  1. If the system accesses object C at this time, it will trigger a read barrier, modify the original reference to the address of the new object C, forward the access, and finally delete the record in the forwarding table.

This behavior is called "self-healing" of the pointer.

In fact, if there is no object D, the old Page can be recycled after all inventory objects are transferred in the previous step, and all accesses can be forwarded to the new Page by relying on pointers and forwarding tables.

  1. The concurrent remapping phase will correct all references and delete the records in the forwarding table.

  1. After the next concurrent marking starts, because the previous garbage collection cycle has not been completed, Remappedthe pointer is marked M1to distinguish it from the previous surviving object marking.

It can be seen that in the process of concurrent marking, ZGC guarantees the correct forwarding of access through the read barrier, and because the coloring pointer adopts a lazy update strategy, compared with Shenandoah, which needs to access the forwarding pointer twice every time Fast on a lot.

Three advantages of dyed pointer

1. Due to the "self-healing" capability provided by the colored pointer, when a Page is cleared, it can be recycled immediately without waiting for the correction of all references to the Page.

2. ZGC does not need to use write barriers at all, for two reasons: due to the use of dyed pointers, there is no need to update the object body; there is no generation, so there is no need to record cross-generation references.

3. The dyed pointer has not been fully developed and used, and the remaining 18 bits provide a very large scalability.

The dyed pointer has a natural problem, that is, the operating system and processor do not fully support the modification of the pointer by the program.

Various memory maps

Colored pointers are only defined by the JVM, and may not be supported by the operating system or processor. In order to solve this problem, ZGC uses virtual memory mapping technology on the Linux/x86-64 platform.

ZGC creates three virtual memory addresses for each object, corresponding to Remapped, Marked 0and Marked 1, respectively, and points to different virtual memory addresses to represent different dyeing marks.

generations

ZGC does not have generations. This is not a technical trade-off, but based on workload considerations. So at present, the overall GC efficiency still has a lot of room for improvement.

read barrier

ZGC uses a read barrier to complete the "self-healing" of the pointer. Since ZGC currently does not have a generation, and ZGC scans all Regions to save the use of card tables, ZGC does not have a write barrier, which has become a major performance advantage of ZGC.

NUMA

When multi-core CPUs operate memory at the same time, there will be contention. Modern CPUs integrate the memory control system into the processor core, and each CPU core has its own local memory.

Under the NUMA architecture, ZGC will now allocate objects in its own local memory, avoiding competition for memory usage.

Before ZGC, only Parallet Scavenge supported NUMA memory allocation.

ZGC operation steps

ZGC, like Shenadoah, runs almost all phases concurrently with user threads. It also includes STW processes such as initial marking and re-marking, which have the same function and will not be described again. The following four concurrency phases are highlighted:

concurrent mark

The concurrent marking phase is the same as G1, which is to traverse the object graph for reachability analysis. The difference is that the ZGC mark is on the dyeing pointer.

Concurrent Prepared Reallocation

At this stage, ZGC will scan all Regions, and if the surviving objects in which Regions need to be allocated in new Regions, these Regions will be put into the reallocation set.

In addition, ZGC's class unloading and weak reference processing after JDK12 are also at this stage.

concurrent reallocation

At this stage, ZGC will copy the inventory objects in the Regions in the redistribution set to a new Region, and maintain a forwarding table for each Region in the redistribution set, recording the mapping relationship between old objects and new objects.

If the user thread concurrently accesses the object in the reallocation process at this stage, and finds that the object is in the reallocation set through the mark on the pointer, it will be intercepted by the read barrier, forward the access through the content of the forwarding table, and modify the referenced value.

ZGC calls this behavior Self-Healing. ZGC's design causes a forwarding to be triggered only when the pointer is accessed, which is much better than Shenandoah's forwarding pointer every time.

Another advantage is that if all objects in a Region have been copied, the Region can be recycled, as long as the forwarding table is kept.

concurrent remapping

The task of the last stage is to fix all the pointers and release the forwarding table.

The urgency of this phase is not high, so ZGC merges the concurrent remapping into the concurrent marking phase in the next garbage collection cycle, and they all need to traverse all objects anyway.

Summarize

Modern garbage collectors play the word "concurrency" to the extreme for the goal of low pauses. Shenandoah has made a lot of optimizations on the basis of G1 to make the recycling phase parallel, while ZGC directly uses black technologies such as dyed pointers and NUMA , the purpose is to allow Java developers to focus more on how to use objects to make the program run better, and leave everything to GC. All we do is to enjoy the good experience brought by modern GC technology .

reference:

1. Shenandoah inOpenJDK 17: sub-millisecond GC pauses [Translation] - Zhihu (zhihu.com)

2.https://shipilev.net/talks/devoxx-Nov2017-shenandoah.pdf

3.https://openjdk.java.net/jeps/333

Guess you like

Origin blog.csdn.net/jdcdev_/article/details/130380186