JVM from entry to give up the ZGC garbage collector

Z Garbage Collector, also known as ZGC, is a scalable low-latency garbage collector introduced in jdk 11 and released in stable version in jdk 15. The aim is to meet the following objectives:

  • < 1ms maximum pause time (jdk < 16 is 10ms, jdk >=16 is <1ms).
  • Pause time does not increase with heap, live-set or root-set size.
  • Use heaps of memory sizes from 8MB to 16TB.

ZGC has the following characteristics:

  • Concurrency
  • region based
  • compression
  • NUMA aware
  • Use colored pointers
  • Use a load barrier

At the heart of ZGC is a concurrent garbage collector, which means that all the heavy lifting is done while the Java thread continues to execute. This greatly limits the impact of garbage collection on application response times.

ZGC features

The ZGC collector is a Region-based memory layout (temporarily) without generational generation. It uses technologies such as read barriers, colored pointers, and memory multi-mapping to implement concurrent mark-and-sort algorithms, with low latency as the first priority. A garbage collector for the target.

memory layout

ZGC has no concept of generation

Let's start with the memory layout of ZGC. Like Shenandoah and G1, ZGC also uses Region-based heap memory layout, but unlike them, ZGC's Region is dynamic (dynamic creation and destruction, and dynamic region capacity). Under the x64 hardware platform, the Region of ZGC can have three types of capacity: large, medium and small (as shown in the figure below):

  • Small Region (Small Region):  The capacity is fixed at 2M, and objects smaller than 256K can be stored.
  • ZTE Region (Medium Region):  The capacity is fixed at 32M, and objects larger than or equal to 256K but less than 4M can be placed.
  • Large Region (Large Region):  The capacity is not fixed and can be changed dynamically, but must be an integer multiple of 2MB for placing large objects of 4MB or more.

NUMA-aware

NUMA corresponds to NMA, UMA is Uniform Memory Access Architecture, NUMA is Non Uniform Memory Access Architecture. UMA means that there is only one memory, and all CUUs have to access these memories, so there will be competition problems (competition for memory bus access rights), If there is competition, it is necessary to lock, and the efficiency of locking will be affected, and the more CPU cores, the more intense the competition. In the case of NUMA, each CPU has a corresponding memory block, and this memory is closest to the CPU on the motherboard. Each CPU has priority to access this memory, so the efficiency will naturally improve.

The NUMA architecture of the server is very popular on medium and large systems, which is a high-performance solution, especially in terms of system latency. ZGC is able to automatically perceive the NUMA architecture and make full use of the characteristics of the NUMA architecture.

Colored Pointer

Colored Pointer, that is, colored pointer, as shown in the figure, is one of the core designs of ZGC. The GC information of the previous garbage collector is stored in the object port, while the GC information of ZGC is stored in the pointer (the tag information is directly recorded on the reference pointer of the object).

Each object has a 64-bit pointer, and these 64 bits are divided into:

  • 18 bits: Reserved for future use.
  • 1 bit: Finalizable flag, this bit is related to concurrent reference processing, it indicates that this object can only be accessed through finalizer (finalizer: an empty method of the object base class, if it is overridden, the method will be called before the GC, this method will be called only once).
  • 1 bit: Remapped flag, after setting the value of this bit, the object does not point to the relocation set (relocation set represents the Region set that needs GC).
  • 1 bit: Marked1 identification.
  • 1 bit: Marked0 identification, and Marked1 above are marked objects for assisting GC.
  • 42 bits: the address of the object (so it can support 2^42=4T memory):

Why are there two marks?

At the beginning of each GC cycle, the used flag bits are swapped, invalidating the flagged state corrected in the previous GC cycle, and all references become untagged. GC cycle 1: If mark0 is used, all reference marks will become 01 at the end of the cycle. GC cycle 2: use mark1, the same as cycle 1, all mark marks will become 10.

Can't ZGC do pointer compression?

Pointer compression refers to compression to 32 bits, and the number of addressing bits cannot exceed 35, that is, the maximum JVM memory is 32G (2^35=32GB), and the number of addressing bits here has reached 42 bits.

Three advantages of color pointer ?

  1. After all surviving objects in a Region have been removed (after copying), the Region can be released immediately, because it still has a forwarding table to record the original address and the new address. In this case, theoretically, as long as the When a Region object is free, ZGC can complete garbage collection.
  2. The color pointer has the "Self-Healing" ability of the pointer, which reduces write barriers (such as incremental updates or original snapshots in three-color markers), and only one read barrier is needed to solve the problem, reducing the The number of memory barriers used.
  3. The color pointer has great expansibility, because there are 18 unused bits, which is more conducive to the expansion of subsequent functions.

multimap addressing

The conversion relationship of different virtual machine memory to physical memory can be implemented at the hardware level, the operating system level or the software level. On the Linux platform, ZGC uses multi-mapping (Mult-Mapping) to map multiple different virtual memory addresses to the same physical memory address, which is a many-to-one mapping. The address space is larger than the heap memory capacity of the opportunity. Consider the flag bit in the coloring pointer as an address segment, so as long as these different address segments are mapped to the same welfare space, after multiple mapping conversions, you can directly use the coloring pointer for addressing. ,As shown below:

Multi-mapping technology may indeed bring some additional benefits such as easier copying of large objects, but from the source, ZGC's multi-mapping is only a derivative of colored pointers, not specifically for implementing some other feature. made on demand.

read barrier

ZGC uses the read barrier method to correct pointer references. Since ZGC uses the method of copying and sorting for GC, it is very likely that the program calls the object when the pointer position has not been updated after the position of the object has been changed. When obtaining the reference of the object in parallel, ZGC will read the pointer of the object and determine the Remapped identification. If the identification is that the object is located in the region that needs to be cleaned this time, the object will have a memory address change and will Replace the reference address of the original object with the new reference address in the pointer, and then return.

In this way, the use of read barriers solves the object read problem of concurrent GC.

Object o = obj.fieldA;    // Loading an object reference from heap
<load barrier needed here>
Object p = o;             // No barrier, not a load from heap
o.doSomething();          // No barrier, not a load from heap
int i = obj.fieldB;       // No barrier, not an object reference

The existence of LoadBarriers will lead to lower throughput of applications configured with ZGC. The official test data is an additional 4% overhead:

ZGC working process

The operation process of ZGC can be divided into the following four stages:

ZGC processing process.png

Concurrent Mark: Like G1 and Shenandoah, concurrent mark is the stage of traversing the object graph for reachability analysis, and it also goes through the initial mark and final mark similar to G1 and Shenandoah before and after (although the name in ZGC is not called These) are short pauses, and what these pauses do is similar in purpose. Different from G1 and Shenandoah, the marking of ZGC is carried out on the pointer instead of the object, and the Marked 0 and Marked 1 flags in the dyed pointer will be updated in the marking phase.

Concurrent Prepare for Relocate (Concurrent Prepare for Relocate): At this stage, it is necessary to calculate which Regions to be cleaned up in this collection process according to specific query conditions, and form these Regions into a Relocation Set (Relocation Set). The redistribution set is still different from the G1 collector's collection set. The purpose of ZGC's division of regions is not to do incremental collection with priority of revenue like G1. On the contrary, ZGC scans all Regions for each collection, exchanging the cost of scanning a wider range for the maintenance cost of the memory set in G1. Therefore, the reallocation set of ZGC only determines that the surviving objects in it will be re-copied to other regions, and the regions in them will be released, but it cannot be said that the recovery behavior is only performed for the regions in this set, because the marking process is for the full heap. In addition, class unloading and handling of weak references, which are supported in ZGC of JDK 12, are also completed in this phase.

Concurrent Relocate: Reallocation is the core stage in the execution process of ZGC. In this process, the surviving objects in the reallocation set are copied to the new Region, and a forwarding table is maintained for each Region in the reallocation set ( Forward Table), which records the forwarding relationship from the old object to the new object. Thanks to the support of colored pointers, the ZGC collector can clearly know whether an object is in the reallocation set only from the reference. If the user thread concurrently accesses the object in the reallocation set at this time, the access will be Intercepted by the preset memory barrier, and then immediately forwards the access to the newly copied object according to the forwarding table record on the Region, and at the same time corrects and updates the value of the reference so that it directly points to the new object. ZGC calls this behavior. For the "self-healing" (Self-Healing) capability of the pointer.

The advantage of this is that only the first access to the old object will fall into forwarding, that is, it is only slow once. Compared with Shenandoah's Brooks forwarding pointer, it is a fixed overhead that must be paid for each object access. Simply put, it is slow every time. , so the runtime load of ZGC on user programs is lower than Shenandoah. Another direct benefit is that due to the existence of the dyed pointer, once the surviving objects of a Region in the reallocation set are copied, the Region can be released immediately for the allocation of new objects (but the forwarding table has to be kept. freed), even if there are still many unupdated pointers to this object in the heap, it does not matter, these old pointers can heal themselves once they are used.

Concurrent Remap: What remapping does is fix all references to old objects in the reallocation set in the entire heap, which is the same as the Shenandoah concurrent reference update phase from a target perspective, but ZGC's concurrent remapping It's not a task that must be "urgent" to complete, because as mentioned earlier, even if it is an old reference, it can be self-healing, at most just one more forwarding and correction operation when it is used for the first time. The main purpose of remapping to clean up these old references is to not slow down (and the side benefit of freeing the forwarding table after the cleanup is complete), so it is not very "urgent". Therefore, ZGC cleverly merges the work to be done in the concurrent remapping phase into the concurrent marking phase in the next garbage collection cycle to complete. Anyway, they all have to traverse all objects, so the merge saves a traversal of objects. s expenses. Once all pointers are corrected, the forwarding table that originally recorded the relationship between the old and new objects can be released.

ZGC core parameters

ZGC trigger timing

Several trigger GC scenarios in ZGC:

  • Timing trigger:  It is not used by default and can be configured through the ZCollectionInterval parameter. Keyword "Timer" in the GC log.
  • Warm-up trigger:  up to three times, trigger when the heap memory space reaches 10%, 20%, and 30%, mainly through the time of GC, to prepare for other GC triggers. GC log keyword "Warmup".
  • Allocation rate:  Based on normal distribution statistics, calculate the possible maximum allocation rate of 99% of memory, and the time point when memory will be exhausted at this rate, trigger GC before exhaustion (exhaustion time, maximum duration of one GC - one GC detection cycle time). GC log keyword "Allocation Rate".
  • Active trigger:  (enabled by default, can be configured through ZProactictive parameter) The heap memory increases by 10% since the last GC. When it exceeds 5 minutes, compare the interval time limit of the last GC (the maximum duration of a GC), and trigger if it exceeds. GC log keyword "Proactive".
  • Metadata allocation trigger:  Caused by insufficient metadata area, the GC log key is "Metadata GC Threshold"
  • Direct trigger:  The code shows that the call to System.gc() is triggered, and the GC log keyword is "System.gc()".
  • Blocking memory allocation request triggered:  Garbage objects have no time to wave their hands, occupying the entire heap space, causing some threads to block, and the GC log keyword is "Allocation Stall".

ZGC log analysis

We will do an analysis of a ZGC LOG for a simple program below. The following is the specific code and analysis.

sample code

Here is a simple piece of code:

/**
 * VM Args:-XX:+UseZGC -Xmx8m -Xlog:gc*
 */
public class HeapOOM {

    public static void main(String[] args) {
        List<byte[]> list = new ArrayList<>();
        while (true) {
            list.add(new byte[2048]);
        }
    }
}

GC log analysis

The GC log is as follows (the operating environment is JDK 17). For example, each line in the GC log is marked with information about the GC process. The key information is as follows:

  • Start:  Start the GC and indicate the reason for the GC trigger. The trigger in the above figure is the adaptive algorithm.
  • Phase-Pause Mark Start:  initial mark, will STW.
  • Phase-Pause Mark End: Mark  again, will STW.
  • Phase-Pause Relocate Start:  Initial transfer, will be STW.

Heap information: Records the change of heap size before and after Mark and Relocate during GC. High and Low record the maximum and minimum values. We generally pay attention to the value of Used in High. If it reaches 100%, there must be insufficient memory allocation during the GC process. It is necessary to adjust the trigger timing of GC, earlier or earlier. GC fast.

GC information statistics: You can print garbage collection information regularly, and observe all statistics from startup to the present within 10 seconds, 10 minutes, and 10 hours. Using these statistics, you can troubleshoot and locate some abnormal points.

ZGC summary

This paper mainly describes the characteristics and working process of ZGC conceptually.

At present, most Internet companies still use jdk 8 and jdk 11. The mainstream use is ParNew + CMS combination or G1.

For our front-line Java developers, we should have the enthusiasm and attention to learn new technologies in order to maintain an advantage in the fierce social competition.

Guess you like

Origin blog.csdn.net/Trouvailless/article/details/124273204