Comparison of CMS, G1, and ZGC for garbage collection

ZGC (The Z Garbage Collector) is a low-latency garbage collector introduced in JDK 11. Its design goals include:

  • Pause time does not exceed 10ms;
  • The pause time does not increase with the size of the heap, or the size of the active object;
  • Supports heaps of 8MB~4TB level (16TB will be supported in the future).

From the design goal, we know that ZGC is suitable for memory management and recycling of large memory and low latency services.

Features include:

  • Region-based memory layout
  • Temporarily no generation
  • Using technologies such as read barriers and color pointers to implement concurrent mark-sort algorithms
  • With low latency as the primary goal.

ZGC reached production-ready in JDK15, and JDK17 is the first ZGC version to start rolling out the long-term support of mature ZGC.

1. Mark clearing, copying, and mark finishing

1. Mark-Sweep

The Mark-Sweep algorithm is the ideological basis of modern garbage collection algorithms.

 

  • Marking phase: First pass the root node and mark all reachable objects starting from the root node. Unmarked objects are unreferenced garbage objects
  • Clearing phase: Clear all unmarked objects.

insufficient:

  • Efficiency problem: Both the marking and clearing processes are not efficient.
  • Space problem: A large number of discontinuous memory fragments will be generated after the mark is cleared. Too much space fragmentation may cause that when the program needs to allocate large objects in the future, it will not be able to find enough continuous memory and have to start another garbage collection in advance action.

Note: What is Clear?

The so-called clearing here is not really emptying, but saving the address of the object that needs to be cleared in the free address list. The next time a new object needs to be loaded, judge whether the location space of the garbage is enough, and store it if it is enough.

2. Copying

  • Divide the original memory space into two pieces, and only use one piece at a time
  • During garbage collection, the surviving objects in the memory being used are copied to the unused memory block, and then all objects in the memory block being used are cleared.
  • Swap the roles of the two memory blocks to complete garbage collection.

Compared with the mark-sweep algorithm, the copy algorithm is a relatively efficient recycling method.

  • It is not suitable for occasions with many surviving objects, such as the old generation.
  • No need to consider complex situations such as memory fragmentation, just move the top pointer of the heap and allocate memory in order, which is simple to implement and efficient to run.
  • It's just that the cost of this algorithm is to reduce the memory to half of the original size, which is too high.

Theoretically, the replication algorithm does not need the marking process. Starting from the gc roots, it will copy away when it encounters a live object, and the copying will be completed after the gc roots find the reachable object.

3. Marking and sorting algorithm Mark—Compact 

The copying algorithm will perform more copy operations when the object survival rate is high, and the efficiency will become lower. More importantly, if you do not want to waste 50% of the space, you need additional space for allocation guarantees to deal with the extreme situation where all objects in the used memory are 100% alive, so this algorithm cannot be directly used in the old generation.

According to the characteristics of the old generation, using the "Mark-Compact" algorithm, the marking process is still the same as the "Mark-Clear" algorithm, but the next step is not to directly clean up the recyclable objects, but to let all surviving objects be collected. Move toward one end, and then directly clean up the memory outside the end boundary.

The Young GC, G1, and ZGC of the new generation of CMS are all based on the mark-copy algorithm, but the difference in the specific implementation of the algorithm leads to huge performance differences.

4. Generation collection collection

Modern virtual machines basically use the theory of generational collection for garbage collection.

Generational cleaning is not a separate algorithm, but a collection theory. Generational collection combines the above three algorithms, and divides the memory into several blocks according to the life cycle of the object, and then adopts according to the characteristics of each block. The most appropriate collection algorithm.

Today's commercial virtual machines use the replication algorithm to recycle the new generation. IBM's special research shows that 98% of the objects in the new generation are "live and die", so it is not necessary to divide the memory space according to the ratio of 1:1. .

The new generation in the heap memory is divided into a larger Eden space and two smaller Survivor spaces. Each time Eden and one of the From Survivor are used, when recycling, the surviving objects in Eden and From Survivor are collected at one time. Copy it to another To Survivor space, and finally clean up Eden and the From Survivor space you just used.

The default Eden of the HotSpot virtual machine: From Survivor: To Survivor = 8:1:1, that is, the available memory space in each new generation is 90% (80%+10%) of the entire new generation capacity, and only 10% of the memory will being "wasted".

2. Review of CMS and G1

1. CMS collector

The CMS (Concurrent Mark Sweep) collector is a collector that aims to obtain the shortest recovery pause time. This is because when the CMS collector is working, the GC worker thread and the user thread can execute concurrently, so as to reduce the collection pause time.

The CMS collector only acts on the collection of the old generation and is based on the mark-clear algorithm. Its operation process is divided into 4 steps:

  • Initial mark (CMS initial mark)
  • Concurrent mark (CMS concurrent mark)
  • Remark (CMS remark)
  • Concurrent sweep (CMS concurrent sweep)

Among them, the two steps of initial marking and re-marking still require Stop-the-world. The initial marking is only to mark the objects that GC Roots can directly relate to, and the speed is very fast. The concurrent marking phase is the process of GC Roots Tracing, and the re-marking phase is to correct the marking caused by the continued operation of the user program during the concurrent marking period. The mark record of the changed part of the object, the pause time of this stage is generally slightly longer than the initial stage, but much shorter than the concurrent mark time.

CMS splits the collection cycle in a pipelined manner, keeping time-consuming operation units executed concurrently with application threads. Separate out only those operating units that require STW to execute, control these units to run at the right time, and ensure that they can be completed in a short time. In this way, during the entire collection cycle, there are only two short pauses (initial marking and re-marking), achieving the purpose of approximate concurrency.

Advantages of CMS collector: concurrent collection, low pause.

CMS Collector Disadvantages:

  • The CMS collector is very sensitive to CPU resources.
  • The CMS collector cannot handle floating garbage (Floating Garbage).
  • The CMS collector is based on the mark-sweep algorithm, which has both drawbacks (memory fragmentation).
  • Pause times are unpredictable.

The fundamental reason why the CMS collector can achieve concurrency is that it adopts an algorithm based on "mark-clear" and decomposes the algorithm process in a fine-grained manner. The previous chapter introduced that the mark-clear algorithm will generate a large amount of memory fragmentation, which is unacceptable for the new generation, so the new generation collector does not provide the CMS version.

2. G1 collector

G1 became the default garbage collection algorithm of the JVM after version 1.9. G1 is characterized by reducing pauses while maintaining a high recovery rate.

The G1 algorithm cancels the physical division of the young generation and the old generation in the heap, but it still belongs to the generational collector. The G1 algorithm divides the heap into several regions, called Regions, as shown in the small squares in the figure below. A part of the area is used as the young generation, a part is used as the old generation, and there is also a partition specially used to store huge objects.

Like CMS, G1 will traverse all the objects, and then mark the object references. After clearing the objects, it will copy and move the area to integrate the fragmented space.

The G1 recycling process is as follows.

  • The young generation collection of G1 adopts the copy algorithm to collect in parallel, and the collection process will be STW.
  • When the old generation of G1 is recycled, it will also recycle the young generation at the same time. It is mainly divided into four stages:
  • It is still the initial marking stage to complete the marking of the root object, and this process is STW;
  • Concurrent marking phase, this phase is executed in parallel with user threads;
  • The final marking stage, completing the three-color marking cycle;
  • In the copying/clearing phase, the Region with a large reclaimable space will be recovered first in this phase, that is, garbage first, which is also the origin of the name G1.

  

G1 uses incremental cleaning of only a part of the region instead of all regions each time, so as to ensure that each GC pause time will not be too long.

G1 is a logical generation rather than a physical division. It is necessary to know the recycling process and the pause stage.

In addition, you need to know that the G1 algorithm allows you to set the size of the Region through JVM parameters, ranging from 1 to 32MB, and you can set the expected maximum GC pause time, etc.

3. The core principle of ZGC

Comparison of ZGC and other GC algorithms, from RednaxelaFX:

The core idea of ​​this concurrent algorithm is:

  • In the marking phase, it is not so much to mark the object (record whether the object has been marked), but to mark the pointer (record whether each pointer in the GC heap has been marked). This is very different from the traditional GC algorithm for three-color marked objects, although the two are equivalent in terms of convergence - eventually all objects and all pointers will be traversed.
  • In the stage of marking and moving objects, every time a pointer is read from the reference type field of the object in the GC heap, the pointer will pass through a "Loaded Value Barrier" (LVB). This is a "Read Barrier" (read barrier), will do different things at different stages. The simplest thing is that in the marking phase, it will mark the pointer and "fix" the pointer in the heap to the new marked value; while in the moving object phase, the barrier will update the read pointer Go to the new address of the object, and "fix" the pointer in the heap to the original field. In this way, even if the GC moves the object, the read barrier will find and correct the pointer, so the application code will always hold the updated valid pointer, without the need for the coarsest-grained synchronization method of stop-the-world. Synchronization between GC and application.
  • One very important point in LVB is the "self healing" nature: if there is a pointer on the heap that is currently in the "not yet updated" state, it will be updated in place once it passes through the LVB, so this field is accessed again in the same GC cycle If so, no correction is required. In this way, the performance overhead (drop in throughput) brought by LVB is very short-lived, unlike the Brooks indirection pointer used by Shenandoah GC, which is always slow.

1. More comprehensive concurrency

Similar to ParNew and G1 in CMS, ZGC also uses the mark-copy algorithm, but ZGC has made significant improvements to the algorithm: ZGC is almost concurrent in the marking, transfer and relocation stages, which is why ZGC achieves a pause time of less than 10ms The most critical reason for the goal.

The ZGC garbage collection cycle is shown in the following figure:

ZGC has only three STW phases: initial marking, re-marking, and initial transfer.

Among them, the initial marking and the initial transfer only need to scan all GC Roots respectively, and the processing time is proportional to the number of GC Roots, which generally takes a very short time; the STW time of the re-marking stage is very short, up to 1ms, and re-enter if it exceeds 1ms Concurrent marking phase. That is, almost all ZGC pauses only depend on the size of the GC Roots collection, and the pause time does not increase with the size of the heap or the size of active objects. In contrast to ZGC, the transfer phase of G1 is completely STW, and the pause time increases with the size of surviving objects.

2. Pointer coloring and read barriers

ZGC solves the problem of accurately accessing objects during the transfer process through coloring pointers and read barrier technology, and realizes concurrent transfer.

The general principle is described as follows: "Concurrent" in concurrent transfer means that the GC thread is in the process of transferring objects, and the application thread is also constantly accessing the object.

Assuming that the object is transferred, but the object address is not updated in time, the application thread may access the old address, resulting in an error.

In ZGC, the application thread accessing the object will trigger the "read barrier". If the object is found to be moved, the "read barrier" will update the read pointer to the new address of the object, so that the application thread always accesses the object. The new address of the object.

So, how does the JVM judge that the object has been moved? It is to use the address of the object reference, that is, the coloring pointer.

The technical details of colored pointers and read barriers are described below.

(1) Coloring pointer

Shading pointers is a technique for storing information in pointers.

ZGC only supports 64-bit systems. It divides the 64-bit virtual address space into multiple subspaces, as shown in the following figure:

Among them, [0~4TB) corresponds to the Java heap, [4TB ~ 8TB) is called M0 address space, [8TB ~ 12TB) is called M1 address space, [12TB ~ 16TB) is reserved for unused, [16TB ~ 20TB) is called Remapped space.

When an application creates an object, it first applies for a virtual address in the heap space, but the virtual address is not mapped to a real physical address. At the same time, ZGC will apply for a virtual address for the object in the M0, M1 and Remapped address spaces respectively, and these three virtual addresses correspond to the same physical address, but only one of the three spaces is valid at the same time. The reason why ZGC sets up three virtual address spaces is because it uses the idea of ​​"space for time" to reduce GC pause time. The space in "Space-for-Time" is a virtual space, not a real physical space. Subsequent chapters will introduce the switching process of these three spaces in detail.

Corresponding to the above address space division, ZGC actually only uses bits 0~41 of the 64-bit address space, while bits 42~45 store metadata, and bits 47~63 are fixed at 0.

ZGC stores object survival information in 42~45 bits, which is completely different from traditional garbage collection and puts object survival information in the object header.

(2) Read barrier

A read barrier is a technique by which the JVM inserts a small piece of code into the application code. This code is executed when an application thread reads an object reference from the heap. Note that this code is only triggered by "reading an object reference from the heap".

Example of a read barrier:

Object o = obj.FieldA // Read references from the heap, need to add a barrier
<Load barrier>
Object p = o // No need to add a barrier, because the reference is not read from the heap
o.dosomething() // No need to add a barrier, because the reference is not read from the heap
int i = obj.FieldB //No need to add a barrier, because it is not an object reference

The code function of the read barrier in ZGC: In the process of object marking and transfer, it is used to determine whether the reference address of the object meets the conditions and take corresponding actions.

3. Demonstration of ZGC concurrent processing

Next, we will introduce the switching process of the address view in a garbage collection cycle of ZGC in detail:

  • Initialization: After ZGC initialization, the address view of the entire memory space is set to Remapped. The program is running normally, objects are allocated in memory, garbage collection starts after certain conditions are met, and the marking phase is entered at this time.
  • Concurrent marking phase: When entering the marking phase for the first time, the view is M0. If the object is accessed by the GC marking thread or the application thread, then the address view of the object is adjusted from Remapped to M0. So, after the marking phase ends, the address of the object is either M0 view or Remapped. If the address of the object is the M0 view, the object is active; if the address of the object is the Remapped view, the object is inactive.
  • Concurrent transfer phase: After marking ends, it enters the transfer phase, at which point the address view is set to Remapped again. If the object is accessed by the GC transfer thread or the application thread, then the address view of the object is adjusted from M0 to Remapped.

In fact, there are two address views M0 and M1 in the marking phase, and the above process shows that only one address view is used. The reason why it is designed as two is to distinguish the previous mark from the current mark. That is, after entering the concurrent marking phase for the second time, the address view is adjusted to M1 instead of M0.

The coloring pointer and read barrier technology is not only applied in the concurrent transfer phase, but also in the concurrent marking phase: to set the object as marked, the traditional garbage collector needs to perform a memory access and put the object survival information in the object header; and In ZGC, you only need to set the 42nd to 45th bits of the pointer address, and because it is a register access, it is faster than accessing memory.

4. What are the disadvantages of ZGC

Although ZGC belongs to the latest GC technology, its advantages are not necessarily outstanding.

ZGC only has absolute advantages in specific situations, such as huge heaps and extremely low pause requirements. In fact, most developments are not too problematic in these two aspects (especially on the server side), and the performance of GC /Efficiency cares more.

In fact, GC technology has not developed much in recent years, that is to say, there is no silver bullet. Some advantages must be obtained at the expense of other aspects. ZGC is also very obvious. The official goal is not to lose more than 15%. G1GC performance, that is to say, it is definitely not comparable to G1 in terms of throughput rate, let alone a complete STW GC.

When ZGC is running, the following problems can be observed:

  1. Low single-generation GC throughput: The most notable problem is that the Concurrent Mark phase requires full heap marking (long time-consuming), resulting in a recovery speed that cannot keep up with the object allocation speed:
  2. There will be an allocation stall (Allocation Stall), and a new ZGC needs to be started. During this ZGC cycle, all application threads must be suspended;
  3. In the worst case, even OOM occurs: if the remaining space is still not enough in the Concurrent Relocate stage, OOM will be thrown;
  4. GC threads running concurrently lead to high CPU;
  5. Since ZGC uses colored pointer technology, it does not support UseCompressedOops (in contrast, ShenandoahGC can support it), which affects the performance of small heaps (below 32GB) to a certain extent;
  6. However, JDK15 can still enable UseCompressedClassPointers when UseCompressedOops is turned off, which alleviates the performance shortcomings to a certain extent;
  7. Object allocation freeze, in addition to the ZGC pause phase, is also affected by the following factors:
  8. The Page Cache Flush problem affects the allocation speed: ZGC divides the heap into pages of different sizes (corresponding to the Region of G1) - small/medium/large pages (objects of different sizes are allocated to different types of pages), if objects of various sizes The allocation speed is unstable (for example, if the number of medium-sized objects suddenly increases, it is necessary to convert large/small pages into medium pages, which is time-consuming), which has been relieved after JDK15 production-ready;
  9. Only a single medium page: In the case of many application threads, if multiple threads allocate medium-sized objects at the same time and the current medium page is not free enough, they will request to allocate a new medium page at the same time, and undo redundant allocation will delay the allocation , and may also cause the above Page Cache Flush to occur;
  10. RSS is particularly high, up to 3 times Xmx, which is caused by the multi-mapping mechanism of ZGC.

Guess you like

Origin blog.csdn.net/leread/article/details/129899146