Take you step by step to understand CMS and G1 garbage collector

Reasons for the different garbage collection algorithms between the new generation and the old generation

In the new generation. Every time a garbage collection finds that a large number of objects die and only a few survive, then the replication algorithm is selected, and the collection can be completed only by paying the replication cost of a small number of surviving objects. In the old age, because of the high survival rate of the object, there is no extra space to guarantee it, so the "mark-clear" or "mark sorting" algorithm is needed for recycling.

 

The relationship between garbage collectors

image.png

The upper part of the figure is the new-generation garbage collector, and the bottom is the old-generation garbage collector. The connection can be used together. There is no best garbage collector, only the most suitable one is reasonable.

 

Serial/Serial Old

The oldest garbage collector that existed when the JVM was first born. The feature is single-threaded, exclusive, suitable for single CPU, generally used in client mode.

This kind of garbage collector is only suitable for garbage collection of dozens of megabytes to one or two hundred megabytes of heap space. (The pause time can be controlled at about 100ms). If you exceed this memory size, the recovery speed will be very slow. So this garbage collector has now become a tasteless one. (Actually, it is generally used as a substitute for CMS)

image.png

The serial garbage collector can be used by the new generation and the old generation by parameter setting during operation.

 

Paraller(平行) Scaven/Paraller old

In order to improve efficiency, start from JDK1.3. JVM began to use a multi-threaded garbage collection mechanism. For garbage collectors that focus on throughput, high throughput can efficiently use CPU time to complete the program's computing tasks as soon as possible. It is mainly suitable for tasks that do not have much interaction in the background, that is, CPU-intensive tasks. (Extension: When the CPU performs centralized data operations, it is CPU-intensive, and when the CPU spends most of the time waiting for IO read and write operations, it is IO-intensive. Note: IO-intensive mainly works on hard disks. So it can Learn more about the difference between mechanical hard drives and solid state drives )

Throughput: The so-called throughput is the ratio of the time the CPU spends running code to the time the CPU consumes, that is, throughput = time running user code/(time running user code + garbage collection time). It is equivalent to the time spent dealing with STW by violent means, and it has not been fundamentally optimized, but good results have been achieved. The virtual machine runs for a total of 100 minutes, of which garbage collection takes up 1 minute, so the throughput is 99%. ( One more thing to remember: Mom asked me to study for 8 hours today, I study for half an hour, and play for 1 and a half hours on my mobile phone. When the 8 hours are up, I told my mom " Mom! I'm finished! " , the actual my throughput The amount is very low. In short, we can understand our concentration ).

The garbage collector is suitable for reclaiming a hundred megabytes to several gigabytes of memory in the heap space.

image.png

At runtime, parameter settings can be used to make the new generation and the old generation use parallel (multi-threaded) garbage collectors.

 

-XX:MaxGCPauseMillis

This parameter can actively set the STW pause time. However, it is not that setting this parameter value to a small value can make the garbage collection faster. The garbage collection pause time is shortened at the expense of throughput (garbage collection becomes frequent) and new generation space. When the space of the new generation becomes smaller, the garbage collection triggers will become more and more frequent in exchange for more frequent garbage collection processing, so that the time of each STW is reduced in disguise. But each startup also takes time. Therefore, frequent garbage collection makes the throughput lower. Therefore, in the multi-threaded garbage collector, this value can be set to the default. This parameter is mainly for the G1 garbage collector to do service.

 

-XX:+UseAdaptiveSizePolicy

This parameter is turned on by default. When the parameter is activated, there is no need to manually set the top to allocate the size of the new generation, the proportion of Eden, and Survivor. The detailed parameters such as the object size of the old age have been promoted. The virtual machine collects performance monitoring information based on the current system operation. These parameters are dynamically adjusted to provide the most appropriate pause time or maximum throughput.

 

ParNew

Multi-threaded garbage collector to cooperate with CMS. For CMS, it collects the old generation itself, and the new generation garbage collector with which it is paired has Serial and ParNew options. ParNew and Paraller Scaven are basically the same, multi-threaded, multi-CPU, and have less pause time than Serial.

 

CMS(ConcurrentMark Sweep)

Note: Only the process of the old CMS garbage collector is included here

image.png

The Cms garbage collector is a garbage collector whose goal is to obtain the shortest pause time. At present, a large part of JAVA applications are concentrated on the server side of the Internet or B/S system. This type of application pays special attention to the response speed of the service, and hope that the system pause time is the shortest, and the system pause time is the shortest, so as to bring a better experience to the user. The CMS collector is implemented based on the "mark-sweep" algorithm. Its operation process is more complicated than the previous types of collectors. It is divided into four steps.

The four steps of CMS garbage collection

Initial mark

For a short time, just mark the objects that GC Roots can directly associate with. high speed. (The purpose of this is to allocate the garbage. New GC Roots will be generated in the process after the initial marking, which already corresponds to the reference chain, but the garbage generated by subsequent references is reserved for the next garbage collection).

Concurrent mark

Longer, at the same time as the user thread. Mark all reachable objects associated with GCRoots (initial marking). The time here is relatively long, so concurrent processing is adopted.

Remark

For a short time, in order to correct the mark record of the part of the object that changes due to the user's continued operation during the concurrent mark. The pause time at this stage will be slightly longer than the initial marking, but much shorter than the concurrent marking time. Many friends who are learning here for the first time may be puzzled. (Anyway, I didn't understand it for the first time, and I am not sure if I understood it right now). First of all, what happens when user threads and GC threads execute concurrently? The GC thread is also serial to its own thread when marking garbage. That is to say, when the reference of the object it has already marked has changed, he cannot go back and deal with it during the marking process.

Okay, now there are two situations: the user thread marks the objects that the GC thread has marked as alive as dead. What happens at this time? The answer is that the GC thread will not process this wave of this actual dead object, let it live a while, and wait until the next garbage collection to process it. This is what we call floating garbage.

Then there is another type, the GC thread treats it as garbage, but this object is re-referenced by other objects that have been determined to be alive by the GC thread. At this point, according to the reachability analysis, the object is not garbage, but it cannot be marked as resurrected by the GC thread that continues to perform concurrent marking. At this time, those resurrected objects may be recycled as dead objects. So how to avoid this problem? It needs to be re-marked to deal with.

In short, concurrent marking marks 90% or even 99 of the correct live garbage. Re-marking is to reconfirm whether the patient is completely dead before "the doctor cremates the patient", that is, to mark the resurrection garbage that we missed to live again. So how to do it specifically, the following article will mention the problem of missing the three-color mark and the solution.

Concurrent cleanup

Due to the longest concurrent marking and concurrent removal process in the entire marking, the garbage collection thread and the user thread work together. So in general, the memory recovery process of the CMS collector is executed concurrently with the user thread.

Obviously, this process is relatively long. CMS allows this relatively long GC work to be executed in parallel with user threads. It is precisely because of this step that everyone is hailed as a garbage collector "focusing on throughput". Obviously, this greatly reduces the STW time, but what is the price of doing so?

Disadvantage

CPU sensitive

Cms is very sensitive to processor resources, after all, it uses concurrent collection and concurrent recycling (user multithreading + GC thread). Therefore, when the number of cores is less than the number of threads, the impact on users is greater. When concurrent marking and concurrent clearing, the added GC thread will cause the CPU to perform more tasks. We know that CPU execution, when the number of threads is greater than the number of CPU cores, the CPU will start to switch context frequently to achieve the effect of concurrent execution. But the context switch will inevitably affect the running speed of the program.

 

Floating garbage

Due to the concurrent cleanup process of CMS, user threads continue to generate garbage. The garbage generated at this time will not be cleared because it has not been marked. So this part of garbage can only be left until the next garbage collection and re-marking and clearing. This kind of garbage is called "floating garbage".

In fact, many official words about floating garbage are not easy for us to understand. In fact, the reason why floating garbage is not easy to collect is because in the concurrent marking phase, the user thread and the GC thread execute concurrently. When the GC thread confirms an object at a certain moment, the root is reachable, and the user thread breaks the reachability chain at the next moment. At this time, it is difficult for the GC thread to look back and identify this object as garbage. This is floating garbage. But floating garbage has little effect on our actual recycling. So leave it to the next garbage collection.

Reserved space

Because user threads and GC threads happen in parallel. When the garbage is marked by the GC, the user thread continues to generate garbage. Therefore, some memory needs to be reserved for them. It also means that the CMS garbage collector cannot, like other old-generation collectors, wait until the old-generation is almost completely occupied before triggering garbage collection. In the JDK1.6 version, the space usage threshold of the old generation (92%).

If the reserved space is insufficient, Concurrent Mode Failure will occur. At this time, the virtual machine will temporarily enable Serial Old to replace the CMS. (Pause the user thread, but only start the post-single-threaded old-age garbage collector. At any time, CMS maintains a distance from the multi-threaded old-age garbage collector. There will always be you without me, and I without you).

 

Will generate space debris

CMS exclusive mark removal algorithm will generate garbage fragments.

We can imagine. As the program runs longer and longer, the memory of the old age will become more and more chaotic. At this time, the frequency of Full GC will get higher and higher. This is obviously contrary to the long-term stable operation of a server.

The biggest problem is that the mark removal algorithm used by CMS will cause memory fragmentation. When the memory is fragmented, it brings a lot of trouble to the allocation of large objects (after all, it takes up contiguous space). In order to solve this problem, CMS provides a parameter: -XX:+UseCMSCompactAtFullCollection, which is generally enabled. If the large object cannot be allocated, the memory is defragmented (using the mark defragmentation algorithm).

image.png

Instead of CMS for marking and sorting, it is the most primitive Serial Old garbage collector. And because the usage scenarios of CMS are often servers with large memory (dozens of gigabytes). At this time, the work efficiency of Serial Old can be imagined. The marking and sorting time of several hours at every turn is unacceptable for users. Therefore, in the earliest CMS application scenario, many users specified. The server must be restarted every day or every few days before running again.

  

CMS summary

In general, CMS is a concurrent garbage collector launched by JVM, which is very representative. But the CMS mark removal algorithm causes a serious problem of memory fragmentation. Even when memory fragmentation significantly affects performance, CMS will use the tag defragmentation algorithm to reorganize the memory during FullGC. But FullGC itself is a problem that we try to avoid (quite time-consuming and may cause lag and affect user experience). When space debris encounters a large object, FullGC is likely to be performed in advance. So no version is the default CMS.

Expansion: Why not use the tag sorting algorithm

Because in the CMS, garbage is removed concurrently. However, the tag tidying will modify the address of the object, which will affect the reference, thereby destroying the normal operation of the user thread.

This collector is suitable for recovering a few G to 20 G of heap memory. Can become the server's garbage collector.

Well. How to ensure that the garbage collection of the large memory server can maintain a long-term and stable process, and can also maintain a good user experience?

 

G1(Garbage First1)

Design ideas

With the increase of memory in JVM. STW's time has become an urgent problem for JVM to solve. If you follow the traditional generational model, you can't escape the unpredictability of STW. (Trash removal is always one wave after another).

In order to realize the time predictability of STW, there must first be a change in thinking. G1 "turns the heap memory into zeros" and divides the heap memory into multiple independent regions (Regions) of the same size. Each region can act as the Eden area, Survivor area, old age area, or even large area of ​​the new generation as needed. Object area. The collector can use different strategies to deal with regions that play different roles. In this way, whether it is a newly created object or an old object that has survived for a period of time, a good collection effect can be obtained for old objects that have survived multiple recycling.

If you are interviewing and just say the idea of ​​"breaking into parts", the interviewer will know that you are an expert.

Region

The Region may be Eden, it may be Survivor, or it may be a special type of Humongous area, dedicated to storing large objects. G1 believes that as long as the size of an object exceeds half of the size of a Region, it can be judged as a large object. And those super large objects that exceed the capacity of the entire Region will be stored in N consecutive Humobgous Regions. In most cases, when G1 recycles, Humongous Region is treated as part of the old generation.

image.png

parameter settings

Open parameters

-XX:+UseG1GC

Region is size-sensitive

-XX:+G1HeapRegionSize

It is generally recommended to increase this value gradually. As the size increases, garbage will survive longer and the GC interval will be longer. But each time the GC time will be longer

 

Maximum GC pause time

MaxGCPauseMillis

This parameter is important. Earlier we said that this parameter can be set in the previously mentioned multi-threaded garbage collector Paraller Scaven/Paraller old, but setting this parameter in Paraller Scaven/Paraller old is basically ineffective or even counterproductive (because it There is no way to control the GC time in a real sense. If the maximum pause time is reduced, then it can only reduce the space of the new generation. Collecting garbage in 3M space is definitely faster than 5M. But the shortcomings are still obvious, which obviously speeds up the GC. Frequency, which reduces throughput).

G1 appeared to pursue the controllability of this parameter. The key to G1's ability to control the pause time of garbage collection is that it needs to track the recycling value of each Region. It must know how many objects in each Region are garbage. If this Region is recycled, how much time will be consumed. Collect more garbage objects in a limited time, and control the STW time caused by garbage collection within the time range specified by the development.

As shown in the figure below, the total time of the three STWs will be controlled as much as possible within the MaxGCPauseMillis we set. And the MaxGCPauseMillis value we set does not make us feel stuck in theory, so that we can ensure the long-term stable operation of the server, and also break the limitation of the memory size for our garbage collection.

working process

image.png

The running process of G1 can be roughly divided into 4 steps:

Initial mark

Just mark the objects that the following GC Roots can be directly associated with, and modify the TAMS pointer so that when the next stage of user threads run concurrently, new objects can be correctly allocated in the available Regions. STW is required at this stage, but it is very fast, and it is done synchronously with the help of Minor GC. So there is no additional pause at this stage of the G1 collector.

What is TAMS?

This passage is my own understanding: First of all, we are using either the mark removal algorithm or the mark sorting algorithm. We all need to ensure that the traversal of the object graph is performed on a consistent snapshot. This snapshot is somewhat similar to the STAB snapshot, but its real function is to confirm the entire object graph that needs to be traversed when making the initial mark. Therefore, when the GC collects garbage, those newly added GC Roots and newly added objects are not included in the scope of this GC, and all are left for the next GC to process. It is better to separate the data, so that the understanding is also better to convince the subsequent three-color mark why the missing label problem must be a broken gray to white reference chain, and it is considered as a missing label when it receives other scanned black references. And it doesn’t count if the reference chain from white to white object is broken and connected to black object (I don’t know if there are readers who break the forehead like bloggers. It’s taken for granted in the book and other blogs).

The above are all foreshadowing, then TAMS is G1's means to exclude the newly added objects of the user thread during concurrent marking of this GC process. G1 has designed two pointers named TAMS for each Region. Part of the space is divided from the Region area to record new objects in the concurrent collection process. It is considered that they must be alive and not included in the scope of this GC.

Concurrent mark

Begin with GC Roots to analyze the reachability of objects in the heap. Recursively scan the entire object graph to find the object to be recycled. This stage takes a long time, but it can be executed concurrently with the user program. After the scanning of the object graph is completed, there are objects whose references change concurrently (this is divided into floating garbage and missing label issues. Floating garbage is described in detail in this article. Missing label issues will be detailed in the next chapter).

Final mark (STW)

G1 uses STAB (original snapshot) to deal with the problem of missing labels.

Screening and recycling

Other garbage collections can be collected concurrently, and of course G1 can too! But why should STW come down for recycling? The point is to filter the two words. After the region design is generated, the recovery cost of each region will be sorted. At this time, according to the expected STW time value we set, a Region set is selected for recycling. Then copy the surviving objects of the part of the region decided to be recycled to the empty region. Finally, clean up the entire old Region space.

Look, look. Isn't this the copy algorithm + tag sorting algorithm?

And because the marking and sorting algorithm will cause the activity of the object, it must be STW and then completed by multiple GC threads in parallel.

Features

Parallel and concurrency

G1 can take full advantage of the hardware advantages in a multi-CPU and multi-core environment, and use multiple CPUs to shorten the STW time. And in the concurrent marking phase, the user thread can be executed concurrently with the GC thread.

Generational collection

Like other collectors, the concept of generation is still retained in G1. Although G1 as a whole is a collector based on the "mark-sort" algorithm. From a local (between two Regions) point of view, the "copy" algorithm is implemented. In any case, these two algorithms mean that there is no memory fragmentation problem during G1 operation. After collection, it can provide regular available memory, which is conducive to long-time running. Allocating large objects will not cause Full GC because no contiguous memory space is found.

Pursue pause time

-XX:MaxGCPauseMillis specifies the maximum pause time of the target, G1 will analyze by itself to obtain the recyclable Region set for recycling. (At the same time, G1 is not pursuing the collection of garbage cleanly all at once, but collecting it within the specified time. Although it may cause the collection to be more frequent, it is basically impossible to feel that the STW time is controllable every time).

scenes to be used

The larger the memory, the better. Generally speaking, the balance point between CMS and G1 is 6~8G. Then there is the world of G1.

Why does the feature not include CPU sensitivity

Compared with CMS, G1 has one step less concurrent removal. (Concurrent cleaning can be processed concurrently without modifying the reference relationship). So the CPU is not so sensitive.

 

Stop The Word

Watch the first-generation garbage collector (Serial/Serial Old). Both the new generation and the old generation will suspend all user threads and start the GC thread to start cleaning up until the garbage collection is complete. This pause is called "Stop The Word". But this Stop The Word brings a very bad experience to the user thread. When the Eden area of ​​the new generation is full, a garbage collection is required. If we use 1 hour to occupy the Eden area once, then the computer may pause responding for five minutes at this time. If the garbage collection of the old generation is triggered over time (the old generation is twice the memory of the new generation by default), we can directly take a break between classes. This is also one of the reasons why the performance of the early JVM was criticized by C/C++. Therefore, the JVM team has been working hard to eliminate or reduce the time spent on STW.

Guess you like

Origin blog.csdn.net/weixin_47184173/article/details/109635083