[JVM Basics] - Garbage Collector

1. What is a garbage collector

According to the name of the garbage collector, we can know that its main function is to reclaim garbage objects in memory that have been judged to be useless. But during the scanning process, the garbage collector is looking for and marking objects that are still alive. After finding all surviving objects, unmarked objects will be recycled uniformly.

For a garbage collector, it actually needs to complete three things

  1. Allocating memory: The design of garbage collection algorithms often determines the memory model and the way of memory allocation
  2. Ensure that live objects will not be affected by garbage collection
  3. Garbage collection

Second, the classification of the garbage collector

In the development process of JDK1 to JDK13, a total of 10 types of garbage collectors have appeared:
Insert picture description here
as shown in the figure above, there is a connection between the garbage collectors, which means that they can be used together.
Each type of garbage collector has its own characteristics, there is no best garbage collector, only the most suitable. We can choose the garbage collector we need according to their different characteristics, and even use them together to improve efficiency.
It is easier to confuse a concept for the characteristics of different garbage collectors

The difference between parallel, serial, and concurrent


First of all, a very simple analogy is used to distinguish the difference between parallel and serial:
One day Xiao Ming is going to the bank to cancel a bank card that has not been used for a long time. Because it is Saturday, there are a lot of people doing business in the bank. Through the queuing system according to different needs to handle their own business.


So what is serial ? ?
At this time, since it was 12 noon, most of the bank's business staff went out for lunch. At this moment, the bank only opened one duty window, and everyone had to wait in line for this duty window to call. If the last person who handled the business did not complete the business, the next person needs to wait forever.


So what is parallelism ? ?
After lunch time, the bank opened other business windows due to the return of the staff. Multiple windows can call and process business at the same time. Although the business of the person with the number in front of you has not been processed yet, When other windows are free, you can also be called, which is parallel.


So the last question is what is concurrency ?
When the queue reaches you, you suddenly remembered to apply for a co-branded provident fund card, so you asked the salesperson if you could apply for a new card at the same time as the cancellation. The clerk replied that because he is not proficient in business, he can not handle the cancellation while still processing your application for the new card. You need to wait until the cancellation is completed to support the application for the new card. This means that concurrency is not supported, and only one task can be processed at a time, and it is impossible to perform different tasks in time sharing through task sorting to achieve concurrency.
And if you are a skilled salesman at this time, he can help you complete the business of applying for a new card while you log off, and complete the two tasks you assigned in an orderly manner, and handle them at the same time. This is to support concurrency.
The examples may be a bit biased, but the concept between the three is roughly the same.

1. Serial (Serial Collector)

Features:
1.Serial only uses one CPU or one GC thread for garbage collection, and suspends other worker threads during the garbage collection process. It is the default recycler in clinet mode.
2. Use copy algorithm to achieve
3. Single-core CPU efficiency is the highest, suitable for Clinet mode (application memory is small, will not create too many objects, so the garbage collection time is short, even if the business thread is stopped, there will be no obvious pause)
4 . There is only one GC thread, avoiding the overhead of thread switching
Insert picture description here

2.ParNew

ParNew is the multi-threaded version
feature of Serial. 1. ParNew
uses multiple GC threads to clean up garbage in parallel. All business threads need to be stopped during the cleaning process, but because it is multi-threaded, its efficiency is higher than that of serial
2. Supports multi-threaded operation, so it is more suitable for server environments with more CPUs. Due to the additional overhead caused by switching back between multiple threads, it does not perform as well as Serial in a single CPU environment. The number of threads enabled by default is the same as the number of CPU cores.
3. Implemented by the COPY algorithm

3.Parallel Scavenge

1. Parallel multi-threaded collector, often used in the new generation, pursues the optimization of CPU throughput, can complete the specified task in a short time, so it is suitable for background operations that do not require much interaction.

Throughput refers to the ratio of user thread running time to the total CPU time, and its calculation formula is:

Throughput = running user code time/(running user code time + GC time) The
higher the throughput, the lower the proportion of GC time and the better the user experience

2. Implemented with a replication algorithm
Insert picture description here

(1) Two ways to reduce the pause time

1. Use multiple GC threads in a multi-CPU environment, thereby reducing the time of garbage collection, thereby reducing the time of user thread STW.
2. Realize the concurrent operation of GC threads and user threads. The so-called concurrency actually refers to the alternate running of user threads and GC threads, so as to reduce the pause time each time, reduce the user's sense of pause, and switch between single threads. It also means that additional overhead is required, so the total time for garbage collection and user threads will be extended.

(2) Parameters provided by Parallel Scavenge

-XX:GCTimeRadio
directly sets the throughput size, and the ratio of GC time to the total time. It is equivalent to the inverse of the throughput
-XX:MaxGCPauseMillis
sets the maximum GC pause time

Parallel will determine the size of the young generation based on this value. If this value is smaller, the young generation will be smaller, so that the collector can perform a collection in a shorter time.

-XX:+UseAdaptiveSizePolicy
can turn on the GC adaptive adjustment strategy (different from ParNew) through the command. We only need to set the maximum heap (-Xmx) and MaxGCPauseMillis or GCTimeRadio, and the collector will automatically adjust the size of the new generation, Eden and Survior The proportion and the age of the object entering the old age are as close as possible to the MaxGCPauseMillis or GCTimeRadio we set

4.Serial Old

Serial Old collectors are the old version of Serial. They are all single-threaded collectors, that is, only one GC thread is started during garbage collection, so they are suitable for client applications. The main difference between them is that Serial old is often used. In the old age, its implementation algorithm is mark-compact

5.Parallel Old

The Parallel Old collector is the old version of the PS collector. Generally they are used together to pursue CPU throughput.
When they trash the phone, they are executed in parallel by multiple GC threads, and all user threads are suspended, using the mark-compact algorithm.

6.CMS (concurrent mark sweep)

CMS acts on the old age and is a collector that aims to obtain the shortest pause time. Give mark-sweep algorithm implementation. The whole process is divided into four steps

1. Initial marking
Stop all user threads, because an initial marking thread is used to mark all objects associated with GC Roots.
2. Concurrent marking
Use multiple concurrent marking threads to execute in parallel and concurrently with user threads. In this process, reachability analysis is performed, and all discarded objects are marked, which is very slow.
3. Re-marking
Use multiple threads to execute in parallel, and mark out the discarded objects that have just appeared in the concurrent process.
4. Concurrent cleanup
Use a concurrent cleanup thread to execute concurrently with business threads to clean up useless objects. This process is very time-consuming.

Insert picture description here

Features of CMS

  • Low throughput.
    Because CMS uses user threads and GC threads to execute concurrently in the garbage collection process, there will be additional overhead for switching between threads, so the CPU throughput is not as high as when all business threads are notified.
  • Unable to process floating garbage. Floating garbage
    may be generated during garbage cleaning. When there is too much floating garbage, it may cause frequent GC.

Floating garbage: In the case of concurrent marking, the user thread will no longer use the object after the GC thread marks the object as the unlived object. There is no reference to this object at this time. This kind of object is called floating garbage

  • Using mark-sweep algorithm to achieve no memory fragmentation

You can use the -XX:+UseCMSCompactAtFullCollection parameter to perform a memory compression after each fgc. You can also configure -XX:CMSFullGCsBeforeCompaction to tell the CMS how many FGCs to perform a memory consolidation

7.G1(GarBage-First)

G1 can be seen as an enhanced version of CMS. The algorithm flow of G1 is similar to that of CMS. The differences are as follows:
1. G1 is implemented by the mark-compact algorithm, which means that every time the GC ends, it obtains continuous space.
2. Although G1 still uses a generational processing method, its memory model has undergone tremendous changes. The basic structure of his memory is divided into one region after another. G1 is responsible for maintaining a list of Regions. Every time GC is needed, he will first evaluate the recovery value of each Region, and then recover the region with the greatest value, so as to obtain the maximum GC recovery efficiency.
Insert picture description here

(1) The recovery process of G1

  • Initial marking:
    marking the object directly associated with the GC Root object, stopping all business threads (STW), and only starting an initial marking thread, this process is very fast.
  • Concurrent marking:
    Perform a comprehensive reachability analysis, and open a concurrent marking thread to mark useful objects. This thread is executed in parallel with the user thread, and this process is longer.
  • Final mark
    Mark out the floating garbage generated by business threads in the concurrent mark process, stop all business threads (STW), and execute multiple final mark threads.
  • Screening and recycling
    Recycling useless objects is also in the STW state at this time, using multiple screening and recycling threads to execute.

(2) Features of G1

  1. The problem of CMS caused by concurrency also exists in G1, but G1 can avoid the memory discontinuity problem caused by the Mark-Sweep algorithm.
  2. G1 has soft real-time characteristics because it can perform garbage collection within the time range specified by the user.
  3. As shown above, G1 has a special area called Humongous.

What is Humongous: If the space occupied by an object reaches or exceeds 50% of the partition size, the G1 collector will consider it to be a huge object. This kind of giant object will be allocated to the old generation by default, but if this giant object will only exist for a short time, it will affect the garbage collection efficiency of G1. In order to solve this problem, G1 specially divides the Humongous area to store the giant object. When a Humongous When the area cannot store a huge object, G1 will look for a continuous H partition to store it. Sometimes, in order to find a continuous H area, FULL GC has to be started .

8. Summary

The above seven types of garbage collectors are used in different scenarios. Among them, Serial, parNew, and parallel scavenger (PS) three collectors act on the new generation and are implemented by the copy algorithm. Among them, Serial is suitable for single-core clients, and ParNew is suitable for multi-core servers. PS is similar to ParNew, but it pays more attention to throughput. In addition, PS cannot be used with CMS.
Serial Old, ParNew Old and CMS are commonly used for recycling in the old age. Serial Old uses the Mark-Compact algorithm to implement, and CMS uses the Mark-Sweep algorithm, so memory fragmentation will occur.
G1 is a processor that straddles the new generation and the old generation. What distinguishes it from the generation model of the above-mentioned collector is a new partition model. Through the division of regions, each region can act as a young or old generation. The Mark-compact algorithm adopted by G1 does not produce memory fragmentation.

Guess you like

Origin blog.csdn.net/xiaoai1994/article/details/109644578