foreword

Java is easier to learn than other programming languages, and a large part of this is due to the automatic memory management mechanism of the JVM.
For developers engaged in C language, they have the "ownership" of each object, and greater power also means more responsibilities. C developers need to maintain the process of each object "from birth to death". When an object is discarded, its memory must be manually released, otherwise a memory leak will occur. For Java developers, the JVM's automatic memory management mechanism solves this headache, making it less prone to memory leaks and memory overflows. GC allows developers to focus more on the program itself without having to care about memory When to allocate, when to deallocate, and how to deallocate.

1. JVM runtime data area

Before talking about GC, it is necessary to understand the memory model of the JVM, how the JVM plans memory, and the main areas of GC.

insert image description here
As shown in the figure, when the JVM is running, the memory will be divided into five large areas, among which the "method area" and "heap" are created with the startup of the JVM, and are memory areas shared by all threads. The virtual machine stack, local method stack, and program counter are created along with the creation of the thread, and are destroyed after the thread runs.

1.1 Program Counter

The program counter (Program Counter Register) is a very small memory space, almost negligible.
It can be regarded as the line number indexer of the bytecode executed by the thread, pointing to the next instruction that should be executed by the current thread. For: Basic functions such as conditional branches, loops, jumps, and exceptions all depend on the program counter.

For a CPU core, only one thread can run at any time. If the thread's CPU time slice is used up, it will be suspended and wait for the OS to reassign the time slice before continuing to execute. How does the thread know where it was last executed? It is implemented through the program counter, and each thread needs to maintain a private program counter.

If the thread is executing a Java method, the counter records the address of the JVM bytecode instruction. If the Native method is executed, the counter value is Undefined.

The program counter is the only memory area that does not specify any OutOfMemoryError conditions, which means that OOM exceptions cannot occur in this area, and the GC will not recycle this area!

1.2 Virtual machine stack

The virtual machine stack (Java Virtual Machine Stacks) is also private to the thread, and has the same life cycle as the thread.

The virtual machine stack describes the memory model of Java method execution. When the JVM wants to execute a method, it will first create a stack frame (Stack Frame) to store: local variable table, operand stack, dynamic link, method exit and other information. After the stack frame is created, it will be pushed into the stack for execution, and it will be popped out of the stack after the method execution ends.

The process of method execution is the process of each stack frame from pushing to popping.

The local variable table is mainly used to store various basic data types, object references, and returnAddress types known to the compiler. The memory space required by the local variable table has been confirmed at compile time, and the size of the local variable table will not be modified during operation.

In the JVM specification, the virtual machine stack specifies two exceptions:

The stack depth requested by the StackOverflowError
thread is greater than the stack depth allowed by the JVM.
The capacity of the stack is limited. If the stack frame pushed by the thread exceeds the limit, a StackOverflowError exception will be thrown, for example: method recursion.
The OutOfMemoryError
virtual machine stack can be dynamically expanded. If sufficient memory cannot be obtained during expansion, an OOM exception will be thrown.

1.3 Native method stack

The native method stack (Native Method Stack) is also thread-private, which is very similar to the virtual machine stack.
The difference is that the virtual machine stack serves to execute Java methods, while the local method stack serves to execute Native methods.

Like the virtual machine stack, the JVM specification also specifies two exceptions, StackOverflowError and OutOfMemoryError, for the local method stack.

1.4 heap

The Java heap (Java Heap) is shared by threads. Generally speaking, it is also the largest memory area managed by the JVM, and it is also the main management area of the garbage collector GC.

The Java heap is created when the JVM starts, and its function is to store object instances.
Almost all objects are created on the heap, but with the development of JIT compilers and the maturity of escape analysis techniques, on-stack allocation and scalar replacement optimization techniques make "all objects are allocated on the heap" less absolute.

Because it is the area mainly managed by GC, it is also called: GC heap.
For the efficient recycling of GC, the Java heap is divided as follows:

insert image description here

1.5 Method area

The method area (Method Area), like the Java heap, is also a memory area shared by threads.
It is mainly used to store: class information loaded by the JVM, constants, static variables, code generated by the just-in-time compiler, and other data.
Also known as: Non-Heap (Non-Heap), the purpose is to distinguish it from the Java heap.

The JVM specification has relatively loose restrictions on the method area, and the JVM may not even perform garbage collection on the method area. This leads to the fact that in the old version of JDK, the method area is also called: PermGen (PermGen).

It is not a good idea to use the permanent generation to implement the method area, and it is easy to cause memory overflow. Therefore, since JDK7, there has been a "remove the permanent generation" action, and the string constant pool originally placed in the permanent generation will be removed. In JDK8, the permanent generation is officially removed and metaspace is ushered in.

2. GC overview

Garbage Collection (Garbage Collection) is abbreviated as "GC". Its history is far longer than the Java language itself. Lisp, which was born at the Massachusetts Institute of Technology in 1960, was the first language to start using memory dynamic allocation and garbage collection technology.

To realize automatic garbage collection, you first need to think about three things: the
insert image description here
five major memory areas of the JVM were introduced earlier, the program counter occupies very little memory, which is almost negligible, and there will never be memory overflow, and the GC does not need to recycle it. The virtual machine stack and the local method stack "live and die together" with the thread. The stack frames in the stack are pushed into and popped out of the stack in an orderly manner with the operation of the method. The amount of memory allocated to each stack frame has been basically determined during the compilation period. Therefore, The allocation and recovery of memory in these two areas are deterministic, and there is no need to consider how to recover.

The method area is different. How many implementation classes does an interface have? How much memory does each class take? You can even dynamically create classes at runtime, so the GC needs to recycle the method area.

The same is true for the Java heap. Almost all Java object instances are stored in the heap. How many object instances a class will create can only be known when the program is running. The allocation and recovery of this part of memory is dynamic, and GC needs to focus on it.

2.1 Which objects need to be recycled

The first step in realizing automatic garbage collection is to determine which objects can be recycled. Generally speaking, there are two ways: reference counting algorithm and reachability analysis algorithm, and almost all commercial JVMs use the latter.

2.1.1 Reference counting algorithm

Add a reference counter to the object, add 1 every time the counter is referenced, and decrement 1 every time the reference counter is canceled. When the counter is 0, it means that the object is no longer referenced, and the object can be recycled at this time.

Although the reference counting algorithm (Reference Counting) takes up some extra memory space, its principle is simple and efficient. It is a good implementation in most cases, but it has a serious drawback: it cannot solve circular references.

For example, a linked list should be recycled as long as there is no reference to the linked list, but unfortunately, since the reference counters of all elements in the linked list are not 0, it cannot be recycled, resulting in memory leaks.

2.1.2 Reachability Analysis Algorithm

At present, mainstream commercial JVMs use reachability analysis to determine whether objects can be recycled.

insert image description here
The basic idea of this algorithm is:

Use a series of root objects called "GC Roots" as the starting node set. From these nodes, search down through the reference relationship. The search path is called "reference chain". If an object reaches GC Roots If there is no reference chain connected, it means that the object is unreachable and can be recycled.

Object reachability means that there is a direct or indirect reference relationship between the two parties.
Root reachability or GC Roots reachability means that there is a direct or indirect reference relationship between objects and GC Roots.

Objects that can be used as GC Roots are as follows:
insert image description here
Reachability analysis means that the JVM first enumerates the root nodes, finds some objects that must survive in order to ensure the normal operation of the program, and then uses these objects as roots to start to After searching, objects with direct or indirect reference chains will survive, and objects without reference chains will be recycled.

2.2 When to Recycle

The JVM divides the memory into five areas, and different GCs will perform garbage collection for different areas. The types of GCs generally fall into the following categories:

Minor GC
is also known as "Young GC" and "Light GC", which is only for the garbage collection of the new generation.
Major GC
, also known as "Old GC", is only for garbage collection in the old age.
Mixed GC
is mixed GC, which performs garbage collection for the new generation and some old generations, and only some garbage collectors support it.
Full GC
full heap GC, heavy GC, garbage collection for the entire Java heap and method area, the longest GC.
When is GC triggered, and what type of GC is triggered? Different garbage collectors are implemented differently, and you can also influence the JVM's decision-making by setting parameters.

Generally speaking, the new generation will trigger GC after the Eden area is exhausted, but the Old area cannot do this, because some concurrent collectors can continue to run during the cleaning process, which means that the program is still creating objects , Allocate memory, which requires the old generation to carry out "space allocation guarantee". Objects that cannot fit in the new generation will be put into the old generation. If the recovery speed of the old generation is slower than the object creation speed, it will lead to "allocation guarantee failure". , at this time the JVM has to trigger Full GC to obtain more available memory.

2.3 How to recycle

After locating the object that needs to be recycled, it is time to start recycling. How to recycle objects has become a problem again.
What kind of recycling method will be more efficient? Is it necessary to compress and organize the memory after recycling to avoid fragmentation? In response to these problems, GC recycling algorithms are roughly divided into the following three categories:

mark-sweep algorithm
mark-copy algorithm
Mark-Collating Algorithm

The details of the recovery algorithm will be introduced below.

3. GC recovery algorithm

The JVM divides the heap into different generations. Objects stored in different generations have different characteristics. Using different GC recovery algorithms for different generations can improve the efficiency of GC.

3.1 Generational Collection Theory

At present, most JVM garbage collectors follow the "generational collection" theory, which is based on three hypotheses.

3.1.1 The Weak Generation Hypothesis

The vast majority of objects are life and death.

Think about whether the programs we write are like this. Most of the time, we create an object just to perform some business calculations. After the calculation results are obtained, the object is useless and can be recycled.
Another example: the client requests to return a list of data. After the server queries the database and converts it into a JSON response to the front end, the data in this list can be recycled.
Things like this can be called objects of "live and die".

3.1.2 Strong generational hypothesis

The more objects that survive GC, the harder it is to recycle.

This hypothesis is entirely based on probability statistics. Objects that cannot be recycled after many GCs can be assumed that they will still not be recycled in the next GC, so there is no need to recycle them frequently and move them In the old age, reduce the frequency of recycling and let the GC recycle the new generation with higher benefits.

3.1.3 Intergenerational Citation Hypothesis

Cross-generational citations are rare compared to same-generational citations.

This is an implicit inference based on the logical reasoning of the first two hypotheses: two objects that have a mutual reference relationship should tend to live or die at the same time.
For example, if a new generation object has a cross-generational reference, since the old generation object is difficult to perish, the reference will make the new generation object survive when it is collected, and then it will be promoted to the old generation after it grows older. Subsequent references were also eliminated.

3.2 Resolving cross-generational references

Cross-generational references are rare, but possible. If you scan the entire old generation for very few cross-generational references, the overhead of each GC will be too large, and the GC pause time will become unacceptable. If cross-generational references are ignored, objects in the new generation will be recycled incorrectly, resulting in program errors.

3.2.1 Remembered Set

The JVM solves this problem through the Remembered Set. By establishing the data structure of the Remembered Set in the new generation, it is avoided to add the entire old generation to the scanning range of GC Roots when recycling the new generation, reducing the overhead of GC.

The memory set is an abstract data structure of pointers from the "non-collection area" to the "collection area". To put it bluntly, it is to mark "the objects referenced by the old generation in the young generation". Memory sets can have the following three record precisions:

Word Length Accuracy: The record is accurate to a machine word length, that is, the number of addressable bits of the processor.
Object precision: Accurate to the object, whether there is a cross-generation reference pointer in the field of the object.
Card precision: Accurate to a memory area, whether there are cross-generational references to objects in this area.

Word length precision and object precision are too refined, and it takes a lot of memory to maintain the memory set, so many JVMs use "card precision", also known as: "card table". The card table is an implementation of the memory set, and it is also the most commonly used form at present. It defines the recording accuracy of the memory set, the mapping relationship with the memory, and so on.

HotSpot uses a byte array to implement the card table. It divides the heap space into a series of power-of-two memory areas. This memory area is called a "card page". The size of a card page is generally 2. Power of , HotSpot uses 2 to the 9th power, that is, 512 bytes. Each element of the byte array corresponds to a card page. If there is a cross-generational reference to an object in a card page, the JVM will mark this card page as "Dirty". The GC only needs to scan the dirty page corresponding to The memory area can be used to avoid scanning the entire heap.

The structure of the card table is shown in the figure below:
insert image description here

3.2.2 Write barrier

The card table is just a data structure used to mark which memory area has cross-generational references. How does the JVM maintain the card table? When do the pages of a card get dirty?

HotSpot maintains the card table through the "Write Barrier". The JVM intercepts the action of "object property assignment". Similar to AOP's aspect programming, the JVM can intervene before and after the object property assignment. It is called "before-write barrier", and the processing after assignment is called "post-write barrier". The pseudocode is as follows:

void setField(Object o){
    
    
    before();//写前屏障
    this.field = o;
    after();//写后屏障
}

After the write barrier is enabled, the JVM will generate corresponding instructions for all assignment operations. Once a reference to an object in the old generation points to an object in the young generation, HotSpot will set the corresponding card table element as dirty.

Please distinguish the "write barrier" here from the "write barrier" of reordering memory instructions in concurrent programming to avoid confusion.

In addition to the overhead of the write barrier itself, the card table also faces the problem of "false sharing" in high-concurrency scenarios. The cache system of modern CPUs is stored in units of "cache lines". Intel's CPU cache lines The size is generally 64 bytes. When multiple threads modify independent variables, if these variables are in the same cache line, the cache lines of each other will become invalid for no reason. Threads have to frequently initiate load instructions to reload data, resulting in Reduced performance.

A Cache Line is 64 bytes, each card page is 512 bytes, 64✖️512 bytes is 32KB, if the objects updated by different threads are within this 32KB, it will cause the card table to be updated at the same time A cache line affects performance. In order to avoid this problem, HotSpot supports setting the element as dirty only when it is not marked. This will add a judgment, but it can avoid the problem of false sharing. Set

-XX:+UseCondCardMark

to open this judgment.

3.3 Mark removal

The mark-and-sweep algorithm is divided into two processes: mark and clear.

The collector first marks the objects that need to be recycled, and clears them uniformly after the marking is completed. It is also possible to mark surviving objects, and then clear all unmarked objects, depending on the proportion of surviving objects and dead objects in memory.

shortcoming:

Unstable execution efficiency
The time consumption of marking and clearing increases with the increasing number of objects in the Java heap.
Memory fragmentation
After the mark is cleared, the memory will generate a large number of discontinuous space fragments, which is not conducive to the subsequent allocation of memory for new objects.

insert image description here

3.4 Mark copy

In order to solve the problem of memory fragmentation caused by the mark-sweep algorithm, the mark-copy algorithm has been improved.

The mark copy algorithm divides the memory into two areas, and only one of them is used at a time. During garbage collection, the mark is first carried out. After the mark is completed, the surviving objects are copied to another area, and then all the current areas are cleaned up.

The disadvantage is that if a large number of objects cannot be recycled, a large amount of memory copy overhead will be generated. The available memory is reduced to half, and the memory waste is relatively large.
insert image description here

Since the vast majority of objects will be recycled in the first GC, very few objects need to be copied, so there is no need to divide the space according to 1:1.
The default ratio of the Eden area to the Survivor area of the HotSpot virtual machine is 8:1, that is, the Eden area is 80%, the From Survivor area is 10%, and the To Survivor area is 10%. The available memory of the entire new generation is Eden area + a Survivor area, which is 90%. , another 10% of the Survivor area is used for partition replication.

If a large number of objects still survive after the Minor GC, beyond the scope of a Survivor area, then the allocation guarantee (Handle Promotion) will be performed to directly allocate the objects into the old generation.

3.5 Marking and finishing

In addition to more copy operations when a large number of objects survive, the mark copy algorithm also requires additional memory space in the old generation for allocation guarantees, so this recycling algorithm is generally not used in the old generation.

Objects that can survive in the old generation are generally objects that cannot be recycled after multiple GCs. Based on the "strong generational hypothesis", objects in the old generation are generally difficult to be recycled. According to the survival characteristics of the objects in the old age, a mark sorting algorithm is introduced.

The marking process of the mark-cleaning algorithm is consistent with the mark-clearing algorithm, but the mark-clearing algorithm does not directly clean up the marked objects like the mark-clearing algorithm, but moves all the surviving objects to one end of the memory area, and then directly cleans up the objects outside the boundary. memory space.

insert image description here
Compared with the mark-clearing algorithm, the biggest difference between the mark-cleaning algorithm is that the surviving objects need to be moved.
Moving live objects during GC has both advantages and disadvantages.

The disadvantage
is based on the "strong generational hypothesis". In most cases, a large number of objects will survive after the old generation GC. To move these objects, all reference addresses need to be updated. This is an expensive operation, and this operation needs to suspend all user threads , that is, the program will block the pause at this time, and the JVM calls this pause: Stop The World (STW).

Advantages
After moving the object to organize the memory space, a large number of discontinuous memory fragments will not be generated, which is beneficial to the subsequent allocation of memory for the object.

It can be seen that there are advantages and disadvantages whether the object is moved or not. If you move, you are responsible for memory reclamation and simple for memory allocation. If you don’t move, it’s easy for memory reclamation and complicated for memory allocation. Considering the throughput of the entire program, moving objects is obviously more cost-effective, because the frequency of memory allocation is much higher than the frequency of memory recovery.

There is another solution: do not move the object at ordinary times, use the mark clearing algorithm, and only enable the mark clearing algorithm when memory fragmentation affects the allocation of large objects.

4. Garbage collector

There are too many JVMs implemented in accordance with the "Java Virtual Machine Specification", and each JVM platform has N garbage collectors for users to choose from. These are not clear in an article. Of course, developers do not need to understand all garbage collectors. Taking Hotspot JVM as an example, the mainstream garbage collectors mainly fall into the following categories:

insert image description here
Serial : Single-threaded collection, user threads paused.
Parallel : Multi-threaded collection, user threads paused.
Concurrency : User threads and GC threads run concurrently.

As mentioned earlier, most JVM garbage collectors follow the "generational collection" theory. Different garbage collectors reclaim different memory areas. In most cases, JVM requires two garbage collectors to work together. The dotted lines in the figure below represent that the two collectors can be used together.
insert image description here

4.1 New Generation Collector

4.1.1 Serial

The most basic and earliest garbage collector, using the mark copy algorithm, only starts one thread to complete garbage collection, and all user threads (STW) will be suspended during recycling.
insert image description here
Use -XX:+UseSerialGCparameters to enable the Serial collector. Since it is a single-threaded collection, the application range of Serial is very limited:

The application is very lightweight, and the heap space is less than 100 MB.
The server CPU resources are tight.

4.1.2 Parallel Scavenge

A multi-threaded young generation collector using the mark-copy algorithm.

insert image description here
Enabled by parameters -XX:+UseParallelGC, ParallelGC is characterized by being very concerned about the throughput of the system. It provides two parameters to allow the user to control the throughput of the system:

-XX:MaxGCPauseMillis: Set the maximum pause time for garbage collection. It must be an integer greater than 0. ParallelGC will work towards this goal. If this value is set too small, ParallelGC may not be able to guarantee it. If the user wants the GC pause time to be very short, ParallelGC will try to reduce the heap space, because it takes less time to reclaim a smaller heap than to reclaim a larger heap, but this will trigger GC more frequently, thereby reducing system throughput.
-XX:GCTimeRatio: Set the size of the throughput, its value is an integer from 0 to 100. Assuming that GCTimeRatio is n, then ParallelGC will spend no more than 100% 1/(1+n)of the time for garbage collection. The default value is 19, which means that ParallelGC will not spend more than 5% of the time for garbage collection.

ParallelGC is the default garbage collector of JDK8. It is a throughput-first garbage collector. Users can set the maximum pause time and throughput of GC through -XX:MaxGCPauseMillisand . -XX:GCTimeRatioBut these two parameters are contradictory. A smaller pause time means that the GC needs to recycle more frequently, which increases the overall time of GC recycling and leads to a decrease in throughput.

4.1.3 ParNew

ParNew is also a multi-threaded new-generation garbage collector that uses the mark-copy algorithm. Its recycling strategy, algorithm, and parameters are the same as Serial, but it simply changes from single thread to multi-thread. It was born only to cooperate with the CMS collector. CMS is a collector of the old generation, but Parallel Scavengeit cannot work with CMS. Serial is recycled serially, and the efficiency is too low, so ParNew was born.

Use parameters -XX:+UseParNewGCto enable, but this parameter has been deleted in versions after JDK9, because JDK9 defaults to the G1 collector, CMS has been replaced, and ParNew was born to cooperate with CMS, CMS is abandoned, and ParNew has no value. .

4.2 Old Generation Collector

4.2.1 Serial Old

Using the markup algorithm, like Serial, a single-threaded exclusive garbage collector for the old age. The space of the old generation is usually larger than that of the new generation, and the mark-sorting algorithm needs to move objects during the recycling process to avoid memory fragmentation, so the collection of the old generation is more time-consuming than the new generation.

As the earliest old-age garbage collector, Serial Old has another advantage, that is, it can be used in conjunction with most new-generation garbage collectors, and it can also be used as a backup collector for CMS concurrent failures.

Use the parameter -XX:+UseSerialGCto enable, the new generation and the old generation will use the serial collector. Like Serial, unless your application is very lightweight, or the CPU resources are very tight, it is not recommended to use this collector.

4.2.2 Parallel Old

ParallelOldGC is a multi-threaded parallel exclusive garbage collector for the old age. Like Parallel Scavenge, it is a throughput-first collector. Parallel Old was born to cooperate with Parallel Scavenge.

ParallelOldGC uses the markup algorithm, which is enabled by parameters -XX:+UseParallelOldGC. The parameters -XX:ParallelGCThreads=ncan set the number of threads to be enabled during garbage collection. It is also the default old generation collector of JDK8.

4.2.3 CMS

CMS (Concurrent Mark Sweep) is a milestone garbage collector, why do you say that? Because before it, GC threads and user threads could not work at the same time, even Parallel Scavenge, it is just multiple threads to recycle in parallel during GC, the entire process of GC still has to suspend user threads, that is, Stop The World. The consequence of this is that the Java program will freeze for a while after running for a period of time, reducing the response speed of the application, which is unacceptable for programs running on the server.

Why should user threads be suspended during GC?
First of all, if the user thread is not suspended, it means that garbage will continue to be generated during the period, and it will never be cleaned up.
Secondly, the running of user threads will inevitably lead to changes in the reference relationship of objects, which will lead to two situations: missing label and wrong label.

1. Missing label: It was not garbage originally, but during the GC process, the user thread modified its reference relationship, making GC Roots unreachable and becoming garbage. This situation is better, nothing more than some floating garbage is generated, and the GC will clean it up next time.

2. Mislabeling: It was originally garbage, but during the GC process, the user thread redirected the reference to it. At this time, if the GC recycles it, it will cause a program error.

In order to achieve concurrent collection, the implementation of CMS is much more complicated than the several garbage collectors introduced above. The entire GC process can be roughly divided into the following four stages:
insert image description here

1. Initial mark

The initial mark is just to mark the objects that GC Roots can directly relate to, which is very fast. The process of initial marking needs to trigger STW, but this process is very fast, and the time consumption of initial marking will not slow down due to the increase of heap space, it is controllable, so the short pause caused by this process can be ignored.

2. Concurrent marking

Concurrent marking is to deeply traverse the initially marked objects, and use these objects as roots to traverse the entire object graph. This process takes a long time, and the marking time will become longer as the heap space grows. Fortunately, this process will not trigger STW, user threads can still work, and the program can still respond, but the performance of the program will be affected a little. Because the GC thread will occupy a certain amount of CPU and system resources, it is more sensitive to the processor. The number of GC threads enabled by CMS by default is: (number of CPU cores + 3)/4. When the number of CPU cores exceeds 4, GC threads will occupy less than 25% of CPU resources. If the number of CPUs is less than 4, GC threads will not The impact of the program will be very large, resulting in a significant decrease in the performance of the program.

3. Relabel

Since the user thread is still running during concurrent marking, it means that during the concurrent marking, the user thread may change the reference relationship between objects, and two situations may occur: one is that the object that could not be recycled can now be Recycled, the other is an object that could have been recycled, but cannot be recycled now. For these two cases, CMS needs to suspend the user thread and perform a remark.

4. Concurrent cleanup

After remarking is complete, it can be cleaned up concurrently. This process also takes a long time, and the overhead of cleaning will increase as the heap space increases. Fortunately, this process does not require STW. User threads can still run normally, and the program will not freeze. However, like concurrent marking, the GC thread still occupies a certain amount of CPU and system resources during cleaning, which will reduce the performance of the program. .

CMS pioneered concurrent collection, making it possible for user threads and GC threads to work simultaneously, but the disadvantages are also obvious:

1. Sensitive to the processor

In the concurrent marking and cleaning phase, although CMS will not trigger STW, marking and cleaning requires the intervention of GC threads. GC threads will occupy a certain amount of CPU resources, which will lead to a decrease in program performance and slower program response. It is better if there are more CPU cores. When the CPU resources are tight, GC threads have a great impact on the performance of the program.

2. Floating garbage

In the concurrent cleaning phase, since the user thread is still running, the garbage created by the user thread during this period is called "floating garbage". The floating garbage cannot be cleaned up by the GC this time, and can only be cleaned up in the next GC.

3. Concurrency failure

Due to the existence of floating garbage, the CMS must reserve some space to load these newly generated garbage. CMS cannot wait until the Old area is full to clean it up like the Serial Old collector. In JDK5, CMS will be activated when 68% of the space in the old generation is used, and 32% of the space is reserved for loading floating garbage. This is a relatively conservative configuration. If the old generation does not grow too fast in the actual reference, you can -XX：CMSInitiatingOccupancyFractionincrease this value appropriately through parameters. In JDK6, the trigger threshold is increased to 92%, leaving only 8% of the space reserved for floating garbage.
If the memory reserved by CMS cannot accommodate floating garbage, it will cause "concurrency failure". At this time, the JVM has to trigger a preliminary plan to enable the Serial Old collector to reclaim the Old area, and the pause time becomes longer.

4. Memory Fragmentation

Since CMS uses the "mark clear" algorithm, it means that a large amount of memory fragments will be generated in the heap after the cleanup is completed. Excessive memory fragmentation can cause many problems, one of which is that it is difficult to allocate memory for large objects. The consequence is that there is still a lot of heap space, but a continuous memory area cannot be found to allocate memory for large objects, and a full GC has to be triggered, so that the GC pause time will become longer.
In response to this situation, CMS provides an alternative solution. Through -XX：CMSFullGCsBeforeCompactionparameter setting, when CMS triggers N times of Full GC due to memory fragmentation, the memory fragmentation will be sorted out before entering Full GC next time. However, this parameter is abandoned in JDK9 used.

4.2.3.1 Three-color marking algorithm

After introducing the CMS garbage collector, we need to understand why CMS GC threads can work with user threads.

The JVM judges whether an object can be recycled, and most of them use the "reachability analysis" algorithm

Start traversing from GC Roots, the reachable ones survive, and the unreachable ones are recycled.

CMS marks objects in three colors:
insert image description here
the process of marking is roughly as follows:

At first, all objects are white and not accessed.
Gray out objects directly associated with GC Roots.
Traversing all references of the gray object, the gray object itself is set to black, and the reference is set to gray.
Repeat step 3 until there are no more gray objects.
At the end, the black objects survive and the white objects are recycled.

The premise of correct execution of this process is that no other thread changes the reference relationship between objects. However, during the process of concurrent marking, user threads are still running, so there will be cases of missing and wrong marking.

Leak mark
Suppose the GC is already traversing object B, and at this time the user thread executes the operation of AB=null, cutting off the reference from A to B.

insert image description here
Originally, after execution A.B=null, B, D, and E can all be recycled, but because B has turned gray, it will still be regarded as a surviving object and continue to traverse.
The final result is that this round of GC will not recycle B, D, and E, and they will be collected in the next GC, which is also part of the floating garbage.
In fact, this problem can still be solved by "write barrier". Just add a write barrier when A writes B, record the records that B is cut off, and mark them as white when re-marking.

Mislabeling
Assuming that the GC thread has traversed to B, the user thread performs the following operations:

B.D=null;//B到D的引用被切断
A.xx=D;//A到D的引用被建立

insert image description here
The reference from B to D is severed, and the reference from A to D is established.
At this time, the GC thread continues to work, because B no longer references D, although A references D again, but because A has been marked as black, GC will not traverse A again, so D will be marked as white, and finally treated as garbage Recycle.
It can be seen that the result of mislabeling is much more serious than leaking tables. Floating garbage can be cleaned up by GC next time, and recycling objects that should not be recycled will cause program operation errors.

Mislabeling will only occur if the following two conditions are met:

Gray to white references are all broken.
A black to white reference is created.

As long as any condition is broken, the problem of mislabeling can be solved.

Original snapshot and incremental update
What the original snapshot breaks is the first condition: when the reference of the gray object pointing to the white object is broken, this reference relationship is recorded. When the scan is over, take these gray objects as the root and scan again. It is equivalent to scanning according to the snapshot of the object graph at the moment when the scanning is started no matter whether the reference relationship is deleted or not.

What the incremental update breaks is the second condition: when a reference from black to white is established, the new reference relationship is recorded, and after the scan is complete, the black object in these records is used as the root to re-scan . It is equivalent to that once a black object establishes a reference to a white object, it will become a gray object.

The solution adopted by CMS is: write barrier + incremental update to achieve, breaking the second condition.

When a black-to-white reference is established, the reference relationship is recorded through a write barrier, and after the scan is complete, the black object in the reference relationship is used as the root to re-scan.

The pseudo code is roughly as follows:

class A{
    
    
    private D d;

    public void setD(D d) {
    
    
        writeBarrier(d);// 插入一条写屏障
        this.d = d;
    }

    private void writeBarrier(D d){
    
    
        // 将A -> D的引用关系记录下来，后续重新扫描
    }
}

4.3 Hybrid Collector

4.3.1 G1

The full name of G1 is "Garbage First" garbage-first collector. JDK7 is officially used, and JDK9 is used by default. It appears to replace the CMS collector.

Since it is going to replace CMS, there is no doubt that G1 is also a concurrent and parallel garbage collector. User threads and GC threads can work at the same time, and the focus is also on the response time of the application.

One of the biggest changes in G1 is that it is only a logical generation, and the physical structure is no longer generational. It divides the entire Java heap into multiple regions of different sizes. Each Region can play the Eden area, Survivor area, or old generation space as needed. G1 can use different strategies to deal with Regions that play different roles.

For all garbage collectors before G1, the scope of recovery is either the entire new generation (Minor GC), the entire old generation (Major GC), or the entire Java heap (Full GC). But G1 jumped out of this cage. It can form a Collection Set (CSet for short) for any part of the heap to recycle. The measurement standard is no longer which generation it belongs to, but which Region has the most garbage and chooses the recycling value. The highest Region is recovered, which is also the origin of the name "Garbage First".

Although G1 still retains the concept of generation, the new generation and the old generation are no longer two continuous memory areas that are fixed. They are composed of a series of Regions, and each GC, the new generation and the old generation The space size of the old generation will be adjusted dynamically. The reason why G1 can control the pause time of GC and establish a predictable pause time model is because it regards the Region as the smallest unit of a single recovery, and the memory space recovered each time is an integer multiple of the size of the Region, so that it can be avoided. Garbage collection is performed across the entire Java heap.

G1 will track the amount of garbage in each Region, calculate the recycling value of each Region, maintain a priority list in the background, and then give priority to recycling the "most garbage" Region according to the allowed GC pause time set by the user, so as to ensure G1 is able to reclaim as much memory as is available in a limited amount of time.

The entire recycling cycle of G1 can be roughly divided into the following stages:

The memory in the Eden area is exhausted, triggering the new generation GC to start recycling the Eden area and the Survivor area. After the new generation GC, the Eden area will be cleared, at least one object will be kept in the Survivor area, and the remaining objects will either be cleaned up or promoted to the old generation. During this process, the size of the young generation may be adjusted.
Concurrent marking cycle
2.1 Initial marking : only marking objects directly related to GC Roots will be accompanied by a new generation GC and will cause STW.
2.2 Root area scanning : The new generation GC triggered during the initial marking will clear the Eden area, and the surviving objects will be moved to the Survivor area. At this time, it is necessary to scan the old area directly accessible from the Survivor area and mark these objects. This process Can be executed concurrently.
2.3 Concurrent marking : Similar to CMS, it will scan and find surviving objects in the entire heap and mark them, without triggering STW.
2.4 Re-marking: trigger STW, and correct the references between objects due to the continued execution of user threads during concurrent marking.
2.5 Exclusive cleaning: trigger STW, calculate the recovery value of each Region, sort the Regions, and identify areas that can be mixed and recycled.
2.6 Concurrent Cleanup: Identify and clean up completely idle Regions without causing pauses.
Mixed recycling: In the concurrent cleaning phase of the concurrent marking cycle, although G1 also reclaims some space, the proportion is still quite low. But after that, G1 has clearly known the recovery value of each Region. In the mixed collection phase, G1 will give priority to recycling the Region with the most garbage. These Regions include both the new generation and the old generation, so they are called "mixed collection". The surviving objects in the cleaned Region will be moved to other Regions, which also avoids memory fragmentation.

Like CMS, because user threads are still running during concurrent recycling, that is, memory is allocated, so if the recovery speed cannot keep up with the speed of memory allocation, G1 will also trigger a Full GC when necessary to obtain more available memory.

Use parameters -XX:+UseG1GCto open the G1 collector -XX:MaxGCPauseMillisto set the target maximum pause time, and G1 will work towards this goal. If the GC pause time exceeds the target time, G1 will try to adjust the ratio of the new generation to the old generation, the heap size, and promotion A series of parameters such as age in an attempt to achieve preset goals.
-XX:ParallelGCThreadsIt is used to set the number of GC threads during parallel collection
-XX:InitiatingHeapOccupancyPercentand is used to specify how much the usage of the entire Java heap will trigger the execution of the concurrent marking cycle. The default value is 45.

4.3.2 Future-oriented ZGC

ZGC is a low-latency garbage collector that was added in JDK11. Its goal is to control the pause time of GC under any heap memory size on the premise that it has as little impact on throughput as possible. within ten milliseconds.

ZGC is oriented to super large heaps and supports a maximum heap space of 4TB. Like G1, it also adopts the Region memory layout form.

One of the biggest features of ZGC is that it uses Colored Pointer technology to mark objects. In the past, if the JVM needs to store some additional data on the object that is only used by the GC or the JVM itself (such as GC age, biased thread ID, hash code), it usually adds additional fields to the object header of the object to Record. ZGC is powerful, directly recording the tag information on the reference pointer of the object.

What is Colored Pointer? Why can the pointer of the object reference itself store data?
In a 64-bit system, the theoretically accessible memory size is 2 to the power of 64 bytes, that is, 16EB. But in fact, such a large amount of memory is far from being used at present, so based on performance and cost considerations, both the CPU and the operating system will impose their own constraints. For example, the AMD64 architecture only supports a 54-bit (4PB) address bus, Linux only supports a 46-bit (64TB) physical address bus, and Windows only supports a 44-bit (16TB) physical address bus.

Under the Linux system, the upper 18 bits cannot be used for addressing, and the remaining 46 bits can support a maximum memory size of 64TB. In fact, the memory size of 64TB is far beyond the needs of the server at present. So ZGC focused on the remaining 46-bit pointer width, and extracted its upper 4 bits to store four flag information. Through these flags, the JVM can directly see the three-color mark status of its referenced object from the pointer, whether it has entered the reallocation set (that is, has been moved), and whether it can only be accessed through the finalize() method. This results in only 42 bits of the physical address bus that the JVM can use, that is, the maximum memory space that ZGC can manage is 2 to the power of 42 bytes, or 4TB.

insert image description here
At present, ZGC is still in the experimental stage, and there is not much information that can be found. I will sort out and update it later.

An article to thoroughly understand GC