JVM GC Garbage Collection

1. Why learn the garbage collection mechanism

In daily development, we don't care much about the recycling and release of objects, because these tasks are done for us by the JVM. However, it is very important for our developers to master the garbage collection mechanism. Although the JVM helps us do these tasks and reduces the workload, unreasonable garbage collection mechanisms often lead to bottlenecks in system performance.

2. What does GC do

  • What memory needs to be reclaimed?
  • When is it recycled?
  • How to recycle?

3. How to determine which objects have died

1. Reference Counting

  In Java, objects and references are associated, and if you want to manipulate objects, you must use references. Therefore, reference counting can be used to determine whether an object can be recycled.

2. Root search algorithm (GC Roots Trancing)

  To resolve circular references in reference counting. Search through the GC Roots object as a starting point. An object is unreachable if there is no reachable path to it.
  Unreachable objects are not equal to recyclable objects.

4. Garbage collection algorithm

1. Mark-Sweep

  Divided into mark and clear.
  Mark : Mark all objects that need to be recycled.
  Recycling : Reclaim the space occupied by marked objects.
Disadvantages : The memory fragmentation is serious, and there may be a problem that large objects cannot find available space.

2. Copying algorithm (copying)

  An algorithm proposed to solve the defect of Mark-Sweep algorithm memory fragmentation.
Divide the memory into two blocks of equal size according to the memory capacity. Only one of them is used at a time, and when this memory is full, the surviving objects are copied to another one, and the used memory is cleared.
Advantages : Simple implementation, high memory efficiency, and not prone to fragmentation.
Cons : Available memory is compressed to half of what it was. If the number of surviving objects increases, the efficiency of the Copying algorithm will be greatly reduced.

3. Mark-Compact Algorithm (Mark-Compact)

  The marking phase is the same as the Mark-Sweep algorithm. After marking, live objects are moved to one end of memory. Objects outside the end boundaries are then cleared.

4. Generational Collecting

  The approach taken by most JVMs today.
  Core idea : Divide memory into different domains according to the different life cycles of objects. Generally, the GC heap is divided into (Tenured/Old Generation) and Young Generation (Yong Generation).

  • Characteristics of the old generation : Only a small number of objects need to be recycled each time garbage collection occurs.
  • Features of the new generation : Every time garbage collection occurs, a large amount of garbage needs to be recycled.
    Therefore, different algorithms can be selected according to different regions.

5. Incremental collection algorithm

  In the existing collection algorithm, every time the garbage is collected, the application will be in a Stop the World state. In this state, all the threads of the application will be suspended, suspending all normal work, and waiting for the garbage collection. Finish. If the garbage collection time is too long, the application will be suspended for a long time, which will seriously affect the user experience or system stability. If processing all the garbage at one time requires a long pause in the system, then the garbage collection thread and the application thread can be alternated. Each time, the garbage collection thread only collects a small area of ​​memory space, and then switches to the user thread to continue execution. Repeat in turn until garbage collection is complete. During the garbage collection process, the application code is executed intermittently, so the system pause time can be reduced, but because of the consumption of thread switching and context switching, the total cost of garbage collection will increase, resulting in a decrease in system throughput.

1) New Generation and Replication Algorithm

  At present, the GC of most JVMs adopts the copying algorithm for the new generation, because most of the objects are recovered every garbage collection in the new generation, that is, there are fewer operations to be copied, but usually it is not divided according to 1:1 new generation.
  Generally, the new generation is divided into a larger Eden space and two smaller Survivor spaces (From Space, To Space). Each time the Eden space and one of the Survivor spaces are used, when recycling, the two spaces are returned. Surviving objects are copied to another Survivor space.
insert image description here

2) Old Generation and Mark Copy Algorithm

  The old generation reclaims a small number of objects each time, so the Mark-Compact algorithm is used.

  • The permanent generation (Permanet Generation) belonging to the method area is used to store class classes, constants, method descriptions, etc. The collection of the permanent generation mainly includes obsolete constants and useless classes.
  • The memory allocation of objects is mainly in the Eden Space of the new generation and the From Space of the Survivor Space (where Survivor currently stores objects), and in a few cases, it will be directly allocated to the old generation.
  • When the Eden Space and From Space of the new generation are insufficient, a GC will occur. After the GC, the surviving objects in the Eden Space and From Space will be moved to the To Space, and then the Eden Space and From Space will be cleaned up.
  • If To Space cannot store an object enough, the object will be stored in the old generation.
  • After GC, Eden Space and To Space are used, and so on.
  • When the object avoids GC once in the Survivor area, its age will be +1. By default, objects whose age reaches 15 will be moved to the old generation.

Four reference types in Java

strong reference

soft reference

weak quotation

phantom reference

5. GC Generational Collection Algorithm VS Partition Collection Algorithm

1. Generational collection algorithm

  The current mainstream JVM garbage collection adopts the "Generation Collection" (Generation Collection) algorithm, which divides the memory into several blocks according to the life cycle of the object, such as the new generation, old generation, and permanent generation in the JVM. The most suitable GC algorithm can be adopted according to the characteristics of each era.

1) New Generation-Replication Algorithm

  Every garbage collection can find a large number of objects dead, only a small number of stock

2) Old Generation-Mark Replication Algorithm

  Because the object has a high survival rate and there is no extra space to guarantee its allocation, the "mark-clean" or "mark-sort" algorithm must be used for recycling, without memory copying, and free memory is directly vacated.

2. Partition collection algorithm

  The partitioning algorithm divides the entire heap space into different continuous small spaces, and each small space is used independently and recovered independently . The advantage of this is that multiple small spaces can be reclaimed at one time, and according to the target pause time, several small areas (rather than the entire heap) can be reclaimed reasonably each time, thereby reducing the pause caused by a GC.

6. GC Garbage Collector

  The Java heap memory is divided into the new generation and the old generation. The new generation mainly uses the copy and mark-sweep garbage collection algorithm; the old generation mainly uses the mark-sort garbage collection algorithm. Therefore, the Java virtual machine provides multiple different garbage collectors.
insert image description here
Source: Garbage Collector

We can draw several conclusions from the above figure:

  • New Generation Garbage Collector : Serial, ParNew, Parallel Scavenge;
    Old Generation Garbage Collector : Serial Old (MSC), Parallel Old, CMS;
    Whole Heap Garbage Collector : G1
  • The connection between the garbage collectors indicates that they can be used together , and there are the following combinations:
    Serial/Serial Old, Serial/CMS, ParNew/Serial Old, ParNew/CMS, Parallel Scavenge/Serial Old, Parallel Scavenge/Parallel Old, G1 ;
  • Serial collector : Serial: Serial, Serial Old
    Parallel collector : Parallel: Parallel Scavenge, Parallel Old
    Concurrent collector : CMS, G1

Serial garbage collector (single thread, copy algorithm)

  Is the most basic garbage collector, using the copy algorithm. Serial is a single-threaded collector . It not only uses one CPU or one thread to complete garbage collection, but also must suspend all other worker threads while performing garbage collection until the end of garbage collection.
  Although the garbage collector needs to suspend all other working threads during the garbage collection process, it is simple and efficient. For a limited single CPU environment, there is no thread interaction overhead, and the highest single-threaded garbage collection efficiency can be obtained. Therefore, Serial garbage The collector is still the default new generation garbage collector for the java virtual machine running in Client mode .

ParNew Garbage Collector (Serial+Multithreading)

  The ParNew garbage collector is actually a multi-threaded version of the Serial collector . It also uses the replication algorithm. Except for using multi-threaded garbage collection, the rest of the behavior is exactly the same as the Serial collector. During the ParNew garbage collector, all other garbage collectors must also be suspended. of worker threads.
  The ParNew collector opens the same number of threads as the number of CPUs by default, and the number of threads of the garbage collector can be limited by the -XX:ParallelGCThreads parameter.
  Although ParNew is almost identical to the Serial collector except for multithreading, the ParNew garbage collector is the default garbage collector for the new generation of many java virtual machines running in Server mode .

Parallel Scavenge Collector

  The Parallel Scavenge collector is also a new generation garbage collector. It also uses the replication algorithm and is also a multi-threaded garbage collector. The focus of the garbage collector introduced earlier is to minimize the user thread pause time during garbage collection. The Parallel Scanvenge collector is to achieve a controllable throughput. It focuses on the program to achieve a controllable throughput (Thoughput, CPU time for running user code / CPU total consumption time, that is, throughput = running user code time / (running user code time + garbage collection time) ) , high throughput can use the CPU time most efficiently and complete the calculation tasks of the program as soon as possible. It is mainly suitable for tasks that operate in the background without much interaction. The adaptive adjustment strategy is also an important difference between the Parallel Scavenge collector and the ParNew collector.

  • The following two parameters can be used for precise control:
      -XX:MaxGCPauseMills sets the maximum garbage collection pause time
      -XX:GCTimeRatio sets the throughput size

Serial Old Collector

  Serial Old is the old-age version of the Serial garbage collector. It is also a single-threaded collector that uses the mark-sort algorithm. This collector is also the default old-age garbage collector that runs on the default Java virtual machine of the Client.
In Server mode, there are two main purposes:

  • Used in conjunction with the new generation of Parallel Scavenge collectors in versions prior to JDK1.5;
  • As a backup collection scheme using the CMS collector in the old generation.
  • The garbage collection process diagram of the new generation of Serial and the old generation of Serial Old:

Parallel Old Collector

  Parallel Old is an old generation version of the Parallel Scavenge collector, using multithreading and the "mark-sort" algorithm.
  1. Act on the old generation
  2. Multithreading
  3. Use mark-sorting algorithm
  In addition to the above features, the key point is that it can be configured and used with the new generation collector Parallel Scavenge to maximize throughput.

CMS collector

  CMS, the full name is Concurrent Mark Sweep, as the name implies , it is concurrent and uses a mark-sweep algorithm . In addition, this collector is also called the Concurrent Low Pause Collector (Concurrent Low Pause Collector).
  This is a cross-generation garbage collector, which truly enables the garbage collection thread and the user thread (basically) to work at the same time . Compared with the Serial collector's Stop The World (when your mother cleans the room, you can no longer throw garbage on the ground), it really achieves that when your mother cleans the room, you throw garbage at the same time.
  1. Act on the old generation
  2. Multi-threading
  3. Use mark-clear algorithm
  The whole algorithm process is divided into the following 4 steps:
  1. Initial mark (CMS initial mark): only mark objects that can be directly related to GC Root , and the speed is very fast , but "Stop The World" is required  
  . 2. Concurrent mark (CMS concurrent mark): The process of GC Root Tracing is simply to traverse the surviving objects marked in the Initial Marking phase, and then continue to recursively mark the objects that are reachable by these objects.
  3. Remark(CMS Remark): During the correction of concurrent marking, the mark record of the part of the object whose mark changes due to the continued operation of the user program requires "Stop The World". This time is generally longer than the initial mark, but much shorter than the concurrent mark time.
  4. Concurrent sweep (CMS concurrent sweep): Clear the objects marked in the previous step.
  Since the most time-consuming operations in the whole process are the second (concurrent marking) and fourth steps (concurrent clearing), the garbage collector threads in these two steps can work together with user threads. So overall, CMS garbage collection and user threads are executed concurrently.
  Disadvantages:
  ① Sensitive to CPU resources
  Because in the concurrent stage, a part of CPU resources will be occupied , which will slow down the application and reduce the total throughput.
  ②. Generation of floating garbage
  Since the user thread is still working during the concurrent cleanup phase of the CMS, the garbage generated at this time cannot be disposed of by the CMS in this collection, and can only be disposed of in the next GC. This part of garbage is called For " floating garbage ".
  ③. Generate memory garbage fragments
  Because the algorithm adopted is mark-clear, it is obvious that there will be space fragments.

G1 collector

  This is at the forefront of current collector technology developments. It can achieve low-pause memory recovery without sacrificing throughput . It was first launched in JDK8 and is the default garbage collector of JDK9.
  This is because it does not distinguish between the new generation and the old generation like all the garbage collectors introduced earlier, it acts on the whole area . Divide the entire Java heap into multiple independent areas (Regin) of fixed size , and track the garbage accumulation area of ​​these areas, maintain a priority list in the background, and give priority to the area with the most garbage each time according to the allowed collection time, so that It ensures that the G1 collector can obtain the highest collection efficiency within a limited time.
  Compared with the CMS garbage collector mentioned earlier, it has two significant improvements :
  ①. It adopts the mark-sorting recycling algorithm
  so that it will not generate space fragments.
  ② It can precisely control the pause time
  and allow users to specify a length explicitly. In a time slice of M milliseconds, the time spent on garbage collection shall not exceed N milliseconds.
  ③. Acting on the entire Java heap,
  the G1 collector does not distinguish between the young generation and the old generation, and is a garbage collector for the entire heap.

Guess you like

Origin blog.csdn.net/qq_43010602/article/details/112239317