Detailed explanation of Java's garbage collection mechanism - from entry to unearthed, if you can't learn it, come and kill me!

First of all, we need to know which memory needs to be recovered?

Which memory needs to be reclaimed

Among the various parts of the Java memory runtime area, the two areas of the heap and the method area have significant uncertainty: the memory required by multiple implementation classes of an interface may be different, and the execution of a method may be different. The memory required by the conditional branch may also be different. Only during the runtime can we know which objects the program will create and how many objects will be created. The allocation and recovery of this part of memory is dynamic.

What the garbage collector focuses on is how to manage the memory in the heap and method area. The memory allocation and recovery we usually refer to only refers to this part of memory.

Collection Heap: Definition of Garbage

Reference counting algorithm:

Add a reference counter to the object:

  • Every time there is a reference to it, the counter value is increased by one;
  • When the reference becomes invalid, the counter value is decremented by one;

Objects whose counters are zero at any time are no longer usable

However, in the field of Java, at least mainstream Java virtual machines do not use reference counting algorithms to manage memory . For example, simple reference counting is difficult to solve the problem of circular references between objects .

To give a simple example: the objects objA and objB both have the field instance,
and the assignment makes objA. It is impossible to be accessed again, but because they refer to each other, their reference counts are not zero, and the reference counting algorithm cannot recycle them.

Reachability analysis algorithm:

The memory management subsystem of the current mainstream commercial programming language uses the Reachability Analysis algorithm to determine whether the object is alive.

The basic idea of ​​this algorithm is to use a series of root objects called "GC Roots" as the starting node set. From these nodes, search downwards according to the reference relationship. The path traveled by the search process is called "reference chain" ( Reference Chain), if there is no reference link between an object and GC Roots, or in terms of graph theory, when the object is unreachable from GC Roots, it proves that the object cannot be used anymore.

As shown in the figure below, although objects object 5, object 6, and object 7 are related to each other, they are not reachable to GC Roots, so they will be judged as recyclable objects:

insert image description here

Objects of GC Roots

In the Java technology system, objects that can be fixed as GC Roots include the following:

  • Objects referenced in the virtual machine stack (local variable table in the stack frame), such as parameters, local variables, temporary variables, etc. used in the method stack of each thread being called.
  • Objects referenced by class static properties in the method area, such as a reference type static variable of a Java class.
  • Objects referenced by constants in the method area, such as references in the string constant pool (String Table).
  • Objects referenced by JNI (commonly known as Native methods) in the local method stack
  • References inside the Java virtual machine, such as the Class object corresponding to the basic data type , some resident exception objects (such as NullPointExcepiton, OutOfMemoryError), etc., and the system class loader .
  • All objects held by synchronization locks (synchronized keyword)
  • JMXBean that reflects the internal situation of the Java virtual machine, callbacks registered in JVMTI, and local code cachewait.

Recycling Method Area: Definition of Garbage

Garbage collection in the method area mainly recycles two parts:Obsolete constants and types that are no longer used

Recycling obsolete constants is very similar to reclaiming objects in the Java heap

Give an example of literal recycling in the constant pool:
If a string "java" has entered the constant pool, but the current system does not have any string object whose value is "java", in other words, there is no string The object references the "java" constant in the constant pool, and there is no other reference to this literal in the virtual machine. If memory reclamation occurs at this time, and the garbage collector judges that it is necessary, the "java" constant will be cleaned out of the constant pool by the system. The symbolic references of other classes (interfaces), methods, and fields in the constant pool are also similar.

It is relatively simple to determine whether a constant is "obsolete". It is OK to see if there is any reference to this constant, but the conditions for determining whether a type belongs to a "class that is no longer used" are more stringent.

The following three conditions need to be met at the same time:

  • All instances of this class have been recycled, that is, there are no instances of this class and any derived subclasses in the Java heap
  • The class loader that loaded this class has been recycled, unless this condition is carefully designed to replace the class loader scenario, such as OSGi, JSP reloading, etc., it is usually difficult to achieve.
  • The java.lang.Class object corresponding to this class is not referenced anywhere, and the methods of this class cannot be accessed through reflection anywhere

how to recycle trash

For the deduction of generational collection theory and garbage collection algorithm, you can refer to another blog post of mine: https://yangyongli.blog.csdn.net/article/details/126473167

After the continuous development of gc in the above article, it has gradually come to the gc we use now:

Summary of Garbage Collection Algorithms

Mark-clear algorithm (applicable to the old generation, but basically abandoned)

This is the earliest and most basic algorithm. First mark the objects to be recycled. After the marking is completed, the marked objects are uniformly recycled.

Disadvantages:
1. The execution efficiency is unstable. As the number of objects increases, the number of markings increases, and the number of clearing also increases, and the execution efficiency decreases. 2. The memory space is fragmented. If the cleared and reserved object memory are interleaved, it will As a result, the memory space is fragmented. Even if the space is sufficient, new objects cannot be created due to fragmentation, resulting in wasted space.

Mark-copy algorithm (commonly used in the new generation now)

Its implementation principle is to divide the memory space into two spaces, only use one of them each time, and copy the surviving objects to the other space. If a large number of objects are recycled, only a small part needs to be copied, so it is suitable for new students. generation. (fast speed and high efficiency)

Disadvantages: The cost is obvious and half the space needs to be wasted

However, a better half-area replication strategy—"Appel"-style recycling, will be derived later in response to the characteristics of objects being born and extinguished. The specific strategy is to divide the new generation space into a larger Eden space and two smaller Survivor spaces, among which HotSpot The default ratio of the virtual machine is Eden: Survivor=8:1, which means that each new generation has 90% of the space (Survivor80+Eden10), which greatly reduces the waste of space, and there will also be such a problem. If the total is 100MB Space, this recycling object is 20MB. Obviously, the remaining 10MB cannot be occupied. No one can guarantee that this situation will not happen. At this time, Appel adopted the "escape door" design - memory allocation guarantee

Memory allocation guarantee : It is to put the part that Survivor cannot stand directly into the old generation. It is equivalent to putting the 20MB in the above example directly into the old generation (that is, the special case of promoting the old generation) .

Marking-collation algorithm (commonly used in the old generation now)

Different from mark-clear, after the surviving objects are marked, there is no direct clearing but the surviving objects are sorted together at one end, which solves the problem of space fragmentation.

Because of the characteristics of the old generation, only a small part is cleared. There will be a large number of live objects in the old generation, and the object index needs to be updated, which leads to a load on the system, and certain operations on such objects must be performed when the system is stopped. This is the "Stop The World" that we often hear.

In addition, there is another method to take one step at a time with the mud, which is to use the mark and clear method in normal times, and after fragmentation occurs, until it affects the object allocation, then use the mark and clear method to obtain a complete space. This processing method is adopted by the CMS collector based on the mark-clear algorithm .


Let's take a look at how the JVM recycles. There are three types of GC in the JVM:

Types of JVM GC

The common GC of JVM includes three types:Minor GC,Major GC与Full GC

For the implementation of HotSpot VM, the GC in it is divided into two types according to the recovery area:

  • One is partial collection (Partial GC)
  • One is full heap collection (Full GC)

Partial collection (Partial GC): It is not a garbage collection that completely collects the entire Java heap, which is divided into:

  • New generation collection (Minor GC/Young GC): only the new generation of garbage collection
  • Old generation collection (Major GC/Old GC): only old generation garbage collection
  • Mixed collection (Mixed GC): collect the garbage collection of the entire new generation and part of the old generation. Currently, only G1 GC has this behavior

Whole heap collection (Full GC): Garbage collection of the entire Java heap and method area

Note: When the JVM is performing GC, not all areas (new generation, old generation, method area) are recycled together every time. Most of the time, the recovery refers to the new generation

GC trigger mechanism

Young generation GC (Minor GC) trigger mechanism

Trigger mechanism:

  • Minor GC is triggered when the young generation runs out of spaceThe lack of space in the young generation here means that the Eden area is full, and the Survivor area will not trigger GC when the Survivor area is full.(Every Minor GC will clean up the memory of the young generation)

Because most Java objects have the characteristics of life and death, Minor GC is very frequent, and the recovery speed is generally faster. Minor GC
will cause STW (stop to world), suspend other user threads, and wait for the garbage collection to complete before the user thread resumes run

Old generation GC (Major GC/Full GC) trigger mechanism

Refers to the GC that occurs in the old age. When the object disappears from the old age, we say that "Major GC or Full GC" happened,

Generally, Major GC appears,Often accompanied by at least one Minor GC(But not absolutely, in the collection strategy of the Parallel Scavenge collector, there is a strategy selection process for Major GC directly)

Trigger mechanism:

  • That is, when there is insufficient space in the old generation, it will try to trigger Minor GC first, and if there is not enough space later, it will trigger Major GC

PS:
The speed of Major GC is generally more than 10 times slower than Minor GC, and the STW time is longer.
If the memory is not enough after Major GC, OOM will be reported.
The speed of Major GC is generally more than 10 times slower than Minor GC.

Full GC trigger mechanism

There are five situations that trigger Full GC execution:

  • Call System.gc(), the system recommends to execute Full GC, but not necessarily
  • Insufficient space in the old generation
  • Insufficient space in method area
  • After Minor GC, the average size of the old generation is larger than the available memory of the old generation
  • When copying from Eden area and from area to to area, if the size of the object is larger than the available memory in the to area, the object is transferred to the old generation, and the available memory in the old generation is smaller than the object size(What if it is not enough after GC? Then it goes without saying: OOM exception is sent)

In addition, pay special attention: full GC should be avoided as much as possible in development or tuning

Why do you need to divide the Java heap into generations?

After research, different objects have different life cycles, and 70%-99% of objects are temporary objects

In fact, it is completely possible to not divide generations. The only reason for generation is to optimize GC performance. If there is no generation, then all objects are in one area. When GC is required, all objects need to be traversed. When GC It will suspend the user thread, so in this case, it will consume a lot of performance. However, most of the objects are born and died. Why not divide the objects that live for a long time? It is enough to recycle dead objects. In short, the areas that are easy to die are frequently reclaimed, and the areas that are not easy to die are reclaimed less.

This is actually based on the theory of generational recycling. For details, please refer to my other blog post: https://blog.csdn.net/weixin_45525272/article/details/126473167

What is a complete GC process in the JVM?

Newly created objects are generally allocated in the new generation. The commonly used new generation garbage collector is the ParNew garbage collector, which divides the new generation into Eden area and two Survivor areas according to 8:1:1.At a certain moment, the objects we created will completely fill up the Eden area, and this object is the last object that fills up the new generation. at this time,Minor GC is triggered

Before the official Minor GC, the JVM will first check whether the objects in the new generation are larger or smaller than the remaining space in the old generation

Why do such a check?
The reason is very simple. If the Survivor area cannot hold the remaining objects after the Minor GC, these objects will enter the old generation, so it is necessary to check in advance whether the old generation is sufficient.

So there are two situations:

  1. If the remaining space in the old generation is larger than the object size in the new generation, then Minor GC is performed directly. After the GC, the survivor is not enough, and the old generation is absolutely enough.

  2. The remaining space in the old generation is smaller than the object size in the new generation. At this time, it is necessary to check whether the "Old Generation Space Allocation Guarantee Rule" is enabled., specifically to see if the -XX:-HandlePromotionFailure parameter is set.

    The old generation space allocation guarantee rules are as follows: if the remaining space in the old generation is greater than the size of the remaining objects after the previous Minor GC, then Minor GC is allowed.
    Because in terms of probability, if you let go of the previous one, you should also let it go this time.

  • There are also two situations for the allocation of guarantee rules at this time:
    • The size of the remaining space in the old generation is greater than the size of the remaining objects after the previous Minor GC, and Minor GC is performed
    • The size of the remaining space in the old generation is smaller than the size of the remaining objects after the previous Minor GC. Perform a Full GC to empty the old generation and then check

Enabling the space allocation guarantee rules in the old generation can only be said to be highly probable. The remaining objects after the Minor GC are enough to be placed in the old generation, so of course there will be contingencies. After the Minor GC, there will be three situations:

  1. The objects after Minor GC are enough to be placed in the Survivor area, everyone is happy, GC ends
  2. The objects after Minor GC are not enough to be placed in the Survivor area, and then enter the old generation, and the old generation can be put down, that's okay, the GC is over;
  3. The objects after the Minor GC are not enough to be placed in the Survivor area, and the old generation cannot be placed, so it can only be Full GC

The above are all examples of successful GC, and there are 3 other situations that will cause GC to fail and report OOM:

  1. Immediately after the Full GC in the previous section, the old generation still can't put the remaining objects, so it can only OOM
  2. The old generation allocation guarantee mechanism is not enabled, and after a Full GC, the old generation still cannot hold the remaining objects and can only OOM
  3. The old generation allocation guarantee mechanism is enabled, but the guarantee fails. After a Full GC, the old generation still cannot put down the remaining objects, and it can also be OOM

Notes on JVM GC:

What does Full GC cause?

Full GC will "Stop The World", that is, the user's application will be suspended throughout the GC period. So when we develop, try to let him Full GC as little as possible.

When does the JVM trigger GC?

We summarize the above process, that is to say:
when the space in the Eden area is exhausted, the Java virtual machine will trigger a Minor GC to collect the garbage of the new generation, and the surviving objects will be sent to the Survivor area. Simple say that whenMinor GC is triggered when the Eden area of ​​the new generation is full

In serial GC,If the remaining memory of the old generation is smaller than the average size of the previous promotion from the young generation to the old generation, Full GC is performed. In concurrent collectors such as CMS, the memory usage of the old generation is checked every once in a while, and Full GC recycling is performed when it exceeds a certain percentage.

How to reduce the number of FullGC?

The following measures can be taken to reduce the number of Full GCs:

  1. Increase the space of the method area;
  2. Increase the space of the old generation;
  3. Reduce the space of the new generation;
  4. The System.gc() method is prohibited;
  5. Use the mark-sort algorithm to keep a large continuous memory space as much as possible;
  6. Troubleshoot useless large objects in your code.

Why can't the old generation be copied using marks?

Because the objects retained in the old generation are hard to perish, and the mark copy algorithm needs to perform more copy operations when the object survival rate is high, the efficiency will be reduced, so this algorithm cannot be directly used in the old generation.

Why is the new generation divided into Eden and Survivor?

Most of the current commercial Java virtual machines first use the "mark-copy algorithm" to recycle the new generation. The algorithm used the "half-region copy" mechanism for garbage collection in the early days. It divides the available memory into two pieces of equal size according to the capacity, and only uses one of them at a time. When the memory of this block is used up, copy the surviving object to another block, and then clean up the used memory space at one time. This is simple to implement and efficient to run, but its flaws are also obvious. The price of this copy recovery algorithm is to reduce the available memory to half of the original, which is a bit too much space waste.

Why are there two Survivor regions?

The biggest advantage of setting two Survivor areas is to solve memory fragmentation

Let's first assume that Survivor has only one area. After the Minor GC is executed, the Eden area is cleared, and the surviving objects are placed in the Survivor area, whileSome objects in the previous Survivor area may also need to be cleared. The question is, how do we clear them at this time?

In this scenario,we can only mark clear, and we know that the biggest problem of marking and clearing is memory fragmentation. In the area that often dies like the new generation,The use of mark clearing will inevitably lead to severe fragmentation of memory

But we now set up two Survivor areas, because Survivor has two areas, so each Minor GC will copy the surviving objects in the Eden area and the From area to the To area, and then the Eden area and the From area can be cleared. During the second Minor GC, the From and To responsibilities are exchanged. At this time, the surviving objects in the Eden area and the To area will be copied to the From area, and so on.

The biggest advantage of this mechanism is that during the whole process, one Survivor space is always empty, and the other non-empty Survivor space is free of fragments. So, why doesn't Survivor divide into more blocks? For example, divide it into three, four, or five? Obviously, if the Survivor area is further subdivided, the space of each block will be relatively small, which will easily lead to the fullness of the Survivor area.Two Survivor areas may be the best solution after weighing

Why do the new generation and the old generation use different recycling algorithms?

If most of the objects in an area are dying and it is difficult to survive the garbage collection process, then put them together and only focus on how to keep a small number of surviving objects each time instead of marking a large number of objects that will be collected , a large amount of space can be recovered at a relatively low cost.

If the rest are hard-to-perish objects, put them together, and the virtual machine can reclaim this area with a lower frequency, which takes into account both the time overhead of garbage collection and the effective use of memory space.

The new generation is in line with the first situation, so using the mark-copy algorithm can better remove a large amount of garbage and retain a small number of surviving objects; while in the old generation, only a small number of objects are recycled. In order to avoid space fragmentation problems, mark - The collation algorithm is optimal.

Guess you like

Origin blog.csdn.net/weixin_45525272/article/details/126469398