Happy 1024 holidays! ——Java garbage collection mechanism

Java garbage collection mechanism

To carry out garbage collection, the most important question is: determine who is garbage?

Thinking about it in daily life, if something is not used often, then this object can be said to be garbage. The same is true in Java
. If an object can no longer be referenced, then the object is garbage and should be recycled.

Based on this idea, we can easily think of using reference counting to determine garbage. When an object is referenced, it is incremented by one, and when it is dereferenced, it is decremented by one, so that we can determine whether an object is garbage by determining whether the reference count is zero. This method is generally called the "reference counting method".

Although the above method is simple, it has a fatal problem, which is circular reference.

A refers to B, B refers to C, and C refers to A. Their respective reference counts are
1. But these three objects are never referenced by other objects, only themselves refer to each other. From the perspective of garbage judgment, the three of them are indeed not referenced by other objects, but their reference counts are not zero at this time. This is the circular reference problem of reference counting.

Today's Java virtual machine uses the GC Root Tracing algorithm to determine garbage objects. The general process is this:
starting from the GC Root, all reachable objects are living objects, and all unreachable objects are garbage.

You can see that the most important thing here is the GC Root collection. In fact, GC Root
is a collection of active references. But this collection is different from general object collections. These collections are specially selected and usually include:

All currently loaded Java classes, reference type static variables of the Java class, reference type constants in the runtime constant pool of the Java class,
references to objects in the GC heap in some static data structures of the VM, etc. Simply put, GC Root
is A carefully selected set of active citations that are guaranteed to survive. Then the objects extended by these references will naturally survive.

How to do garbage collection?

At this point, we understand what garbage is and how the JVM determines garbage objects. So after identifying garbage objects,
how does the JVM perform garbage collection? That’s what we’re going to talk about next: How to do garbage collection?

Simply put, there are three garbage collection algorithms: mark-and-sweep algorithm, copy algorithm, and mark-compression algorithm.

Mark-and-sweep algorithm. As you can see from the name, it is divided into two phases: marking phase and clearing phase. A feasible implementation is to mark all
reachable objects triggered by GC Root during the marking phase. At this time, all unmarked objects are garbage objects. Then in the cleanup phase, all unmarked objects are cleared. The biggest problem with the mark-and-sweep algorithm is space fragmentation. If there are too many space fragments, it will cause discontinuity in the memory space. Although large objects can also be allocated in discontinuous space, the efficiency is lower than that of continuous memory space.

Replication algorithm. The core idea of ​​the copy algorithm is to divide the original memory space into two blocks, and only use one block at a time. During garbage collection, the surviving objects in the memory being used are copied to the unused memory blocks. Then clear all objects in the memory block being used, and then swap the roles of the two memory blocks to complete garbage collection. The disadvantage of this algorithm is that it halves the memory space, which is a huge waste of memory space.

Flag compression algorithm. The mark compression algorithm can be said to be an optimized version of the mark clearing algorithm, which also needs to go through two stages, namely: mark settlement and compression stages. During the marking phase,
all objects are triggered from the GC Root reference collection. In the compression phase, all surviving objects are compressed on one side of the memory, and then all space outside the boundary is cleared.

Comparing these three algorithms, we can find that they all have their own advantages and disadvantages.

Although the mark and clear algorithm will produce memory fragmentation, it does not need to move too many objects, so it is more suitable for situations where there are many surviving objects. Although the copy algorithm needs to halve the memory space and move surviving objects, there will be no space fragmentation after cleaning, which is more suitable for situations where there are relatively few surviving objects. The mark compression algorithm is an optimized version of the mark clearing algorithm, which reduces space fragmentation.

generational thinking

Just imagine, if we use any algorithm alone, the final garbage collection efficiency will not be very good. In fact, the builders of the JVM
virtual machine also thought so, so they adopted the generational algorithm in the actual garbage collection algorithm.

The so-called generational algorithm
uses different garbage collection algorithms according to different memory areas of JVM memory. For example, for the new generation area with few surviving objects, the replication algorithm is more suitable. In this way, only a small number of objects need to be copied to complete garbage collection, and there will be no memory fragmentation. For areas such as the old generation where there are many surviving objects, it is more suitable to use the mark compression algorithm or the mark clear algorithm, so that there is no need to move too many memory objects.

Imagine if instead of using the generational algorithm, the replication algorithm was used in the old generation. In extreme cases, the survival rate of objects in the old generation can reach 100%, so we need to copy so many objects to another memory area, which is a huge workload.

Here we will talk in depth about the garbage collection algorithm adopted in the new generation. As we said above, the new generation is characterized by few surviving objects and is suitable for the replication algorithm. One of the simplest implementations of the replication algorithm is to use half of the memory and reserve the other half. But in fact we know that in the actual
JVM new generation division, it is not divided into two equal parts of memory. Instead, it is divided into three areas: Eden area, from area, and to area. So why does the JVM
eventually adopt this form instead of dividing it into two memory blocks by 50%?

To answer this question, we need to first have a deep understanding of the characteristics of new generation objects. According to research by IBM, 98% of objects in the new generation
live and die, so there is no need to divide the memory space according to a 1:1 ratio. Therefore, in the HotSpot virtual machine, the JVM
divides the memory into a larger Eden space and two smaller Survivor spaces, and their size ratio is 8:1:1. When recycling, copy the surviving objects in Eden and Survivor to another Survivor space at one time, and finally clean up Eden and the Eden space just used.

In this way, the memory space utilization reaches 90%, and only 10% of the space is wasted. If the memory is divided equally into two pieces of memory, the memory utilization rate is only
50%, and the difference between the two utilization rates is nearly doubled.

Partition thinking

The generational idea divides the object into two parts (new generation and old generation) according to the life cycle length of the object. However, there
is actually a partitioning idea in the JVM, which is to divide the entire heap space into different continuous small intervals.

Each small interval is used independently and recycled independently. The advantage of this algorithm is that it can control how many intervals are recycled at a time and can better control the GC time.

At this point, we
have basically cleared up all JVM garbage collection, from what garbage is at the beginning, to how to judge garbage later, to how to recycle garbage, and to the two important ideas of garbage collection: generational thinking and partitioning thinking. Through such a context, we understand the overall summary of garbage collection. In the following chapters, we'll dive into the details.

Guess you like

Origin blog.csdn.net/Turniper/article/details/120937657