Quickly understand the JAVA garbage collection mechanism

When it comes to garbage collection (Garbage Collection, GC), many people will naturally associate it with Java. In Java, programmers don't need to care about the problems of dynamic memory allocation and garbage collection, all of which are handled by the JVM. As the name suggests, garbage collection is to release the space occupied by garbage, so in Java, what kind of objects are considered "garbage"? So when some objects are determined to be garbage, what kind of strategy is used to reclaim (free up space)? In the current commercial virtual machines, what are the typical garbage collectors? Let's discuss these issues one by one. The following is the table of contents outline for this article:

If there is something wrong, I hope to understand and criticize and correct me, I am very grateful.

1. How to determine that an object is "junk"?

In this section, we first understand one of the most basic questions: What if an object is determined to be "garbage"? Since the task of the garbage collector is to reclaim the space occupied by garbage objects for use by new objects, how does the garbage collector determine that an object is "garbage"? -That is, by what method is used to determine that an object can be recycled.

In Java, objects are associated with references, which means that if you want to manipulate objects, you must use references. So obviously a simple way is to judge whether an object can be recycled by reference counting. Without loss of generality, if an object does not have any references associated with it, it means that the object is basically unlikely to be used elsewhere, and then the object becomes a recyclable object. This method is called reference counting.

This method is characterized by simple implementation and high efficiency, but it cannot solve the problem of circular references, so this method is not used in Java (Python uses reference counting). Look at the following code:

public class Main {
    
    
 public static void main(String[] args) {
    
    
 MyObject object1 = new MyObject();
 MyObject object2 = new MyObject();

 object1.object = object2;
 object2.object = object1;

 object1 = null;
 object2 = null;
 }
}

class MyObject{
    
    
 public Object object = null;
}

The last two sentences assign object1 and object2 to null, which means that the objects pointed to by object1 and object2 can no longer be accessed, but because they refer to each other, their reference counts are not 0, then the garbage collector They will never be recycled.

In order to solve this problem, the reachability analysis method is adopted in Java. The basic idea of ​​this method is to search through a series of "GC Roots" objects as the starting point. If there is no reachable path between "GC Roots" and an object, the object is said to be unreachable, but it should be noted that Objects judged as unreachable may not necessarily become recyclable objects. An object judged as unreachable must go through at least two marking processes to become a recyclable object. If there is still no possibility of escaping to become a recyclable object during the two marking processes, it basically becomes a recyclable object. .

As for the specific operation of the accessibility analysis method, I have not seen it very clearly for the time being. If any friend knows better, please feel free to advise.

Let's look at an example:

Object aobj = new Object ( ) ;
Object bobj = new Object ( ) ;
Object cobj = new Object ( ) ;
aobj = bobj;
aobj = cobj;
cobj = null;
aobj = null;

Which line is likely to make an object a recyclable object? The code in line 7 will cause some objects to become recyclable objects. As for why it is left to the reader to think for himself.

Look at another example:

String str = new String("hello");
SoftReference<String> sr = new SoftReference<String>(new String("java"));
WeakReference<String> wr = new WeakReference<String>(new String("world"));

Which of these three sentences will make the String object a recyclable object? In the second and third sentences, the second sentence will judge the String object as a recyclable object when the memory is insufficient, and the third sentence will judge the String object as a recyclable object under any circumstances.

Finally, let’s summarize the more common situations in which objects are judged as recyclable objects:

(1) Explicitly assign a reference to null or point a reference that already points to an object to a new object, such as the following code:

Object obj = new Object();
obj = null;
Object obj1 = new Object();
Object obj2 = new Object();
obj1 = obj2;

(2) The object pointed to by a local reference, such as the following code:

void fun() {
    
    

.....
 for(int i=0;i<10;i++) {
    
    
 Object obj = new Object();
 System.out.println(obj.getClass());
 } 
}
  1. Only weak references to objects associated with it, such as:
WeakReference<String> wr = new WeakReference<String>(new String("world"));

2. Typical garbage collection algorithm

After determining which garbage can be collected, what the garbage collector has to do is to start garbage collection, but this involves a problem: how to efficiently carry out garbage collection. Since the Java virtual machine specification does not make clear provisions on how to implement the garbage collector, virtual machines of various vendors can implement the garbage collector in different ways, so only the core of several common garbage collection algorithms are discussed here. thought.

1. Mark-Sweep (mark-sweep) algorithm

This is the most basic garbage collection algorithm. The reason why it is the most basic is because it is the easiest to implement and the simplest idea. The mark-sweep algorithm is divided into two stages: the mark stage and the clear stage. The task of the marking phase is to mark all objects that need to be recycled, and the cleaning phase is to reclaim the space occupied by the marked objects. The specific process is shown in the figure below:

It can be easily seen from the figure that the mark-sweep algorithm is easier to implement, but there is a more serious problem that is prone to memory fragmentation. Too much fragmentation may lead to failure to find enough when you need to allocate space for large objects in the subsequent process. Space and trigger a new garbage collection in advance.

2. Copying algorithm

In order to solve the shortcomings of the Mark-Sweep algorithm, the Copying algorithm was proposed. It divides the available memory into two pieces of equal size according to the capacity, and only uses one of them at a time. When this block of memory is used up, copy the surviving objects to another block, and then clear the used memory space at once, so that memory fragmentation is not easy to occur. The specific process is shown in the figure below:

Although this algorithm is simple to implement, efficient in operation and not prone to memory fragmentation, it has made a high price for the use of memory space because the memory that can be used is reduced to half of the original.

Obviously, the efficiency of the Copying algorithm has a lot to do with the number of surviving objects. If there are many surviving objects, the efficiency of the Copying algorithm will be greatly reduced.

3. Mark-Compact (marking-finishing) algorithm

In order to solve the shortcomings of the Copying algorithm and make full use of the memory space, the Mark-Compact algorithm is proposed. The marking phase of this algorithm is the same as Mark-Sweep, but after marking is completed, it does not directly clean up recyclable objects, but moves all surviving objects to one end, and then cleans up the memory outside the end boundary. The specific process is shown in the figure below:

4.Generational Collection (generational collection) algorithm

The generational collection algorithm is currently the algorithm used by most JVM garbage collectors. Its core idea is to divide the memory into several different areas according to the life cycle of the object's survival. Under normal circumstances, the heap area is divided into Tenured Generation and Young Generation. The characteristic of the old generation is that only a small number of objects need to be collected during each garbage collection, while the characteristic of the new generation is that every garbage collection There are a large number of objects that need to be recycled, so the most suitable collection algorithm can be adopted according to the characteristics of different generations.

At present, most garbage collectors adopt the Copying algorithm for the new generation, because each garbage collection in the new generation will reclaim most of the objects, which means that the number of operations that need to be copied is less, but in practice it is not based on 1:1 To divide the space of the Cenozoic generation by proportion, generally speaking, the Cenozoic is divided into a larger Eden space and two smaller Survivor spaces. Each time the Eden space and one of the Survivor spaces are used, when reclaiming, Copy the surviving objects in Eden and Survivor to another Survivor space, and then clean up Eden and the Survivor space just used.

However, since the old age is characterized by only a small number of objects being recycled each time, the Mark-Compact algorithm is generally used.

Note that there is another generation outside the heap area called the permanent generation (Permanet Generation), which is used to store classes, constants, method descriptions, and so on. The recycling of the permanent generation mainly recycles two parts: discarded constants and useless classes.

Three. Typical garbage collector

The garbage collection algorithm is the theoretical basis of memory recovery, and the garbage collector is the specific implementation of memory recovery. The following introduces several garbage collectors provided by the HotSpot (JDK 7) virtual machine. Users can combine collectors used in various generations according to their own needs.

1.Serial/Serial Old

The Serial/Serial Old collector is the most basic and oldest collector. It is a single-threaded collector, and all user threads must be suspended while it is performing garbage collection. The Serial collector is a collector for the new generation and uses the Copying algorithm, and the Serial Old collector is a collector for the old generation and uses the Mark-Compact algorithm. Its advantage is that it is simple and efficient to implement, but its disadvantage is that it will bring pauses to users.

2.ParNew

The ParNew collector is a multi-threaded version of the Serial collector, which uses multiple threads for garbage collection.

3.Parallel Scavenge

The Parallel Scavenge collector is a new-generation multi-threaded collector (parallel collector). It does not need to suspend other user threads during recycling. It uses the Copying algorithm. This collector is different from the previous two collectors. It is mainly to achieve a controllable throughput.

4.Parallel Old

Parallel Old is the old version of the Parallel Scavenge collector (parallel collector), using multi-threading and the Mark-Compact algorithm.

5.CMS

The CMS (Current Mark Sweep) collector is a collector that aims to obtain the shortest recovery pause time. It is a concurrent collector and uses the Mark-Sweep algorithm.

6.G1

The G1 collector is the most cutting-edge result of the development of collector technology today. It is a server-oriented collector that can make full use of a multi-CPU and multi-core environment. So it is a parallel and concurrent collector, and it can build a predictable pause time model.

Let me add something about memory allocation:

image.png

The memory allocation of objects is generally allocated on the heap. Objects are mainly allocated in the Eden Space and From Space of the young generation. In a few cases, they are directly allocated in the old generation. If the space of the Eden Space and From Space of the new generation is insufficient, a GC will be initiated. If after the GC, Eden Space and From Space can accommodate the object, it will be placed in Eden Space and From Space. In the process of GC, the surviving objects in Eden Space and From Space will be moved to To Space, and then Eden Space and From Space will be cleaned up. If during the cleaning process, To Space cannot store an object enough, it will move the object to the old generation. After GC, Eden space and To Space are used, and the surviving objects will be copied to From Space in the next GC, and the cycle is repeated. When an object escapes a GC in the Survivor area, its age will be increased by 1. By default, if the object reaches 15 years of age, it will move to the old age.

Generally speaking, large objects are directly allocated to the old generation. The so-called large objects refer to objects that require a large amount of continuous storage space. The most common type of large object is a large array, such as:
byte[] data = new byte[410241024]

This usually allocates storage space directly in the old generation.

Of course, the allocation rules are not 100% fixed, it depends on which garbage collector combination and JVM related parameters are currently used.

At last

Life is more than perseverance and hard work. Dreams are meaningful pursuits.
Finally, I wish you all early success in your studies, get a satisfactory offer, get a quick promotion and raise your salary, and reach the pinnacle of life.
If you need courseware source software and other materials, add the assistant vx: xcw18874131605 (note: today's headline)

Guess you like

Origin blog.csdn.net/p1830095583/article/details/115255816