[JVM] Garbage collection mechanism

 Hello, hello, hello everyone~ I am your old friend: Protect Xiao Zhouღ  


What I bring to you today is  the garbage collection mechanism of JVM (Java Virtual Machine). What does recycling mean? How to determine the memory to be recycled: reference counting, reachability analysis, how to release space: mark clearing, copy algorithm, mark sorting, generational recycling , let’s take a look~


This issue is included in the blogger's column : JavaEE_Protect Xiao Zhouღ's blog-CSDN blog

It is suitable for programming beginners. Interested friends can subscribe and view other "JavaEE basics".

Stay tuned for more excitement: Protect Xiao Zhouღ *★,°*:.☆( ̄▽ ̄)/$:*.°★* '


Various areas of Java runtime memory. For the three parts of the program counter, virtual machine stack, and local method stack, their
life cycles are related to the relevant threads and end with the end of the thread.
Because when the method ends or the thread ends, the memory will naturally be recycled with the thread.
Almost all instance objects are stored in the Java heap. Before garbage collecting the heap, the garbage collector must first determine which of these objects are still alive and which have "died". A dead object can be simply understood as: the current object is no longer usable.

The above is a mechanism to determine whether an object is alive.


1. Garbage collection

Garbage collection (GC) is to help programmers release memory at their own discretion. In C language, the memory we dynamically open (malloc) needs to be manually released by calling free() when not in use. If it is not released, What's the harm? The memory space is fixed. When the program is running, it will apply for a large amount of memory space from the memory. When some data is no longer used and the memory is not released, these invalid data will always occupy the memory space. As the invalid data becomes more and more More and more, the memory gradually runs out, causing memory overflow (throwing an exception), program crash, and system downtime, just restart it~

All in all, memory leaks are the lifelong enemy of C/C++ programmers.

But the blogger studied Java. Subsequent programming languages ​​such as Java introduced garbage collection to solve the above problems, which can effectively reduce memory leaks. Unless your program loops\recursively applies for space, there is no termination condition.

The release of memory is a complicated matter:

The approach of C/C++ is to let the programmer decide when to release the memory, which depends on the programmer's level.

Java automatically determines through some strategies of the JVM, and the accuracy is relatively high, but it also means that there will be some price to pay in terms of performance.


There are several areas of memory in the JVM. When we talk about memory recycling, what part of the space does it specifically reclaim?

The program counter is a simple integer that stores addresses. It is used to store the address of the instruction currently being executed or the address of the next instruction to be executed. Destroyed when the thread is destroyed.

The Java virtual machine stack is a memory area provided by the Java virtual machine for executing Java methods to support method invocation and execution. It consists of stack frames, which saves method parameters, local variables, return values ​​and other information. ends with the end of the method

The local method stack is mainly maintained by the JDK, which integrates some local methods or class objects written in the C/C++ language. The size of the local method stack is configurable and can be specified through JVM parameters. If the local method stack space is insufficient, a stack overflow exception (StackOverflowError) will be thrown.

The Method Area is a memory area in the Java Virtual Machine (JVM) that is used to store class structure information, constants, static variables, compiler-compiled code and other data. It is shared by all threads. Unlike the heap area, each thread has its own independent method stack and program counter.

Heap area: The main storage is the objects behind class instances. The heap area is also the main area for garbage collection. GC can also be thought of as releasing objects in units.


Garbage collection is mainly divided into two stages: This is also the main strategy of concern
1. Determine who is garbage

2. Strategies for deleting junk


2. Determine the garbage

Garbage collection mainly focuses on objects in the heap area . If an object is never used again, it can be considered garbage.

An object can be used in Java only by reference.

If an object has no reference to it, then the object must not be used and can be considered garbage.

If an object is no longer needed, but there are still references to maintain the object, it is not considered garbage at this time.

In Java, it is simply determined whether an object has references to maintain , to determine whether it is garbage.

Java is conservative in identifying garbage objects. It is a mistake if they are not released in time. If useful objects are accidentally killed, the program may not be able to continue execution.

How to determine whether an object has a reference to it?

Here are two strategies to explain to you:  1. Reference counting 2. Reachability analysis


2.1 Reference counting

Reference counting: Allocate extra space to the object and save an integer, indicating how many references the object has. Currently, Java does not use this strategy. Python uses a reference counting strategy.

But reference counting has two disadvantages: 

1. Additional space needs to be opened for counting.

2. There is a circular reference. Will cause an error in the reference counting decision logic

for example:

class Test {
    public Test n; 
}


Test a = new Test();  // 此时 a 引用指向的 Test 对象计数器 = 1
Test b = new Test();  // 此时 b 引用指向的 Test 对象计数器 = 1

a.n = b; // 此时 b 引用指向的 Test 对象又被 a-> Test 的成员n 引用指向, b-> Test 计数器 +1 = 2

b.n = a; // 此时 a 引用指向的 Test 对象又被 b-> Test 的成员n 引用指向, a -> Test 计数器 +1 = 2

If the references of a and b are destroyed at this time, the counter of the Test object referenced by each one will be - 1, but the member variable n of the Test object still maintains the reference. The counters of these two objects are not 0 and cannot be used as garbage, but these two objects It is no longer available. This is a logical cycle.

Java does not use reference counting as a strategy to determine garbage, but uses reachability analysis~

Other memory management techniques can be used to solve the reference counting problem, such as garbage collection (Garbage Collection). The garbage collector can track the reference relationships between objects, traverse the reachable objects starting from the root object, and mark the unreachable objects as garbage and recycle them. The garbage collector can solve the problem of circular references and automatically reclaim unused memory at the appropriate time, reducing the workload of manual memory management.


2.2 Reachability Analysis (Java Strategy)

Reachability analysis understands the reference relationship between objects as a tree structure, starting from some special starting points and traversing it. If the object can be traversed, it is "reachable", then "unreachable" means "unreachable". Just do the garbage disposal. 

With the above strategy, the garbage collector can solve the problem of circular references.

Key points of reachability analysis: Traversal needs a starting point, and the following can be used as a starting point.

1. Local variables (references) on the stack are all "starting points" and will be traversed.

2. Objects referenced in the constant pool

3. In the method area, the object referenced by the static member.

Reachability analysis: Starting from the starting point, see if other objects can be accessed in the object, follow the clues, traverse all accessible objects, and mark the objects as "reachable" while traversing, then The rest is "unreachable" and can be considered garbage.

Disadvantages of reachability analysis: 

1. Because judging whether an object is garbage requires traversal from the starting point, which means it takes more time, and when an object becomes garbage, it may not be discovered in time.

2. When performing reachability analysis, it is necessary to follow the clues. During this process, if the object reference relationship in the current code changes, it cannot be detected in time. Therefore, in order to complete the traversal mark more accurately, it is necessary to It requires other business threads to pause their work (STW problem), which is also the biggest shortcoming of the Java garbage collection mechanism.


3. Release “garbage” objects

Through the above explanation, we already know how Java determines "garbage" and uses reachability analysis. Once the waste is identified, it needs to be disposed of. Regarding garbage disposal mechanisms, there are three typical strategies:

3.1 Mark Clearance

Using mark clearing, the marked garbage object is directly processed and released. However, the obvious disadvantage of this method is that it will produce memory fragmentation. When applying for space, you apply for "a continuous storage space" , but now the free space of the memory It is a discrete , independent space. For a discrete space like this, assuming there is 1 G, but if you want to apply for 200M of space at one time, you may not be able to apply for it.


3.2 Replication algorithm

Copy algorithm: Divide the entire memory space into two sections, use only half of it at a time, copy the non-garbage objects to the other side, and then release the entire area uniformly.

The copy algorithm can solve the problem of memory fragmentation, but its shortcomings are also obvious:
1. Only half of the memory is used at a time, and the memory utilization rate is relatively low.

2. If there are relatively few objects marked for deletion, and most of them are objects to be retained, then the cost of copying the objects to be retained to the other side will be relatively high.


3.3 Tag organization

Similar to ArrayList (sequential list) deleting intermediate elements, in order to ensure the continuity of data, there is a transfer process.

Mark defragmentation solves the problem of memory fragmentation and improves memory utilization. However, the overhead of moving data is also very high after each clearing of mark elements.


According to the above three strategies for clearing marked objects, we know that mark clearing will produce memory fragments, and the copy algorithm does not utilize memory well. When there is a lot of data to be retained, a large number of copy operations will be performed. Logo organization perfectly improves the above problems, but it also involves a large amount of data movement.

Therefore, the implementation idea of ​​JVM clearing marks combines the above three ideas - the idea of ​​generational recycling

3.4 Generational recycling

Generational recycling sets the concept of "age" for objects, which is used to describe how long the object has existed.

If an object is just born, it is considered to be 0 years old. Each time it goes through a round of reachability analysis , the unmarked object will be one year old. This age is used to distinguish the survival time of the object.

Then different recycling strategies can be adopted according to objects of different age groups.

 1. Newly created objects are placed in the Eden area.
When the garbage collection (GC) scan (feasibility analysis) reaches the Eden area, most objects will be eliminated in the
young generation in the first round of GC. Mainly used The recycling strategy is the copying algorithm (Copying Algorithm) and the mark-sweep algorithm ( Mark-Sweep Algorithm )

 2. If the objects in the Eden area are not eliminated in the first round of GC, they will be copied to the survival area through the copy algorithm. The survival area is divided into two parts of the same size, and only half of them are used at a time. When the survival area The object is marked after a round of GC, and other surviving objects will be copied to another unused survivor area through the copy algorithm, and then all marked objects will be cleared. So alternately.

 3. When an object is in the survival area and has not been eliminated after several discussions, and its age increases to a certain level, it will be copied to the old area through the copy algorithm. The probability of being eliminated for objects entering the old area is very low. Therefore, At this time, the frequency of GC in the old area is also reduced. If an object is found to be garbage in the old area, it will be deleted by marking it.

 4. In special cases, when an object is very large, it directly enters the old area (the cost of copying large objects is relatively high)


Okay, here we are, the  [JVM] garbage collection mechanism  blogger has finished sharing. This is just a simple conceptual understanding. I hope it will be helpful to everyone. If there is anything wrong, you are welcome to criticize and correct me. 

Thank you to everyone who has read this article, please stay tuned for more exciting content: Protect Xiao Zhouღ *★,°*:.☆( ̄▽ ̄)/$:*.°★* 

When I meet you, all the stars fall on my head ...

Guess you like

Origin blog.csdn.net/weixin_67603503/article/details/132707236