Overview of jvm garbage collection mechanism

d085debffdb34f3a9d5c15643443790e.jpg1. Partitions of jvm memory

 

The memory structure of the JVM includes five major areas: program counter, virtual machine stack, local method stack, heap area, and method area.

Among them, the program counter, virtual machine stack, and local method stack are three areas that are born and destroyed with the thread. Therefore, the memory allocation and recycling of these areas are deterministic, and there is no need to consider the recycling issue too much, because When the method ends or the thread ends, the memory will naturally be recycled. The Java heap area and method area are different. The allocation and recycling of this part of memory are dynamic, which is what the garbage collector needs to focus on. Before the garbage collector recycles the heap area and method area, it must first determine which objects in these areas can be recycled and which ones cannot be recycled yet. This requires the use of an algorithm to determine whether the object is alive.

2. Algorithm for JVM to determine whether an object is alive

Reference counting algorithm

Analysis of Algorithms

Reference counting is an early strategy in garbage collectors. In this approach, each object instance in the heap has a reference count. When an object is created, the object instance is assigned to a variable and the variable count is set to 1. When any other variable is assigned a reference to this object, the count is increased by 1 (a = b, then the counter of the object instance referenced by b + 1), but when a reference of an object instance exceeds the life cycle or is set to When a new value is reached, the reference counter of the object instance is decremented by 1. Any object instance with a reference counter of 0 can be garbage collected. When an object instance is garbage collected, the reference counter of any object instance it references is decremented by one.

advantage:

The reference counting collector can be executed very quickly and interleaved with the execution of the program. It is more beneficial for real-time environments where the program needs not to be interrupted for a long time.

shortcoming:

Unable to detect circular references. If the parent object has a reference to the child object, the child object in turn refers to the parent object. This way, their reference count can never be 0.

code example

Feeling boring? Here’s a piece of code to calm my nerves

public class ReferenceFindTest {            public static void main(String[] args) {        MyObject object1 = new MyObject();        MyObject object2 = new MyObject();        object1.object = object2;        object2.object = object1;        object1 = null;        object2 = null;    }
}

This code is used to verify that the reference counting algorithm cannot detect circular references. The last two sentences assign object1 and object2 to null, which means that the objects pointed to by object1 and object2 can no longer be accessed, but because they refer to each other, their reference counters are not 0, so the garbage collector will never They will not be recycled.

Reachability analysis algorithm

The reachability analysis algorithm is introduced from graph theory in discrete mathematics. The program treats all reference relationships as a graph. Starting from a node GC ROOT, it searches for the corresponding reference node. After finding this node, it continues to search for this node. Reference nodes, after all reference nodes have been searched, the remaining nodes are considered to be nodes that have not been referenced, that is, useless nodes. Useless nodes will be judged to be recyclable objects.
1534147-20190705170222148-1656133565.png

In the Java language, the objects that can be used as GC Roots include the following:

  • Objects referenced in the virtual machine stack (local variable table in the stack frame);
  • Objects referenced by class static properties in the method area; objects referenced by constants in the method area;
  • Object referenced by JNI (Native method) in the local method stack.

References in Java

Whether it is to determine the number of references to an object through the reference counting algorithm, or to determine whether the object's reference chain is reachable through the reachability analysis algorithm, determining whether the object is alive is related to "references". In the Java language, references are divided into four types: strong references, soft references, weak references, and virtual references. The strength of these four references gradually weakens in turn. Strong references are ubiquitous in program code, such as Object obj = new Object(). As long as the strong reference still exists, the garbage collector will never reclaim the referenced object. Soft references are used to describe objects that are useful but not necessary. For objects associated with soft references, these objects will be included in the recycling scope for the second recycling before a memory overflow exception occurs in the system. If there is not enough memory after this recycling, a memory overflow exception will be thrown. Weak references are also used to describe non-essential objects, but their strength is weaker than soft references. Objects associated with weak references can only survive until the next garbage collection occurs. When the garbage collector works, objects associated with only weak references will be recycled regardless of whether the current memory is sufficient. Virtual reference is also called ghost reference or phantom reference (you really know how to choose the name, it looks very magical), which is the weakest kind of reference relationship. Whether an object has a virtual reference has no impact on its lifetime, and it is impossible to obtain an object instance through a virtual reference. Its function is to receive a system notification when this object is recycled by the collector. Don't be intimidated by the concept, and don't worry. It's not off topic yet. If you go deeper, it will be difficult to explain. The editor lists these four concepts to illustrate that both the reference counting algorithm and the reachability analysis algorithm are based on strong references.

The last struggle before the object dies (is recycled)

Even objects that are unreachable in the reachability analysis algorithm are not "necessary to die". At this time, they are temporarily in the "probation" stage. To truly declare an object dead, it must go through the marking process at least twice. First marking: If the object is found to have no reference chain connected to GC Roots after the reachability analysis, it will be marked for the first time; second marking: after the first marking, a filtering will be performed , the filtering condition is whether it is necessary for this object to execute the finalize() method. If the association is not re-established with the reference chain in the finalize() method, it will be marked a second time. The object marked successfully for the second time will actually be recycled. If the object is re-established with the reference chain in the finalize() method, it will escape this recycling.

How to determine whether recycling is needed in the method area

The judgment of whether the content stored in the method area needs to be recycled is different. The main contents recycled in the method area include: discarded constants and useless classes. Deprecated constants can also be judged by the reachability of references, but for useless classes, the following three conditions need to be met at the same time: all instances of the class have been recycled, that is, there are no instances of the class in the Java heap. ; The ClassLoader that loaded this class has been recycled; the java.lang.Class object corresponding to this class is not referenced anywhere, and the methods of this class cannot be accessed through reflection anywhere.

3. Commonly used garbage collection algorithms

Mark-and-sweep algorithm

The mark-sweep algorithm scans from the root collection (GC Roots). After marking the surviving objects, it scans the entire space for unmarked objects and recycles them, as shown in the figure below. The mark-and-sweep algorithm does not need to move objects, it only needs to process non-viable objects. It is extremely efficient when there are many surviving objects. However, because the mark-and-sweep algorithm directly recycles non-viable objects, it will cause memory fragmentation! ![]1534147-20190705170253596-467035981.png

Replication algorithm

The replication algorithm was proposed to overcome the overhead of handles and solve the problem of memory fragmentation. It starts by dividing the heap into an object surface and multiple free surfaces. The program allocates space for the object from the object surface. When the object is full, the garbage collection based on the copying algorithm scans the active objects from the root collection (GC Roots) and adds them Each active object is copied to the free side (so that there are no free holes between the memory occupied by the active objects). In this way, the free side becomes the object side, the original object side becomes the free side, and the program will allocate it in the new object side. Memory.1534147-20190705170336201-2085196616.png

Mark-collation algorithm

The mark-sort algorithm uses the same method as the mark-clear algorithm to mark objects, but it is different when clearing. After reclaiming the space occupied by non-survival objects, all surviving objects will be moved to the left free space and the corresponding pointer. The mark-organize algorithm is based on the mark-clear algorithm and moves objects. Therefore, the cost is higher, but it solves the problem of memory fragmentation. See the figure below for the specific process:1534147-20190705170327078-1434031864.png

Generational collection algorithm

The generational collection algorithm is the algorithm currently used by most JVM garbage collectors. Its core idea is to divide the memory into several different areas according to the life cycle of the object. Generally, the heap area is divided into the Tenured Generation and the Young Generation. There is another generation outside the heap area which is the Permanent Generation. The characteristic of the old generation is that only a small number of objects need to be recycled during each garbage collection, while the characteristic of the new generation is that a large number of objects need to be recycled during each garbage collection. Then the most suitable collection can be adopted according to the characteristics of different generations. algorithm.1534147-20190705170347401-897168005.png

Recycling Algorithm of Young Generation

  • All newly generated objects are first placed in the young generation. The goal of the young generation is to collect objects with short life cycles as quickly as possible.
  • The new generation memory is divided into one eden area and two survivor (survivor0, survivor1) areas in a ratio of 8:1:1. Most objects are generated in the Eden area. When recycling, first copy the surviving objects in the eden area to a survivor0 area, and then clear the eden area. When the survivor0 area is also full, copy the surviving objects in the eden area and survivor0 area to another survivor1 area, and then clear eden and the survivor0 area. At this time, the survivor0 area is empty, and then the survivor0 area and survivor1 area are exchanged. That is, keep the survivor1 area empty, and so on.
  • When the survivor1 area is not enough to store the surviving objects of eden and survivor0, the surviving objects are directly stored in the old generation. If the old generation is also full, a Full GC will be triggered, that is, both the new generation and the old generation will be recycled.
  • The GC that occurs in the new generation is also called Minor GC. Minor GC occurs more frequently (it does not necessarily trigger when the Eden area is full).

Recycling algorithm of Old Generation

  • Objects that still survive after N garbage collections in the young generation will be placed in the old generation. Therefore, it can be considered that the old generation stores objects with long life cycles.
  • The memory is also much larger than that of the new generation (probably the ratio is 1:2). When the old generation memory is full, Major GC, that is, Full GC, is triggered. The frequency of Full GC is relatively low, and the survival time of objects in the old generation is relatively long, and the survival rate mark is high.

Recycling algorithm of Permanent Generation

Used to store static files, such as Java classes, methods, etc. The persistent generation has no significant impact on garbage collection, but some applications may dynamically generate or call some classes, such as Hibernate, etc. In this case, a relatively large persistent generation space needs to be set up to store these newly added classes during operation. The persistent generation is also called the method area

 

Guess you like

Origin blog.csdn.net/weixin_57763462/article/details/133105375