JVM garbage collection

Garbage collection is mainly carried out for the heap and method area. The three areas of the program counter, virtual machine stack, and local method stack are thread-private and only exist in the life cycle of the thread, and will disappear after the thread ends, so there is no need to garbage collect these three areas.

Determine whether an object can be recycled

1. Reference counting algorithm

Add a reference counter to the object. When the object increases by a reference, the counter is increased by 1, and when the reference is invalid, the counter is decreased by 1. Objects with a reference count of 0 can be recycled.

In the case of circular references between two objects, the reference counter will never be 0 at this time, making it impossible to recycle them. It is precisely because of the existence of circular references that the Java Virtual Machine does not use the reference counting algorithm.

public class Test {
    
    
	public Object instance = null;
	public static void main(String[] args) {
    
    
		Test a = new Test();
		Test b = new Test();
		a.instance = b;
		b.instance = a;
		a = null;
		b = null;
	}
}

In the above code, the object instances referenced by a and b hold each other's object references, so when we remove the references to the a and b objects, there are still references between the two objects, resulting in two The Test object cannot be recycled.

2. Reachability analysis algorithm

Start the search with GC Roots, the reachable objects are all alive, and the unreachable objects can be recycled.

Insert picture description here

The Java virtual machine uses this algorithm to determine whether an object can be recycled. The objects of GC Roots include the following:

  • Objects referenced in the local variable table in the virtual machine stack
  • Objects referenced in JNI (commonly referred to as Native methods) in the native method stack
  • Objects referenced by class static properties in the method area, such as reference type static variables of a Java class.
  • Objects referenced by constants in the method area, such as references in the string constant pool.
  • References inside the Java virtual machine, such as the Class object corresponding to the basic data type, some resident exception objects, and the system class loader.
  • All objects held by the synchronized lock (synchronized keyword)

3. Recycling of the method area

Because the method area mainly stores permanent generation objects, and the recovery rate of permanent generation objects is much lower than that of the new generation, it is not cost-effective to recover in the method area. Mainly the recovery of obsolete constants and the unloading of classes.

In order to avoid memory overflow, in scenes where reflection and dynamic proxy are used a lot, the virtual machine needs to have the ability to type offload. There are many unloading conditions for classes. The following three conditions need to be met, and the conditions may not necessarily be unloaded if the conditions are met:

  • All instances of this class have been recycled, and there is no instance of this class in the heap at this time.
  • The ClassLoader that loaded this class has been recycled.
  • The Class object corresponding to this class is not referenced anywhere, so the method of this class cannot be accessed through reflection anywhere.

4.finalize()

Destructor similar to C++, used to close external resources. However, try-finally and other methods can be better, and the method is expensive to run, and the uncertainty is large, and the calling order of each object cannot be guaranteed, so it is best not to use it.

When an object can be recycled, if the finalize() method of the object needs to be executed, then it is possible to make the object re-referenced in the method, thereby realizing self-help. Self-rescue can only be done once. If the recovered object calls the finalize() method to save itself, it will not be called again during subsequent recovery.

Reference type

Whether it is judging the number of references to an object through the reference counting algorithm, or judging whether the object is reachable through the reachability analysis algorithm, judging whether the object can be recycled is all related to the reference.

Java provides four reference types with different strengths.

1. Strong references

Objects that are strongly referenced will not be recycled. Use new a new object to create a strong reference.

Object obj = new Object();

2. Soft references

Objects associated with soft references will only be reclaimed when there is not enough memory. Use the SoftReference class to create soft references.

Object obj = new Object();
SoftReference<Object> sf = new SoftReference<Object>(obj);
obj = null; // 使对象只被软引用关联

3. Weak references

The object associated with the weak reference must be recycled, which means that it can only survive until the next garbage collection occurs. Use the WeakReference class to create weak references.

Object obj = new Object();
WeakReference<Object> wf = new WeakReference<Object>(obj);
obj = null;

4. Phantom references

Also known as "ghost reference" or "phantom reference", whether an object has a phantom reference will not affect its survival time, and it is impossible to obtain an object instance through a phantom reference. The only purpose of setting a phantom reference for an object is to receive a system notification when the object is recycled. Use the PhantomReference class to create phantom references.

Object obj = new Object();
PhantomReference<Object> pf = new PhantomReference<Object>(obj, null);
obj = null;

Garbage collection algorithm

Partial collection (Partial GC): Refers to garbage collection where the goal is not to collect the entire Java heap, which is divided into:

  • Minor GC / Young GC: Refers to garbage collection where the target is only the young generation.
  • Old generation collection (Major GC / Old GC): Refers to garbage collection that targets only the old generation. Currently, only the CMS collector will collect the old generation separately. (The term Major GC is a bit confusing. Different references are often used in different materials. According to the context, it is the old generation collection or the whole heap collection to distinguish between them.)
  • Mixed collection (Mixed GC): Refers to the garbage collection that aims to collect the entire young generation and part of the old generation. Currently only the G1 collector will have this behavior.

Full GC: Collect garbage collection of the entire Java heap and method area.

1. Mark-clear algorithm

The earliest and most basic garbage collection algorithm "Mark-Sweep" algorithm (Mark-Sweep) algorithm. Like its name, the algorithm is divided into two stages: "marking" and "clearing": first, all objects that need to be reclaimed are marked, and after the marking is completed, all marked objects are reclaimed uniformly, or vice versa, marking survival All unmarked objects are collected in a unified manner. The marking process is the judging process of whether the object is garbage, that is, the reachability analysis algorithm is used to judge.

Disadvantages:

  • The execution efficiency is unstable. If the Java heap contains a large number of objects, and most of them need to be recycled, a large number of marking and clearing actions must be performed at this time. As a result, the execution efficiency of the two processes of marking and clearing increases with the number of objects. Increase and decrease.
  • After marking and clearing, a large number of non-contiguous memory fragments will be generated. Too many memory fragments may cause insufficient continuous space to be found when larger objects need to be allocated in the process of program operation, and another garbage collection action has to be triggered in advance.

Marking-the execution process of the clearing algorithm:
Insert picture description here

2. Marking-copy algorithm

Marking-The replication algorithm is often referred to as the replication algorithm for short. In order to solve the problem of low execution efficiency when the mark-sweep algorithm faces a large number of reclaimable objects, the memory is divided into two pieces of equal size, and only one piece is used each time. When this piece of memory is used up, the surviving objects are copied to On the other block, and then clean up the used memory space again. Since the entire half area is reclaimed every time, there is no need to consider the complex situation of space fragmentation when allocating memory. The main disadvantage is that only half of the memory is used.

Marking-the execution process of the replication algorithm:
Insert picture description here

Current commercial virtual machines use this collection algorithm to reclaim the new generation, but they are not divided into two equal-sized pieces, but a larger Eden space and two smaller Survivor spaces. Each time Eden and one of them are used. Survivor. When recycling, copy all the surviving objects in Eden and Survivor to another Survivor, and finally clean up Eden and the used Survivor.

The Eden and Survivor size ratio of the HotSpot virtual machine is 8:1 by default, which ensures that the memory utilization rate reaches 90%. If more than 10% of the objects survive each collection, then a piece of Survivor is not enough. At this time, it is necessary to rely on the space allocation guarantee of the old generation, that is, borrow the space of the old generation to store the objects that cannot be stored.

3. Marking-sorting algorithm

Let all surviving objects move to one end, and then directly clean up the memory outside the end boundary.

Marking-sorting out the algorithm execution process:
Insert picture description here

Advantages: Will not produce memory fragmentation.
Insufficiency: Need to move a large number of objects, and the processing efficiency is relatively low. And this kind of object movement operation must pause the user application throughout the entire process. Such a pause was vividly described as "Stop The World" by the original virtual machine designer.

4. Generational collection

The current commercial virtual machine adopts a generational collection algorithm, which divides the memory into several blocks according to the life cycle of the object, and uses an appropriate collection algorithm for different blocks. Generally, the heap is divided into the young generation and the old generation.
New generation use: mark-copy algorithm
Old generation use: mark-clear or mark-sort algorithm

Garbage collector

Insert picture description here
The above are the 7 garbage collectors in the HotSpot virtual machine. The connection indicates that the garbage collector can be used together.

  • Single-threaded and multi-threaded: Single-threaded means that the garbage collector uses only one thread, and multi-threaded uses multiple threads;
  • Serial and parallel: Serial refers to the alternate execution of the garbage collector and the user program, which means that the user program needs to be paused when the garbage collection is performed; parallel refers to the simultaneous execution of the garbage collector and the user program. Except for CMS and G1, other garbage collectors are executed in a serial manner.

1. The
Insert picture description here
serial collector Serial translates to serial, which means it is executed in a serial manner. It is a single-threaded collector and will only use one thread for garbage collection.

Its advantage is simple and efficient. In a single CPU environment, because there is no thread interaction overhead, it has the highest single-threaded collection efficiency. It is the default new-generation collector in client mode, because the memory is generally not very large in this scenario. The pause time for collecting one or two hundred megabytes of garbage can be controlled within a hundred milliseconds. As long as it is not too frequent, this pause time is acceptable.

2.ParNew collector
Insert picture description here

It is a multi-threaded version of the Serial collector.

It is the default new-generation collector in server mode. In addition to performance reasons, it is mainly because in addition to the Serial collector, only it can be used with the CMS collector.

3.Parallel Scavenge collector

Like ParNew, it is a multi-threaded collector.

The goal of other collectors is to minimize the pause time of the user thread during garbage collection, and its goal is to achieve a controllable throughput, so it is called a "throughput first" collector. The throughput here refers to the ratio of the CPU time used to run the user program to the total time. (Total time = time to run user code + time to run garbage collection)

The shorter the pause time, the more suitable the program that needs to interact with the user, and a good response speed can improve the user experience. The high throughput can efficiently use CPU time and complete the calculation tasks of the program as soon as possible, which is suitable for tasks that do not require too much interaction in the background.

Shortening the pause time is in exchange for throughput and Cenozoic space: the Cenozoic space becomes smaller and garbage collection becomes frequent, resulting in a decrease in throughput.

The GC adaptive adjustment strategy (GC Ergonomics) can be turned on through a switch parameter, and there is no need to manually specify the size of the young generation (-Xmn), the ratio of the Eden and Survivor areas, and the age of the promoted object. The virtual machine collects performance monitoring information according to the current system operation, and dynamically adjusts these parameters to provide the most suitable pause time or maximum throughput.

4. The
Insert picture description here
Serial Old collector is the old version of the Serial collector, which is also used by the virtual machine in the Client scenario. If used in the Server scenario, it has two major uses:

  • Used in conjunction with the Parallel Scavenge collector in JDK 1.5 and previous versions (before the birth of Parallel Old).
  • As a backup plan for the CMS collector, it is used when Concurrent Mode Failure occurs in concurrent collection.

5.Parallel Old collector
Insert picture description here

It is the old version of the Parallel Scavenge collector. It supports multi-threaded concurrent collection and is implemented based on the mark-sort algorithm.

In situations where throughput and CPU resource sensitivity are important, Parallel Scavenge plus Parallel Old collector can be given priority.

6. CMS collector

CMS (Concurrent Mark Sweep), Mark Sweep refers to the mark-sweep algorithm.

Divided into the following four processes:

  • Initial mark: just mark the objects that GC Roots can directly associate with. The speed is very fast and needs to be paused.
  • Concurrent marking: The process of traversing the entire object graph starting from the directly associated objects of GC Roots. It takes the longest time during the entire recycling process and does not need to be paused.
  • Re-marking: In order to correct the marking record of the part of the object whose marking changes due to the continued operation of the user program during concurrent marking, a pause is required.
  • Concurrent cleanup: clean up and delete objects judged in the marking phase and dead, without pause.

In the longest time-consuming concurrent marking and concurrent clearing process in the whole process, the collector thread can work with the user thread without stopping.
Insert picture description here

It has the following disadvantages:

  • Low throughput: Low pause time is at the expense of throughput, resulting in insufficient CPU utilization.
  • Unable to dispose of floating garbage, Concurrent Mode Failure may occur. Floating garbage refers to the garbage generated by user threads in the concurrent cleanup phase. This garbage can only be collected in the next GC. Due to the existence of floating garbage, it is necessary to reserve a part of the memory, which means that CMS collection cannot wait for the old generation to be reclaimed when the old generation is almost full like other collectors. If the reserved memory is not enough to store floating garbage, Concurrent Mode Failure will occur, and the virtual machine will temporarily enable Serial Old instead of CMS.
  • The space fragmentation caused by the marking-clearing algorithm often has space left in the old generation, but it is impossible to find a large enough continuous space to allocate the current object, and a Full GC has to be triggered in advance. In order to solve this problem, the CMS collector provides a -XX:+UseCMS-CompactAtFullCollection switch parameter (enabled by default, this parameter is obsolete after JDK 1.9), which is used to enable memory fragmentation when the CMS collector has to perform Full GC The consolidation process (marking-collation algorithm). Memory cleaning must move surviving objects, and (before Shenandoah and ZGC) cannot be concurrent. Therefore, another parameter -XX is provided: CMSFullGCsBefore-Compaction (this parameter is obsolete after JDK 1.9). This parameter requires the virtual machine to perform defragmentation before entering Full GC for several times after executing Full GC without defragmenting the space. (The default value is 0, which means defragmentation is required every time the Full GC is entered).

7.Garbage First Collector

G1 (Garbage-First), it is a garbage collector for server-side applications, which has good performance in scenarios with multiple CPUs and large memory.

The mission given by the HotSpot development team is to replace the CMS collector in the future. The heap is divided into the young generation and the old generation. The scope of other collectors is the entire young generation or the old generation, and G1 can directly collect the new generation and the old generation together.
Insert picture description here
G1 divides the heap into multiple independent regions of equal size, and the young generation and the old generation are no longer physically separated.
Insert picture description here
By introducing the concept of Region, the original memory space is divided into multiple small spaces, so that each small space can be garbage collected separately. This division method brings a lot of flexibility, making predictable pause time models possible. By recording the garbage collection time of each Region and the space obtained by recycling (the two values ​​are obtained through past recycling experience), and maintaining a priority list, each time according to the allowed collection time, priority is given to the region with the highest value.

Each Region has a Remembered Set, which is used to record the Region where the reference object of the Region object is located. By using Remembered Set, full heap scans can be avoided when doing reachability analysis.
Insert picture description here
If the operation of maintaining the Remembered Set is not calculated, the operation of the G1 collector can be roughly divided into the following steps:

  • Initial mark
  • Concurrent mark
  • Final mark: In order to correct the part of the mark record where the mark changes due to the continued operation of the user program during the concurrent mark, the virtual machine records the object change during this period in the Remembered Set Logs of the thread. The final mark phase needs to be Remembered Set Logs The data is merged into the Remembered Set. Threads need to be paused at this stage, but they can be executed in parallel.
  • Screening and recycling: First, sort the recycling value and cost in each Region, and formulate a recycling plan based on the GC pause time that the user expects. In fact, this stage can also be executed concurrently with the user program, but because only a part of the Region is recycled, the time is user-controllable, and pausing the user thread will greatly improve the collection efficiency.

It has the following characteristics:

  • Spatial integration: The collector is based on the "mark-and-sort" algorithm as a whole, and it is based on the "copy" algorithm from the local (between two regions) point of view, which means that no memory space fragmentation will be generated during operation. .
  • Predictable pause: Allow users to specify clearly that in a time segment of M milliseconds, the time spent on GC should not exceed N milliseconds.

Memory allocation and recovery strategy

Minor GC and Full GC

  • Minor GC: Recycle the young generation, because the young generation objects have a short survival time, so the Minor GC will be executed frequently, and the execution speed will generally be faster.
  • Full GC: Recycle the old and young generations. Objects in the old generation have a long survival time. Therefore, Full GC is rarely executed, and the execution speed will be much slower than that of Minor GC.

Memory allocation strategy

1. Objects are allocated in Eden first

In most cases, objects are allocated on the young generation Eden. When the Eden space is insufficient, a Minor GC is initiated.

2. Big objects directly enter the old age

Large objects refer to objects that require contiguous memory space. The most typical large objects are very long strings and arrays. Often large objects will trigger garbage collection in advance to obtain enough continuous space to allocate to large objects.
-XX:PretenureSizeThreshold, objects larger than this value are allocated directly in the old generation to avoid a large amount of memory copy between Eden and Survivor.

3. Long-term surviving objects enter the old age

Define an age counter for the object. The object is born in Eden and survives the Minor GC. It will be moved to Survivor. The age will increase by 1 year. The age of the object will increase by one year every time the object passes through the Minor GC in the Survivor area. Increase to a certain age (default is 15) and move to the old age. -XX:MaxTenuringThreshold is used to define the threshold of age.

4. Age determination of dynamic objects

The virtual machine does not always require the age of the object to reach MaxTenuringThreshold to be promoted to the old age. If the total size of all objects in the Survivor space below or equal to a certain age is greater than half of the Survivor space, the objects with the age greater than or equal to this age can Enter the old age directly without waiting for the age required in MaxTenuringThreshold.

5. Space allocation guarantee

Before the occurrence of Minor GC, the virtual machine first checks whether the maximum available continuous space in the old generation is greater than the total space of all objects in the new generation. If the conditions are true, then the Minor GC can be confirmed to be safe.

If it is not true, the virtual machine will check whether the value of HandlePromotionFailure allows guarantee failure. If allowed, it will continue to check whether the maximum available continuous space in the old generation is greater than the average size of the objects promoted to the old generation. If it is greater, it will try to perform a Minor GC ; If it is less than, or the value of HandlePromotionFailure does not allow risk, then a Full GC must be performed.

Full GC trigger conditions

For Minor GC, the trigger conditions are very simple. When the Eden space is full, a Minor GC will be triggered. The Full GC is relatively complicated, with the following conditions:

1. Call System.gc()

It is only recommended that the virtual machine perform Full GC, but the virtual machine may not actually perform it. It is not recommended to use this method, but let the virtual machine manage memory.

2. Insufficient space in the
old age The common scenarios of insufficient space in the old age are the large objects mentioned above directly entering the old age, and long-lived objects entering the old age.
In order to avoid Full GC caused by the above reasons, you should try not to create too large objects and arrays. In addition, you can increase the size of the young generation through the -Xmn virtual machine parameter, so that objects are recycled in the young generation as much as possible, and do not enter the old generation. You can also use -XX:MaxTenuringThreshold to increase the age of the object into the old generation, allowing the object to survive in the new generation for a longer period of time.

3. Space allocation guarantee failure
. Minor GC using the replication algorithm requires the memory space of the old generation as guarantee. If the guarantee fails, a Full GC will be executed. For details, please refer to Section 5 above.

4. Insufficient space for permanent generation
in JDK 1.7 and before. In JDK 1.7 and before, the method area in HotSpot virtual machine is implemented with permanent generation. The permanent generation stores some Class information, constants, static variables and other data. When there are many classes to be loaded, reflected classes, and methods to be called in the system, the permanent generation may be full, and Full GC will be executed if it is not configured to use CMS GC. If it still cannot be recycled after Full GC, then the virtual machine throws java.lang.OutOfMemoryError. To avoid Full GC caused by the above reasons, the method that can be used is to increase the permanent generation space or switch to CMS GC.

5. Concurrent Mode Failure During the
execution of CMS GC, there are objects that need to be placed in the old age, and the old age has insufficient space (maybe too much floating garbage during the GC process causes temporary space shortage), it will report Concurrent Mode Failure error and trigger Full GC.

Guess you like

Origin blog.csdn.net/qq_46122005/article/details/112908999