"In-depth understanding of the Java virtual machine" reading notes Chapter 3 garbage collector and memory allocation strategy

Click to view "In-Depth Understanding of the Java Virtual Machine" Reading and Combing Collection

Overview

Garbage collection (GC) is not originally created by Java nor is it the first to be proposed by Java. But it is carried forward by java. GC allows programmers not to pay too much attention to garbage collection, but to spend more energy on research and development. In most cases, we don't need to care about GC, but if GC becomes a bottleneck for the system to achieve higher concurrency, we must monitor and adjust the automated technology of GC.
In JVM, the program counter virtual machine stack local method station, along with the determination of the class structure, its size is basically determined. Therefore, when the thread is destroyed, these parts of the memory are naturally recycled.
However, there are many uncertainties in the method area and the heap, and it is not so easy when performing GC on these two parts.
To perform GC, we must first clear three questions:
1. Which memory needs to be recycled
2. When to recycle
3. How to recycle

Which memory needs to be reclaimed-to determine whether the object is alive

Almost all the instance objects in java are stored in the heap. When the garbage collector garbage collects the heap, it must first determine which objects are still alive and which objects are dead (there is no reference to him)

Reference counting algorithm

Add a reference counter to the object. When someone refers to him, the counter will be +1. When the reference expires, the counter is -1. When the value of the counter is 0, the object can no longer be used, so it is considered garbage. Advantages
and disadvantages of the reference counting algorithm:
Advantages: simple principle, convenient implementation, and high judgment efficiency.
Disadvantages: Circular reference problems will occur, leading to memory leaks. If in order to solve the circular reference problem, a lot of extra processing is needed to ensure correct work.
Smart pointers solve the problem of circular references.
When an object A strongly references B, B can only refer to A. There are also two reference counters, namely strong/weak counters. When the strong reference counter is 0 (regardless of whether the weak reference counter is 0 or not), it is judged as garbage. However, wild pointers may appear in this way.
Reference counting algorithms are generally not applicable in mainstream JVMs

Reachability analysis algorithm

The basic idea of ​​the reachability analysis algorithm is to use some root objects called GC Roots as the starting node set, and then search downwards based on the reference relationship. The route taken by the search process is called the reference chain. If an object is not in the reference chain (unreachable from GC Roots), it means that the object is garbage.
In Java, the fixed objects that can be used as GC Roots include the following types:
1. Objects referenced in the virtual machine stack (the local variable table among them).
2. Variables referenced by static variables
3. Objects referenced by constants (such as strings referenced by a string constant pool)
4. Objects referenced in the local method stack 5. Objects referenced within
JVM. Such as the Class object corresponding to the basic data type, and some resident exception objects. There are also system class loader
6. Objects that are synchronized (synchronized)
7. JMXBean that reflects the internal situation of the Jvm, callbacks registered in JVMTI (what the hell), local code cache, etc.

The third one says that the strings referenced by the string constant pool are also GC Roots. When will these strings be GC? . .

Of course, these GC Roots are far from enough, such as generational collection and partial collection. When these garbage collections are only for part of the heap space, some other objects need to be introduced into the GC Roots. Because objects in different regions are likely to have references to each other.

Quote

Both the reachability analysis algorithm and the reference counting algorithm are inseparable from the concept of reference. In the original Java, there was only one reference relationship, that is, if the number stored in a reference type object represents the starting address of another piece of memory, the reference is called a reference to a certain piece of memory/object. Later, as needed, the concept of reference was expanded. Strong references, soft references, weak references, phantom references

Strong citation

Strong quotation is equivalent to the original quotation method of java. Such as Object obj = new Object () is a strong reference. If a strong reference exists, it will not be collected as garbage.

Soft reference

Objects that are only associated with soft references will be included in the recycling range before the system is about to overflow, and will be recycled for the second time. An exception will be thrown if the memory is still insufficient. You can use soft references to achieve the cache function. Use the SoftReference class to implement soft references

Weak reference

Objects associated with weak references can only survive the next GC. Use WeakReference to implement weak references

Phantom reference

The GC of objects associated with phantom references will not be affected in any way. The only purpose of the phantom reference is to receive a system notification when the object is reclaimed by the collector.

The life and death of the subject

The subject will experience up to two markings before it is actually cleared. The first time is in the reachability analysis algorithm, if there is no reference chain to reach the object, the object is marked as garbage. Then the jvm will determine whether it is necessary to execute the finalize method for the marked object. If the finalize method is not redefined or the jvm has already executed the finalize method, the finalize method will not be executed, and the finalize method will be entered directly. If it is rewritten and has not been executed, it will be replaced. Join the F-Queue queue. A low-priority thread is started by the JVM to execute the finalize method of these objects. But the JVM only guarantees that the finalize method will be executed, but does not guarantee to wait for these objects to execute. (If you wait, if the execution time of a finalize method is too long, it will greatly affect the GC efficiency).

public class Test18 {
    
    
    static Test18 save = null;

    @Override
    protected void finalize() throws Throwable {
    
    
        super.finalize();
        System.out.println(this);
        save = this;
        System.out.println(this);
    }

    public static void main(String[] args) throws InterruptedException {
    
    
        Test18 t18 = new Test18();
        t18 = null;
        System.gc();
        //finalize方法优先级低,暂停1s确保finalize方法已经执行
        //Thread.sleep(1000);
        if(save == null){
    
    
            System.out.println("复活失败");
        }else {
    
    
            System.out.println("复活成功");
        }

    }
}

Insert picture description here

public class Test18 {
    
    
    static Test18 save = null;

    @Override
    protected void finalize() throws Throwable {
    
    
        super.finalize();
        System.out.println(this);
        save = this;
        System.out.println(this);
    }

    public static void main(String[] args) throws InterruptedException {
    
    
        Test18 t18 = new Test18();
        t18 = null;
        System.gc();
        //finalize方法优先级低,暂停1s确保finalize方法已经执行
        Thread.sleep(1000);
        if(save == null){
    
    
            System.out.println("复活失败");
        }else {
    
    
            System.out.println("复活成功");
        }

    }
}

Insert picture description here
Although finalize can be used to rescue objects and can be used to release resources. But this method is not recommended.

Recovery method area

The garbage collection in the method area mainly includes two parts, constants and types.
The process of reclaiming constants is very similar to reclaiming objects in the heap. If a constant has no String object to refer to it, and there is no other place to refer to this literal in the virtual machine. If GC occurs at this time, and the JVM determines that it is necessary, then this constant will be cleared. The symbolic references of other class (interface) method fields in the constant pool are similar to this.
It is much more troublesome to judge whether a type is used. The following three conditions need to be met:
1. All instances of this class (derived subclasses) have been recycled
2. The loader that loaded this class has been recycled, unless this condition is a carefully designed alternative class loader scenario, such as OSGi JSP reloading, etc., otherwise it is difficult to achieve
3. The Class object corresponding to the class is not referenced anywhere (the object cannot be created through reflection)

Generational collection theory

The generational collection theory is the theory followed by most commercial virtual machines. The generational collection theory is essentially a set of empirical rules that meet the actual conditions of most programs. It is based on three generational hypotheses.
1. Weak generational hypothesis: the vast majority of objects live and die day after day.
2. Strong generational hypothesis: the more difficult it is for the object to die out the more the garbage collection process.
3. Compared with the same generation reference, the cross-generation reference only accounts for a small part
of the generational collection algorithm. Generally, the java heap is divided into two parts, one is the new generation and the other is the old generation. Among them, the objects in the new generation usually cannot survive a GC. Objects in the two and old ages are generally difficult to remove. If each garbage collection only targets those most objects that die, and only a few objects survive the new generation, the efficiency will be much faster (just focus on how to retain a small number of objects instead of marking those large numbers of objects to be Reclaimed objects.). And put those objects that are difficult to die into the old age, and reduce the frequency of GC for the old age. In this way, both the time overhead of garbage collection and the effective use of memory space are taken into account.
(Only the CMS garbage collector specifically targets the garbage behavior of the old
generation ) In order to solve the problem of cross-generation references, a global data structure (memory set) can be added to the young generation. He divides the old generation into several small blocks for Indicates which piece of memory in the old generation has a fast-band reference. When minor GC (Young GC) is performed again, the objects in these memory blocks will also be added to GC Roots.

Mark-sweep algorithm

The mark-sweep algorithm is the earliest and most basic garbage collection algorithm. Hence the name implies that it has two processes, namely marking and clearing. jav looks for garbage objects based on GC Roots. Mark out all the garbage objects, and then recycle all the marked objects in a unified manner. You can also reverse it, mark which objects are alive, and then mark those that are not marked.
This algorithm has two disadvantages.
1. Fragmentation problems will occur.
2. The execution efficiency is unstable. If there are a large number of objects in the heap, and many of them have to be recycled, marking and cleaning actions will take more time.

Mark-copy algorithm

The mark-copy algorithm, referred to as the copy algorithm for short. He divides the available memory space into two parts and only uses one part each time. During GC, the surviving objects are copied to another area, and then the original area is directly cleaned up at once. When there are few surviving objects, this algorithm is very efficient. But when there are many surviving objects, it will cost a lot of replication overhead. Of course, the biggest problem with this algorithm is that half of the memory is wasted. This algorithm is used in the new generation to divide the new generation into an eden area and two survivor areas. Every time memory is allocated, only the eden area and a survivor area are used. (The default ratio is 8:1:1)
. If more than 10% of the objects survive each time, problems will arise. Therefore, the new generation also has a distribution guarantee mechanism, and more than 10% of the part goes directly to the old generation.

Mark-up algorithm

Although the copy algorithm solves the problem of memory fragmentation in the mark-and-clear algorithm. But because it wastes half of the memory space, it is not suitable for use in situations where there are many surviving objects. Therefore, the old generation is not suitable for using the replication algorithm. The mark-sweep algorithm is another optimization of the mark-sweep algorithm. The marking process is the same for both. But the subsequent operation is not to reclaim the recyclable objects, but to move all living objects to one end of the memory, and then directly clean up the memory space outside the boundary. Since the objects need to be moved, it is necessary to update all pointers that refer to these objects. The mark sorting algorithm pays more attention to throughput, while the mark-sweep algorithm pays more attention to delay. (This is consistent with Parallel old using the mark sorting algorithm and CMS using the mark clearing algorithm. The former focuses on throughput and the latter guarantees delay).

Implementation of Algorithm Details of HotSpot Virtual Machine

Root node enumeration

In addition to the memory cleaning process, the enumeration of the root node is also a major cause of STW. The nodes that can be fixed as GC Roots are mainly used for global reference (such as constant static properties) and execution (local variable table in the stack frame). Although the goal is clear, an efficient search process is not easy. If a simple traversal is performed, it will take a lot of time. HotSpot uses a set of data structures called OopMap to achieve this goal. Once the class is loaded, hotsport will calculate what type of data is on the offset of the object's memory. In the process of just-in-time compilation, those locations in the stack and registers are also recorded in specific locations as references. In this way, the garbage collector can learn this information directly when scanning.

Safe point

There are many instructions that can cause changes in the reference relationship or the content of the OopMap, and it is impossible to generate a corresponding OopMap in every place. Therefore, OopMap objects will only be generated in certain specific places, which are called safe points. The characteristic of these safety points is the multiplexing of instruction sequences, such as method calls, loop jumps, and exception jumps.
So just let all threads stop at a safe point during GC. How can this be achieved?
1. Active interruption. When the garbage collector needs a ready-made terminal, the system stops all threads, then judges whether it is at a safe point, and then allows threads that are not at the safe point to continue running, and let them interrupt again for a while. Know that all threads are at a safe point. Almost no one uses this method
. 2. Active interruption. When the garbage collector needs a ready-made terminal, the system sets a flag bit. When the thread reaches a safe point, it will judge this flag bit. If it is true, it is at the safe point. Hang up. In addition to judging at the safe point, it will also be judged where the memory is applied.

Safe area

The safe area is equivalent to an elongated safe point. When some objects are unable to respond to the interrupt request of the virtual machine (such as sleep), the thread can no longer go to the safe point to suspend itself. The safe zone refers to a piece of code that will not cause the reference relationship to change. When the thread processes in the safe area, it will live in the safe area, and when the garbage collector wants to interrupt the thread, it will skip the threads in these safe areas. When leaving the safe zone, it is necessary to judge whether the enumeration of the root node is completed. If it is not completed, it has to wait forever. Know that the enumeration of the root node is completed

Memory set and card table

When performing generational garbage collection, it is inevitable that you will encounter a problem-referencing inter-generational problems. In order to solve this problem, a memory set can be used.
The memory set is an abstract data structure that records the pointers from the non-cleaned area to the cleaned area. If efficiency and cost are not considered, the easiest way is to use all cross-generation reference object arrays in the non-collected area to implement the memory set.
But the collector only needs to determine which non-cleaning area has a pointer to the collection area through the memory set. It is not necessary to know all the details of these inter-generational pointers.
So when it comes to memory sets, you can choose a larger memory granularity. The following lists several record accuracy
1. Byte accuracy, each record is accurate to one machine byte. That is, the byte includes a cross-generation pointer
. 2. Object accuracy: that is, the object includes a cross-generation pointer
. 3. Card accuracy: each record is accurate to a piece of memory, that is, there is a cross-generation pointer in the memory.

The card accuracy refers to a method called card table to realize the memory set. This is currently the most commonly used implementation.

Concurrent reachability analysis

Insert picture description here
The above figure describes the problem of object disappearance that occurs concurrently.
The object disappears must meet the following two conditions:
1. The evaluator inserts one or more new references from the black object to the white object
2. The evaluator deletes all references from the gray object to the white object.
Black: all referenced objects are traversed
Gray: There are still references not traversed.
White: Not in the reference chain.
In order to avoid this problem, there are two solutions.
1. Incremental update: record the newly inserted black objects with white references, and then perform scanning on these black objects One scan. It is equivalent to when the black object newly inserts the reference to the white object, then it becomes gray
. 2. Original snapshot: When the gray object wants to delete the reference to the white object, record the reference to be deleted and wait for scanning After finishing, scan again. That is, no matter whether the reference relationship is deleted or not, the search will be carried out according to the snapshot of the object graph at the moment when the scan is first started.

Classic garbage collector

Insert picture description here
The connection indicates that it supports combined use. JDK 9 indicates that it will not be supported after JDK9.

Serial collector

Serial is a new generation of single-threaded garbage collector. This single thread means not only that he will only use one processor or one collection thread to complete garbage collection. More importantly, it is emphasized that he must stop all worker threads, STW, during garbage collection. The Serial collector is simple and efficient (compared to the single-threaded mode of other collectors), and is suitable for use in environments with limited memory resources. Generally used on the client

ParNew collector

Insert picture description here
ParNew is equivalent to the multi-threaded version of Serial. ParNew can perform garbage collection in parallel. Otherwise they are almost identical. Both PartNew (always) and Serical (before JDK1.9) can be combined with CMS (Concurrent Mark Sweep). CMS is the first garbage collector to support concurrent garbage collection. The garbage collection thread and the user thread work at the same time (basically).

Parallel Scavenge collector

The Parallel Scavenge collector is based on the mark-copy algorithm the same as the ParNew collector. They are also new-generation garbage collectors that support parallel garbage collection. The special thing about Parallel Scavenge is that he pays more attention to throughput. That is, while ensuring the throughput, improve the response speed as much as possible.
The Parallel Scavenge collector has an adaptive adjustment mode. In this mode, the virtual machine collects performance monitoring information according to the current system operation and dynamically adjusts these parameters to provide the most suitable pause time or maximum throughput.

Serial Old collector

Insert picture description here
The Serial Old collector is the old version of the Serial collector, and it is also single-threaded. Use the mark-and-sort algorithm. Generally used in client mode.
If it is used in server mode, there are generally two situations:
1. As a backup solution for CMS
2. Use with ParallelScavenge

Parallel Old collector

Insert picture description here
The Parallel Old collector is the old version of Parallel Scavenge. Use the mark-and-sort algorithm. When they are used together, the "throughput priority" effect can be maximized.

CMS collector

CMS is an old-age garbage collector, a garbage collector to obtain the shortest recovery pause.
Insert picture description here
CMS uses a mark-sweep algorithm. His implementation is a bit more complicated than the former centralized garbage collector. There are 4 steps in total:
1. Initial marking: Only marking objects directly connected to GC Roots. STW
2. Concurrent marking: the process of traversing this object graph concurrently.
3. Re-marking: traverse again (incremental update) STW for some objects whose indexes have changed.
4. Concurrent clearing: due to the use of mark-clearing algorithm, there is no need to move objects.
CMS is called a concurrent low-pause collector. Although CMS is excellent, it still has three obvious shortcomings.
1. The CMS garbage cleaning thread will work together with the worker thread and will occupy a part of the thread (or processor computing power), which will cause the application to slow down.
2. CMS cannot clear floating garbage (floating garbage refers to the garbage that appears in the worker thread after marking in the process of concurrent garbage collection.) Because it is executed concurrently, it will reserve a part of the old space for the running program. However, if the reserved space is not enough during the garbage clearing process, a backup plan will be triggered. Use Seral Old for Full GC.
3. Due to the use of mark-and-sweep algorithm, space fragmentation will occur. (You can choose to organize in Full GC)

Garbage First Collector (G1)

G1 collector
Insert picture description here
G1 collector is a milestone achievement in the development history of garbage collector technology. Starting from G1, the design guide of the most advanced garbage collector has become the pursuit of a memory allocation rate that can cope with the application. Instead of pursuing to clean up the entire java heap at once.
G1 divides the heap memory into zeros and divides it into multiple independent Regions. G1 does not adhere to a fixed size and fixed number of generational area divisions. Each Region can play the Eden Survivor space of the new generation as needed. The collector can use different strategies to deal with regions that play different roles. There is also a special H region in Region. When an object size exceeds 1/2 Region, H will be treated as an old generation most of the time. Compared to the entire heap, a mark-and-sort algorithm is used. Compared with the region, the replication algorithm is used. Compared with CMS, there is no space debris problem at all.
The reason why the G1 collector can establish a predictable pause time model is because it takes the Region as the smallest recovery unit, that is, the memory space collected each time is an integer multiple of the size of the Region. This can avoid garbage collection in the entire heap. A more specific processing idea is to let the G1 collector track the value of the garbage accumulation in each Region. The value is the amount of space obtained, and then a priority queue is maintained in the background. Each time, according to the collection pause time allowed by the user, the regions with the greatest return value are processed first. This ensures that G1 obtains the highest possible efficiency in a limited time.
The garbage collection process of G1
1. Initial marking: only mark the objects that Gc Roots can directly associate to. And use the TAMS pointer to draw a block of memory in the Region for GC to become an object. STW
2. Concurrent marking: then continue to traverse the object graph according to the object marked in the first step.
3. Final mark: continue to traverse through the original snapshot (SATB) recorded in step two.
4. Screening and recycling: Update the statistical data of Regions, and sort the recycling value and cost of each Region. According to the pause time specified by the user, select the Region to form the collection. Copy the surviving to another Region, and then clear the old Region.
G1 still has the problem of cross-domain reference, and it is more serious. Each Region has a card table. The card table records the pointers of other regions to itself, and marks these pointers in the range of those card pages.
G1 still has problems with concurrent marking. G1's solution is to use the original snapshot

Supplement:
1. When objects of the same age in the survivor area occupy more than 1/2 of the space, put all objects greater than or equal to this age directly into the old generation
2. You can set a threshold by setting parameters, and when you want to allocate When the object is larger than this threshold, it is created directly in the old generation
. 3. Each thread has a private TLAB in the Eden area, which can be used to allocate objects concurrently.

Guess you like

Origin blog.csdn.net/qq_30033509/article/details/114876536