The garbage collection mechanism that java interviewers like to ask, the analysis of the Ali P7 boss is in place

Preface

  • The JVM memory model includes three parts:
    • Heap (Java heap accessible by Java code and method area used by JVM itself),
    • Stack (virtual machine stack serving Java methods and native method stack serving Native methods)
    • A program counter to ensure that the program can be executed continuously in a multithreaded environment

In particular, we mentioned that the Java heap is the main area for garbage collection, so it is also called the GC heap; and the method area also has a less rigorous expression, which is the permanent generation. In general, the heap (including the Java heap and method area) is the main object of garbage collection, especially the Java heap.

In fact, the automatic memory management advocated in the Java technology system can ultimately be attributed to automatically solving two problems: allocating memory to the object and recycling the memory allocated to the object, and the memory area targeted by these two problems is the Java memory model In the heap area. Regarding the problem of object allocation memory, the author's blog post "JVM Memory Model Overview" has explained how to divide the available space and the thread safety issues involved. This article will further give memory allocation rules in conjunction with garbage collection strategies. In addition, we know that the garbage collection mechanism is a significant feature of the Java language, which can effectively prevent memory leaks and ensure the effective use of memory, so that Java programmers no longer need to consider memory management issues when writing programs. The issues to be considered in the Java garbage collection mechanism are complex. This article explains its three core issues

include:

  • What memory needs to be recycled? (Two classic algorithms for whether objects can be recycled: reference counting and reachability analysis algorithms)

  • When will it be recycled? (The garbage collection timing of the young, old, and permanent generations of the heap, MinorGC and FullGC)

  • How to recycle? (Three classic garbage collection algorithms (mark removal algorithm, copy algorithm, mark sorting algorithm) and generational collection algorithm and seven garbage collectors)

Before discussing the Java garbage collection mechanism, we should first remember one word: Stop-the-World . Stop-the-world means that the
JVM has stopped the execution of the application due to GC, and this situation will happen in any GC algorithm. When Stop-the-world occurs, all threads except the threads required for GC are in a waiting state until the GC task is completed. In fact, GC optimization often refers to reducing the time that Stop-the-world occurs, so that the system has the characteristics of
high throughput and low pause.
Ps: Memory leak means that the memory space is not recovered after it is used, and does not involve complex data In the general case of the structure, the memory leak of Java is manifested as the life cycle of a memory object exceeds the length of time the program needs it.

How to determine whether an object can be recycled?

Reference counting algorithm: determine the number of references to an object

  • The reference counting algorithm determines whether the object can be recycled by judging the number of references to the object.

The reference counting algorithm is an early strategy in the garbage collector. In this method, each object instance in the heap has a reference count.

  • When an object is created and the object instance is assigned to a reference variable, the reference count of the object instance is set to 1.
  • When any other variable is assigned a reference to this object, the reference count of the object instance is increased by 1 (a = b, then the counter of the object instance referenced by b is increased by 1),
  • But when a reference of an object instance exceeds its life cycle or is set to a new value, the reference count of the object instance is decreased by 1.
  • In particular, when an object instance is garbage collected, the reference counter of any object instance it references is decremented by one. Any object instance with a reference count of 0 can be considered garbage collected.

The reference counting collector can be executed very quickly, and it is intertwined in the running of the program, which is beneficial to the real-time environment where the program needs not to be interrupted for a long time, but it is difficult to solve the problem of circular references between objects. As shown in the program and diagram below, the reference count between the objects objA and objB can never be 0, so these two objects can never be recycled.

public class ReferenceCountingGC {

        public Object instance = null;

        public static void testGC(){

            ReferenceCountingGC objA = new ReferenceCountingGC ();
            ReferenceCountingGC objB = new ReferenceCountingGC ();

            // 对象之间相互循环引用,对象objA和objB之间的引用计数永远不可能为 0
            objB.instance = objA;
            objA.instance = objB;

            objA = null;
            objB = null;

            System.gc();
    }

The last two sentences of the above code assign objA and objB to null, which means that the objects pointed to by objA and objB can no longer be accessed, but because they refer to each other, their reference counters are not 0, then garbage collection The processor will never recycle them.

Reachability analysis algorithm: Determine whether the reference chain of the object is reachable

  • The reachability analysis algorithm determines whether the object can be recycled by judging whether the reference chain of the object is reachable.

The reachability analysis algorithm is introduced from the graph theory in discrete mathematics. The program treats all reference relationships as a graph, and uses a series of objects named "GC Roots" as the starting point, starting from these nodes and going down Search, the path that the search traverses is called the Reference Chain. When an object is not connected to the GC Roots by any reference chain (it is unreachable from the GC Roots to the object in graph theory), it proves that the object is unavailable, as shown in the following figure. In Java, the objects that can be used as GC Root include the following:

  • Objects referenced in the virtual machine stack (local variable table in the stack frame);

  • Objects referenced by static properties of the class in the method area;

  • Objects referenced by constants in the method area;

  • Objects referenced by Native methods in the native method stack;

Garbage collection algorithm

Mark removal algorithm

The mark-clear algorithm is divided into two stages: mark and clear. The algorithm first scans from the root set, and marks the surviving objects. After the mark is completed, the unmarked objects in the entire space are scanned and recycled, as shown in the following figure.

There are two main shortcomings of the mark-sweep algorithm:

Efficiency problem: the efficiency of both marking and clearing processes is not high;

Space problem: The mark-and-sweep algorithm does not need to move objects, and only deals with non-surviving objects. Therefore, a large number of discontinuous memory fragments will be generated after the mark is cleared. Too much space fragmentation may lead to future program running. When a large object needs to be allocated, enough continuous memory cannot be found and another garbage collection action has to be triggered in advance.

Copy algorithm

The replication algorithm divides the available memory into two equal-sized blocks according to capacity, and only one block is used each time. When this block of memory is used up, the surviving objects are copied to another block, and then the used memory space is cleaned up at once. This algorithm is suitable for scenarios where the survival rate of objects is low, such as the new generation. In this way, the entire half area is reclaimed every time, and there is no need to consider memory fragmentation and other complex situations when memory is allocated. As long as the pointer on the top of the heap is moved and the memory is allocated in order, the implementation is simple and the operation is efficient. The algorithm diagram is as follows:

In fact, all commercial virtual machines now use this algorithm to reclaim the new generation. Research has found that only about 10% of the objects in the new generation survive each time they are recycled, so there are few objects that need to be copied, and the efficiency is good . As introduced in the blog post "JVM Memory Model Overview", in practice, the new generation of memory will be divided into a larger Eden space and two smaller Survivor spaces (as shown in the figure below). Each time you use Eden and one of them A piece of Survivor. When recycling, copy the surviving objects in Eden and Survivor to another Survivor space one at a time, and finally clean up Eden and the Survivor space just used. The default ratio of Eden to Survivor in HotSpot virtual machine is 8:1, which means that the available memory space in each new generation is 90% (80%+10%) of the entire new generation capacity, and only 10% of the memory will be "wasted" ".

Tag sorting algorithm

The copy collection algorithm will perform more copy operations when the object survival rate is high, and the efficiency will be lower. More importantly, if you don’t want to waste 50% of the space, you need to have extra space for allocation guarantees to deal with the extreme situation where all objects in the used memory are 100% alive, so this is generally not directly used in the old age. algorithm. The marking process of the mark sorting algorithm is similar to the mark clearing algorithm, but the subsequent steps are not to directly clean up the reclaimable objects, but to move all surviving objects to one end, and then directly clean up the memory outside the end boundary, similar to the disk sorting In the process, the garbage collection algorithm is suitable for scenes with high object survival rate (old age), and its working principle is shown in the figure below.
  Insert picture description here

The most significant difference between the mark-sweeping algorithm and the mark-sweeping algorithm is that the mark-sweeping algorithm does not move objects, and only processes non-surviving objects; while the mark-sweeping algorithm moves all surviving objects to one end and treats the non-survival The object is processed, so it will not generate memory fragmentation. The schematic diagram of the function of the tag sorting algorithm is as follows:

Generational collection algorithm

For a large system, when there are more objects and method variables created, there will be more objects in the heap memory. If you analyze whether the objects should be recycled one by one, it will inevitably cause inefficiency.
  The generational collection algorithm is based on the fact that the life cycle (survival) of different objects is different, and objects of different life cycles are located in different areas of the heap, so different areas of the heap memory are recycled using different strategies Can improve the execution efficiency of JVM.
  Modern commercial virtual machines use generational collection algorithms: the new generation of objects has a low survival rate, and the replication algorithm is used; the old generation has a high survival rate, and the mark removal algorithm or the mark sorting algorithm is used. Java heap memory can generally be divided into three modules: young generation, old generation and permanent generation, as shown in the following figure:
  

Young Generation

The goal of the new generation is to collect objects with short life cycles as quickly as possible. Generally, all newly generated objects are first placed in the new generation .
  The new generation memory is divided into an eden area and two survivor (survivor0, survivor1) areas according to the ratio of 8:1:1, and most objects are generated in the Eden area. During garbage collection, first copy the survivor objects in the eden area to the survivor0 area, and then empty the eden area. When the survivor0 area is also full, copy the survivor objects in the eden area and survivor0 area to the survivor1 area, and then clear eden and this Survivor0 area, at this time survivor0 area is empty, and then exchange the roles of survivor0 area and survivor1 area (that is, the Eden area and survivor1 area will be scanned during the next garbage collection), that is, keep survivor0 area empty, and so on. In particular, when the survivor1 area is not enough to store the survivor objects in the eden area and survivor0 area, the survivor objects are directly stored in the old generation.
  If the old generation is also full, a FullGC will be triggered, that is, both the new generation and the old generation will be recycled. Note that the GC that occurs in the new generation is also called MinorGC. MinorGC occurs more frequently and does not necessarily trigger when the Eden area is full.

Old Generation

The old generation stores some objects with a long life cycle. As described above, the objects that are still alive after N garbage collections in the new generation will be placed in the old generation.
  In addition, the memory of the old generation is much larger than that of the new generation (approximately 1:2). When the old generation is full, Major GC (Full GC) will be triggered. Objects in the old generation have a longer survival time, so the frequency of FullGC occurrence is relatively low. .

Permanent Generation (Permanent Generation)

Permanent generation is mainly used to store static files, such as Java classes and methods.
  Permanent generation has no significant impact on garbage collection, but some applications may dynamically generate or call some classes, such as reflection, dynamic proxy, CGLib and other bytecode frameworks. At this time, a relatively large permanent generation space needs to be set up to store these operations. Classes added in the process.

summary

Since the object is processed by generations, the garbage collection area and time are also different. There are two types of garbage collection, Minor GC and Full GC

GC:

Recycling the young generation will not affect the old generation. Because most of the new generation of Java objects die frequently, Minor GC is very frequent. Generally, fast and efficient algorithms are used here to make garbage collection complete as soon as possible.

Full GC:

Also called Major GC, the entire heap is collected, including the young, old, and permanent generations. Since Full GC needs to reclaim the entire heap, it is slower than Minor GC. Therefore, the number of Full GC should be reduced as much as possible. The reasons for Full GC include: the old generation is full, the permanent generation (Perm) is full, and the System .gc() is called explicitly etc.

Garbage collector

If the garbage collection algorithm is the methodology of memory recovery, then the garbage collector is the specific implementation of memory recovery. The following figure shows 7 types of collectors for different generations. Among them, the collectors used for recycling the new generation include Serial, PraNew, and Parallel Scavenge. The collectors used for recycling the old generation include Serial Old, Parallel Old, CMS, and The G1 collector that reclaims the entire Java heap. The connection between different collectors means that they can be used together.
  

Serial collector (replication algorithm):

  • The new-generation single-threaded collector, marking and cleaning are single-threaded, the advantage is simple and efficient;

Serial Old collector (marking-sorting algorithm):

  • The old single-threaded collector, the old version of the Serial collector;

ParNew collector (replication algorithm):

  • The new-generation parallel collector is actually a multi-threaded version of the Serial collector, which performs better than Serial in a multi-core CPU environment;

Parallel Scavenge collector (copy algorithm):

  • The new-generation parallel collector pursues high throughput and efficient use of CPU. Throughput = user thread time/(user thread time + GC thread time), high throughput can efficiently use CPU time to complete program calculation tasks as soon as possible, suitable for background applications and other scenarios that have low requirements for interaction;

Parallel Old collector (marking-sorting algorithm):

  • Parallel collector of the old age, throughput first, the old version of the Parallel Scavenge collector;

CMS (Concurrent Mark Sweep) collector (mark-sweep algorithm):

  • The parallel collector of the old age is a collector that aims to obtain the shortest recovery pause time. It has the characteristics of high concurrency and low pause, and pursues the shortest GC recovery pause time.

G1 (Garbage First) collector (marking-sorting algorithm):

  • Java heap parallel collector, G1 collector is a new collector provided by JDK1.7, G1 collector is implemented based on the "mark-sort" algorithm, which means that it will not generate memory fragments. In addition, an important feature of the G1 collector that is different from the previous collectors is that the scope of G1 collection is the entire Java heap (including the new generation and the old generation), while the collection scope of the first six collectors is limited to the new generation or the old generation. .

Memory allocation and recovery strategy

The automatic memory management advocated in the Java technology system can ultimately be attributed to automatically solving two problems: allocating memory to the object and reclaiming the memory allocated to the object. Generally speaking, objects are mainly allocated on the Eden area of ​​the new generation. If the local thread allocation buffer (TLAB) is activated, it will be allocated on the TLAB according to thread priority. In a few cases, it may be directly allocated to the old age. In general, the memory allocation rules are not constant. The details depend on which combination of garbage collectors are currently used, and the settings of memory-related parameters in the virtual machine.

Objects are allocated in Eden first. When the Eden area does not have enough space for allocation, the virtual machine will initiate a MinorGC.

Current commercial virtual machines generally use a replication algorithm to reclaim the new generation. The memory is divided into a larger Eden space and two smaller Survivor spaces. Each time you use Eden and one of the Survivor spaces. When garbage collection is performed, the surviving objects in Eden and Survivor are copied to another Survivor space at one time, and Eden and the previous Survivor space are finally disposed of. (The default ratio of Eden and Survivor in HotSpot virtual machine is 8:1) When Survivor space is not enough, it needs to rely on the old generation for allocation guarantee.

Big objects directly enter the old age.

The so-called large objects refer to Java objects that require a large amount of continuous memory space. The most typical large objects are very long strings and arrays.

Long-lived objects will enter the old age.

After the object has experienced a certain number of Minor GCs in the young generation (the default is 15), it will be promoted to the old generation.

Age determination of dynamic objects.

In order to better adapt to the memory conditions of different programs, the virtual machine does not always require the age of the object to reach MaxTenuringThreshold to be promoted to the old age. If the total size of all objects of the same age in the Survivor space is greater than half of the Survivor space, the age is greater than or Objects equal to this age can directly enter the old age without waiting for the age required in MaxTenuringThreshold.

It should be noted that the garbage collection mechanism of Java is a capability provided by the Java virtual machine, which is used to dynamically reclaim the memory space occupied by objects without any references in an irregular manner during idle time. In other words, the garbage collector reclaims the memory space occupied by objects without any references rather than the objects themselves

to sum up

Thank you for seeing here. After reading, if you have any questions, you can ask me in the comment area. If you think the article is helpful to you, remember to give me a thumbs up. You will share java-related technical articles or industry information every day. Welcome everyone's attention and Forward the article!

Guess you like

Origin blog.csdn.net/weixin_47277170/article/details/108013942