GC understanding of JVM garbage collection mechanism

JVM garbage collection

Why garbage collection : Java is not like C statements, you have to open up space through code when you use it, and manually destroy it after use. Java provides a virtual machine for our program, so that we pay more attention to business logic when writing code. As for the development of memory space, release these to the JVM. The code runs on the JVM, and the JVM will identify which one is garbage through the algorithm, and then automatically release the unused space in the program through the garbage collection algorithm. C language, C++ is like manual transmission, Java is like automatic transmission.
GC: Garbage Collection 垃圾收集, the garbage collection of the young generation is also called GC, and the garbage collection of the old generation is called Full GC.
What is garbage
When a program is running, it needs to open up space in memory to store data. But after this space is used once, it may never be used again. Without an address pointing to it, the program cannot use the storage content of this address space, like a ghost in the memory. But these data really exist in memory, which is garbage. When a java program is written, it will only open up space but not release the space, which will cause too much memory space occupied by useless data in the memory, which will reduce the actual available memory of the program a little bit, and eventually lead to memory overflow.
Below, take a singly linked list to delete a node as an example, the deleted node is a garbage

public class Application {
    
    
    public static void main(String[] args) {
    
    
        // 创建三个节点
        Node n1 = new Node("张三");
        Node n2 = new Node("李四");
        Node n3 = new Node("王五");

        // 将3个节点连接起来，形成 n1->n2->n3
        n1.next=n2;
        n2.next=n3;

        // 遍历链表
        Node node=n1;
        while (node!=null){
    
    
            System.out.println(node.data);
            node=node.next;
        }

        // 删除一个节点 将中间的节点删除
        n1.next=n3;
        n3.pre=n1;
        n2=null;

        // 遍历链表
        node=n1;
        while (node!=null){
    
    
            System.out.println(node.data);
            node=node.next;
        }
    }
}
@Data
class Node{
    
    
    Node pre;
    Node next;
    Object data;
    public Node(Object data){
    
    
        this.data=data;
    }
}

Thinking: Is the n2 node useless? So what is it still taking up space?
Original linked list:
insert image description here

After deleting the n2 node: We already know that n2 is garbage, but unlike C, Java cannot manually operate the memory release space. But the JVM will automatically recognize the garbage and release it for us.
insert image description here

At this time, the JVM will recognize that n2 is a garbage through the algorithm (reachability analysis), and finally reclaim the memory space where it is located

The area and scope of garbage collection
insert image description here

As the saying goes: the stack runs, the heap stores, and the heap takes up the most space. GC is responsible for garbage collection of the method area and the heap, but mainly for the heap. Other areas occupy very little space and will not produce any garbage.

Since the heap is the main place for garbage collection, let’s ignore a series of problems in garbage collection first, and then introduce the specific logical structure of the heap and how garbage is transported in the heap

Generational collection

GC garbage collection is very likely to be recycled more than once, and the same memory area may not be garbage at the first garbage collection, but it is garbage at the second collection. How to efficiently scan out garbage, efficiently remove garbage, and not affect the normal operation of the JVM as much as possible, GC logically divides the heap space into three generations (physically in memory), namely the new generation, the old generation, and the yuan space (permanent generation before JDK7). The JVM will store the information in these three areas in a manner specific to them according to the characteristics of the areas 分代管理. For example: GC (YoungGC) is performed in the young generation, and FullGC is performed in the old generation.
There are two main ideas of generational collection:

Most subjects live and die
Objects that survive more garbage collections are less likely to become garbage

insert image description here
Next, I will describe how these areas coordinate and operate in actual garbage collection and why they are divided in this way: the
class objects we created at the beginning will be placed in the Eden area (Eden) at the beginning, as the objects created become more and more The more, if there is no space for newly created objects to be stored in Eden, then YoungGC will be triggered. YongGC is triggered for the first time: it will scan all the garbage in the Eden area and the from area of the survivor (the from area is empty at this time), and then transfer it 不是垃圾to 复制the to area of the survivor area, and then 清空Eden区all the data in the from area of the survivor will be experienced at the same time After a GC scan, the age of the content that is judged not to be garbage is +1.
After the first GC, the current state of the heap space: the Eden area is empty, and the to area of the survivors has a small amount of surviving data. The age of these data is +1, and the to area is converted to the from area. The area is converted to the to area, and the to area is empty at this time. Since the space is released after a GC, the newly created objects will be placed in the Eden area, and more and more objects will be created as the program runs. When Eden is full again, the second GC will be triggered. (Note: and this time one of the survivor areas is empty, that is, the to area) GC will scan out the 和幸存者from区part of the Eden area that is not garbage, and then copy this part to another empty survivor area to area, and at the same time will not Garbage's data age +1 again. Since the space is released after a GC, the newly created objects are then put into the Eden area and the cycle repeats.
Until a certain time (at least after 15 GCs), since the age of each data that is not regarded as garbage will be +1, until the age of some data reaches 15 years old (can be modified by configuring JVM parameters), at this time the data that has reached 15 years old has already After 15 GCs, the system believes that these data will have a low probability of becoming garbage in the future, and it takes time and effort to move them back and forth each time, so the data is put into the old generation. When YoungGC is triggered, the Old Generation is not involved.
The process of the new generation is like this, the summary is复制+1->清空->互换. Copy the data that is not garbage to the empty survivor area (assuming area 0, at this time area 0 is the to area) and the age of these data +1, then clear the Eden area and the survivor area that was not empty before copying (assuming area 1 , at this time area 1 is the from area), and then exchange the from area with the to area. At this time, zone 0 is the from zone, and zone 1 is the to zone. Swap again at next GC.
Next, turn to the data whose age has reached 15 times in the old area. Every GC that occurs in the young generation will not be affected here, but it is possible that new data will be added after the GC. With frequent GC, new data is constantly added, and the system will predict in advance whether the space in the old generation will be full (the average value of each new data > the remaining space in the old generation), and when it is predicted that the space in the old generation may be full next time , which will trigger Full GC. Scan the garbage in the old generation and clear it to release the memory space in the old generation.

insert image description here
(This figure is just a general flow, ignoring many details.)
Next, we will introduce the metaspace
JDK7 and before. The metaspace used to be the permanent generation, which is the implementation of the method area, so it is also called non-heap. But logically it is part of the heap (the heap is divided into young generation, old generation, and permanent generation), and physically uses the same piece of memory. It is bound to the old generation, and no matter who is full, it will trigger the removal of garbage in the old generation and permanent generation. In this way, there is no need to write the corresponding code for the permanent generation separately, and use the old generation directly. The permanent generation mainly stores 类信息, 普通常量, 静态常量, 编译器编译后的代码and so on. For example: the object we created is created according to which class, the template of this class is in the permanent generation, and the class needle in the header information of the object will point to the class of its instance in the permanent generation. After JDK7, the string constant pool was moved to the heap. However, there is still a problem. Since the data stored in the permanent generation is different from that in the heap, it is difficult to determine the appropriate size of the permanent generation. Therefore, the permanent generation will be moved to the middle JDK8and 直接内存renamed Metaspace (Metaspace), logically Up and physically separated from the old generation.
In this way, the metaspace uses the local memory. The default maximum space used is the size of the local memory (the upper limit can also be configured), and the class information can be loaded freely according to the actual situation. It has its own garbage collection frequency without following the old generation. The metaspace is the implementation of the method area specification instead of the old generation.
The biggest difference between the permanent generation and the metaspace is that the metaspace uses direct memory, and the size does not need to be hard-coded
Note:

新创建的对象都会放入Eden区，但在扫描垃圾时，会将Eden区和不为空的幸存者区放在一起扫描。
幸存者from区和幸存者to区大小永远一致
Not all objects have to be 15 years old to enter the elderly area. For example: some large objects or the size of the survivor space in the Eden area and the survivor From area exceeds half the size of the survivor to area, etc. These are directly put into the old area.
There are many conditions that trigger Full GC in the old area:
- Executed in code System.gc(), rarely used in code
- Insufficient space in the old generation
- The guarantee of space allocation fails. Before GC, calculate the average size of the space promoted from the young generation to the old generation. When the size exceeds the remaining space of the old generation, FUll GC will be performed.
- The metaspace exceeds the threshold. By default, the size of the metaspace is relative to the local size. Full GC will be performed when the total size of the metaspace exceeds the threshold.
The STW time caused by Full GC is too long (STW is more than 10 times that of GC), so the tuning idea of GC is to either reduce the number of Full GC, or reduce the STW time of Full GC.

Thinking: 既然从Eden区和幸存者区中扫描出不是垃圾的内容要放入另一个幸存者区，那这两个区的功能是固定好的吗？
After copying, there is an exchange. Whoever is empty is the to area
. After the first GC, whoever is empty is the to area. Because the copy algorithm is used for garbage removal in the young generation. Scan the parts that are not junk and divert them to another destination. Who is the destination is not fixed. If it is fixed, it is assumed that the survivor 1 area will always be the from area, and the survivor 0 area will always be the to area: the first GC will copy the non-garbage part to the to area, and the second garbage Recycling needs to scan the Eden area and the to area, and the place to be copied cannot be the to area, it should be the from area. Because the survivor area also needs to be garbage collected, there must be a to area, and the from area and the to area must be able to achieve logical interchange.
Thinking: 为什么在新生区清除垃圾时是复制有用的，而不是直接清除无用的？这样不是就不需要幸存者区，会更加节省空间了吗？
According to statistics, 98% of objects in use are temporary objects, which means that most of the Eden area is garbage. It is more efficient to pick out the useful ones in the garbage dump than to clear the garbage one by one, so the new generation uses the copying of the useful parts. But this way of copying requires extra space. The data in the old generation has a low probability of becoming garbage after 15 GCs, so you only need to clear the useless parts.
Thinking: 什么时候会触发GC？（GC通常指YoungGC）
GC is triggered when the Eden area is full. Objects are stored in the Eden area when they are created, and the survivor's from area does not participate in directly storing newly created objects.

The garbage collection process has been briefly introduced above, so from a microscopic point of view, how does the garbage collector, whether it is the young generation or the old generation, identify which ones are garbage?

How to identify garbage

No matter what kind of algorithm it is, the core is to find out the memory address that has no reference relationship in the program, because it will never be used in the program without reference relationship, and it will be recycled.

reference counting

The number of times data in the heap memory is referenced is +1, and the number of references is decremented by 1. If the number of references to a piece of data in the heap memory is 0, it means that there is no place to reference this data and it is judged as garbage.
For example: there is a Node class, it is referenced by node1, the reference count is 1->node1, node2 all references it, the reference count is 2->node1 dereferences it, the reference count is 1->node2 dereferences it, the reference count is 0, the data at this address is garbage.

public class Application {
    
    
    public static void main(String[] args) {
    
    
        Node node1 = new Node("张三");
        Node node2=node1;
        System.out.println(node2);
        node1=null;
        System.out.println(node2);
        node2=null;
    }
}

But there is a problem with this approach:循环引用问题

public class Application {
    
    
    public static void main(String[] args) {
    
    
        Node node1 = new Node("张三");
        Node node2 = new Node("李四");
        node1.setData(node2);
        node2.setData(node1);
        node1=null;
        node2=null;
    }
}

Take the Node whose name is "Zhang San" as an example: node1 reference, the reference count is 1->node2 data reference, the reference count is 2->node1 reference is invalid, the reference count is 1, but at this time because Node2 is already null, This Node can no longer be used in the program, and this Node is already garbage, but the number of references is 1 but not 0, so it cannot be recognized as garbage. This is the circular reference problem caused by the reference counting method.
insert image description here
Disadvantages of reference counting:

Each object must maintain a reference counter, which has performance loss
to deal with circular references

Due to the outstanding shortcomings of reference counting, the reference counting method is basically not used now, but the following reachability analysis method is used.

accessibility analysis

Starting from an object that can be used as GC roots, follow its reference relationship and traverse its reference chain. All GC roots objects that have not been traversed are garbage in memory, and the data at this address is no longer used in the program. Just like a bunch of grapes, start from the root and go down the vine. The grapes you can find must be in the bunch. Only the fallen grapes will not be traversed. This is garbage.
It is still an example that cannot be solved by using the technical method above.

public class Application {
    
    
    public static void main(String[] args) {
    
    
        Node node1 = new Node("张三");
        Node node2 = new Node("李四");
        Node node3 = new Node("王五");
       	new Node("赵六");
        node1.setData(node2);
        node2.setData(node1);
        node1=node3;
        node2=null;
    }
}

The above code can be expressed as:
insert image description here

If you use the reachability analysis method, node1, node2, and node3 can all be used as GC roots, and start traversing its reference relationship from the third. In the end, the three nodes named Zhang San, Li Si, and Zhao Liu in the memory were not traversed, so it was determined that these three were garbage and would be recycled
insert image description here

Since the reference relationship is traversed from the collection of GC roots objects, which objects can be used as GC roots? GC roots are a collection of four types of objects

Objects referenced by local variables on the stack
Objects referenced by static properties in the method area
Objects referenced by constants in the method area
The object referenced by the native method stack JNI

In the above example, all objects that are GC roots are type 1.

Traversing objects with reference relationships from GC roots 不是一定不会as garbage depends on the memory situation at that time and the reference relationship. But those without a reference relationship 一定会are treated as garbage.
It is determined which ones are garbage by traversing the reference relationship of GC roots objects, which in turn involves the type of relationship referenced.

Four types of reference relationships: strong, soft, weak, virtual

The reference relationship between objects can be divided into four types, namely 强引用, 软引用, 弱引用, 虚引用. The reference relationships we usually use are all strong references, and the remaining three methods are used under special functions.
insert image description here

Replenish:

The JVM will trigger GC when the space in the Eden area is insufficient or the space in the old generation is insufficient. We can also manually call the GC in the code. Using System.gc();this method is uncommon and usually only used in testing. And: use this way GC is Full GC.
By default, the minimum allocation of heap memory accounts for the total server memory 六十四分之一, and the maximum memory accounts for the total server memory 四分之一. The JVM will start with the minimum memory. When the memory is not enough, it will adjust until it reaches the maximum memory. If it is still not enough, it will cause an OOM error. .
Verification:
16G of native memory

can also be obtained through code:

public class Application {
    
    
    public static void main(String[] args) {
    
    
        OperatingSystemMXBean mem = (OperatingSystemMXBean) ManagementFactory.getOperatingSystemMXBean();
        // 获取内存总容量
        long totalMemorySize = mem.getTotalPhysicalMemorySize();
    }
}

View the minimum heap memory and maximum heap size in the JVM

public class Application {
    
    
    public static void main(String[] args) {
    
    
        System.out.println(""+Runtime.getRuntime().totalMemory()/1024/1024+"M");
        System.out.println(""+Runtime.getRuntime().maxMemory()/1024/1024+"M");
    }
}

insert image description here
Approximately satisfy 默认the maximum heap memory size = total memory size / 4, 默认minimum heap memory size = total memory size / 64,
these default size parameters can be adjusted in Idea (the server can be adjusted by command line when starting the jar package) , usually in order to prevent memory fluctuations, adjust the maximum heap memory space to the minimum heap memory space.
For example: -Xms5m: 最小set the heap memory space to 5M, -Xmx5m 最大set the heap memory space to 5M and
configure it in Idea
insert image description here
Code verification results:

strong reference

The references we usually use are strong references, and other references need to be marked with code. For example:

public class Application {
    
    
    public static void main(String[] args) {
    
    
        Node node1 = new Node("张三");
        Node node2 = new Node("李四");
        Node node3=node1;
    }
}

Node1 and node3 refer to the Node whose name is Zhang San, and node2 refers to the Node whose name is Li Si. All three references are strong references.
The characteristic of strong references is that when GC roots are reachable, strong references will never be garbage collected. If there is insufficient memory, it will not be recycled, and an OOM exception will be thrown directly.

Soft reference SoftReference

When the memory is insufficient, the heap space address of the soft reference is invalid, and it will also be garbage collected.
Code testing ideas: test the same code in different environments (sufficient memory and insufficient memory), whether the heap space under only soft references will be treated as garbage collection

public class Application {
    
    
    public static void main(String[] args) {
    
    
        User u1 = new User("张三"); //强引用方式
        SoftReference<User> softReferenceU1 = new SoftReference<>(u1); // 软引用方式 与强引用都是引用同一块地址

        System.out.println(u1);   // 以强引用的方式获取引用地址的数据（User （name=“张三”））
        System.out.println(softReferenceU1.get()); // 以软引用的方式获取引用地址的数据（User （name=“张三”））

        u1=null; //此时 User（“张三”）失去了强引用 ，只有一个软引用

        System.gc(); // 经历了一次 Full GC

        System.out.println(softReferenceU1.get()); // 测试 User（"张三"）是否被回收掉
    }
}
@Data
@AllArgsConstructor
class User{
    
    
    private String username;
}

insert image description here

With sufficient memory

Result:
insert image description here
From this it can be seen that:在内存充足时，由于堆内存中User（name=“张三”）仍有一个软引用，使得它没有被当做垃圾回收。

In the case of insufficient memory, set the JVM parameters, and set the maximum and minimum memory of the JVM heap to 5MB

Due to insufficient memory, an error will be reported directly. Here, the above code is modified so that the statement with insufficient memory is wrapped in try for output.

public class Application {
    
    
    public static void main(String[] args) {
    
    
        User u1 = new User("张三"); //强引用方式
        SoftReference<User> softReferenceU1 = new SoftReference<>(u1); // 软引用方式 与强引用都是引用同一块地址

        System.out.println(u1);   // 以强引用的方式获取引用地址的数据（User （name=“张三”））
        System.out.println(softReferenceU1.get()); // 以软引用的方式获取引用地址的数据（User （name=“张三”））

        u1=null; //此时 User（“张三”）失去了强引用 ，只有一个软引用


        try {
    
    
            Byte[] load = new Byte[1024 * 1024 * 10];
            // 直接开辟一个10M的内存空间 使得堆内存不足 这是检测是否会只有软引用是否会被当做垃圾回收
        }catch (Exception e){
    
    

        }
        finally {
    
    
            System.out.println(softReferenceU1.get()); // 测试 User（"张三"）是否被回收掉
        }
    }
}

result:
insert image description here

Weak reference WeakReference

As long as the GC is triggered, the soft reference will be invalid (the space that is only softly referenced will be treated as garbage)
without GC

public class Application {
    
    
    public static void main(String[] args) {
    
    
        User u1 = new User("张三"); //强引用方式
        WeakReference<User> weakReferenceU1 = new WeakReference<>(u1); // 弱引用方式 与强引用都是引用同一块地址

        System.out.println(u1);   // 以强引用的方式获取引用地址的数据（User （name=“张三”））
        System.out.println(weakReferenceU1.get()); // 以弱引用的方式获取引用地址的数据（User （name=“张三”））

        u1=null; //此时 User（“张三”）失去了强引用 ，只有一个弱引用

        System.out.println(weakReferenceU1.get());

    }
}
@Data
@AllArgsConstructor
class User{
    
    
    private String username;
}

Result: Before there is no GC, if the object under the reference can still be used
insert image description here

Thinking about GC
insert image description here
: Why should u1 be set to null in the code?
The use of u1 is a strong reference. The characteristic of a strong reference is: under any circumstances, the object pointed to by the strong reference will never be treated as garbage. If you do not disconnect strong references when testing soft references, weak references, and phantom references, you will not be able to see the results.

WeakHashMap

When we use HashMap, Key is a strong reference relationship. For example: I created a User u1=User (name="Zhang San"), and wanted to add an attached value to the u1 object, so I passed u1 as the key to the HashMap. When the u1 object is used up (the attached value should also disappear), u1=null when you want to release, but at this time HashMap still has a strong reference pointing to the User object. This object cannot be released as garbage.
insert image description here
In this way, the object will still exist during garbage collection, which will waste memory and cause OOM problems.
Using WeakHashMap is to establish a weak reference relationship with the object. If you want to use this object, you don’t want to make the object not garbage. When u1 is used up and the strong reference is disconnected, the garbage will be collected during GC, and the reference in WeakHashMap will be ignored.

public class Application {
    
    
    public static void main(String[] args) {
    
    
        WeakHashMap<User, String> weakHashMap = new WeakHashMap<>();
        User u1 = new User("张三");
        User u2 = new User("李四");
        weakHashMap.put(u1,"v1");
        weakHashMap.put(u2,"v2");
        u1=null;  //  User("张三")由于断开强引用，只有一个弱引用的WeakHashMap与之相连，在发生GC时会被回收
        System.out.println(weakHashMap);
        System.gc();
        System.out.println(weakHashMap); // 判断只有WeakHashMap引用下的对象是否被释放
    }
}

Result:
insert image description here
But when the object is used as weakHashMap as Value, the User object breaks the strong reference of u1, and the object still exists after GC, so the value part of WeakHashMap may be a strong reference. As shown in the picture:

insert image description here

Thinking: When the object pointed to by the key is recycled, what will be returned when the value is obtained through the key of WeakHashMap?
insert image description here

Thinking: WeakHashMap may be recycled because the Key is a weak reference. When the reference object has no strong reference, where does the value of WeakHashMap go after recycling?
insert image description here
Thinking: WeakHashMap<Object,Object>是否类似于HashMap<Reference<Object>,Object>?

Supplement: I have seen this idea of using weak references to solve the problem of ThreadLocal memory overflow in the ThreadLocalMap source code:
insert image description here
use ThreadLocal as the key, when the ThreadLocal is destroyed, the data about this ThreadLocal in the Map will also be released.

Usage scenarios of soft references and phantom references

Soft references and weak references are usually used in conjunction with strong references. A soft reference is when the strong reference is disconnected, and the address space is still wanted to be used if memory permits. Soft reference means that when the strong reference is disconnected, there is no need to use the address space. It is usually combined with a collection. The method of heap release is troublesome, and it is handed over to the GC to release.

For example: using soft references to generate a cache
When reading an image, if it is read from the disk every time it is used, it will affect performance, and if it is read all at once, there may be memory overflow. How to improve efficiency without causing memory overflow? At this time, it is necessary to maximize the memory utilization, and release the pictures loaded into the memory when the memory space is insufficient and OOM is about to occur. If there is enough memory during operation, the image in memory can be used all the time. HashMap<String, Reference<Byte>> hashMap = new HashMap<>()

Phantom references and reference queues

The use of phantom references should be used in conjunction with reference queues, and using phantom references without reference queues will be meaningless.

reference queue

Before the referenced object is destroyed, it will be put into the import queue. The other end listens to this queue and can perform some aftermath processing on the object that is about to be destroyed. Soft
references, weak references, and phantom references can all specify the reference queue during construction. Phantom references must use the specified reference queue when they are constructed.
The following phantom references are used in conjunction with reference queues:

public class Application {
    
    
    public static void main(String[] args) throws InterruptedException {
    
    
        ReferenceQueue<Object> referenceQueue = new ReferenceQueue<>();
        User u1 = new User("张三");
        WeakReference<Object> weakReference = new WeakReference<>(u1, referenceQueue);
        u1=null;
        new Thread(()->{
    
    
            while (referenceQueue.poll()==null){
    
    
            }
            System.out.println("对象被销毁");
        }).start();

        System.gc();
        TimeUnit.SECONDS.sleep(1);
    }
}

rarely used

Virtual reference PhantomReference

phantom: ghosts, illusions, phantom references cannot obtain the referenced object, its only use is to combine with the reference queue, so that the referenced object can be added to the reference queue before being destroyed, so that some aftermath can be done at the other end of the introduction queue. The use of phantom references does not have any additional impact on the class and is rarely used.

public class Application {
    
    
    public static void main(String[] args) throws InterruptedException {
    
    
        User user = new User("张三");
        ReferenceQueue<Object> referenceQueue = new ReferenceQueue<>();
        PhantomReference<User> reference = new PhantomReference<>(user, referenceQueue); // 在创虚引用对象时就要指定消息队列
        System.out.println(reference.get()); //尝试获取虚引用引用的对象
        user=null;
        new Thread(()->{
    
    
            while (referenceQueue.poll()==null){
    
    

            }
            System.out.println("虚引用引用的对象被销毁");
        }).start();
        System.gc();
        TimeUnit.SECONDS.sleep(1);
    }
}

result:
insert image description here

Thinking: Can other reference queues be added to the reference queue before being destroyed?
It can be used only rarely, but the phantom reference only has this function, so it is specially introduced

Summary: As for whether the object will be recycled, you only need to check the number of references to this object and the way of reference.

How to identify garbage has been introduced above, the next step is how to deal with garbage after it is found?

garbage collection algorithm

reference count

If the count is 0, it is garbage. If there is a reference count, it will be +1, and if there is a reference invalidation count, it will be -1.
This algorithm is combined with the reference counting method of scanning garbage, but this method has the problem of circular references, which is rarely used, so the algorithm method of reference counting will not be used.

Copy Copying

Copy the non-junk part of Area A to Area B, and then clear Area A. This method is suitable for the young generation of the heap, because the young generation is not a minority of garbage, and the amount of movement is relatively small. Copy the part of the Eden area and the survivor's from area that is not garbage to the survivor's to area, and then clear the Eden area and the survivor's from area.
This method is more efficient, and the copy starts from the place where the to area is all empty, and the clearing is to clear all the Eden area and the survivor's from area, so there will be no memory fragmentation.
Disadvantages of this approach

Additional space is required, such as the to area of the survivor, which needs to be the same size as the from area, and half of the memory in the survivor area is wasted at any time
If there is an extreme situation, such as 100% of the Eden area is not garbage, it is time-consuming to copy the useful part, and it will explode the survivors to area. So it is suitable for places with low survival rate (by default, Eden area: survivor from area: survivor to = 8: 1: 1)

Mark Mark-Sweep

Mark removal: divided into two parts 1. Traverse all GC roots to mark which part is garbage 2. Traverse the entire heap to clear garbage. STW (stop the world) is required to suspend the entire application when both marking for trash and clearing trash. Occurs in the old generation, because most objects in the old generation have experienced 15 GCs, and the probability of becoming garbage again is relatively small.

insert image description here
It has two major disadvantages:

Inefficient and requires pausing the entire application
Clearing makes the memory discontinuous, resulting in excessive memory fragmentation. The JVM then has to maintain a free list of memory, which is another overhead. Moreover, when allocating array objects (large objects are in the old area), a large piece of contiguous space is required, and it is not easy to find continuous memory space.

Mark-Compact

Mark finishing: divided into three parts, 1. Traverse all GC roots to mark which part is garbage 2. Traverse the entire heap to clear garbage. 3. Organize memory to reduce fragmentation. Compared with the marking algorithm, there is one more sorting step, which solves the problem of memory fragmentation, but also brings new problems: it takes time and CPU to sort.
Both mark clearing and mark clearing occur in the old generation, and the two are often combined during use. After multiple mark clears, one clearing is performed. This not only avoids the time-consuming and performance-consuming frequent marking and sorting, but also keeps memory fragmentation within a reasonable range.

Each of these three algorithms has its own advantages and disadvantages. There is no perfect algorithm, and the appropriate algorithm should be selected according to the appropriate Yangtze River.

Which algorithm above is an idea to solve garbage collection, and the specific implementation tool is the following garbage collector.

Types of Garbage Collectors

Really divide the tools implemented according to the algorithm, and the way the garbage collector generally recycles into five categories.

serial collector

Only one thread is collecting, and all user threads are suspended when garbage is collected. Suitable for single-threaded scenarios. For example, during class, only one cleaning lady comes in to clean up. If you want to continue the class, you can only wait until the cleaning lady finishes cleaning. All user threads have to pause and wait for this thread to complete GC, which is less efficient.
The specific implementations are Serial(for the young generation), Serial Old(for the old generation)

parallel collector

Compared with the serial collector, there is no longer only one thread when recycling, but multiple threads participate together. In this way, it is no longer waiting for one person to work, and the amount of tasks remains unchanged. Compared with one person working together, multiple people can reduce the waiting time of user threads.
The specific implementations are ParNew(for the young generation), Parallel Scavenge(for the young generation), Parallel Old(for the old generation)

concurrent collector

The user thread and the recovery thread can work together (although there is a pause, but the time is short), its biggest feature is that there is no long pause, and it is suitable for scenarios where the user has a strong request for interaction. Because users certainly don't want sudden long pauses when interacting, using a concurrent garbage collector can reduce response time.
The specific implementation is CMS(for the old generation)

G1

ZGC

Thinking: 什么是STW？
STW: Stop The Word means that during garbage collection, all user threads will be suspended, causing a freeze phenomenon.
Thinking: 复制算法也会到导致STW吗？为什么要GC时要STW？
All garbage collectors will cause STW, it's just a matter of time. Because when the garbage is finally confirmed, consistency must be ensured, otherwise the program will continue to run, the reference relationship will continue to change, and the analysis results will be inaccurate. If there is no pause during copying, the objects created during this period will not be marked as surviving objects, and the survivors will not be moved to the survivor area but will be cleared, strong references will be cleared, and the program will go wrong. Moreover, in the copy algorithm and the marking algorithm, the address of the original referenced object will change. Prevent citation confusion.

Examples of common garbage collectors

new generation

Serial

Only 一个the thread uses the copy algorithm in the young generation to copy all surviving objects in the Garden of Eden and the survivor From area to the survivor To area. STW is required during GC. Can work with the Serial Old garbage collector.

Parallel Scavenge

Start 多个the thread and use the copy algorithm in the young generation to copy all surviving objects in the Garden of Eden and the survivor From area to the survivor To area. STW is required during GC. Due to the use of multiple threads to work together, the time of STW may be shorter.
The throughput can be set by the parameter XX:MaxGCPauseMillis and the parameter -XX:GCTimeRatio which directly controls the throughput. Throughput = program running time / (program running time + GC time). Improving throughput can increase CPU utilization, but it has nothing to do with response speed and does not necessarily improve user experience.

Can work with CMS垃圾回收器.Parallel Old垃圾回收器

ParNew

It is also GC with multiple threads, which is slightly different from Parallel (throughput cannot be set). can CMS垃圾回收器work with

old generation

Serial Old

Garbage is collected in a single-threaded manner, and all user threads are suspended during garbage collection. It is similar to the Serial in the youth generation above, but in the old generation, the mark-and-organize algorithm is used, which will not cause memory fragmentation.

Parallel Old

The recovery of concurrent threads is similar to that of Parallel in the young generation, but the marking algorithm adopted in the old generation will not cause memory fragmentation.

CMS

The full name of CMS: concurrent mark sweep, the translation is concurrent clearing.

CMS four stages

初始标记: Scan all GC roots, that is, root nodes, in the old area. During this period, the change of the root node is suspended, so it is STW, but because there are relatively few GC roots, the STW time is relatively short
Concurrent marking: Since all root nodes have been scanned in the initial stage, at this stage, it is only necessary to scan out objects that cannot be referenced along the root node and mark them as garbage. During this period, it is executed in parallel with the user thread. Since there are usually more nodes under the root node than the root node, it is relatively time-consuming, but because it is executed in parallel with user threads, there is no STW. The disadvantage is that this node consumes more performance.
重新标记: Garbage may be added because the user thread scans garbage while executing. This is to suspend all user threads for a final housecleaning. Mark the newly added garbage in this process again to ensure that the cleaning is thorough, and the garbage generated due to concurrency cannot be unknown. Although this stage is STW, the time is short because there is less garbage added.
Concurrent clearing: All the garbage has been marked here, and the rest is cleared (the clearing algorithm uses mark clearing). Cleaned-up threads can execute alongside user threads.

The biggest feature of CMS is that the STW is short, and the system freezes for a short time during the interaction, which is suitable for use when interacting with users.
But CMS has two major disadvantages:

In the process of concurrent clearing, since the user thread is not suspended, it is possible that new data will be added to the old generation (such as large objects directly placed in the old generation), so it is not cleared when it is full, but a certain amount of space is reserved . For example, 10%. However, if the new data entering the old generation exceeds 10% of the reserved value during the concurrent clearing process, there will be no space for the old generation at this time, and it will be triggered, and then the CMS will degenerate Concurrent Mode Failureinto a Serial Old garbage collector. Suspend all user threads and recycle in a single-threaded manner. Cause serious lag. The large reserved space will cause frequent Full GC, and the small reserved space will cause the failure of CMS recycling.
The algorithm adopted by CMS is concurrent clearing algorithm, so it will generate memory fragmentation. When there are too many fragments, the Serial GC will be triggered, and the single-threaded mark-organization algorithm will be used to clear and defragment.

Thinking: Why doesn't CMS use markup algorithm to solve the fragmentation problem?
If the collation algorithm is used, the address of the object may change during concurrent cleanup, requiring the suspension of all user threads.

young generation + old generation

G1

Recommend reading this blog

As for the configuration and use of the GC garbage collector, as well as some parameters and commands of the JVM, the tuning of the JVM combined with Linux will be introduced in subsequent blogs. If you find an error, welcome to discuss it together.