Garbage collection algorithm of JVM garbage collection series

essay

It is a sin to delay publishing this article for a week. The reason for the recent delay in updating is that I am currently studying the overall layout of the next series of "Microservices, Cloud Native" articles. I have to say that the content of java is really endless, especially when microservices become the mainstream, resulting in more and more hierarchical levels among programmers, and the requirements of architects have reached a terrifying height in history. Business developers are more replaceable (uncomfortable~~~).
insert image description here

introduction

This blog post will introduce several common garbage collection algorithms in the HotSpot virtual machine, such as: mark removal, copy algorithm, mark compression, etc.

Reference book: "In-depth understanding of Java Virtual Machine"

Personal java knowledge sharing project - gitee address

Personal java knowledge sharing project - github address

Mark Sweep

The most basic collection algorithm is the "Mark-Sweep" algorithm. Like its name, the algorithm is divided into two stages: "Mark" and "Sweep":

  • Marking: First mark all the objects that need to be recycled. The process of marking is actually introduced in the previous article
  • Cleanup: Recycle all marked objects uniformly after marking is completed

insert image description here
The jvm will first mark the garbage objects that can be recycled (through the root reachability algorithm), and then clear these objects, in fact, reset the corresponding memory block of the object. The whole process is through the daemon process of the virtual machine ( deamon) completed.

The mark-and-sweep algorithm is the most basic collection algorithm. The reason why it is said to be the most basic collection algorithm is that subsequent collection algorithms are based on this idea and improved on its shortcomings. Therefore, the disadvantages of this algorithm are also obvious:

  1. The efficiency is not high, and the entire application needs to be stopped, resulting in poor user experience
  2. The free memory cleaned up in this way is discontinuous, resulting in memory fragmentation. A free list needs to be maintained, and too much space fragmentation may lead to the inability to find enough contiguous memory and have to trigger another garbage collection action in advance when large objects need to be allocated during the running of the program

Copying algorithm (Copying)

In order to solve the efficiency problem, a collection algorithm called "copying" (Copying) appeared, which divides the available memory into two pieces of equal size according to the capacity, and only uses one of them at a time (that is, two pieces in the heap memory). A survivor area, which was introduced in the previous heap series of articles). The principle of this algorithm is to divide the available memory into two pieces of equal size according to the capacity, and only use one of them at a time. When the memory of this block is used up, copy the surviving object to another block, and then clean up the used memory space at one time.

insert image description here

Today's commercial virtual machines use this collection algorithm to recycle the new generation. IBM's special research shows that 98% of the objects in the new generation are "live and die", so there is no need to divide them according to the ratio of 1:1. Instead, the memory is divided into a larger Eden space and two smaller Survivor spaces, and Eden and one of the Survivor spaces are used each time [1]. When recycling, copy the surviving objects in Eden and Survivor to another Survivor space at one time, and finally clean up Eden and the Survivor space just used. The default ratio of Eden to Survivor for the HotSpot virtual machine is 8:1, that is, the available memory space in each new generation is 90% (80%+10%) of the entire new generation capacity, and only 10% of the memory will be "wasted". ". Of course, 98% of the recyclable objects are only data in general scenarios. We have no way to guarantee that no more than 10% of the objects will survive each recycle. When the Survivor space is not enough, we need to rely on other memory (here refers to the old generation) to carry out Assign guarantee (Handle Promotion).

The memory allocation guarantee is like we go to the bank to borrow money. If we have a good reputation and can repay the loan on time in 98% of cases, the bank may default that we can repay the loan on time and in quantity next time. We only need a guarantee If someone can guarantee that if I can't repay the money, I can deduct money from his account, then the bank thinks there is no risk. The same is true for memory allocation guarantees. If another piece of Survivor space does not have enough space to store the surviving objects collected in the last young generation, these objects will directly enter the old generation through the allocation guarantee mechanism.

Advantages and disadvantages of the replication algorithm:

  • Advantages: no mark and clear process, simple implementation and efficient operation. After copying the past, the continuity of the space is guaranteed, and there will be no "fragmentation" problem
  • Disadvantages: The disadvantage of this algorithm is also obvious, that is, twice the memory space is required. For G1, which is split into a large number of regions, copying instead of moving means that the GC needs to maintain the object reference relationship between regions, regardless of the memory usage or time overhead.

Mark Compact

This algorithm is an algorithm proposed to solve the problem that the copy collection algorithm will perform more copy operations when the object survival rate is high, and the efficiency will become low. More importantly, if you don't want to waste 50% of the space, you need to have additional space for allocation guarantees to deal with the extreme situation where all objects in the used memory are 100% alive, so you generally can't directly choose this in the old generation algorithm.
insert image description here
Pros and cons of this algorithm:

  • Advantages: Eliminates the disadvantage of scattered memory areas in the mark-clear algorithm. When we need to allocate memory for a new object, the JVM only needs to hold the start address of a memory. Eliminates the high cost of halving memory in the replication algorithm
  • Disadvantages: In terms of efficiency, the mark-sort algorithm is lower than the copy algorithm. While moving the object, if the object is referenced by other objects, the address of the reference needs to be adjusted. During the moving process, the user application needs to be suspended all the way. Namely: STW

Generational Collection Algorithm

Among all the above algorithms, none of them can completely replace other algorithms, and they all have their own unique advantages and characteristics.

The generational collection algorithm came into being. The generational collection algorithm is based on the fact that different objects have different life cycles. Therefore, objects with different life cycles can be collected in different ways to improve recycling efficiency. Generally, the Java heap is divided into the new generation and the old generation, so that different recycling algorithms can be used according to the characteristics of each age to improve the efficiency of garbage collection.

In HotSpot, based on the concept of generation, the memory recovery algorithm used by GC must combine the characteristics of the young generation and the old generation.

  • Young Generation
    • The characteristics of the young generation: the area is large, the object life cycle is long, the survival rate is high, and the collection is not as frequent as the young generation.
    • In this case, the recycling speed of the replication algorithm is the fastest. The efficiency of the copy algorithm is only related to the size of the current surviving objects, so the recycling of the young generation. The problem of low memory utilization of the replication algorithm is alleviated by the design of two survivors in hotspot.
  • Old generation (Tenured Gen)
    • The characteristics of the old generation: the area is larger, the object life cycle is long, the survival rate is high, and the collection is not as frequent as the young generation.
    • In this case, there are a large number of objects with a high survival rate, and the replication algorithm becomes obviously inappropriate. It is generally implemented by a mark-sweep-sort mix.
      • The overhead of the Mark phase is proportional to the number of surviving objects
      • The overhead of the Sweep phase is positively related to the size of the managed area
      • The overhead of the Compact phase is proportional to the data of the surviving objects

incremental collection algorithm

If all the garbage is processed at one time, the system needs to be paused for a long time, so the garbage collection thread and the application thread can be executed alternately for a long time. Each time, the garbage collection thread collects only a small area of ​​memory space and then switches to the application thread. Repeat in turn until garbage collection is complete.

  • Advantages: The incremental collection algorithm allows garbage collection threads to complete marking, cleaning, or copying in a staged manner by properly handling conflicts between threads
  • Disadvantage: Using this method, since the reference program code is executed intermittently during the garbage collection process, the system pause time can be reduced. However, due to the consumption of thread switching and context conversion, the overall cost of garbage collection will increase, resulting in a decrease in system throughput.

Finally, summarize the advantages and disadvantages of the three common algorithms:

Mark Sweep Mark Conpact Copying
speed medium slowest fastest
space overhead Less (but builds up debris) Less (does not accumulate debris) Typically needs 2x the size of live objects (doesn't pile up fragments)
moving object no yes yes

Guess you like

Origin blog.csdn.net/a_ittle_pan/article/details/127038570