Detailed explanation of garbage collection algorithm of JVM

Get into the habit of writing together! This is the 8th day of my participation in the "Nuggets Daily New Plan · April Update Challenge", click to view the details of the event .

Existing Garbage Collection Algorithms

Classification

According to how to determine whether the object is garbage, garbage collection algorithms are divided into two categories: 1. Reference counting garbage collection (determining garbage is through reference counters) Alias: direct garbage collection 2. Tracking garbage collection (determining garbage is through GC Roots) Alias : indirect garbage collection

主流虚拟机采用的是第二种追踪式垃圾收集,所以本文讲解第二种垃圾收集的算法
复制代码

Garbage Collector Design Principles

According to two generational hypotheses:

1. Most objects cannot survive the first garbage collection

2. Objects that have survived multiple garbage collections are difficult to be marked as garbage.

The garbage collector divides the memory in the heap into different areas, and allocates them to different areas according to the age of the object generation (how many garbage collections it has survived):

For example, if the object generation age is small, the first type of object should be marked as a surviving object , and there is no need to mark those garbage objects, because most of these objects are garbage objects that will be used up soon.

For the second type of object with older generational age, it should be marked as garbage objects, because according to the second hypothesis, the proportion of garbage objects in this part of the objects is very small, so the frequency of garbage collection can also be reduced.

After the heap is divided into different areas, the garbage collector can only reclaim a part of the area, and different algorithms can be used for each part of the area to collect garbage.

Generally speaking, the heap will be divided into at least two areas: "new generation" and "old generation". The young generation stores objects of the first hypothesis type, and the old generation stores objects of the second hypothesis type.

注意:这种设计看起来是完美的,但是如果老年代中的对象引用了新生代中的对象这个时候年轻代发生垃圾回收时,除了需要遍历GC Roots外,还需要遍历整个老年代才会确保年轻代中的对象真正没有对象引用。显然这种遍历整个老年代效率肯定会很低,所以采用了一种解决方案:读者有兴趣可以看看: 在这篇博客的末尾

标记-清除算法

最早出现的垃圾回收算法,之后出现的算法都是根据其缺点来进行演进的。

两个阶段: 1.标记 2.清除 标记需要回收的对象完成后进行统一回收所有被标记的对象, 也可以标记存活的对象统一回收没有被标记的对象。

一,标记: 如何判定对象是否是垃圾的过程在上一篇 博客 中已经讲解过,接着标记这些垃圾对象。

二,清除: 进行统一回收掉标记的对象。

缺点

1.当堆中的对象大部分是垃圾时,标记和清除的效率会变低,而且会随着内存中垃圾对象的增长,导致效率越来越低。

2.内存碎片化:因为内存分配不是连续的,所以当清除后,内存中会存在大量内存碎片。当遇到大对象分配内存找不到足够的连续的内存来存放时会提前触发GC。 insert image description here

标记-复制算法

采用的是“半区复制”的算法来实现的,即每次只使用其中的一部分内存,当这部分内存用完后将存活着的对象复制到另外一块内存上,接着清空刚才使用的那部分内存,当另一部分内存满了的时候再用上一次清空后的那块内存往复。

解决了标记-清除的内存碎片化问题,因为当发生GC时会进行全部清空,只将存活对象复制到另外一块内存中。

insert image description here

“Apple回收策略”

Andrew Appel针对刚刚分代假说中的第一条,提出了“Appel式回收策略”。

一般情况下百分之九十八的对象在经历第一次gc时就会被清除。因此做出优化将年轻代分为了一块eden空间和两块Survival空间。enen和Survival内存占比为8:1,即每次使用百分之九十的内存,只有百分之十的内存会被浪费,因为对象大部分都会死去所以没有必要分配一半的空间来存放存活对象。

但是如果使用百分之十的内存来存放存活对象,当存活对象在Survival空间存放不下时,这个时候就需要用老年代担保,因此当存不下时会存放到老年代中。

缺点

1.当内存中的对象存活率较高时,复制大量存活对象会使得效率变低。 2.如果不想造成严重的内存浪费,就需要有额外的空间进行分配。

标记-整理算法

上面所说的标记-复制算法针对与新生代中进行的回收算法,并不适用与老年代,原因:老年代中存活率较高,存放不下时需要额外的内存空间担保。

因此出现了标记-整理算法, 和标记-复制算法相同的是:标记的都是存活对象; 和标记-复制算法不同的是:复制算法将存活对象复制到另外一块内存上然后清除之前使用的内存,而整理算法是移动存活的对象到一端,然后清除边界以外的内存

缺点

移动存活对象: 对于老年代中GC之后大部分都为存活对象,将这些对象都进行移动并且更新引用这些对象的地方是一个比较耗时的操作。而且更新引用需要暂停用户线程来保证用户线程访问对象不会出错,简称STW,"Stop the Word"。

In fact, STW is not only required for moving objects to update references in the sorting algorithm, but STW is required for the clearing algorithm and the mark clearing in the copying algorithm, but the time is short.

insert image description here

Summarize

Using the mark-collate algorithm means that the object needs to be moved to update the object's reference during GC, which means that memory recovery will be more complicated .

Using a mark-and-sweep algorithm means memory fragmentation .

Using a mark-copy algorithm means that memory availability is not high .

"Throughput": The sum of the efficiency of the evaluator (the thread using the garbage collector, which is also the user thread) and the garbage collector.

Because memory allocation and access (excessive allocation and access caused by memory fragmentation) are higher than the frequency of garbage collector recycling (STW is required for recycling, and it is time-consuming) , the throughput of sorting is actually better than clearing. .

Parallel Scabenge is based on the mark-and-squeeze algorithm for the throughput-focused collector, and the mark-to-clean algorithm is used for the STW-focused collector CMS. In fact, CMS is a collector that combines the two algorithms. Most of the time, the clearing algorithm is used, and only when the allocated memory is insufficient (the fragmentation is particularly serious), the sorting algorithm is used for a collection.

Guess you like

Origin juejin.im/post/7084419733474246693