JVM Quest: garbage collection algorithm

This series of notes is mainly based on "in-depth understanding of the Java Virtual Machine: JVM advanced features and best practices 2nd Edition" is a book of reading notes.

Garbage collection algorithm

Garbage collection algorithms are mainly mark - sweep algorithm, replication algorithm, mark - sorting algorithm, generational collection algorithm these types of specific algorithm to achieve not too much to explore, only their design ideas are introduced.

Mark - sweep algorithm

The most basic method is to mark - sweep (Mark-Sweep) algorithm, like its name, is divided into "mark" and "clear" two stages: first mark all objects to be recovered, uniform recycling mark after all marked objects. The labeling process is actually an article on the process of determining whether the object mentioned "death", through reference counting algorithm or reachability analysis algorithm to determine whether the object to be recovered.

Mark - sweep algorithm is the most basic algorithm, other algorithms because it is based on fundamental improvements evolved. Mark - Clear lack algorithms are mainly two: one is efficiency, marking and removal efficiency of the two processes is not high; the other is a space problem, a large number of discrete space debris after clearly marked, will be too much space debris cause when you need to allocate large objects, unable to find a large enough contiguous space in advance and had to trigger another garbage collection.

Mark - sweep algorithm execution as shown below:

image

Replication algorithm

Copy (Copying) algorithm to solve - efficiency "Clear mark" algorithm was born, and it is equal to the capacity of the memory is divided into two sizes, each only use one, when this one ran out of memory, it will also survive copy objects to another one of them, then it turned out that an empty memory. Thus each time for the entire half-area memory recall, memory allocation when they do not consider the case of space debris, as long as the top of the stack pointer moves sequentially assigned to, simple, efficient operation. This algorithm is only the memory is reduced to half the original, costly.

Copy algorithm execution as shown below:

image

现代的很多商用虚拟机都是采用的这种收集算法来回收新生代,有研究表明,新生代中的对象98%都是”朝生夕死“的,所以不需要按照1:1的比例来划分内存空间,而是将内存分为一块较大的Eden空间和两块较小的Survivor空间,每次使用Eden空间和其中一块Survivor空间,当回收时,Eden和Survivor中还存活的对象一次性复制到另外一块Survivor中,然后清理掉Eden空间和刚才用过的Survivor空间。

HotSpot虚拟机划分的Eden空间和Survivor空间的比例是8:1,也就是把新生代空间划分为8:1:1的三部分,每次新生代中可以使用的内存为90%,被浪费的只有10%。

当然,98%的对象可回收只是一般场景下的数字,我们不能保证每次回收后只有不多于10%的对象存活,当Survivor空间不够用时,需要依赖其它内存(指老年代)进行分配担保。分配担保就是,当另外一块Survivor空间没有足够空间存放垃圾收集新生代存活下来的对象时,这些对象将直接通过内存分配担保机制进入老年代。

标记-整理算法

复制算法在对象存活率比较高的时候,就要进行非常多的复制操作,使得效率变低。而且如果不想浪费50%的空间,就必须有额外一块空间用作分配担保,所以在老年代一般不会使用这种算法。

根据老年代的特点,标记-整理(Mark-Compact)算法应运而生,标记过程仍然与“标记-清除”算法,但后续步骤不再是直接对可回收对象进行清除,而是让所有存活对象都向一端移动,然后直接清理掉边界以外的内存。

标记-整理算法的执行过程如下图:

image

分代收集算法

“分代收集”(Generational Collection)算法就是根据对象的存活周期的不同,将内存分为相应的几块。一般是把Java堆内存分为新生代和老年代,这样可以根据各个年代的特点采用适合的收集算法。在JDK1.7及之前,还有永久代,不过JDK1.8中已经被取消。

在新生代中,每次垃圾收集都有大批量的对象死去,只有少量存活,那就使用复制算法,只需要付出少量存活对象的复制成本就可以完成收集。而老年代中对象存活率较高,也没有额外空间进行分配担保,所以就必须使用标记-清除或者标记-整理算法来进行回收。

收集算法的高效执行

上面介绍了几个主流的垃圾收集算法,垃圾收集算法中需要判断哪些对象是“存活”的哪些是“死亡”的,以决定具体回收哪些对象。上一篇文章中介绍的引用计数算法以及可达性分析算法,就是进行“对象审判”的依据。虚拟机在实现以上算法的同时,也必须对算法的执行效率进行严格的考量,才能保证虚拟机高效运行。

GC Roots

可达性分析的时候,会从GC Roots节点查找引用链来作为判断依据。而可以作为GC Roots的节点主要是在全局性的引用(例如常量或类静态属性),或者执行上下文(栈帧中的本地变量表)中。

另外,可达性分析的时间敏感还体现在GC停顿上,因为分析工作必须在一个能保证一致性的快照中进行,这里的一致性是指分析期间整个执行系统就像冻结在某个时间点上,不能出现分析过程中对象引用关系还在不断变化的情况。这点是导致GC进行时必须停顿所有执行线程的一个重要原因,这种GC停顿被称作Stop-The-World

目前主流的Java虚拟机采用的都是准确式GC,就是虚拟机可以知道内存中某个位置的数据具体是什么类型,所以当GC停顿时,并不需要一个不漏的检查所有引用,虚拟机有办法可以直接得知哪些地方存放着对象引用。在HotSpot的实现中,是使用一组称为OopMap的数据结构来达到这种目的,在类加载完后,HotSpot就把对象内什么偏移量上是什么类型的数据计算出来,在JIT编译时,也会在特定的位置记录下栈和寄存器中哪些位置是引用,这样在GC扫描时就可以直接得知这些信息了。

安全点

程序执行时并非在所有地方都能停下来开始GC,只有在到达安全点(SafePoint)时才能暂停,因为只有在安全点的位置,才会记录引用关系,才会记录OopMap中的信息。安全点的选取是以“是否具有让程序长时间执行的特征”为标准进行选定,而满足长时间执行的特征就是指令序列复用,例如方法调用、循环跳转、异常跳转等,具有这些功能的指令才会产生SafePoint。

Another problem is how to make SafePoint GC occurs when all threads have come to pause on the nearest safe point, there are two methods, namely preemptive interrupt and active interrupt . Preemptive interrupt will first of all threads in all interrupted when GC occurs, if found to have local interrupted thread safety is not the point, on the resumption of the thread, I went to the safe point. Active is interrupted when the GC thread needs to be interrupted, not directly to the threaded operation, just simply set a flag, each thread initiative to poll this flag, the interrupt flag is true discovery interrupted himself suspended. Using a more active break.

Security Zone

Point safety mechanism appears to have been perfect, but actually is not. Point mechanism to ensure safety during program execution, in not too long a time to be able to enter the GC SafePoint, but when the program does not perform it? The so-called do not perform just do not allocate CPU time, a typical example is the thread Sleep or Blocked state, then the thread can not respond to the JVM interrupt request, which went to a safe point to interrupt pending. This case, it is only through the security zone to resolve the (Safe Region).

Security zone refers to a piece of code snippet, the reference relationship does not change anywhere in this region began GC are safe. SafeRegion can be seen as an extension of the SafePoint.

Guess you like

Origin www.cnblogs.com/cellei/p/12131331.html