Generation GC and CardTable

Generational garbage collection

All modern languages ​​with garbage collection have adopted the generational garbage collection mechanism, which is based on the following reasons:

  1. Almost all of the object life cycle is very short survival time is not more than a few garbage collection cycle
  2. GC time is usually more than 90% of the objects are created since the last GC
  3. If an object is survived more garbage collection cycle, GC will once again be marked on this subject

Bytes allocated

Generational garbage collection algorithm

Generating an object program is running can be divided into different generations in various ways, it is usually determined according to the object of the survival time.

Usually low-generation memory block will be recycled more frequently, causing a short pause system; high-generation memory block reclaiming comment lower, but the system can cause long pause.

We assume that the program memory is divided into two: low Gen0 generation and high generation of Gen1. The new object is allocated in the use of space Gen0.

gen-1

A total of four memory objects, then we will be a reference to the memory freed

gen-2

If this time triggered GC, the program will find that there are two objects have Gen0 unreachable, so will clean them out.

gen-3

After several rounds of GC, both objects are still alive in the state, the program will be promoted to its Gen1, that old era

gen-4

Now, assume that the distribution of many objects out in the program is running, start the next time GC memory object stored as follows:

gen-5

When we only GC Gen0 (new generation), and start looking up from the Root object without scanning the Gen1, only to find there are several marked ✓ in. Because they are the root directly reachable, the red arrow is pointing up the subject because it is indirect and can not be scanned.

If the scan time of the entire memory space will be very huge cost, because the number of objects in Gen1 (years old) included will be far greater than the number of objects Gen0.

Thus, all of the generational collection algorithm will be employed in a manner to contain the entire scanning Gen1 region Gen1 objects Gen0 object reference record, when the garbage collection Gen0 memory area, these objects also serve as root treated, avoiding . It is a typical space for time algorithm. Specific implementation shown below:

gen-6

Card Table

通常来说,老年代的空间大小往往比新生代要大得多,里面对象的数量也非常多。如果我们以引用的方式来保存对象,这个数据结构所占用的空间可能非常大。为了加快一点GC的速度,使得程序整体内存占用上升了30%-40%,这样的开销是否值就有待商榷了。因此,记录老年代空间大小的数据结构有以下两个特点:

  1. 占用的内存空间必须非常小
  2. 老年代中持有新生代对象的数量通常较少,是一种比较稀疏的映射结构

基于以上两点,几乎所有的分代算法都采用了Card Table的方式保存持有新生代引用的老年代对象信息。

保存对象二元状态且最省空间的数据结构必然是bitmap,Card Table也是bitmap的一个变种。它采用一个bit位代表老年代内存中的一片空间,如果这片空间中包含持有新生代引用的对象,就将这片空间至1(标记为dirty)。这样,我们就得到了一个不那么精确的结果:我们知道了哪些空间中包含了持有新生代对象引用的老年代对象。在进行垃圾回收时,只需要将这些对象遍历一次,就能够精确的知道哪些对象是我们所需的。

cardtable

如上图所示,Card Table中一个bit代表实际内存中4KB的空间。从原理可知:CardTable中bit所映射的空间越小,所包含的信息也就越模糊;映射的空间越大,所占用的内存空间也就越多。JVM中Card Table一个bit对应的内存空间为512byte。

Write Barrier

从CardTable原理可知,每当引用关系被创建或改变时,都有可能需要对Card Table进行修改。因此,在含有GC的高级语言中,通常会在赋值语句(即含有=)的前后进行一些逻辑操作。这种类似AOP的机制就被成为Write Barrier。例如:

a.val = b;

当JVM执行上述语句时,实际上在虚拟机中执行的语句为:

void oop_field_store(oop* field, oop value) { 
  pre_write_barrier(field); 
  *field = value; // the actual store 
  post_write_barrier(field, value); 
}

代码中的pre_write_barrier()与post_write_barrier()都是write barrier,JVM会在这个方法中对引用或对象做一些额外的操作。其他高级语言类似C#、js等都有此类机制。

总结

Generational garbage collection mechanism is computer scientists based on the actual use of GC algorithm for doing a tendency optimization, in order to avoid generational memory space after running the entire program to traverse the various high-level language using the Card Table and Write Barrier space for time optimizing the way garbage collection efficiency of the new generation, finally get a very efficient garbage collection algorithm.

Original: Big Box  Generation GC and CardTable


Guess you like

Origin www.cnblogs.com/wangziqiang123/p/11618365.html