C #-depth analysis of GC-handling mechanism

Quote Address: https://www.cnblogs.com/nele/p/5673215.html

GC Past and Present

  Although this article is based on .NET as a target to tell the GC, GC but the concept is not only born soon. As early as 1958, the famous Turing Award winner John McCarthy realized Lisp language has provided the GC function, this is the first time the GC. Lisp programmers think memory management is too important, it can not be managed by the programmer.

  But then the days of Lisp did not amount to anything, the use of manual memory management language prevailed, with C represented. For the same reason, but different people different views, C programmers think memory management is too important, it can not be managed by the system and ridiculed Lisp programs slow as a turtle speed. Indeed, in that each Byte must be carefully calculated speed and a large number of GC's occupation of system resources so that many people can not accept. Then, in 1984, the Dave Ungar developed Smalltalk language for the first time using the Generational garbage collection technique (this technique in the following talk), but Smalltalk has not been very widely used.

  Until the mid-1990s GC before boarding the stage of history as a protagonist, they had to due to the progress of Java, today's GC has non-Wu Amon. Using Java VM (Virtual Machine) mechanism, managed by the VM to run the program, of course, also include the GC management. Appeared in the late 1990s .NET, .NET and Java uses a similar approach to management by the CLR (Common Language Runtime). It appears the two camps of people into the development of age-based virtual platform, GC at this time also more and more public attention.

  Why use GC it? You can also say why you want to use automatic memory management? There are several reasons the following:

  1, increase the abstraction level of software development;

  2, programmers can focus on the real issues in question without distraction to manage memory;

  3, the interface module can be made more clear, reducing the coupling between modules;

  4, greatly reducing the memory Bug human mismanagement brought about;

  5, make more efficient memory management.

  Overall GC is possible so that programmers can emerge from complex memory issues, thereby increasing the speed of software development, quality and safety.

  What is GC

  GC as the name suggests, is garbage collection, of course, just from here in terms of memory. Garbage Collector (garbage collector, in the case would not become too confused GC) of all objects on the basis of the application root, traversing applications dynamically allocated on the Heap [2], by identifying whether they are referenced to determine which the object is dead, which still need to be used. The object is no longer referenced by an application or other object is the root object is already dead, so-called junk, need to be recovered. This is the principle of work GC. In order to achieve this principle, GC has a variety of algorithms. More common algorithm Reference Counting, Mark Sweep, Copy Collection and so on. Mark Sweep algorithm mainstream virtual system .NET CLR, Java VM and Rotor are used.

  A, Mark-Compact labeled compression algorithm

  Simply the .NET GC algorithm seen Mark-Compact algorithm. Stage 1: Mark-Sweep marks cleanup phase, assume heap all the objects can be recycled, and then find the object can not be recycled, these objects are marked to mark, did not play in the final heap objects are labeled can be recycled; stage 2: Compact compression stage, after the object is reclaimed heap memory space becomes discontinuous, moving these objects in the heap, to make them re-start from the heap base address continuous arrangement, similar to the defragmentation of disk space.

alt

  Heap memory through recovery after compression, may continue to use the foregoing method of heap memory allocation, i.e., only a pointer to the record start address can be allocated heap. The main processing steps: the thread is suspended determine roots → → create objects recovered reachable objects graph → → → heap pointer compression repair. It can be understood roots: references the heap object relationship complex (cross-reference, a circular reference) to form a complex graph, roots CLR various entry points can be found outside the heap.

  GC roots search areas include the global object, static variables, local objects, the function call parameters, the current object pointer in the CPU register (also finalization queue) and the like. Mainly fall into two types: static variables already initialized object thread still in use (stack + CPU register). Reachable objects: refers to the reference relationship according to the object, the object can be reached starting from the roots. The current execution of the function of the object A is a local variable root object, he refers to an object member variable B, then B is a reachable object. From the roots can create reachable objects graph, the remaining objects that is unreachable, can be recycled.

alt  Pointer repair because the compact process of moving a heap object, the object address changes, we need to repair all references pointers, including a reference pointer stack, CPU register pointer in the heap, and other objects. Debug and release a slight difference between the execution mode, the object code for subsequent release mode is unreachable without reference, and the debug mode to wait until the current function completes, these objects will become unreachable, the purpose of debugging local object tracking Content. Passed to managed objects COM + will also become root, and has a reference counter to a compatible memory management mechanism COM +, the reference counter is zero, these objects could become recycled objects. Pinned objects refers to a subject can not move position after dispensing, e.g. object to unmanaged code (or use a fixed keyword), GC unmanaged code can not modify the reference pointer when the pointer repair, thus moving these objects will occur abnormal. pinned objects can cause heap fragmentation, but in most cases it is passed to unmanaged code objects should be recovered out in the GC.       

  Two, Generational generational algorithm

  Programs may use hundreds of M, a few G memory, such a memory area for GC high cost, some generational algorithm includes a statistical basis, the effect of improving the performance of the more obvious GC. The object according to the life cycle into a new, old, according to the results of the statistical distribution of reflected, can the new and old areas with different recovery strategies and algorithms to enhance recycling efforts for the new zone, for a shorter time interval , smaller memory region, the partial objects at a lower cost on a large number of newly execution path no longer in use timely recovery of discarded off. Assumption condition generational algorithm:

  1, a large number of the newly created object life cycle relatively short, while the older object life cycle will be longer;

  2, on the part of the memory is faster than the recovery of all the memory-based recovery operation;

  3, the degree of association between the newly created object is usually strong. Heap allocated objects are continuous, strong degree of association help to improve the hit rate of the CPU cache, the .NET heap into three generations old region: Gen 0, Gen 1, Gen 2;

alt

  Heap divided into three generations old region, the corresponding GC three ways: # Gen 0 collections, # Gen 1 collections, #Gen 2 collections. If Gen 0 heap memory reaches a threshold, the trigger generation 0 GC, GC Gen 0 0 After generations of surviving objects into the Gen1. If the memory Gen 1 reaches a threshold, for the generation GC, GC the generation Gen 1 and Gen 0 heap heap be recovered together, surviving object enters Gen2.

  The second generation GC Gen 0 heap, Gen 1 heap and Gen 2 heap recovered together, Gen 0 Gen 1 and relatively small, generation of the two together is always kept at about age 16M; Gen2 size is determined by the application, may reach several G, substituting 0 and thus generation of very low cost GC, GC called second generation Full GC, usually costly. Rough calculation and generation of generation 0 GC should be completed between a few milliseconds to tens of milliseconds, while relatively large Gen 2 heap, Full GC may take a few seconds. During operation substantially .NET application is concerned, the 2nd generation, and generation of generation 0 GC frequency should be approximately 1: 10: 100.

  Three, Finalization Queue and Freachable Queue

  这两个队列和.NET对象所提供的Finalize方法有关。这两个队列并不用于存储真正的对象,而是存储一组指向对象的指针。当程序中使用了new操作符在Managed Heap上分配空间时,GC会对其进行分析,如果该对象含有Finalize方法则在Finalization Queue中添加一个指向该对象的指针。

  在GC被启动以后,经过Mark阶段分辨出哪些是垃圾。再在垃圾中搜索,如果发现垃圾中有被Finalization Queue中的指针所指向的对象,则将这个对象从垃圾中分离出来,并将指向它的指针移动到Freachable Queue中。这个过程被称为是对象的复生(Resurrection),本来死去的对象就这样被救活了。为什么要救活它呢?因为这个对象的Finalize方法还没有被执行,所以不能让它死去。Freachable Queue平时不做什么事,但是一旦里面被添加了指针之后,它就会去触发所指对象的Finalize方法执行,之后将这个指针从队列中剔除,这是对象就可以安静的死去了。

  .NET Framework的System.GC类提供了控制Finalize的两个方法,ReRegisterForFinalize和SuppressFinalize。前者是请求系统完成对象的Finalize方法,后者是请求系统不要完成对象的Finalize方法。ReRegisterForFinalize方法其实就是将指向对象的指针重新添加到Finalization Queue中。这就出现了一个很有趣的现象,因为在Finalization Queue中的对象可以复生,如果在对象的Finalize方法中调用ReRegisterForFinalize方法,这样就形成了一个在堆上永远不会死去的对象,像凤凰涅槃一样每次死的时候都可以复生。

  托管资源:

  .NET中的所有类型都是(直接或间接)从System.Object类型派生的。

  CTS中的类型被分成两大类——引用类型(reference type,又叫托管类型[managed type]),分配在内存堆上;值类型(value type),分配在堆栈上。如图:

alt

  值类型在栈里,先进后出,值类型变量的生命有先后顺序,这个确保了值类型变量在退出作用域以前会释放资源。比引用类型更简单和高效。堆栈是从高地址往低地址分配内存。

  引用类型分配在托管堆(Managed Heap)上,声明一个变量在栈上保存,当使用new创建对象时,会把对象的地址存储在这个变量里。托管堆相反,从低地址往高地址分配内存,如图:

alt

  .NET中超过80%的资源都是托管资源。

  非托管资源: 

  ApplicationContext, Brush, Component, ComponentDesigner, Container, Context, Cursor, FileStream, Font, Icon, Image, Matrix, Object, OdbcDataReader, OleDBDataReader, Pen, Regex, Socket, StreamWriter, Timer, Tooltip, 文件句柄, GDI资源, 数据库连接等等资源。可能在使用的时候很多都没有注意到!

  .NET的GC机制有这样两个问题:

  首先,GC并不是能释放所有的资源。它不能自动释放非托管资源。

  第二,GC并不是实时性的,这将会造成系统性能上的瓶颈和不确定性。

  GC并不是实时性的,这会造成系统性能上的瓶颈和不确定性。所以有了IDisposable接口,IDisposable接口定义了Dispose方法,这个方法用来供程序员显式调用以释放非托管资源。使用using语句可以简化资源管理。

  示例:

Copy the code
///summary
/// 执行SQL语句,返回影响的记录数
////summary
///param name="SQLString"SQL语句/param
///returns影响的记录数/returns
publicstaticint ExecuteSql(string SQLString)
{
using (SqlConnection connection =new SqlConnection(connectionString))
{
using (SqlCommand cmd =new SqlCommand(SQLString, connection))
{
try
{
connection.Open();
int rows = cmd.ExecuteNonQuery();
return rows;
}
catch (System.Data.SqlClient.SqlException e)
{
connection.Close();
throw e;
}
finally
{
cmd.Dispose();
connection.Close();
}
}
}
}
Copy the code

  当你用Dispose方法释放未托管对象的时候,应该调用GC.SuppressFinalize。如果对象正在终结队列(finalization queue), GC.SuppressFinalize会阻止GC调用Finalize方法。因为Finalize方法的调用会牺牲部分性能。如果你的Dispose方法已经对委托管资源作了清理,就没必要让GC再调用对象的Finalize方法(MSDN)。附上MSDN的代码,大家可以参考。

Copy the code
publicclass BaseResource : IDisposable
{
// 指向外部非托管资源
private IntPtr handle;
// 此类使用的其它托管资源.
private Component Components;
// 跟踪是否调用.Dispose方法,标识位,控制垃圾收集器的行为
privatebool disposed =false;
// 构造函数
public BaseResource()
{
// Insert appropriate constructor code here.
}
// 实现接口IDisposable.
// 不能声明为虚方法virtual.
// 子类不能重写这个方法.
publicvoid Dispose()
{
Dispose(true);
// 离开终结队列Finalization queue
// 设置对象的阻止终结器代码
//
GC.SuppressFinalize(this);
}
// Dispose(bool disposing) 执行分两种不同的情况.
// 如果disposing 等于 true, 方法已经被调用
// 或者间接被用户代码调用. 托管和非托管的代码都能被释放
// 如果disposing 等于false, 方法已经被终结器 finalizer 从内部调用过,
//你就不能在引用其他对象,只有非托管资源可以被释放。
protectedvirtualvoid Dispose(bool disposing)
{
// 检查Dispose 是否被调用过.
if (!this.disposed)
{
// 如果等于true, 释放所有托管和非托管资源
if (disposing)
{
// 释放托管资源.
Components.Dispose();
}
// 释放非托管资源,如果disposing为 false,
// 只会执行下面的代码.
CloseHandle(handle);
handle = IntPtr.Zero;
// 注意这里是非线程安全的.
// 在托管资源释放以后可以启动其它线程销毁对象,
// 但是在disposed标记设置为true前
// 如果线程安全是必须的,客户端必须实现。
}
disposed =true;
}
// 使用interop 调用方法
// 清除非托管资源.
[System.Runtime.InteropServices.DllImport("Kernel32")]
privateexternstatic Boolean CloseHandle(IntPtr handle);
// 使用C# 析构函数来实现终结器代码
// 这个只在Dispose方法没被调用的前提下,才能调用执行。
// 如果你给基类终结的机会.
// 不要给子类提供析构函数.
~BaseResource()
{
// 不要重复创建清理的代码.
// 基于可靠性和可维护性考虑,调用Dispose(false) 是最佳的方式
Dispose(false);
}
// 允许你多次调用Dispose方法,
// 但是会抛出异常如果对象已经释放。
// 不论你什么时间处理对象都会核查对象的是否释放,
// check to see if it has been disposed.
publicvoid DoSomething()
{
if (this.disposed)
{
thrownew ObjectDisposedException();
}
}
// 不要设置方法为virtual.
// 继承类不允许重写这个方法
publicvoid Close()
{
// 无参数调用Dispose参数.
Dispose();
}
publicstaticvoid Main()
{
// Insert code here to create
// and use a BaseResource object.
}
}
Copy the code

  GC.Collect() 方法

  作用:强制进行垃圾回收。

  GC的方法:

名称

说明

Collect()

强制对所有代进行即时垃圾回收。

Collect(Int32)

强制对零代到指定代进行即时垃圾回收。

Collect(Int32, GCCollectionMode)

强制在 GCCollectionMode 值所指定的时间对零代到指定代进行垃圾回收

  GC注意事项:

  1、只管理内存,非托管资源,如文件句柄,GDI资源,数据库连接等还需要用户去管理。

  2、循环引用,网状结构等的实现会变得简单。GC的标志-压缩算法能有效的检测这些关系,并将不再被引用的网状结构整体删除。

  3、GC通过从程序的根对象开始遍历来检测一个对象是否可被其他对象访问,而不是用类似于COM中的引用计数方法。

  4、GC在一个独立的线程中运行来删除不再被引用的内存。

  5、GC每次运行时会压缩托管堆。

  6、你必须对非托管资源的释放负责。可以通过在类型中定义Finalizer来保证资源得到释放。

  7、对象的Finalizer被执行的时间是在对象不再被引用后的某个不确定的时间。注意并非和C++中一样在对象超出声明周期时立即执行析构函数

  8、Finalizer的使用有性能上的代价。需要Finalization的对象不会立即被清除,而需要先执行Finalizer.Finalizer,不是在GC执行的线程被调用。GC把每一个需要执行Finalizer的对象放到一个队列中去,然后启动另一个线程来执行所有这些Finalizer,而GC线程继续去删除其他待回收的对象。在下一个GC周期,这些执行完Finalizer的对象的内存才会被回收。

  9、.NET GC使用"代"(generations)的概念来优化性能。代帮助GC更迅速的识别那些最可能成为垃圾的对象。在上次执行完垃圾回收后新创建的对象为第0代对象。经历了一次GC周期的对象为第1代对象。经历了两次或更多的GC周期的对象为第2代对象。代的作用是为了区分局部变量和需要在应用程序生存周期中一直存活的对象。大部分第0代对象是局部变量。成员变量和全局变量很快变成第1代对象并最终成为第2代对象。

  10, GC to perform different inspection strategies for different generations of objects to optimize performance. Periodic checks each GC generation 0 object. About 1/10 of the GC cycle checkpoint generation 0 and generation object. GC cycle checkpoint about 1/100 of all objects. Finalization of rethinking the price: Finalization of objects may need to stay an extra 9 GC cycle in memory than do not need Finalization. If at this time it has not been Finalize, it becomes second-generation objects, so stay longer in memory.

Guess you like

Origin www.cnblogs.com/qiupiaohujie/p/11960624.html