Unity basic learning ten, c# garbage collection mechanism (GC)

1. What is GC

As the name suggests, GC is garbage collection, of course, it is only in terms of memory. Garbage Collector (garbage collector, also known as GC without confusion) is based on the root of the application , traverses all objects dynamically allocated by the application on the Heap , and determines which objects are already by identifying whether they are referenced Dead ones, which still need to be used. Objects that are no longer referenced by the root of the application or other objects are dead objects, so-called garbage, that need to be recycled. This is how GC works. In order to realize this principle, GC has various algorithms. The more common algorithms are Reference Counting , Mark Sweep , Copy Collection and so on. The current mainstream virtual systems . NET CLR (Common Language Runtime) , Java VM and Rotor all use the Mark Sweep algorithm.

2. Mark-Compact mark compression algorithm

Simply think of .NET's GC algorithm as the Mark-Compact algorithm.

  1. In the mark-sweep mark-clearing stage, first assume that all objects in the heap can be recycled, then find out the objects that cannot be recycled, and mark these objects, and finally all the unmarked objects in the heap can be recycled;
  2. In the Compact compression phase, the heap memory space becomes discontinuous after the objects are reclaimed, and these objects are moved in the heap so that they are re-arranged continuously from the heap base address, similar to the defragmentation of disk space.

After the heap memory is recovered and compressed, the previous heap memory allocation method can continue to be used, that is, only one pointer is used to record the starting address of the heap allocation. The main processing steps: suspend the thread → determine the roots → create reachable objects graph → object recycling → heap compression → pointer repair. Roots can be understood in this way: the reference relationship of objects in the heap is intricate (cross-references, circular references), forming a complex graph, and roots are various entry points that the CLR can find outside the heap.

The places where GC searches for roots include global objects, static variables, local objects, function call parameters, object pointers in the current CPU registers (and finalization queue), etc. It can be mainly classified into two types: static variables that have been initialized, and objects that are still used by threads (stack+CPU register) . Reachable objects: Refers to objects that can be reached from roots according to the object reference relationship . For example, if the local variable object A of the currently executing function is a root object, and its member variable refers to object B, then B is a reachable object. Starting from the roots, a reachable objects graph can be created, and the remaining objects are unreachable and can be recycled .

Pointer repair is because the heap object is moved during the compact process, and the object address changes. All reference pointers need to be repaired, including pointers in the stack, CPU register, and reference pointers of other objects in the heap. There is a slight difference between Debug and release execution modes. In release mode, objects not referenced by subsequent code are unreachable, while in debug mode, these objects will not become unreachable until the current function is executed. The purpose is to track local objects during debugging. Content. Managed objects passed to COM+ will also become root, and have a reference counter to be compatible with COM+'s memory management mechanism. When the reference counter is 0, these objects may become recycled objects. Pinned objects refer to objects that cannot be moved after allocation, such as objects passed to unmanaged code (or using the fixed keyword). GC cannot modify the reference pointer in unmanaged code during pointer repair, so moving these objects will occur abnormal. Pinned objects can cause heap fragmentation, but in most cases objects passed to unmanaged code should be reclaimed during GC.  

3. Generational algorithm 

The program may use hundreds of megabytes or several gigabytes of memory, and the cost of GC operations on such memory areas is very high. The generational algorithm has a certain statistical basis, and the effect of improving the performance of GC is relatively obvious. Divide objects into new and old according to their life cycles . According to the results reflected in the statistical distribution rules, different recovery strategies and algorithms can be adopted for new and old areas , and the recovery and processing of new areas can be strengthened to strive for a shorter time interval. , In a smaller memory area, a large number of newly discarded and no longer used local objects on the execution path are recycled in time at a lower cost . Hypothetical prerequisites for the generational algorithm:

  1. A large number of newly created objects have a short life cycle, while older objects have a longer life cycle;

  2. Reclaiming part of the memory is faster than recycling based on all memory;

  3. The degree of association between newly created objects is usually strong. The objects allocated by the heap are continuous, and a strong correlation is beneficial to improve the hit rate of the CPU cache. .NET divides the heap into three generation areas: Gen 0, Gen 1, and Gen 2;

Heap is divided into 3 generation areas, and there are 3 corresponding GC methods: # Gen 0 collections, # Gen 1 collections, #Gen 2 collections. If the Gen 0 heap memory reaches the threshold, the 0-generation GC is triggered, and the surviving objects in Gen 0 enter Gen1 after the 0-generation GC. If the memory of Gen 1 reaches the threshold, the first-generation GC will be performed. The first-generation GC will reclaim the Gen 0 heap and Gen 1 heap together, and the surviving objects will enter Gen2.

  The 2nd generation GC recycles Gen 0 heap, Gen 1 heap and Gen 2 heap together. Gen 0 and Gen 1 are relatively small, and the combined age of these two generations is always kept at about 16M; the size of Gen2 is determined by the application and may reach A few G, so the cost of generation 0 and generation 1 GC is very low, and generation 2 GC is called full GC, which usually costs a lot. Roughly calculated, the GC of generation 0 and generation 1 should be able to complete in a few milliseconds to tens of milliseconds. When the Gen 2 heap is relatively large, the full GC may take a few seconds. Generally speaking, during the running of .NET applications, the frequency of 2nd generation, 1st generation and 0th generation GC should be roughly 1:10:100.

4.Finalization Queue和Freachable Queue

These two queues are related to the Finalize method provided by .NET objects. These two queues are not used to store real objects, but to store a set of pointers to objects. When the new operator is used in the program to allocate space on the Managed Heap, the GC will analyze it, and if the object contains a Finalize method, a pointer to the object will be added to the Finalization Queue.

  After the GC is started, it goes through the Mark stage to identify which ones are garbage. Search in the garbage again, if it is found that there is an object pointed to by the pointer in the Finalization Queue in the garbage, separate the object from the garbage, and move the pointer to it to the Freachable Queue. This process is called the resurrection of the object (Resurrection), and the dead object is brought back to life. Why revive it? Because the Finalize method of this object has not been executed, it cannot be allowed to die. Freachable Queue usually does nothing, but once a pointer is added to it, it will trigger the execution of the Finalize method of the pointed object, and then remove the pointer from the queue, and the object can die quietly.

  The System.GC class of .NET Framework provides two methods to control Finalize, ReRegisterForFinalize and SuppressFinalize. The former is the Finalize method that requests the system to complete the object, and the latter is the Finalize method that requests the system not to complete the object. The ReRegisterForFinalize method is actually to re-add the pointer to the object to the Finalization Queue. This creates a very interesting phenomenon, because the object in the Finalization Queue can be resurrected. If the ReRegisterForFinalize method is called in the Finalize method of the object, an object that will never die on the heap is formed, like a phoenix nirvana. Every time you die, you can be resurrected.

5. Managed resources

        All types in .NET are derived (directly or indirectly) from the System.Object type.

  The types in CTS are divided into two categories - reference type (reference type, also called managed type [managed type]), which is allocated on the memory heap; value type (value type), which is allocated on the stack. As shown in the picture:

  Value types are on the stack, first in, last out, and the life of value type variables has a sequence, which ensures that value type variables will release resources before exiting the scope. Simpler and more efficient than reference types. The stack allocates memory from high addresses to low addresses.

  The reference type is allocated on the managed heap (Managed Heap), and a variable is declared to be saved on the stack. When using new to create an object, the address of the object will be stored in this variable. On the contrary, managed heap allocates memory from low address to high address, as shown in the figure:

 

More than 80% of resources in .NET are managed resources.

6. Unmanaged resources:

   ApplicationContext, Brush, Component, ComponentDesigner, Container, Context, Cursor, FileStream, Font, Icon, Image, Matrix, Object, OdbcDataReader, OleDBDataReader, Pen, Regex, Socket, StreamWriter, Timer, Tooltip, file handle, GDI resource, database connection And so on resources. Many people may not notice it when using it!

7. The GC mechanism of NET has two problems:

  First of all, GC does not release all resources. It cannot automatically release unmanaged resources.

  Second, GC is not real-time, which will cause bottlenecks and uncertainties in system performance.

  GC is not real-time, which will cause bottlenecks and uncertainties in system performance. So with the IDisposable interface, the IDisposable interface defines the Dispose method, which is used for programmers to explicitly call to release unmanaged resources. Resource management can be simplified by using the using statement.

///summary
/// 执行SQL语句,返回影响的记录数
summary
///param name="SQLString"SQL语句/param
///returns影响的记录数/returns
publicstaticint ExecuteSql(string SQLString)
{
    using (SqlConnection connection =new SqlConnection(connectionString))
    {
        using (SqlCommand cmd =new SqlCommand(SQLString, connection))
        {
            try
            {
                connection.Open();
                int rows = cmd.ExecuteNonQuery();
                return rows;
            }
            catch (System.Data.SqlClient.SqlException e)
            {
                connection.Close();
                throw e;
            }
            finally
            {
                cmd.Dispose();
                connection.Close();
            }
        }
    }
}

  When you use the Dispose method to release unmanaged objects, you should call GC.SuppressFinalize. If the object is finalizing the queue (finalization queue), GC.SuppressFinalize will prevent the GC from calling the Finalize method. Because the call of the Finalize method will sacrifice some performance. If your Dispose method has already cleaned up the entrusted resources, there is no need for the GC to call the object's Finalize method (MSDN). Attach the MSDN code, you can refer to it. 

using System;
 
namespace usingDemo
{
    public class FinalizeDisposeBase : IDisposable
    {
        // 标记对象是否已被释放
        private bool _disposed = false;
        // Finalize方法
        ~FinalizeDisposeBase()
        {
            Dispose(false);
        }
 
        /// <summary>
        /// 这里实现了IDisposable中的Dispose方法
        /// </summary>
        public void Dispose()
        {
            Dispose(true);
            // 告诉GC此对象的Finalize方法不再需要调用
            GC.SuppressFinalize(true);
        }
 
        /// <summary>
        /// 在这里做实际的析构工作
        /// 声明为虚方法以供子类在必要时重写
        /// </summary>
        /// <param name="isDisposing"></param>
        protected virtual void Dispose(bool isDisposing)
        {
            // 当对象已经被析构时,不在执行
            if(_disposed)
            {
                return;
            }
            if(isDisposing)
            {
                // 在这里释放托管资源
                // 只在用户调用Dispose方法时执行
            }
            // 在这里释放非托管资源
            // 标记对象已被释放
            _disposed = true;
        }
    }
 
    public sealed class FinalizeDispose:FinalizeDisposeBase
    {
        private bool _mydisposed = false;
        protected override void Dispose(bool isDisposing)
        {
            // 保证只释放一次
            if (_mydisposed)
            {
                return;
            }
            if(isDisposing)
            {
                // 在这里释放托管的并且在这个类型中声明的资源
            }
            // 在这里释放非托管的并且在这个类型中声明的资源
            // 调用父类的Dispose方法来释放父类中的资源
            base.Dispose(isDisposing);
            // 设置子类的标记
            _mydisposed = true;
        }
        static void Main()
        {
 
        }
    }
}

7. GC.Collect() method

  Role: Force garbage collection.

  The method of GC:

name

illustrate

Collect()

Forces immediate garbage collection of all generations.

Collect(Int32)

Forces immediate garbage collection from generation zero through the specified generation.

Collect(Int32, GCCollectionMode)

Forces garbage collection from generation zero to the specified generation at the time specified by the GCCollectionMode value

 8. GC precautions:

  • Only manage memory, unmanaged resources, such as file handles, GDI resources, database connections, etc. need to be managed by users.
  • The implementation of circular references, reticulation structures, etc. will be easy. The GC's mark-compression algorithm can effectively detect these relationships and delete the entire network structure that is no longer referenced.
  • GC detects whether an object can be accessed by other objects by traversing from the root object of the program, instead of using a reference counting method similar to COM.
  • GC runs in a separate thread to remove memory that is no longer referenced.
  • The GC compacts the managed heap each time it runs.
  • You are responsible for the release of unmanaged resources. A resource can be guaranteed to be released by defining a Finalizer on the type.
  • The time when the finalizer of the object is executed is an indeterminate time after the object is no longer referenced. Note that the destructor is not executed immediately when the object exceeds the declaration cycle as in C++
  • The use of Finalizers has a performance cost. Objects that need Finalization will not be cleared immediately, but Finalizer.Finalizer needs to be executed first, and the thread that is not executed in GC is called. The GC puts each object that needs to execute Finalizer into a queue, and then starts another thread to execute all these Finalizers, while the GC thread continues to delete other objects to be recycled. In the next GC cycle, the memory of these objects that have executed Finalizer will be reclaimed.
  • .NET GC uses the concept of "generations" to optimize performance. Generations help the GC more quickly identify objects that are most likely to become garbage. Objects created after the last garbage collection are generation 0 objects. Objects that have gone through a GC cycle are generation 1 objects. Objects that have undergone two or more GC cycles are generation 2 objects. The role of the generation is to distinguish between local variables and objects that need to survive the application life cycle. Most generation 0 objects are local variables. Member variables and global variables quickly become generation 1 objects and eventually generation 2 objects.
  • The GC performs different inspection strategies on objects of different generations to optimize performance. Generation 0 objects are checked every GC cycle. About 1/10 of the GC cycles check both generation 0 and generation 1 objects. About 1/100th of a GC cycle checks all objects. Rethink the cost of Finalization: Objects that require Finalization may stay in memory for an additional 9 GC cycles than those that do not require Finalization. If it has not been Finalized at this time, it will become a second-generation object, thus staying in memory for a longer time.

9. Copy Collection Algorithm 

Principle
In order to solve the efficiency problem of the mark clearing algorithm, a collection algorithm called "copying" (Copying) appeared, which divides the available memory into two pieces of equal size according to the capacity, and only uses one of them at a time. When the memory of this block is used up, copy the surviving object to another block, and then clean up the used memory space at one time. In this way, one piece of memory is reclaimed each time, and there is no need to consider complex situations such as memory fragmentation when allocating memory. You only need to move the pointer on the top of the heap and allocate memory in order, which is simple to implement and efficient to operate.
Disadvantages
At the same time, only half of the memory is used (according to the actual situation, a non-1:1 ratio can be used, such as HotSpot's new generation 8:1:1). The execution process of the
copy algorithm is as follows:
 

quote

C# Technical Talk on Garbage Collection Mechanism (GC) (Transfer) - nele - 博客园

Take you to understand in one minute: garbage collection replication algorithm_b_eethoven's blog-CSDN blog_garbage collection replication algorithm

Guess you like

Origin blog.csdn.net/u013617851/article/details/124002702