[C# Study Notes] Memory Management

Insert image description here


Official documentation

Automatic memory management

Automatic memory management is one of the services provided by the CLR during managed execution. The common language runtime's garbage collector manages the allocation and deallocation of memory for applications. For developers, this means developing managed applications without having to write code that performs memory management tasks. Automatic memory management solves common problems, such as forgetting to free an object and causing a memory leak, or trying to access the memory of a freed object.

Allocate memory

In general, C# memory is divided into four blocks, namely

  • Global data area: stores global variables, static data, and constants
  • Code area: stores all program codes
  • Stack area: stores local variables, parameters, return data, return addresses, etc. allocated for operation.
  • Heap area: free storage area

In a process, the two major areas where code is executed are the stack and the heap. In the stack, code is pushed in and out of the stack in sequence for execution. Each thread maintains its own thread stack, and the memory address of the stack area increases from top to bottom: Excerpted from the article Understanding C#’s Heap,
Stack , Value types and reference types
have an in-depth understanding of the Heap and Stack in C#, and master them all at once!
Detailed explanation of C# garbage collection mechanism
Insert image description here

I believe that many computer professionals are quite clear about the execution of code in the stack. This is not the case in the heap. When a new process is initialized, the runtime reserves a continuous address space area for the process. This reserved address space is called the managed heap.

The managed heap is dedicated to C#'s automatic memory management and is managed by the CLR.

The managed heap maintains a pointer to the address of the next object to be allocated in the heap. Initially, this pointer is set to point to the base address of the managed heap. Unlike the stack, the heap is built from bottom to top, so all free space is on top of used space. The biggest difference from the stack is that when we access the heap, we can access and remove it freely in any order.
Insert image description here
(a messy pile)

I remember I said before that there are two types of variables in C#. The first is a value type, which is stored directly on the stack. If you use a value type to assign a value, you need to make a complete copy and store it on the stack. Our operations are all implemented on the stack, because operations on the stack are fast and memory allocation and release are simple.

Reference types are much more convenient. On the one hand, the ontology of reference types is stored in the heap, and access in the heap is more free. On the other hand, we will store the heap address variable on the stack. When we want to access a reference type, we simply find it by its address on the stack. However, the heap cannot be accessed directly because the data storage is too random. Without the pointer in the stack, we have no idea where it is in the heap.

Insert image description here
But using the heap causes a problem: on the stack, as the program is executed, various variables pushed onto the stack will be automatically released. However, the heap is first accessible at will. If the programmer forgets to release used variables, memory will be wasted. Secondly, even if the variables in the heap are released, the memory corresponding to the released variables will be empty. If the free contiguous memory is too small and we cannot use it at all, memory fragmentation will occur. Therefore, C# uses a managed heap, and the GC mechanism helps us automatically manage this heap.

When an application creates the first reference type, memory is allocated for the type at the base address of the managed heap.
When the application creates the next object, the garbage collector allocates memory for it in the address space immediately following the first object.
The garbage collector will continue to allocate space for new objects in this manner as long as the address space is available.

Allocating memory from the managed heap is faster than allocating unmanaged memory. Since the runtime allocates memory for the object by adding a value to the pointer, this is almost as fast as allocating memory from the stack.
In addition, because new objects allocated continuously are stored contiguously in the managed heap, applications can access these objects quickly.

Summary: The stack is mainly used to store local variables and method call information, and its operation speed is faster; while the heap is mainly used to store dynamically allocated objects, its operation speed is slower but it can support longer life cycles and object sharing. For variables with small data and short life cycles, you can consider using the stack; for larger objects or objects that need to survive for a long time, you need to use the heap.


free memory

Let us understand how the GC mechanism releases memory.

.NET's garbage collector manages an application's memory allocation and deallocation. Whenever an object is created, the common language runtime allocates memory for the object from the managed heap. As long as there is address space in the managed heap, the runtime will continue to allocate space for new objects. However, memory is not unlimited. The garbage collector must eventually perform a garbage collection to free some memory. The garbage collector's optimization engine determines the best time to perform a collection based on the allocations performed. When performing a collection, the garbage collector examines the managed heap for objects that are no longer used by the application and then performs the necessary operations to reclaim their memory.

Garbage collection occurs when one of the following conditions is met:

  • The system has low physical memory. Memory size is detected through an out-of-memory notification from the operating system or an out-of-memory indication from the host.

  • The memory used by allocated objects on the managed heap exceeds the acceptable threshold. This threshold is continuously adjusted as the process runs.

  • Call the GC.Collect method. In almost all cases, you do not have to call this method because the garbage collector will continue to run. This method is mainly used for special cases and testing.

GC

Overall, C#'s GC and Lua's GC are actually similar.

Reference tracking: The first step in GC is to determine which objects are still referenced by active references through reference tracking. GC will start from the root object on the stack, which includes local variables and static variables in the method. Then, starting from the root object, trace the reference chain and find all reachable objects.

Root object marking: During the reference tracking process, the GC will mark all reachable objects as live objects. These live objects will be retained and will not be cleaned up.

Mark and sweep algorithm

Object marking: The GC will next traverse all objects in the heap, starting from objects reachable by the root object, and mark them as live objects. GC uses an algorithm called "mark and sweep" to mark objects. First, all objects in the heap are garbage by default. Then, the referenced objects are found by traversing the stack, and these referenced objects are marked. Finally, all garbage memory is released (or marked as free).

Compression phase: After cleaning up the garbage, the memory in the heap will become scattered. In order to avoid memory fragmentation, all surviving objects will be moved to the bottom of the heap.

Insert image description here

Generational algorithm

The principle of the generational algorithm is based on statistics. Simply put, the longer the active time of the object, the higher the usage rate, the lower the mortality rate, and the less need to clean up.

Assumptions of the generational algorithm:
1. A large number of newly created objects have a short life cycle, while older objects have a longer life cycle.
2. Recycling part of the memory is faster than recycling based on the entire memory.
3. New The objects created are usually strongly related to each other. The objects allocated by the heap are continuous, and a strong correlation is beneficial to improving the hit rate of the CPU cache.

.NET divides the heap into three generation age areas: Gen 0, Gen 1, and Gen 2. The
Insert image description here
mark and clear algorithm is used to remove garbage in each iteration. The heap will be divided into three generation areas, and the corresponding GC has three methods: # Gen 0 collections, # Gen 1 collections, #Gen 2 collections.

When the Gen0 area memory reaches the threshold, it will be triggered # Gen 0 collections, and the surviving objects will enter the Gen1 area;

It will be triggered when the memory in the Gen1 area reaches the threshold # Gen 1 collections, and the surviving objects of Gen0 will be put into Gen1, and the surviving objects of Gen1 will be put into Gen2.

When the memory in the Gen2 area reaches the threshold, it will be triggered # Gen 2 collections. The surviving objects of Gen0 are put into Gen1, the surviving objects of Gen1 are put into Gen2, and the positions of the surviving objects of Gen2 remain unchanged.

Gen 0 and Gen 1 are relatively small, and the combined age of these two generations always remains around 16M; the size of Gen2 is determined by the application and may reach several G, so the cost of generation 0 and generation 1 GC is very low, and the cost of generation 2 GC is very low. Called fullGC, it's usually very costly. A rough calculation of generation 0 and generation 1 GC should be completed in a few milliseconds to tens of milliseconds. When the Gen 2 heap is relatively large, full GC may take several seconds. Generally speaking, the frequency of generation 2, generation 1 and generation 0 GC during the running of .NET application should be roughly 1:10:100.

Like Lua, when an object is cleaned up by GC, a finalizer function is called.

large objects and small objects

GC separates objects into small and large objects. If the object is large, some of its properties will be more important than if the object is smaller. For example, compressing large objects (that is, copying them in memory to elsewhere on the heap) is quite expensive. Therefore, the garbage collector places large objects on the Large Object Heap (LOH).

An object is considered a large object if its size is greater than or equal to 85,000 bytes. This number is determined based on performance optimization. When an object allocation request is 85,000 bytes or larger, the runtime allocates it to the large object heap. Large objects belong to generation 2 because they can only be recycled during generation 2 collection.

When the CLR is loaded, the GC allocates two initial heap segments: one for small objects (small object heap, or SOH) and one for large objects (large object heap, LOH). For SOH, objects not processed by GC are promoted to the next generation. Even if LOH has not been processed, it still belongs to the second generation and will not be promoted to the next generation. Moreover, compression will not be performed in the large object heap. If the size of the memory fragment generated after recycling is less than 85000 bytes, then this fragment can never be used again in the life cycle of this program.
Insert image description here


.NET’s GC mechanism has two problems:

First of all, GC is not able to release all resources. It cannot automatically release unmanaged resources.

Second, GC is not real-time, which will cause bottlenecks and uncertainties in system performance.

GC is not real-time, which will cause bottlenecks and uncertainties in system performance. So with the IDisposable interface, the IDisposable interface defines the Dispose method, which is used by programmers to explicitly call to release unmanaged resources. Use the using statement to simplify resource management.

Guess you like

Origin blog.csdn.net/milu_ELK/article/details/132105268