Garbage collection GC classic algorithm

content

Garbage Collection GC (Garbage Collection)

1. What is garbage

After malloc in the c language has applied for memory, you need to manually reclaim free.
Garbage is created when you forget to recycle

2. Why do we need GC?

often forget to recycle

prone to multiple recycling

Classic GC algorithm

1. Some basic concepts

insert image description here

We think of memory as a directed graph
Each block of data is a node
in the directed graph Each pointer is an edge in the directed graph

Root node: A data block that contains pointers to the heap but is not located in the heap is called the root node (for example: registers, locations on the stack, global variables, etc.)
Reachable nodes: There is a reachable path from the root node to this node
Unreachable nodes: that is, garbage and need to be recycled

2. Mark and Sweep

Can be built on top of malloc/free packages

Question 1: When to start executing the algorithm?

Start when there is insufficient space (allocate memory with malloc until out of space)

Question 2: What does the algorithm perform after running out of space?

Mark: Set a mark bit for each reachable node from the root node
Sweep: Scan all nodes and release those nodes that are not marked

Question 3: How to set the flag bit?

Mark each block with an extra bit at the head

Specific interpretation:
insert image description here

Fake code:

void GC(){
    
    
    HaltAllProcessing();
    ObjectCollection Roots=GetRoots();//找到所有的根节点
    for(int i=0;i<root.Count();i++){
    
    
        Mark(root[i]);//标记
    }
    Sweep();//清除
}

Contains three stages:

Collect all root nodes

Mark all reachable nodes starting from the root

last clear

Phase 1: Collect all root nodes

The runtime system needs to provide some method for the GC to collect the list of root nodes
For example: .NET maintains these root nodes and provides the GC with an API to collect these root nodes

The second stage: marking all reachable nodes starting from the root
Pseudocode:

ptr mark(ptr p){
    
    
    if(!is_ptr(p)) return; //如果p不是指针的话，什么都不做
    if(markBitSet(p)) return; //如果p已经被标记了的话，直接返回
    setMarkBit(p); //标记p
    for(i=0;i<length(p);i++){
    
      //遍历判断p数据块是否指向其他节点
        mark(p[i]); 
    }
    return;
}

^{About how to judge whether p is a pointer, I haven't understood it yet, and I will write zzz} after I understand it.

Second stage: cleanup
Pseudocode:

ptr sweep(ptr p,ptr end){
    
     
    while (p<end){
    
    
        if (markBitSet(p)){
    
    
            clearMarkBit();//如果被标记了，就清除标记位，相当于置空
        }else if (allocateBitSet(p)){
    
    
            free(p);//如果没有被标记，且分配了空间就释放这一块空间
        }
        p+=length(p);//通过加上p的长度来到达下一个block，达到遍历整个heap的效果
    }
}

Advantages and disadvantages
Advantages:

You don't have to write your own code to free the memory. When the heap is full, it will be executed automatically.
This way you can find all the memory that should be freed

shortcoming:

When this GC algorithm is executed, other processes will be interrupted, and there may be a sudden drop in performance.
As shown in the figure above: it is easy to cause memory fragmentation

3. Copy method

Basic idea: use 2 heaps

One heap is used when the program is running
One heap is used only during GC

The GC algorithm performs steps

Traverse the reachable data from the root node

Copy reachable data from from-space (heap when the program is running) to to-space (heap used during GC)
Note: Unreachable nodes stay in from-space

Swap the two heaps (that is, the original from-space becomes to-space)

Specific interpretation:
insert image description here

Advantages and disadvantages
Advantages:

This algorithm runs faster than the mark and scan algorithm. Because it scans the heap only once.

shortcoming:

The disadvantage is also obvious, only half of the heap space is effectively used (typically trades space for time)

4. Reference Counting

The basic idea:

Keep track of the number of pointers to each object
When the reference count is 0, the object is unreachable garbage

Specific interpretation:
insert image description here

Advantages and disadvantages:
Advantages:

This algorithm has the advantage of being dynamic, and is executed whenever there is an allocation or other heap operation.

shortcoming:

Unreachable circular list cannot be detected
High cost of counting: excessive reference count increments and decrements

current application

There is no reference counting in java
Python also uses reference counting and provides periodic detection

5. Generational GC

According to empirical observations:

If an object is accessed for a long time, it is likely to remain like this
In most languages: most objects died young

in conclusion

We can save work by scanning young generation objects often and rarely scanning old generation objects

Darwin's theory of evolution: New species are always the easiest to weed out.

Specific interpretation:
insert image description here

The basic idea

Allocate objects to different generations G0, G1...
G0 contains a new generation of objects, most likely to become garbage,
the frequency of G0 scanning is higher than that of G1 , the
efficiency is greatly improved

Summarize

Reference counting is a common solution to the problem of explicit memory allocation. Code that implements assignment-time increment and decrement operations is often one reason for slow programs. Reference counting is not a comprehensive solution anyway, since circular references are never deleted.
Garbage collection will only run when memory becomes tight. When memory is ample, the program will run at full speed and will not spend any time freeing memory.
Modern garbage collectors are much more advanced than the slow garbage collectors of the past. The generational, copy-reclaimer largely overcomes the inefficiencies of earlier mark-and-sweep algorithms.
Modern garbage collectors do heap compaction. Heap compaction will reduce the number of pages referenced by the program, which means that memory accesses will have a higher hit rate and less swapping.
Programs that use garbage collection do not crash due to the accumulation of memory leaks. Programs that use GC have more long-term stability. Programs that use garbage collection have fewer hard-to-find pointer errors. This is because there are no dangling pointers to freed memory. Because there is no explicit memory management code, there can be no corresponding errors.
Programs that employ garbage collection are faster to develop and debug, because no explicit release code is developed, debugged, tested, or maintained.

Garbage collection is not a magic bullet. It has the following shortcomings:

It is unpredictable when memory reclamation will run, so the program may suspend unexpectedly.
- There is no upper bound on the time to run memory reclamation. Although in practice it is usually fast, there is no guarantee of this.
- All threads except the collector are stopped while the collection is in progress.
The garbage collector may leave some memory that should be reclaimed. In practice, this is not a big problem, because explicit memory collectors usually leak some memory, which causes them to eventually use up all the memory, another reason is that explicit memory collectors usually put memory back into their own internal memory pool instead of giving memory back to the operating system.
Garbage collection should be implemented as a basic operating system kernel service. But because this is not the case, programs that use garbage collection are forced to run around with their garbage collection implementation. Although this implementation can be made into a shared DLL, it is still part of the program.