Article directory

5. Reference Counting
6. Comparison of Garbage Collectors
appendix

5. Reference Counting

In the reference counting algorithm, the survivability of an object can be directly determined through the creation or deletion of the reference relationship, so that it is not necessary to first find out all surviving objects through heap traversal like the tracking collector, and then reversely determine the untraversed ones. garbage object.

The reference counting algorithm relies on a very simple invariant:

An object may be alive if and only if the number of references pointing to it is greater than zero.

In the reference counting algorithm, each object needs to be associated with a reference count, which is usually stored in a slot in the object head.

Algorithm 5.1 shows the simplest implementation of reference counting, which increments or decrements the reference count of an object when a reference to that object is created or deleted.

The Write method is used to increment the reference count of the new target object while decrementing the reference count of the old target object, even for updates of local variables. We also assume that before a method returns, the setter will set all references in local variables to null.
The addReference method increases the reference count of the object, and correspondingly, deleteReference decreases the reference count.

It should be noted that the modification of the reference count must follow the order of increasing first and then decreasing (lines 9~10 in Algorithm 5.1), otherwise when the new object is the same as the old object, that is, src[i]=ref, it may Causes the object to be prematurely recycled. Once the reference count of an object drops to zero (line 20 in Algorithm 5.1), it can be recycled, and the reference counts of all its child nodes can be reduced at the same time, which may trigger recursive recycling of child nodes.

insert image description here

The Write method in Algorithm 5.1 is an example of a write barrier, where the compiler adds some short code sequences in addition to the actual pointer write operation.

Such collectors may execute concurrently with setters:

Either immediately in the setter's reference counting operation, or asynchronously in another thread.

Collectors may also process objects in different regions of the heap at different frequencies, such as generational collectors. In these cases, the setter must introduce some additional barrier operations to ensure the correctness of the collection algorithm.

5.1 Advantages and disadvantages of reference counting algorithm

advantage

The memory management overhead of the reference counting algorithm is allocated during the running of the program, and at the same time, once an object becomes garbage, it can be recovered immediately (but we will see later that this feature is not beneficial to all occasions), so reference counting The algorithm can continue to operate on the heap that is about to fill up, without requiring a certain reserved space like the tracking collector.
The reference counting algorithm directly manipulates the source and target of pointers, so its locality is no worse than that of the application it serves. When an application determines that an object is not a shared object, it can directly perform destructive operations on it without first creating a copy.
The implementation of the reference counting algorithm does not require the support of the runtime system, especially the root of the program does not need to be determined. Even when the system is partially unavailable, the reference counting algorithm can reclaim some memory, which will be very useful in distributed systems.

shortcoming

Reference counting brings additional time overhead to the setter.
In order to avoid premature release of objects that may be caused by multi-thread competition, the increase and decrease operations of reference counts and the operations of loading and storing pointers must be atomic, and it is not enough to protect only the increase and decrease operations of reference counts. Certain smart pointer libraries that provide reference counting require careful use by the caller to avoid races. Developers must avoid possible race conditions in updating pointer slots, otherwise undefined behavior may result.
In a simple reference counting algorithm, even a read-only operation requires a memory write request (to update the reference count). Similarly, when modifying a pointer field, it is also necessary to perform read and write operations on the object originally pointed to by the field once. The write operation here will "pollute" the cache, and may cause additional memory conflicts.
Reference counting algorithms cannot reclaim circular reference data structures (that is, data structures that contain self-references). Even if such a data structure becomes an island in the object graph (that is, when the whole is unreachable), the reference counts of its individual constituent objects will not drop to zero. However, self-referencing data structures are quite common (such as doubly linked lists, trees with pointers from child nodes to the root, etc.), although their frequency of occurrence varies widely from program to program.
In the worst case, the reference count of an object may be equal to the total number of objects in the heap, which means that the field occupied by the reference count must be the same size as a pointer field, that is, a full slot. Given that the average size of objects in object-oriented languages is usually small (for example, the size of objects in Java programs is usually 20 to 64 bytes), this space overhead is very expensive.
Reference counting algorithms can still cause stalls. When deleting the last reference to the root node of a large pointer structure, the reference counting algorithm will recursively delete every descendant node of the root node.

5.2 Improve efficiency

The efficiency of the reference counting algorithm can be improved in two ways:

Reduce the number of barrier operations
Replace expensive synchronous operations with cheaper asynchronous operations.

Deferred (deferred) : Deferred reference counting (deferred reference counting) sacrifices the timeliness of a small number of fine-grained recycling increments (that is, immediately recycles objects when they become garbage) in exchange for improved efficiency. This scheme defers the identification of some garbage objects until the collection phase at the end of a certain period, thereby avoiding some barrier operations.
Coalescing : Many reference counting operations are temporary and "unnecessary". Developers can manually remove these useless operations. In some special scenarios, the compiler can also do this work, but a more general method It may be that the state of the object is only tracked at the beginning and end of a certain period of time while the program is running. In a single period, coalescing reference counting only pays attention to whether the object is modified for the first time, and the re-modification of the same object will be ignored.
Buffering : Buffered reference counting also delays identification of garbage objects. However, unlike delayed reference counting or merged reference counting, this scheme buffers all reference count increase and decrease operations for subsequent processing, and only the recycling thread can perform reference count change operations. Buffered reference counting is concerned with "when" a reference count change operation is performed, not "if" a change is required.

5.3 Delayed reference counting

Compared with simple tracking collection algorithms, reference counting operations bring relatively high overhead to the setter.
Changes to reference counts must be atomic and must be consistent with changes to pointers.
Write operations modify both old and new objects, which may cause the cache to be polluted with data that cannot be reused immediately.
Manually removing useless reference counting operations is error-prone, and compiler optimizations have proven to be an effective solution to this problem.

Therefore, most high-performance reference counting systems use a lazy reference counting strategy.

Most pointer loading operations load it into local variables or temporary variables, ie registers or stack slots. Only when the setter writes the pointer to an object on the heap does it adjust the reference count of its target object.

Figure 5.1 shows the abstract view of delayed reference counting. Only when the setter operates the object in the heap, the reference count change will be executed immediately, while the reference count change generated by the operation stack or register will be delayed.

This of course comes at a cost:

If reference counting operations on local variables are ignored, reference counts are no longer accurate, so it is no longer safe to immediately recycle objects with a reference count of zero .
In order to ensure that all garbage can be collected, deferred reference counting must introduce a static pause of all things to periodically correct the reference count, but fortunately, this pause time is usually shorter than the time used by tracking collectors, such as mark-sweep recycler.

Delayed counting can make counting inaccurate. Because local variable references are not immediately counted in the object's statistics. (Because local variables generally refer to and delete references very frequently, so the count does not change immediately)
Therefore, when recycling is required, we need to reach objects with a reference of 0 through the root reachable method (in the zero table, the count is zero. Insert) to confirm.

insert image description here

In Algorithm 5.2, the read operation used by the setter to load the object is the simple, barrier-free implementation introduced in Chapter 1, and the operation to write the reference to the root is also unbarriered (see item 14 in Algorithm 5.2. line), but a barrier must be used when writing a reference to an object on the heap, in which case the reference count of the new object must be incremented immediately (see line 17 in Algorithm 5.2).
When the reference count of an object becomes zero, the write barrier needs to add it to the zero reference table (zero count table, ZCT) , but cannot release it immediately (see line 26 in Algorithm 5.2), because the program There may still be a reference to the object on the stack , but that reference is not counted in the object's reference count .

Zero-reference tables can be implemented in various ways, such as bitmaps or hash tables. Conceptually, a zero-reference table contains objects that have a reference count of zero but may still be alive. When the setter writes the reference of a zero-reference object into the object in the heap, it can be removed from the zero-reference table, because the reference count of the object must be a positive number at this time (see line 19 in Algorithm 5.2 ), this strategy helps control the size of the zero-reference table.

Garbage collection is necessary when the available heap memory is exhausted (for example, the allocator fails to allocate memory). The collector needs to suspend all setter threads and check whether the reference count of the object in the zero reference table is really zero . For an object in a zero-reference table, only when it is referenced by one or more roots, the object can be determined to be alive.

The easiest way to determine the surviving objects in the zero-reference table is to scan the object pointed to by the root and increase its reference count (see line 29 in Algorithm 5.2). After this step is completed, the references of all objects referenced by the root The counts are necessarily all positive, and objects with a reference count of zero are garbage.

In order to realize the collection of garbage objects, we can use a method similar to mark-sweep (such as Algorithm 2.3) to scan the entire heap, that is, to find and reclaim all "unmarked" objects with a reference count of zero, but only scanning the zero reference table can also To achieve the same effect, that is, adopt a method similar to Algorithm 5.1 to process and release the objects in the zero-reference table.
Finally, the "mark" operation must be reverted, that is, the root is scanned again, and the reference count of its target object is decremented by 1 (ie, the reference count is restored to its original value). At this time, if the reference count of an object returns to zero again, it needs to be added to the zero reference table again.

Deferred reference counting eliminates the overhead of reference count changes when setters operate on local variables .

Some earlier studies have shown that deferred reference counting can reduce the overhead of pointer operations by 80% or more. If the improvement of locality is considered, it should have more advantages in terms of performance improvement under modern hardware conditions.

However, the reference counting operation of the object pointer field cannot be delayed, but must be performed immediately, and must be an atomic operation.

insert image description here

5.4 Merging reference counts

Delayed reference counting solves the reference count change overhead when the setter operates local variables, but when the setter saves a reference to an object in the heap, the reference count change overhead is still unavoidable.

Levanoni and Petrank noticed that for any object domain in any period, the collector only needs to pay attention to its state at the beginning and end of the period, while the reference counting operation in the period can be ignored, so multiple states of the object can be Merge into two.

For example, suppose the pointer field f of the object X in the initial state refers to the object $O_0$ , the field is modified to $O_1, O_2, .. O_n in a certain period of time$ At this time, the update operation of the reference count is as follows:
insert image description here

A pair of operations in the intermediate state (see inside the solid line box) cancel each other out, so they can be omitted.

The method of Levanoni and Petrank is

During each period, write barriers copy objects to the local journal before they are modified for the first time .
For an object that has not been modified in the current period, when the setter updates one of its pointer fields, Algorithm 5.3 will capture this operation and record the address of the object and the value of each pointer field in the local update buffer ( Line 5 in Algorithm 5.3), while marking the modified object as dirty.
In the log method, in order to avoid repeatedly adding the object to the thread local log, the algorithm first adds the initial value of the object pointer field to the log (line 11 in algorithm 5.3), and only adds it when src is not dirty to the log (appendAndCommit method), and then increase the internal cursor of the log (line 13 in Algorithm 5.3). At this time, the algorithm needs to mark the object as dirty to distinguish it from the pointer field. An object is marked as dirty by writing the address of its corresponding entry in the log into its header field.

It should be noted that even if competition results in entries of the same object appearing in multiple thread-local buffers, the algorithm can ensure that each entry contains the same information, so there is no need to care about which thread the log entry recorded in the object header field is located in. in the local buffer. Depending on the processor's memory consistency model, it is likely that write barriers will not require any synchronization operations.

insert image description here
At the beginning of the collection cycle, Algorithm 5.4 first suspends each thread, then merges the update buffer of each thread into the log of the collector, and finally allocates a new update buffer for each thread.

As mentioned above, the competition relationship may cause the update buffer of multiple threads to contain entries of the same object, which requires the recycler to ensure that each dirty object will only be processed once, so the processReferenceCounts method will first update the reference count Determine whether the object is dirty.
For an object marked as dirty, the recycler first clears its dirty mark to ensure that it will not be processed repeatedly, then adds 1 to the reference count of all its child nodes at the time of recycling, and finally adds the reference count of all child nodes in the current period before the object is modified for the first time Decrement the reference count of child nodes by 1.
During this period, the object's initial child node can be directly obtained from the log, and the object's current child node can be directly obtained from the object itself (the log contains the reference of the object). In addition, the algorithm can prefetch objects or reference counted fields in both increment and decrement reference count loops.

insert image description here

We take Figure 5.2 as an example to demonstrate the process of merging reference counting.

Assuming that a certain pointer field of object A is changed from object C to object D within a certain period of time, and then changed from object C to object D within this period, then at the end of this period, the two pointer fields of object A are originally The value of (B and C) has been recorded in the collector log (left side of Figure 5.2), so the collector will increase the reference count of object B and object D, and reduce the reference count of object B and C at the same time.

Since the pointer field in object A pointing to object B has not been modified, the reference count of object B remains unchanged. Combining lazy reference counting with merged reference counting reduces the overhead of most reference counting operations on setters.

In particular, we remove the setter thread's dependence on expensive synchronization operations through these two methods, but these benefits also come at a price. For garbage collection, we have to introduce a pause again, although this pause time may be shorter than the tracing collector. We reduce the timeliness of recycling (garbage objects can only be recycled at the end of the period), and the log buffer and zero reference table also bring additional space overhead. In combined reference counting, for a pointer slot that has never been modified, the object it points to may still require the collector to add and delete the reference count once.

insert image description here

5.5 Circular reference counting

For a ring data structure, the reference count of its internal objects is at least 1, so the ring garbage cannot be recovered only by the reference count itself.

Ring data structures, such as doubly linked lists or ring buffers, are very common in both applications and runtime systems. Object-relational mapping (object-relationsmapping) systems may require the database and its tables to refer to each other for some information.

Certain structures in the real world are naturally circular, such as roads in geographic information systems. Lazy functional languages often use loops to represent recursion. Researchers have proposed a variety of strategies to solve the circular reference counting problem, we introduce a few of them.

The simplest strategy is to supplement reference counting with occasional collection tracking.

This method assumes that most objects will not be referenced by the ring data structure, so the reference counting method can be used to achieve fast recovery, while the tracking recovery is responsible for processing the remaining ring data structures. This solution simply reduces the frequency of tracking collections.

Many scholars suggest to distinguish pointers that cause closed loops from other pointers.

They refer to ordinary references as strong references , and references that cause closed loops as weak references.

If strong references are not allowed to form cycles, the graph of strong references can be processed using standard reference counting algorithms. Brownbridge's algorithm has been widely used. In short, each object needs to contain a strong reference count and a weak reference count. When performing a write operation, the write barrier will detect the strength of the pointer and the target object, and use all possible The reference that produces the ring is set as a weak reference. In order to maintain the invariant that "all reachable objects are strongly reachable, and strong references do not produce cycles", the setter may need to change the strong and weak properties of the pointer when deleting the reference.

However, this algorithm is not safe and may cause the object to be recycled in advance. For details, please refer to Salkild's reference counting example.

Of all the reference counting algorithms that can handle circular data structures, the most widely recognized is the trial deletion algorithm.

The algorithm does not scan the entire live object graph with a back-tracking collector. Instead, it focuses on local object graphs that may generate ring garbage due to deleted references. In the reference counting algorithm:

Inside the ring garbage pointer structure, the reference counts of all objects are generated by the pointers between its internal objects.
Ring garbage is only possible if an object's reference count is still greater than zero after a reference to that object is deleted.

The partial tracing algorithm makes full use of the above two conclusions, and the algorithm starts subgraph tracing from an object that may be garbage.

For each reference traversed, the algorithm will perform a trial deletion of its target object, that is, temporarily reduce the reference count of the target object, thereby removing the reference count generated by the internal pointer. After the trace is completed, if the reference count of an object is still not zero, it must be because other objects outside the subgraph refer to the object, and then it can be determined that the object and its transitive closure are not garbage.

Test deletion: that is, first subtract (-1) from the self-reference, and then calculate whether there is a reference (to calculate whether it is 0). If it is 0, it will be put into the candidate, thinking that it may be self-referential.
Reverse test deletion: just add -1 back.

The Recycler algorithm supports concurrent recycling of circular reference counts. Algorithm 5.5 demonstrates only the simpler synchronous version of the algorithm; the asynchronously collected version is discussed in Chapter 15.

The recovery of the ring data structure is divided into three stages:

The collector starts subgraph tracking from an object that may be a member of the ring garbage , and at the same time decrements the reference count generated by the internal pointer (markCandidates method). The algorithm colors the objects traversed in gray.
Check all objects in the subgraph. If the reference count of an object is not zero, the object must be referenced by other objects outside the subgraph. At this time, it is necessary to modify the first stage of the experimental deletion operation (scan method) . The algorithm will recolor the surviving gray objects as black, and at the same time color other gray objects as white.
All objects that are still white in the subgraph must be garbage, and the algorithm can recycle them (collectCandidates method).

insert image description here

The Recycler algorithm in synchronous mode uses five colors to differentiate objects.

black means alive
white for trash
Gray indicates that the object may be a member of the garbage ring
Purple indicates that the object may be a candidate root for ring garbage (candidate garbage)

Possibilities after experimental deletion:

If the reference count is still not zero after deleting a reference pointing to an object, it may lead to the generation of ring garbage, so Algorithm 5.5 will mark it as purple, and add it to the ring garbage candidate member set (algorithm line 22 in 5.5).
If the reference count drops to zero after deleting a reference to an object, the object must be garbage, and the release method will modify it to black and recursively process its child nodes. At this time, if the object is not a candidate garbage, the release method will release it directly, and if the object is included in the candidates collection, the recycling of the object will be postponed to the markCandidates stage. As shown in Figure 5.3a, when a reference pointing to object A is deleted, the reference count of object A is still not zero, and the object will be added to the candidates collection.

step

The markCandidates method first scopes objects that are likely to be ring garbage and eliminates the impact of internal references on the reference count.

This stage examines each object in the candidates collection. If the object is still purple (that is, after the object is added to the candidates collection, there are no new references to the object), then its recursive closures are marked gray, otherwise it is removed from the collection.

If the object is black and has a reference count of zero, it is immediately recycled. markGrey will decrement the reference count of each object it traverses by 1 during the tracking process.

So in Figure 5.3b, the subgraph starting from A has been colored gray, and the reference counts generated by the internal references of the subgraph have been eliminated.

The algorithm scans for garbage candidates and their gray transitive closures, and finds out which objects have external references.

If the reference count of an object is not zero, it must be referenced by an object outside the gray submap. In this case, scanBlack will compensate for the reference count reduction operation caused by markGrey, that is, increase the object's Reference count and color it black;
If an object has a reference count of zero, it is colored white and scanning of its children continues.

It should be noted that the white object cannot be equated with garbage here, because if the scanBlack method starts subgraph traversal from another node, it is possible to access the white object again. In Figure 5.3b, although objects Y and Z have reference counts of zero, they are still reachable through the outer object X. When the scan method traverses to the object X, it will find that its reference count is not zero. At this time, the algorithm will call the scanBlack method to correct the reference count of the object in its gray delivery closure.

The final state of the subgraph is shown in Figure 5.3c.

The collectwhite method will recycle white (garbage) objects.

This method will empty the candidates collection, and at the same time release (and reset it to black) each white object traversed, and then recursively process its children.

It should be noted that the collectwhite method will not process child objects that already exist in the candidates collection, they will be processed in subsequent loops within the collectCandidates method.

insert image description here

Special handling of certain types of objects can further improve collection performance. Such objects include objects that do not contain pointers, objects that can never be members of ring data structures, etc.

The Recycler algorithm will color these objects green instead of black when allocating them, and will never add them to the candidate garbage collection, let alone track them.

Bacon and Rajan found that this approach can reduce the size of the candidate garbage collection by an order of magnitude. Figure 5.4 describes the complete object state transformation of the synchronous Recycler algorithm, including the green nodes.
insert image description here

5.6 Restricted domain reference counting

The space taken up by an object's reference count in its header is also worth noting. In theory, an object may be referenced by all objects in the heap, so the size of the reference count field should be the same as the size of the pointer field, but for small objects, this level of space overhead is too expensive.

In practice, the reference counts of most objects are usually small, unless they are intentionally large. In addition, most objects are not shared objects. Once the pointer to them is deleted, these objects can be reused immediately. This feature allows functional languages to update objects such as arrays in place without having to update them based on their new The copy is modified.

If you know in advance the upper limit that the reference count may reach, you can use a smaller field to record the reference count, but there are usually some widely referenced objects (popular objects) in many programs .

In the face of the problem that the reference count occasionally exceeds the upper limit, it is still possible to limit the size of the reference count field if a fallback processing mechanism can be introduced.

Once the reference count of an object reaches the maximum value allowed, it can be converted into a sticky reference count (sticky reference count) , that is, any subsequent pointer operations will no longer change the reference count value of the object.

The most extreme option is to use only one bit to represent the reference count, thereby concentrating the power of reference counting on non-shared objects. This bit can be kept in the object or recorded on a pointer.

A corollary of scoped reference counting is:

Once the reference count of the object exceeds the upper limit, the object can no longer be recycled through the reference count. At this time, a back-up tracking collector is needed to process this object. The tracking collector can fix the object's reference count to the correct value when traversing each pointer (whether the object's reference count exceeds the limit or not).

Wise said that mark-compact and copy collectors can also recover object uniqueness information when properly modified. The backtracking collector should be able to collect ring garbage under all circumstances.

6. Comparison of Garbage Collectors

6.1 Throughput

For many users, the primary concern may be the overall throughput of the program, which may also be the main evaluation metric for batch programs or web servers.

For the former, short pauses are acceptable, but for the latter, such pauses are often masked by system or network delays. It is important to perform garbage collection as fast as possible, but faster collection speed does not mean that the overall execution speed of the program will also be faster.

In a well-configured system, garbage collection should take only a small fraction of the overall execution time, and if a faster collector imposes more overhead on setter operations, it is likely to reduce the overall execution time of the application. Time gets longer.

The overhead of the setter can be explicit, such as the read-write barrier in the reference counting algorithm, but some implicit factors may also affect the performance of the setter

If the copying collector rearranges objects in an inappropriate way, it can reduce the cache friendliness of the setter
Operations that decrement the reference count will most likely require access to a "colder" object.

In any case, it is very important to avoid synchronous operations, but changes to reference counts must use synchronous operations to avoid "lost" update operations. Delayed reference counting and merged reference counting can eliminate the overhead of these synchronous operations.

Some people use algorithmic complexity to compare different recycling algorithms.

Marking-sweeping recycling needs to consider the overhead of the two phases of tracking (marking) and cleaning, and the cleaning process needs to visit each object (including living objects and dead objects)
The complexity of the copy collector depends only on the tracking phase, and the tracking process only needs to visit all surviving objects

Based on this alone, it's easy to draw the wrong conclusion that mark-and-sweep collections are more expensive than copy collections.

Also for tracking, the mark-sweep algorithm requires far fewer instructions to access an object than the copy-type collection algorithm. Locality also has a big impact on recycling performance.

We saw in Section 2.6 that using prefetching technology can make up for the cache miss problem, but for the copy collector, whether it is possible to use prefetching technology while retaining the benefits of depth-first copying, but not A perfect answer.

In all tracking collectors, the overhead of pointer tracking usually plays a decisive role. In addition, the copying collector performs best when the proportion of surviving objects in the heap is small, but if you use lazy sweeping in the mark-sweep algorithm, you can also achieve the best performance in this scenario.

6.2 Dwell time

Another concern of many users is that garbage collection can cause pauses in program execution. Minimizing pause time is not only important for interactive programs, but also a key requirement for transactional service handlers, otherwise it will lead to a backlog of transactions.

The tracking collectors we've introduced so far all introduce a static pause, that is, the collector needs to suspend the setter thread until the collection is complete.
The advantage of the reference counting algorithm is that it can amortize the recycling overhead during the execution of the program, thereby avoiding the static pause of everything , but as mentioned in the previous chapter, in a high-performance reference counting system, this advantage is not absolute:
- When the last reference to a larger data structure is removed, recursive reference count modification and object deallocation may occur.
- Fortunately, there is no multi-threaded contention problem in the operation of changing the reference count of the garbage object, although it may still cause the conflict of the cache line where the object resides.
- The more deadly problem is that the two most effective strategies for improving reference counting performance, delayed reference counting and merged reference counting, require a static pause to reclaim objects in the zero-reference table.

6.3 Memory space

If the physical memory is small, or the application program is very large, or the application program needs better scalability, the memory usage becomes very important. All garbage collection algorithms introduce space overhead, which is usually determined by various factors.

Certain algorithms may require a certain amount of space on each object, such as a reference count field.

The half-area copying collector requires additional heap space as a copy reserved area, and in order to ensure the safety of the collection, the copy reserved area should be able to accommodate all currently allocated objects, unless there is a fallback processing mechanism (such as mark-compact algorithm ).

Non-moving collectors face memory fragmentation, which reduces heap availability. Although the metadata space required for recycling does not belong to the heap, it cannot be ignored.

Tracing collectors may need to mark stacks, mark bitmaps, or other more complex data structures. All non-compact collectors, including explicit managers, require some space to maintain the data structures they need, such as partition free lists, etc.

Finally, for tracking recycling or delayed reference counting recycling, if you want to avoid performance bumps caused by frequent recycling, you must reserve a certain amount of space for garbage objects in the heap.

In a garbage collection system, the size of the reserved space usually reaches 30%~200% of the minimum amount of memory required by the application, or even 300%.

Many systems can perform heap expansion when necessary to avoid performance bumps and other purposes. If an application using a garbage collector wants to achieve the same performance as an explicitly managed heap, the required memory space is usually 3 to 6 times that of the latter.

Simple reference counting algorithms can immediately reclaim an object when it is no longer associated with the live object graph. Besides avoiding the accumulation of garbage in the heap, this feature has some other potential advantages:

The freed space is usually reallocated within a short time, which helps improve cache performance
In some scenarios, the compiler can detect when an object becomes garbage, and then reuse it immediately, so that it does not need to be handed over to the memory manager for recycling.

An ideal garbage collector should not only meet the integrity requirements (that is, all dead objects will eventually be returned), but also achieve the timeliness of recycling (that is, all dead objects can be recycled in each recycling cycle).

The basic tracking collectors introduced in the previous chapters can meet this requirement, but at the cost of scanning all surviving objects for each collection process.

However, based on performance considerations, modern high-performance collectors usually give up the timeliness of collection, that is, allowing some garbage to "float" from the current collection cycle to the next collection cycle.

In addition, the reference counting algorithm also faces the problem of collection integrity, that is, the ring garbage cannot be collected without the help of tracking methods.

6.4 Implementation of the collector

Correctly implementing a garbage collection algorithm is not easy, and correctly implementing a concurrent collection algorithm is even more difficult.

The interface between the collector and the compiler is critical. The errors generated by the collector are likely to show up a long time later (perhaps after multiple collection cycles), and the consequence is usually that the setter tries to access an illegal reference, so the robustness of the collector (robustness ) Just as important as its speed.

The collector is a key system component related to performance. Good software engineering practices such as modularization and componentization can be used to guide the implementation of the collector, thereby ensuring high maintainability of the code.

One of the advantages of a simple tracking collector:

The interface between the collector and the setter is relatively simple, that is, the collector is only invoked when the allocator runs out of memory.

The main complexity of implementing this interface is how to determine the root of recycling, including references contained in global variables, registers, and stack slots.

It should be emphasized that the design of replicating and sorting collectors is much more complex than that of non-moving collectors.

The mobile collector needs to find out exactly each root and update all references to an object
The non-moving collector only needs to find at least one reference to the surviving object, and does not need to change the value of the pointer.

So-called conservative collectors (conservative collectors) can perform garbage collection without precise setter stack and object layout information. They use intelligent (but safe, conservative) guesses to determine whether a value is a pointer.

Since the non-moving collector does not update the reference, even if the collector mistakenly identifies a value as a reference, it will not modify the value, and the only risk is that it may cause a space leak. See Jones for a more detailed discussion of pessimistic garbage collectors.

The reference counting algorithm needs to be tightly coupled to the setter, which is both an advantage and a disadvantage.

The advantage is that reference counting can be implemented as a library, so developers can decide for themselves which objects need to be managed by reference counting and which objects need to be managed manually.
The downside is that this coupling imposes a processing overhead on the setter, which is critical to ensure the correctness of reference counting operations.

For any modern programming language that uses dynamic memory allocation, the memory manager has a crucial impact on performance. Its key operations usually include memory allocation, setter update operations (including read-write barriers), and the internal cycle of the garbage collector.

The implementation code for these critical operations should be inlined , but at the same time be careful not to overbloat the code.

Inline

Take the inline function as an example:
an inline function is also called an inline function, which mainly solves the operating efficiency of the program. Function calls need to establish a stack memory environment, pass parameters, and generate program execution transfer, all of which require some time overhead. Some functions are used frequently, but the code is very short.

So we use the inline function to add this part to the calling position at compile time, which is similar to us directly writing the steps of the method to the calling position and then compiling.

Function inlining is supported in C++, the purpose of which is to improve the execution efficiency (speed) of functions. If a function is inlined, then at compile time, the compiler places a copy of the function's code in every place where the function is called. Any modification to an inline function requires recompilation of all clients of the function, because the compiler needs to replace all the code once again, otherwise the old function will continue to be used.

Take macro as an example:
In a C program, macro codes can be used to improve execution efficiency. The preprocessor replaces the function call with the method of copying the macro code, which saves the process of pushing parameters to the stack, generating CALL calls in assembly language, returning parameters, executing return, etc., thereby increasing the speed.

The biggest disadvantage of using macro code is that it is error-prone, and the preprocessor often produces unexpected side effects when copying macro code. For C++, there is another disadvantage of using macro code: the inability to manipulate private data members of a class.

It is also this way that leads to the uncertainty of the size of the file generated by compilation, that is, expansion.

If the processor's instruction cache is large enough and the code bloat is small enough (Steenkiste recommends less than 30% for older systems with small caches), code bloat will have a negligible impact on performance.

The code sequence executed in most cases (ie, the "fast path") should be as short as possible to facilitate inlining, while for the "slow path" that is executed only in a few cases, it can be implemented by a procedure call [Blackburn and McKinley, 2002]. In addition, the output of the compiler is also crucial, so it is necessary to check its assembly code. Cache-related behavior also has a significant impact on performance.

6.5 Adaptive systems

Commercial systems usually allow users to choose garbage collectors independently, and each collector has a series of adjustable parameters, but how to choose and adjust often confuses users. To further complicate matters, the parameters for each type of collector are not independent of each other.

Some researchers have suggested that systems should be able to adapt to the environment in which the applications being served are adapted. The Java runtime system developed by Soman et al. can dynamically switch the type of collector according to the size of the available heap at runtime. There are two basis for switching:

Use offline analysis to determine the best allocator given the program's current heap size
Toggles the collector based on the proportion of the program's current use to the maximum available space.

Singer et al. [2007a] used machine learning techniques to analyze certain static characteristics of a program and predict the most suitable collector type for the program (requiring only one experimental run).

Sun's Ergonomic adjustment system. The size of the available space in the heap can be controlled by adjusting the performance of the HotSpot collector, so as to achieve the throughput and maximum pause time required by the user.

For developers, the best and perhaps the only advice we can offer is to control the behavior of the application you develop and the space size and life cycle distribution characteristics of the objects you use, and then use different collectors to experiment accordingly , and finally choose the most suitable one.

However, experiments usually need to be based on real data sets to ensure the accuracy of the results, and artificial, "toy"-like benchmarking programs may play a reverse misleading role.

6.6 Unified Garbage Collection Theory

The previous chapters introduced two different types of garbage collection strategies:

Direct recycling, that is, reference counting
Indirect recycling, that is, tracking recycling.

Bacon et al. found a deep similarity between these two recycling strategies, and proposed an abstract framework of unified garbage collection theory. We can use this to show exactly the similarities and differences between different collectors.

6.6.1 Garbage Collection Abstraction

In the following abstract framework, we only use simple abstract data structures, and the specific implementation methods can be different. Garbage collection can be expressed as a fixed-point computation , that is, to calculate the reference count p(n) of a certain node n.

Valid reference sources for objects include root collections and other nodes with non-zero reference counts, namely:

insert image description here

From the perspective of reference counting, objects with non-zero reference counts should be retained, and the remaining objects should be recycled. The value of the reference count does not have to be exact, but should be at least a safe approximation of its true value.

In the abstract garbage collection algorithm, the calculation of the reference count needs to use a work list W of objects to be processed, and the algorithm ends when W is empty. In the following description, W is a multiset , because the same reference may be added to it multiple times in different operations.

6.6.2 Tracking Garbage Collection

The unified garbage collection theory abstracts the tracking garbage collection algorithm into a form of reference counting.

Algorithm 6.1 shows the abstraction of the tracking garbage collection process:

The tracing process starts from all nodes with non-zero reference counts
After each recycling cycle, the sweepTracing method will clear the reference counts of all nodes, and the New method will also reset the reference counts of newly created objects to zero.
In the collectTracing method, the algorithm first calls the rootsTracing method to construct the initial work list W, and then passes it to the scanTracing method.

insert image description here

As expected, the collector traces the entire object graph to find all objects reachable from the root:

The scanTracing method traces each node in the work list, and adds 1 to the reference count of each node found during the tracing process, so that the reconstruction of all node reference counts can be completed (recall what we described in Section 5.6 Yes, the tracking collector can be used to fix sticky reference counts).
If a reachable node src is found for the first time (that is, when it changes from 0 to 1, see line 10 in Algorithm 6.1), the recycler will scan its pointer fields and add the child nodes they point to to the work list w, so as to realize the recursive scanning of all outgoing edges of sre.
The end of the while loop signifies the completion of the scanning process, at which point all surviving nodes have been found, and each surviving object has a non-zero reference count equal to the number of edges of that node.
Next, the sweepTracing method releases all useless nodes, and at the same time clears the reference counts of all objects for the next recycling. It should be noted that the tracking collector in practical applications can use only one bit to represent the reference count of the object, that is, use the mark bit to record whether the object has ever been accessed, and the mark bit at this time is equivalent to its reference count a rough approximation of .

The calculation result of the tracking collector is the least fixed-point solution (least fixed-point solution) of formula (6.1) , that is, the reference count value of each object is the minimum value that can satisfy the equation.

We can use the three-color abstraction introduced in Section 2.2 to interpret the garbage collection algorithm. In Algorithm 6.1, objects with a reference count of zero are white, and objects with non-zero reference counts are black. The change in color of the object from white to gray to black represents the entire process of the object from the time it is first spotted to when it is scanned. Therefore, the abstract tracking algorithm can also be regarded as the process of dividing all nodes into two sets, the black object set represents reachable objects, and the white object set represents garbage.

6.6.3 Reference Counting Garbage Collection

To show the similarities between refcounting garbage collection and tracking collection, in the abstract refcounting garbage collection algorithm shown in Algorithm 6.2, the inc and dec methods executed by the setter write the reference counting operation into the buffer, and the Not immediately. A strategy for buffering reference count change operations is useful for multithreaded applications.

The buffer operation here is similar to the merged reference counting algorithm introduced in Section 5.4. In Algorithm 6.2, the specific garbage collection work is done in the collectCounting method, where
the applyIncrements method performs the deferred reference count increment operation ( $I$ ), the scanCounting method performs a deferred reference count decrement (D).

When the setter uses the Write method to perform the assignment operation, that is, when the new target reference dst is written into the domain src[i], the increase of the reference count of the new target object (that is, inc(dst)) and the increase of the reference count of the original target object The reduction (ie dec(src[i1)) will be written to the buffer.

At the beginning of recycling, the collector first performs all reference count increment operations in the buffer. At this time, the reference count of the object may be larger than its real value.
Next, the scanCounting method will iterate through the work list and decrement the reference count of each object it encounters by 1. If the reference count of an object drops to zero at this stage, it means that the object is garbage, and its child nodes will also be added to the work list.
Finally, the sweepCounting method frees all garbage nodes.

The tracking algorithm is generally the same as the reference counting algorithm, there are only some minor differences between them. Both algorithms include a scanning process: the scanTciing method of tracking recovery uses the reference count increase operation, while the scanCounting method of the reference count algorithm uses the reference count decrease operation. Both algorithms need to recursively scan zero-reference objects, and both need to release the space occupied by garbage nodes through the cleaning process. The outlines of Algorithm 6.1 and the first 31 lines in Algorithm 6.2 are basically similar. In addition, the delayed reference counting strategy (the strategy that delays the reference counting operation of the root object) also conforms to this abstract framework (see Algorithm 6.3).

As mentioned in the previous chapters, when there are cycles in the object graph, the calculation of reference counts will encounter problems. Figure 6.1 shows a simple object graph in which two objects form a single ring. If object A has a reference count of zero, then object B will also have a reference count of zero (since only objects with non-zero reference counts affect the reference counts of the objects they refer to). However, since the reference counts of object A and object B depend on each other, there is a chicken-and-egg problem here: we can also preset the reference count of object A to 1, so that the referenced object B Also has a reference count of 1.

insert image description here
Since many different situations may arise, there may be many different solutions for a general fixed-point calculation expression. For the situation shown in Figure 6.1, Nodes = {A, B}, Roots = {}, then there are two solutions for fixed-point operations (see formula 6.1):

The minimum fixed-point solution ρ(A)= ρ(B)=0 and the maximum fixed-point solution ρ(A)= ρ(B)=1. The calculation result of the tracking collector is the minimum fixed-point solution, while the calculation result of the reference counting collector is the maximum fixed-point solution, so it cannot (only rely on the reference count itself) collect ring garbage. The set of objects reachable only from the ring garbage is the only difference between the two solutions. Section 5.5 has introduced that in the reference counting algorithm, local tracking can be used to recover ring garbage. This process is essentially a process of starting from the maximum fixed-point solution, continuously reducing the set of objects to be recycled, and finally obtaining the minimum fixed-point solution.

insert image description here

appendix

[1] "Garbage Collection Algorithm Handbook The Art of Automatic Memory Management"
[English] Richard Jones (Richard Jones) [US] Anthony Hosking (Antony Hosking) Eliot Moss (Eliot Moss) Wang Yaguang
Xue Di translation

"Garbage Collection Algorithm Handbook The Art of Automatic Memory Management" - Comparison of Reference Counting and Garbage Collectors (Notes)

Article directory

5. Reference Counting

5.1 Advantages and disadvantages of reference counting algorithm

5.2 Improve efficiency

5.3 Delayed reference counting

5.4 Merging reference counts

5.5 Circular reference counting

step

5.6 Restricted domain reference counting

6. Comparison of Garbage Collectors

6.1 Throughput

6.2 Dwell time

6.3 Memory space

6.4 Implementation of the collector

6.5 Adaptive systems

6.6 Unified Garbage Collection Theory

6.6.1 Garbage Collection Abstraction

6.6.2 Tracking Garbage Collection

6.6.3 Reference Counting Garbage Collection

appendix

Guess you like