Python virtual machine collection (3) - garbage collection algorithm (3)

Optimization: reuse fields to save memory

To save memory, the two linked-list pointers in each GC-backed object can be used for a variety of purposes. This is a common optimization known as "fat pointers" or "tagged pointers": pointers carrying extra data, "folded" into the pointer, meant to be stored inline in the data representing the address, taking advantage of memory addressing certain attributes. This is possible because most architectures align certain types of data to the size of the data, usually one or more words. This difference leaves some of the least significant bits of the pointer unused, which could be used to mark or hold other information -- most commonly as bit fields (each bit is a separate mark) -- as long as the pointer's The code masks these bits before accessing memory. E. For example, on a 32-bit architecture (for address and word size), a word is 32 bits = 4 bytes, so word-aligned addresses are always multiples of 4, so end with 00, leaving the last 2 bits available ; while in a 64-bit architecture, a word is 64 bits = 8 bytes, so word-aligned addresses end at 000, leaving the last 3 bits available.

The CPython GC uses two fat pointers that correspond to the extra fields of PyGC_Head discussed in the memory layout and object structure section:

A "tagged" or "fat "tagged" pointer cannot be dereferenced directly due to the presence of extra information that must be stripped before the actual memory address can be obtained. Special attention needs to be paid to functions that directly manipulate linked lists, since these usually assume that the pointers are in a consistent state.

  • The _gc_prev field is normally used as a "previous" pointer to maintain a double-linked list, but its lowest two bits are used to hold the flags prev_MASK_COLLECTING and _PyGC_REV_MASK_FINALIZED. Between collections, the only flag that can appear is _PyGC_PREV_MASK_FINALIZED , which indicates whether the object is finalized or not. During collection, in addition to the two flags, _gc_prev is temporarily used to store a copy of the reference count (gc_ref), and the gc-linked list will become a singly-linked list until _gc_pref is restored.
  • The _gc_next field is used as the "next" pointer to maintain the double-linked list, but during collection its lowest bit is used to hold the next_MASK_UNREACHABLE flag, which indicates whether the object is temporarily unreachable during the cycle detection algorithm. This is a disadvantage of using only doubly-linked lists to implement partitioning: while most demanding operations are constant time, there is no efficient way to determine which partition an object is currently on. Instead, special tricks (such as the NEXT_MASK_UNREACHABLE flag) are used when needed.

Garbage Collector C API

Python's support for detecting and collecting garbage involving circular references requires support for object types that are "containers" for, and possibly containers for, other objects. Types that store no references to other objects, or only atomic types such as numbers or strings, do not need to provide any explicit support for garbage collection.

To create a container type, the tp_flags field of the type object must contain Py_TPFLAGS_HAVE_GC and provide an implementation of the tp_traverse handler. If instances of this type are mutable, a tp_clear implementation must also be provided.

Py_TPFLAGS_HAVE_GC

Objects of types with this flag set must conform to the rules documented here. For convenience, these objects will be referred to as container objects.

Constructors for container types must conform to two rules:

The object's memory must be allocated using PyObject_GC_New() or PyObject_GC_NewVar().

It must call PyObject_GC_Track() after initializing all fields that may contain references to other containers.

TYPEPyObject_GC_New(TYPE,PyTypeObject类型)

Similar to PyObject_New(), but for container objects with the Py_TPFLAGS_HAVE_GC flag set.

TYPE PyObject_GC_NewVar(type, PyTypeObject type, Py_size_t size)

Similar to PyObject_NewVar(), but for container objects with the Py_TPFLAGS_HAVE_GC flag set.

TYPEPyObject_GC_Resize(TYPE,PyVarObjectop,Py_size_t newsize)

Resizes an object allocated by PyObject_NewVar(). Returns the resized object or NULL on failure. The collector cannot yet track ops.

void PyObject_GC_Track(PyObject*on)

Add the object op to the set of container objects tracked by the collector. The collector can run at unexpected times, so objects must be valid while they are being tracked. This function should be called once all fields following the tp_traverse handler are valid, usually near the end of the constructor.

Likewise, an object's delocator must conform to a similar pair of rules:

PyObject_GC_UnTrack() must be called before fields referring to other containers are invalidated.

The object's memory must be freed using PyObject_GC_Del().

void PyObject_GC_Del(void*op)

Use PyObject_GC_New() or PyObject_GC_NewVar() to free memory allocated to an object.

void PyObject_GC_UnTrack(void*op)

Removes the object op from the set of container objects tracked by the collector. Note that PyObject_GC_Track() can be called again on this object, adding it back to the set of tracked objects. The deallocator (tp_dealloc handler) should call this function for the object before any fields used by the tp_traverse handler become invalid.

Changed in version 3.8: The _PyObject_GC_TRACK() and _PyObject_GC_UNTRACK() macros have been removed from the public C API.

The tp_traverse handler accepts function arguments of the following types:

int*visitproc)(PyObject*对象,void*arg)

The type of the visitor function passed to the tp_traverse handler. The function should be called with the object to traverse as object and the third argument of the tp_traverse handler as arg. The Python kernel uses several visitor functions to implement cyclic garbage detection; users are not expected to need to write their own visitor functions.

A tp_traverse handler must have the following type:

int*transverseproc)(PyObject*self,visitproc访问,void*arg)

The traversal function of the container object. The implementation must call the accessor function for each object directly contained by self, with the arguments to be accessed being the contained object and the arg value passed to the handler. An accessor function cannot be called with a NULL object argument. If access returns a non-zero value, that value should be returned immediately.

To simplify writing tp_traverse handlers, the Py_VISIT() macro is provided. In order to use this macro, the tp_traverse implementation must name its arguments exactly visit and arg:

void Py_VISIT(PyObject*o)

If o is not NULL, the access callback is called with arguments o and arg. Return it if visit returns non-zero. Using this macro, the tp_traverse handler looks like this:

static int
my_traverse(Noddy *self, visitproc visit, void *arg)
{
    
    
    Py_VISIT(self->foo);
    Py_VISIT(self->bar);
    return 0;
}

The tp_clear handler must be of the query type, or NULL if the object is immutable.

int*inquiry)(PyObject*self)

Remove references that may have created a reference cycle. Immutable objects do not have to define this method, since they can never directly create reference cycles. Note that the object must still be valid after calling this method (don't just call Py_DECREF() on a reference). The collector will call this method if it detects that this object is involved in a reference cycle.

Optimization: Latency Tracking Containers

Certain types of containers cannot participate in reference cycles and therefore do not need to be tracked down by the garbage collector. Untracking these objects reduces the cost of garbage collection. However, determining which objects can be untracked is not free, and the cost must be weighed against the benefits of garbage collection. There are two possible strategies for when to untrack a container:

  • when creating the container.

    When the garbage collector checks the container.

As a general rule, instances of atomic types are not tracked, instances of non-atomic types (containers, user-defined objects...) are tracked. However, there are some type-specific optimizations that can be made to suppress the garbage collector footprint of simple instances. Some examples of native types that benefit from latency tracking:

  • Tuples containing only immutable objects (integers, strings, etc., and recursively tuples of immutable objects) do not need to be tracked. The interpreter creates a large number of tuples, many of which will not survive until garbage collected. So it's not worth it to untrack eligible tuples at creation time. Instead, all tuples except empty tuples are tracked upon creation. During garbage collection, determine whether any surviving tuples can be untracked. A tuple can be untracked if all its contents have not been tracked. Check that the tuple is untracked in all garbage collection cycles. Untracking tuples may take more than one cycle.
  • Dictionaries containing only immutable objects also don't need to be tracked. Dictionaries are not tracked at creation time. If a tracked item is inserted into the dictionary (as key or value), the dictionary will be tracked. During a full garbage collection (all generations), the collector will untrack any dictionary whose contents are not tracked.

The garbage collector module provides the Python function is_track(obj), which returns the current tracking state of an object. Subsequent garbage collections may change the object's tracking state.

gc.is_tracked(0)
False
gc.is_tracked("a")
False
gc.is_tracked([])
True
gc.is_tracked({
    
    })
False
gc.is_tracked({
    
    "a": 1})
False
gc.is_tracked({
    
    "a": []})
True

Guess you like

Origin blog.csdn.net/AI_LX/article/details/128737383