[Python road] garbage collection memory management &

A, python source code

1. Prepare Source

Download Python Source: https://www.python.org/ftp/python/3.8.0/Python-3.8.0.tgz

Unpack folder:

 

Our main concern Include the ".h" file in the directory and Objects ".c" file.

Include and Objects from our file types can be seen Python interpreter is written in C language.

 

2.object.h

In the Include folder, all ".h" file.

The C language header file in the main storage of the macro function declarations, the structure declaration, global variables, and so on.

 

All our classes inherit from Object in Python, so object.h the C language, we can look at is how to achieve.

We first look at object.h file content (small portion):

#define _PyObject_HEAD_EXTRA \
     struct _Object * _ob_next; \
     struct _Object * _ob_prev; 

typedef struct _Object {
     // maintain a doubly linked list refchain 
    _PyObject_HEAD_EXTRA
     // reference count 
    Py_ssize_t ob_refcnt;
     // Data type 
    struct _typeobject * ob_type; 
} the PyObject; 

typedef struct { 
    the PyObject ob_base; 
    // data type is a multi-element number of a maintenance capacity 
    Py_ssize_t ob_size; / * number The Part of items in variable * / 
} PyVarObject;

We can see from the above code, the two structures and PyVarObject PyObject, except that a plurality PyVarObject ob_size property, which represents the number of elements (e.g. List, the number of elements in dict).

Therefore, these two structures, respectively (the definition of any data in Python, will have this head) heads for different types of data:

PyObject:float

PyVarObject:list、dict、tuple、set、int、str、bool

Since Python int is not limited in length, it is the underlying implementation str, int therefore also belong PyVarObject camp. Bool in Python is actually 0 and 1, it is also int, also belong to PyVarObject camp.

 

3.floatobject.h

typedef struct {
    PyObject_HEAD
    double ob_fval;
} PyFloatObject;

We as a float, for example, can be seen to create a float type of data, it is actually an instance of PyFloatObject structure is created.

PyFloatObject structure contained in a PyObject_HEAD (this is the PyObject object.h in), and a double ob_fval, this double variable is the value of our store.

 

We Python in practice, look at the source code of the process:

1) python define variables v = 0.3:

Source process:

    a. open memory (memory size is sizeof (PyFloatObject))

    b. initialization

      ob_fval = 0.3

      ob_type=float

      ob_refcnt = 1

    c. The object is added to the doubly linked list refchain

2) python perform operation name = v:

Source process:

    ob_refcnt + = 1

3) python perform operations del v:

Source process:

    ob_refcnt- = 1

4) python execution

def func(arg): 
    print(arg)

func(name)

Source process:

    Open up the stack when executed: ob_refcnt + = 1

    At the end of the destruction of the stack: ob_refcnt- = 1

5) python implementation of del name:

Source process:

    ob_refcnt- = 1

 

In this operation several times, each time will be determined ob_refcnt- = 1 ob_refcnt is equal to zero. If it is 0, it will be classified as spam, and normally GC collector should be recovered, see section II.

 

Second, the caching mechanism

In the first section, if the reference float variables are deleted, the reference count for the future 0, GC collector stands to reason that should be recovered.

1.free_list cache list

But the compiler thinks, users often have to define a variable of type float, so he took out an object from the PyFloatObject refchain the list, and put in another one-way linked list, this list is one-way cache (called free_list) .

We have to be verified:

>>> v = 8.9
>>> name = v
>>> del v
>>> id(name)
1706304905888
>>> del name
>>> xx = 9.0
>>> id(xx)
1706304905888
>>>

You can see, name the id is 1706304905888, after deleting the name, xx created by a float variable, or as a result of xx id 170,630,490,588. This checks cache mechanism.

 

Why use a cache ( free_list )?

  Because the recovery of memory space and the memory it consumes time, so if the space into the cache, there is a new float variables are defined, then get directly from the cache address, a re-initialization, and the new value ob_fval assigned to.

 

2.free_list maximum length

Note that this is a one-way linked list (free_list) only for PyFloatObject type. And this list has a maximum length of 100. You can see the definitions in floatobject.c:

PyFloat_MAXFREELIST #ifndef
 // defined maximum length free_list 
#define PyFloat_MAXFREELIST 100
 #endif 
// used to indicate the current numfree free_list how long 
static  int numfree = 0 ;
 // free_list pointer 
static PyFloatObject free_list * = NULL;

1000, for example, while float variable reference count becomes 0, only the classified free_list 100, 900 may be the remainder recovered.

 

In the float, the maximum length is 100 free_list, and other data types, the maximum length may be different.

For example, the maximum length of the list free_list 80:

#ifndef PyList_MAXFREELIST
#define PyList_MAXFREELIST 80
#endif
static PyListObject *free_list[PyList_MAXFREELIST];
static int numfree = 0;

dict also 80:

#ifndef PyDict_MAXFREELIST
#define PyDict_MAXFREELIST 80
#endif
static PyDictObject *free_list[PyDict_MAXFREELIST];
static int numfree = 0;
static PyDictKeysObject *keys_free_list[PyDict_MAXFREELIST];
static int numfreekeys = 0;

 

3. Other Optimization Mechanism

Not all data types use free_list caching mechanism, such as int data using a small pool of optimization:

#ifndef NSMALLPOSINTS
#define NSMALLPOSINTS           257
#endif
#ifndef NSMALLNEGINTS
#define NSMALLNEGINTS           5
#endif

 

Third, the garbage collection mechanism

Python's GC follow the following principles:

  Reference counter based, and a clear mark generational supplemented recovered.

1. The reference counter (ibid., Slightly)

2. Circular References

Generally occurs in the circular reference lists, dictionaries, and other containers the object class objects, they can be nested inside each other, for example:

= A [1, 2 ] 
B = [. 4,. 5 ]
 # B reference count increases by 1 to 2 
a.append (B) 

# A reference count becomes 0 
del A
 # B becomes a reference count , but it has been unable to access b, so we formed a memory leak 
del b

In this case, a memory leak occurs, we must use clear labeling to solve the problem of circular references.

 

2. Mark Clear

For those container classes of objects in Python will separate them into a doubly linked list (non-refchain), the do regular scans.

Reference: https://www.cnblogs.com/saolv/p/8411993.html

# First set of circular reference # 
A = [1,2 ] 
B = [3,4- ] 
a.append (B) 
b.append (A) 
del A 

# # 

# second groups references # 

C = [4,5 ] 
d = [5,6 ] 
c.append (d) 
d.append (c) 
del c
 del d
 # Thus, a reference to the original and the original c and d of the original reference count for the object are is 1, b cited object's reference count is 2, 
E [7, 8 ]
 del E

Now explain mark clear: this piece of code to run to the top, this time, our intention is to clear away objects c and d and e are cited, while retaining the objects a and b references. However, c and d are referenced object's reference count is non-zero, the original simple method only cleared e, c and d of the referenced object currently in memory.

  Suppose, at this time we predetermined cycle time to, at which point the flag cleared flourish. His task is to, in a, b, c, d four variable object, find out the real need to clean up the c and d, while retaining a and b.

  First, he first divided into two sets, a dial root object (survivors), a dial unreachable (dead group). He then cited various objects copy count out this copy removed references to ring.

  Removal ring: Suppose two objects of A, B, we proceed from A, because it has a reference to B, then B reference count by 1; and B along the reference reached, because B has one pair of A reference, the same reference subtraction of 1 a, so that, to complete the circulation loop between the extraction reference object.

  Removal is completed, then a copy of the reference count is 0, the reference count b copy is a copy of a reference 1, c and d are zero count. Then the first copy of the non-zero into survival group, copy of the death into the group 0. If you just end, then it victimizes a, because even with b, we target a referenced cleared b in memory can use it? Apparently the trial had at it again, do not take innocent people to kill, so he survived in the group, and all the analysis again for each object, due to the current group only b survival, he only b analysis, because you want to b survival, so b, the elements have to survive, so in b, discovered the original object is a pointed, so he took him out from the group of death. At this point, he has been to the first and second instance, eventually all of the objects in the death of any course in the group all killed, while the root object alive. b object pointed to by the reference count is 2 any course, a primary reference count for the object pointed to is still 1

 

Survival after scanning the group object, into another list to go, a total of three such linked list, representing three generations.

 

3. Generational recovery

It refers to three generational list maintenance recovery container object class, corresponding to the three lists three. On the bottom of the list scanned 10 times, fishes top of the list scanned once.

In fact, this is to save the performance, minimize the scanned object.

I think there is no problem frequently used objects placed on the floor, reducing the number of scans.

Therefore, in the Python memory management, it maintains a total of four linked list, wherein a list for managing refchain general data types, such as float like. The other three lists generational composition, the container manage class data type.

 

 

Reference blog: https://www.cnblogs.com/wupeiqi/articles/11507404.html

Guess you like

Origin www.cnblogs.com/leokale-zz/p/12113559.html