[Python] -python test development stack memory management (two) - Garbage Collection

In the previous article ( python memory management - reference count ), we introduced the reference counting memory management mechanism in python, python is through it to manage memory efficiently. Today to introduce garbage collection python, whose main strategy is to reference count based, mark - sweep and generational recovered as a secondary strategy (the students are familiar with java at back memories, in fact, this strategy is and JVM were similar at) .

Reference counting garbage collection

We also then be followed by an article on the introduction of the relevant reference count scenes, help us to understand how to python garbage collection by reference counting. In fact, through the literal meaning, we should not be difficult to understand, when the reference count reaches zero of an object, the object means that no longer use this object, the object becomes the equivalent of a useless "junk", when the python interpreter to scan this object can be recovered off.

We look at some examples, can increase the reference count of the object python or decrease scene:

# coding=utf-8
"""
~~~~~~~~~~~~~~~~~
 @Author:xuanke
 @contact: [email protected]
 @date: 2019-11-29 19:52
 @function: 验证引用计数增加和减少的场景
"""
import sys

def ref_method(str):
    print(sys.getrefcount(str))
    print("我调用了{}".format(str))
    print('方法执行完了')

def ref_count():
    # 引用计数增加的场景
    print('测试引用计数增加')
    a = 'ABC'
    print(sys.getrefcount(a))
    b = a
    print(sys.getrefcount(a))
    ref_method(a)
    print(sys.getrefcount(a))
    c = [1, a, 'abc']
    print(sys.getrefcount(a))

    # 引用计数减少的场景
    print('测试引用计数减少')
    del b
    print(sys.getrefcount(a))
    c.remove(a)
    print(sys.getrefcount(a))
    del c
    print(sys.getrefcount(a))
    a = 783
    print(sys.getrefcount(a))

if __name__ == '__main__':
    ref_count()

Results are as follows:

测试引用计数增加
7
8
10
我调用了ABC
方法执行完了
8
9
测试引用计数减少
8
7
7
4

From the above results, we concluded that:

Increasing the reference count scenes:

  • Object is created and assigned to a variable, such as: a = 'ABC'
  • CROSS-REFERENCE between variables (equivalent to the variable pointing to the same object), such as: b = a
  • Variable as a parameter passed to the function. For example: ref_method (a), in fact, the last article, we mentioned the call getrefcount will increase the reference count.
  • The object into a container object (lists, tuples, dictionaries). For example: c = [1, a, 'abc']

Reducing the reference count of the scene:

  • When a variable goes out of scope, such as: when the function execution is completed, the results above, do not know if you found no references before and after the execution method count remains the same, this is because after the implementation of the method, the object reference count is also it is reduced if the printing method is able to see the effect of increasing the reference count.
  • When the reference variable object is destroyed, or such del a del b. Note that if del a, go get a reference count will direct error.
  • The object is removed from the container object, such as: c.remove (a)
  • Directly to the destruction of the entire container, such as: del c
  • Reference object is assigned to another object, the equivalent variable does not point to an object before, but point to a new object, such a case, the reference count certainly will change. (Excluding two objects default reference count the same scene).

While the reference count in real time to know whether an object can be recycled, but there are two drawbacks:

  • Additional space is needed to maintain the reference count.
  • There are encounters a circular reference object can not be effectively addressed. The so-called circular reference is for example: A reference to the object of the object B, the object B and the object references A, causing them to two reference count can not be reduced to 0, and therefore can not be recycled.

Mark - garbage collected

In order to resolve the reference can not be resolved by counting in a circular reference problem, using a labeled Python - garbage collection algorithm, its entire two-step process:

  • Tags: through all objects, if it is reachable (reachable), which is also the object being referenced it, the object is then marked up;
  • Clear: through all objects again, if an object is not marked up , it is recovered off.

Note that can be produced in python circular reference problem might be: a list of dictionary, user-defined classes of objects, and other objects tuple, and for this simple number string data type, and does not produce a circular reference , so that the latter is not in the column labeled Clear consider the algorithm.

For marking - recycling process of garbage collection, I found a few pictures from the Internet, to facilitate understanding of the whole process:

1.png

The first figure is the initial state, not only ref_count, there is a gc_ref value of the picture, this gc_ref fact, in order to solve the problem of reference counting, it is a copy of ref_count, so its initial value and ref_count consistent. When the start traversing all objects referenced when link1 link2 found objects, gc_ref link2 will reduce the value of 1, and so on, to obtain the results of FIG.

2.png

The second map, we see link2, link3, link4 of gc_ref have been zero, when the python garbage collector scans all of the objects again, they will be marked as GC_TENTATIVELY_UNREACHABLE, while being moved to the Unreachable list. Some students may wonder why link2 not be moved Unreachable list, in fact, it theoretically should be moved Unreachable list, as shown in the third chart:

3.png

If the python garbage collector again scanned object, an object is found ref_count not 0, it will be marked as GC_REACHABLE, showing also being referenced with, link1 shown below is the case.

4.png

In addition to the mark than link1 reachable, python garbage collector, will also turn up through all the nodes from the current node up to, for example, can be reached from link1 link2 and link3, but link3 has been placed Unreachable list, it is also necessary to link3 then moved back to Object to Scan the list represents an object can still touch-tat. The end result shown below, it can be recovered only link4 out:

5.png

Mark - sweep method can solve the problem of circular references, but the disadvantages are also obvious, it is the need python garbage collector to perform two passes for python objects, and each scan, python interpreter will pause to deal with other things, until the end of the scan after to return to normal. This process is like: librarian for the library clean finish, then the library will be closed until after the clean up to re-open the library for students to use.

Generational garbage collection

That being the python garbage collection process, will suspend the entire application, there is no better optimization it? The answer is yes. In the python interpreter, the survival time of the object is not the same:

  • 长时间存活(或一直存活)的对象,它们是内存垃圾的可能性低,可以减少对它们扫描的次数。
  • 临时或短时间存活的对象,这种对象比较容易成为内存垃圾,所以得频繁扫描。
  • 位于前两种情况的之间的对象。可根据情况进行内存扫描。

这样区分对象后,就可以节省每次扫描的时间(不需要所有对象都扫描),重而能提升垃圾回收的速度。

python中结合着上面列出的三种类型的对象分了三个对象代(0,1,2),它们其实对应了3个链表:每一个新生对象在generation zero中,如果它在一轮gc扫描中活了下来,那么它将被移至generation one,在这一个对象代扫描次数将会减少;如果它又活过了一轮gc,它又将被移至generation two,在这一个对象代对象扫描次数将会更少。

python触发垃圾回收扫码的时机

python解释器只会在触发某个条件时,才会去执行垃圾回收。这个条件就是当python分配对象的次数和取消分配对象的次数(引用计数变为0)做差值高于某个阈值,我们可以通过python提供的方法来查看这个阈值。

def threshold_gc():
    # 获取阈值
    print(gc.get_threshold())
    # 可设置阈值
    gc.set_threshold(800, 10, 10)
    print(gc.get_threshold())

# 运行结果
(700, 10, 10)  
(800, 10, 10)

上面程序运行结果中值的含义如下:

  • 700是垃圾回收启动的阈值。
  • 后面两个10与分代回收有关(上面介绍过python分了三个对象代:0、1、2),第一个10表示每进行10次0代对象扫描,则进行1次1代对象扫描。
  • 最后一个10表示每进行10次1代对象扫描,则执行1次2代对象扫描。

此外可以自己根据情况,调用set_threshold()方法来调整垃圾回收的频率。比如:set_threshold(700,10,5),相当于增加了对2代对象的扫描频率。

gc这个库中还有一些很好玩的函数,大家可以了解下(更多方法可以参考官方文档):

def gc_method():
    # 启动垃圾回收
    gc.enable()
    # 停用垃圾回收
    gc.disable()
    # 手动指定垃圾回收,参数可以指定垃圾回收的代数,不填写参数就是完全的垃圾回收
    gc.collect()
    # 设置垃圾回收的标志,多用于内存泄漏的检测
    gc.set_debug(gc.DEBUG_LEAK)
    # 返回一个对象的引用列表
    gc.get_referrers()

额外补充-python内存分层结构

在python中,内存管理机制被抽象成分层次的结构,从python解释器Cpython的源码obmallic.c的注释中抓取了对内存分层的描述:

/*
    Object-specific allocators
    _____   ______   ______       ________
   [ int ] [ dict ] [ list ] ... [ string ]       Python core         |
+3 | <----- Object-specific memory -----> | <-- Non-object memory --> |
    _______________________________       |                           |
   [   Python's object allocator   ]      |                           |
+2 | ####### Object memory ####### | <------ Internal buffers ------> |
    ______________________________________________________________    |
   [          Python's raw memory allocator (PyMem_ API)          ]   |
+1 | <----- Python memory (under PyMem manager's control) ------> |   |
    __________________________________________________________________
   [    Underlying general-purpose allocator (ex: C library malloc)   ]
 0 | <------ Virtual memory allocated for the python process -------> |
   =========================================================================
    _______________________________________________________________________
   [                OS-specific Virtual Memory Manager (VMM)               ]
-1 | <--- Kernel dynamic storage allocation & management (page-based) ---> |
    __________________________________   __________________________________
   [                                  ] [                                  ]
-2 | <-- Physical memory: ROM/RAM --> | | <-- Secondary storage (swap) --> |

*/
  • 第-2层是物理内存层。
  • 第-1层是操作系统虚拟的内存管理器。
  • Layer 0 is the C malloc, free, and other memory allocation associated release layer. When the memory allocation is greater than 256K, the layer 0 will call malloc allocate memory.
  • The first layer and the second layer is a python level memory allocator (memory cell), when the memory is less than 256K application, will be processed by the two layers. The presence of these three levels of two memory structures: arena> pool> block, wherein the fixed size is 256K arena, a fixed pool size is 4K, and the block size is an integer multiple of 8, the minimum allocation to meet the demand.
  • Layer 3 is the object memory allocator python, python that is the object of our commonly used, such as: lists and dictionaries, tuples, and so on.

python memory so layered design, the most fundamental purpose is to improve execution performance python, because if not hierarchical, frequent calls to malloc and free, very consuming system resources, create performance problems. After the stratification, the first layer and the second layer is acting as a memory pool, depending on the memory size allocated to different layers to handle, it reduces frequent calling malloc.

to sum up

This article describes the python garbage collection in three ways, as well as python hierarchical memory management, it is more in-depth knowledge of python, but I believe can help you understand memory management style of python. If we were asked the question "python garbage collection mechanism" in the process of looking for work after the interviewer, if you can speak out the contents of the text of the item is definitely a plus.

Guess you like

Origin www.cnblogs.com/zhouliweiblog/p/11968650.html