Python basic knowledge combing-garbage collection mechanism in Python

Python basic knowledge combing-garbage collection mechanism in Python


1 Introduction

When the Python program is running, it needs to open up a space in the memory to store the temporary variables generated during runtime; after the calculation is completed, the result is output to the permanent memory. If the amount of data is too large, OOM (out of memory) may appear in the memory space, and the program may be terminated by the operating system.

Leak refers to: the program itself is not well designed, causing the program itself to fail to release memory that is no longer used

Memory leak refers to: after the code allocates a certain section of memory, because of a design error, it loses control of this section of memory, which causes a waste of memory.

We only need to remember the most critical sentence:

The garbage collection mechanism in Python is based on reference counting, and the two mechanisms of mark-sweep and generational collection are supplemented.

2. Reference counting

Everything in Python is an object, so all the variables we see are essentially a pointer to an object.

So, how do we judge whether this object needs to be recycled?

When the reference count (number of pointers) of this object is 0, it means that there is no reference to this object, and naturally this object becomes garbage and needs to be recycled.

2.1 Example 1: a is a local variable

import os
# 一个开源的获取系统信息的库
import psutil

# 显示Python程序占用的内存大小
def show_memory_info(hint):
    pid = os.getpid()
    p =psutil.Process(pid)

    info = p.memory_full_info()
    memory = info.uss / 1024. / 1024
    print('{} memroy used:{} MB'.format(hint,memory))

def func():
    show_memory_info('initial')
    a = [i for i in range(10000000)]
    show_memory_info('after created ')

func()
show_memory_info('finished')
# 输出
initial memroy used:6.1796875 MB
after created  memroy used:398.26953125 MB
finished memroy used:10.515625 MB

After the call, func()after the creation of the list a, the memory usage reached nearly 400MB, and after the function call ended, the memory was restored to the previous level.

The list a declared inside the function is a local variable. After the function returns, the reference of the local variable will be cancelled. At this time, the number of references to the object pointed to by the list a is 0, and Python will perform garbage collection, so a large amount of it was previously occupied The memory is released.

2.2 Example 2: a is a global variable

So, what happens when we declare a as a global variable?

import os
# 一个开源的获取系统信息的库
import psutil

# 显示Python程序占用的内存大小
def show_memory_info(hint):
    pid = os.getpid()
    p =psutil.Process(pid)

    info = p.memory_full_info()
    memory = info.uss / 1024. / 1024
    print('{} memroy used:{} MB'.format(hint,memory))

def func():
    show_memory_info('initial')
    global a # 声明全局变量a
    a = [i for i in range(10000000)]
    show_memory_info('after created ')

func()
show_memory_info('finished')
# 输出
initial memroy used:6.16796875 MB
after created  memroy used:398.28515625 MB
finished memroy used:398.28515625 MB

After we declare a as a global variable, even after the function returns, the reference to the list still exists. At this time, Python's garbage collection mechanism will not reclaim a and still occupy memory.

2.3 Example 3: a as the return value

import os
# 一个开源的获取系统信息的库
import psutil

# 显示Python程序占用的内存大小
def show_memory_info(hint):
    pid = os.getpid()
    p =psutil.Process(pid)

    info = p.memory_full_info()
    memory = info.uss / 1024. / 1024
    print('{} memroy used:{} MB'.format(hint,memory))

def func():
    show_memory_info('initial')
    a = [i for i in range(10000000)]
    show_memory_info('after created ')
    return a

a = func()
show_memory_info('finished')
# 输出
initial memroy used:6.1640625 MB
after created  memroy used:398.2578125 MB
finished memroy used:398.2578125 MB

If a is used as the return value, because a a=fun()is referenced in, a will not be recycled by the garbage collection mechanism, and the memory will still be occupied.

2.4 Reference counting principle

import sys

a = []
# 两次引用,一次来自a,一次来自 getrefcount
print(sys.getrefcount(a))
def func(a):
    # 四次引用,a,python 的函数调用栈,函数参数 和 getrefcount
    print(sys.getrefcount(a))
func(a)
# 两次引用,一次来自a ,一次来自 getrefcount,函数 func 调用已经不存在
print(sys.getrefcount(a))
# 输出
2
4
2

sys.getrefcount()This function can view the number of references to a variable.

Note: It getrefcount()will also quote a count.

Note: When a function occurs, there will be two additional references, one from the function stack and the other from the function parameters.

import sys
a = []
# 两次引用计数,一次是a,另一次是sys.getrefcount()
print('引用计数为:{} 次'.format(sys.getrefcount(a)))

b = a
# 三次 引用计数,前两次 + b=a的一次引用,sys.getrefcount()重复只计算一次
print('引用计数为:{} 次'.format(sys.getrefcount(a)))

c = b # 四次
d = b # 五次
e = b # 六次
f = b # 七次
g = d # 八次,通过 d 引用了 b
print('引用计数为:{} 次'.format(sys.getrefcount(a)))

2.5 Manual garbage collection

Garbage collection in Python is freemuch simpler than releasing memory in C language , but what if we need to manually release memory? How to do it?

Then, we need to call del ato delete an object first , and then call to gc.collect()start garbage collection, the code is as follows:

import gc
import os
import psutil

def show_memory_info(hint):
    pid = os.getpid()
    p = psutil.Process(pid)
    info = p.memory_full_info()
    memory = info.uss / 1024. / 1024
    print('{} memroy used:{} MB'.format(hint, memory))

show_memory_info('initial')
a = [i for i in range(10000000)]
show_memory_info('after created')
del a
gc.collect()
show_memory_info('finished')
print(a)
# 输出
initial memroy used:6.16015625 MB
after created memroy used:398.21875 MB
finished memroy used:10.46875 MB 
Traceback (most recent call last):
  File "/Users/gray/Desktop/test.py", line 18, in <module>
    print(a)
NameError: name 'a' is not defined

It can be seen that this is the manual recycling mechanism of Python. After manual recycling, the memory space is recycled and the list ais deleted, so the error display is anot defined.

3. Circular references

We know that if the reference count is 0, Python will recycle. If two objects refer to each other, will they be garbage collected?

Let's start with an example:

import gc
import os
import psutil

def show_memory_info(hint):
    pid = os.getpid()
    p = psutil.Process(pid)
    info = p.memory_full_info()
    memory = info.uss / 1024. / 1024
    print('{} memroy used:{} MB'.format(hint, memory))

def func():
    show_memory_info('initial')
    a = [i for i in range(10000000)]
    b = [i for i in range(10000000)]
    show_memory_info('after a,b created')
    a.append(b)
    b.append(a)

func()
show_memory_info('finished')
# 输出
initial memroy used:6.21484375 MB
after a,b created memroy used:660.42578125 MB
finished memroy used:660.15625 MB

Obviously, due to the mutual reference between a and b, even if a and b are both local variables, after the function call ends, the pointers of a and b no longer exist in the program sense, but the memory is still occupied.

In such a simple code, we can still find circular references, but when the engineering code is complex, the reference loop may not be easy to find easily.

This kind of circular reference can also be handled by Python, and we still call gc.collect()to manually start garbage collection.

import gc
import os
import psutil

def show_memory_info(hint):
    pid = os.getpid()
    p = psutil.Process(pid)
    info = p.memory_full_info()
    memory = info.uss / 1024. / 1024
    print('{} memroy used:{} MB'.format(hint, memory))

def func():
    show_memory_info('initial')
    a = [i for i in range(10000000)]
    b = [i for i in range(10000000)]
    show_memory_info('after a,b created')
    a.append(b)
    b.append(a)

func()
gc.collect()
show_memory_info('finished')
# 输出
initial memroy used:6.15234375 MB
after a,b created memroy used:784.7109375 MB
finished memroy used:10.74609375 MB

Manual garbage collection is in effect, and garbage collection in Python is not that weak.

4. Python garbage collection

For circular references, Python has special mark-sweep and generational collection to deal with.

4.1 Mark-Clear

For a directed graph, if you start from a node to traverse, and mark all the nodes that it passes through; then after the traversal is over, all the nodes that are not marked are called unreachable nodes; obviously, the existence of these nodes It doesn't make any sense. Naturally, we need to garbage collect them.

Of course, traversing the entire graph every time is also a huge performance waste for Python. Therefore, in the implementation of garbage collection in Python, mark-sweepa data structure is maintained using a doubly linked list, and only the objects of the container class are considered (only the container class Objects can produce circular references).

4.2 Generational collection

Python divides all objects into three generations. The newly created object is generation 0; after a garbage collection, the remaining objects will be moved from the previous generation to the next generation in turn; the threshold for starting garbage collection in each generation can be set separately. When the new object minus the deleted object in the garbage collection container reaches the corresponding threshold, the object of this generation will be garbage collected.

The idea of ​​generational collection is actually: the new generation of objects are more likely to be recycled, and the longer-lived objects have a higher probability of continuing to survive. This way of thinking can save a lot of calculations and improve the performance of Python.






For the follow-up update of the blog post, please follow my personal blog: Stardust Blog

Guess you like

Origin blog.csdn.net/u011130655/article/details/113019063