[python advanced] Garbage collection garbage collection 2 [python advanced] Garbage collection garbage collection 1

foreword

In the previous article [python advanced] Garbage collection garbage collection 1 , we talked about Garbage collection (GC garbage collection), described Ruby and Python garbage collection, circular data structures and reference counting in Python, and GC in Python Threshold, in this section we will continue to introduce some applications and precautions of the GC module, let's start today's explanation~~

1. Garbage collection mechanism

Garbage collection in Python is mainly based on reference counting, supplemented by generational collection.

1. The case that causes the reference count to +1

  • object is created, e.g. a=23
  • object is referenced, e.g. b=a
  • Objects are passed as parameters to a function, such as func(a)
  • The object is stored as an element in a container, e.g. list1=[a,a]

2. Situations that lead to a reference count of -1

  • The alias of the object is explicitly destroyed, e.g. del a
  • The alias of the object is given to the new object, e.g. a=24
  • An object leaves its scope, such as when the f function finishes executing, the local variables in the func function (the global variables do not)
  • The container in which the object resides is destroyed, or the object is removed from the container

3. View the reference count of an object

In [1]: import sys

In [2]: a = "hello world"

In [ 3 ]: sys.getrefcount (a)
Out[3]: 2

You can view the reference count of the a object, but it is 1 larger than the normal count, because a is passed when calling the function, which will increase the reference count of a by 1

2. Circular references cause memory leaks

The flaw of reference counting is the problem of circular references

import sys
a = "hello world"
sys.getrefcount (a)

import gc
class ClassA():
    def __init__(self):
        print('object born,id:%s'%str(hex(id(self))))

def f2():
    while True:
        c1 = ClassA ()
        c2 = ClassA ()
        c1.t = c2
        c2.t = c1
        del c1
        del c2

#Turn off python's gc 
gc.disable()
f2()

When f2() is executed, the memory occupied by the process will continue to increase.

  • After c1 and c2 are created, the reference counts of these two pieces of memory are both 1. After executing c1.t=c2 and c2.t=c1, the reference counts of these two pieces of memory become 2.
  • After del c1, the reference count of the object in memory 1 becomes 1. Since it is not 0, the object in memory 1 will not be destroyed, so the reference count of the object in memory 2 is still 2. After del c2, Similarly, the reference number of the object in memory 1 and the object in memory 2 is 1.
  • Although both of their objects can be destroyed, due to circular references, the garbage collector will not recycle them, so it will lead to memory leaks.

3. Garbage recycling

import gc
class ClassA():
    def __init__(self):
        print('object born,id:%s'%str(hex(id(self))))
    #def __del__(self):
    #    print('object del,id:%s'%str(hex(id(self))))

def f3():
    print("-----0------")
    #print(gc.collect())
    c1 = ClassA()
    c2 = ClassA ()
    c1.t = c2
    c2.t = c1
    print("-----1------")
    del c1
    del c2
    print("-----2------")
    print(gc.garbage)
    print("-----3------")
    print(gc.collect())#显式执⾏垃圾回收
    print("-----4------")
    print(gc.garbage)
    print("-----5------")

if __name__ == '__main__':
    gc.set_debug(gc.DEBUG_LEAK) #Set the log of the gc module 
    f3()

The python3 result is as follows:

-----0------
object born,id:0x7fcd059190f0
object born,id:0x7fcd05919240
-----1------
-----2------
[]
-----3------
gc: collectable <ClassA 0x7fcd059190f0>
gc: collectable <ClassA 0x7fcd05919240>
gc: collectable <dict 0x7fcd05989d48>
gc: collectable <dict 0x7fcd058f24c8>
4
-----4------
[<__main__.ClassA object at 0x7fcd059190f0>, <__main__.ClassA object at 0x7fcd05919240>, {'t': <__main__.ClassA object at 0x7fcd05919240>}, {'t': <__main__.ClassA object at 0x7fcd059190f0>}]
-----5------
gc: collectable <module 0x7fcd059715e8>
gc: collectable <dict 0x7fcd0597af08>
gc: collectable <builtin_function_or_method 0x7fcd0596fdc8>
...

illustrate:

  • Garbage-collected objects will be placed in the gc.garbage list 
  • gc.collect() will return the number of unreachable objects, 4 equals two objects and their corresponding dict

There are three situations that trigger garbage collection:

  1. call gc.collect(), 
  2. When the counter of the gc module reaches the threshold. 
  3. when the program exits

4. Analysis of common functions of gc module

The gc module provides an interface for developers to set options for garbage collection. As mentioned above, a defect of using the reference counting method to manage memory is circular references, and one of the main functions of the gc module is to solve the problem of circular references.

Commonly used functions:

1. gc.set_debug(flags) Set the debug log of gc, generally set to gc.DEBUG_LEAK
2. gc.collect([generation]) explicitly perform garbage collection, you can input parameters, 0 means only check the first Generation objects, 1 means checking the first and second generation objects, and 2 means checking the first, second, and third generation objects. If no parameters are passed, a full collection is executed, which is equal to passing 2. Returns the number of unreachable objects
3. The frequency of automatic garbage collection in the gc module obtained by gc.get_threshold().
4. gc.set_threshold(threshold0[, threshold1[, threshold2]) Set the frequency of automatic garbage collection.
5. gc.get_count() Gets the current automatic garbage collection counter and returns a list with a length of 3.

Automatic garbage collection mechanism of gc module

The gc module must be imported, and is_enable()=True will start automatic garbage collection.
The main function of this mechanism is to discover and deal with unreachable garbage objects.
Garbage collection = garbage checking + garbage collection
In Python, the method of generational collection is adopted. Divide the object into three generations. At the beginning, when the object is created, it is placed in one generation. If the modified object survives the garbage check of one generation, it will be placed in the second generation. In the second generation garbage check, if the object survives, it will be put into the third generation.
The gc module will have a list of length 3 counters, which can be obtained through gc.get_count().
For example (488, 3, 0), where 488 refers to the number of memory allocated by Python minus the number of freed memory since the last garbage check. Note that it is memory allocation, not reference count increase. E.g:

print(gc.get_count())#(590,8,0)
a = ClassA()
print(gc.get_count())#(590,8,0)
del a
print(gc.get_count())#(590,8,0)

3 refers to the last second-generation garbage check and the number of first-generation garbage inspections. Similarly, 0 refers to the last three-generation garbage inspection and the number of second-generation garbage inspections.
The gc module has an automatic garbage collection threshold, that is, a tuple with a length of 3 obtained by the gc.get_threshold function, such as (700, 10, 10) Each time the counter increases, the gc module will Check if the incremented count reaches the threshold number, if so, perform the corresponding algebraic garbage check and reset the counter.
For example, assuming the threshold is (700,10,10):

When the counter increases from (699 , 3, 0) to (700, 3 , 0), the gc module will execute gc.collect(0), that is, check the garbage of a generation of objects and reset the counter
When the counter increases from ( 699,9,0) to (700,9,0), the gc module will execute gc.collect(1 ), that is, check the garbage of the first and second generation objects, and reset the counter
When the counter increases from ( 699,9,9) to (700,9,9), the gc module will execute gc.collect(2), that is, check the garbage of one, two and three objects and reset the counter

important point

The only thing that the gc module can't handle is that the classes with circular references have the __del__ method, so avoid defining the __del__ method in the project

import gc
class ClassA():
    pass
    #def __del__(self):
    #    print('object born,id:%s'%str(hex(id(self))))

gc.set_debug(gc.DEBUG_LEAK)
a = ClassA ()
b = ClassA ()
a.next = b
b.prev = a
print("--1--")
print(gc.collect())
print("--2--")
del a
print("--3--")
del b
print("--3-1--")
print(gc.collect())
print("--4--")

The results are as follows:

--1--
0
--2--
--3--
--3-1--
gc: collectable <ClassA 0x7f599dc690f0>
gc: collectable <ClassA 0x7f599dc69160>
gc: collectable <dict 0x7f599dcdcd48>
gc: collectable <dict 0x7f599dcdcdc8>
4
--4--
gc: collectable <module 0x7f599dcc45e8>
gc: collectable <dict 0x7f599dccdf08>
gc: collectable <builtin_function_or_method 0x7f599dcc2dc8>
...

If del is turned on, the running result is:

--1--
0
--2--
--3--
--3-1--
gc: collectable <ClassA 0x7fb236853128>
gc: collectable <ClassA 0x7fb236853160>
gc: collectable <dict 0x7fb2368c5d48>
gc: collectable <dict 0x7fb2368c5ec8>
object born,id:0x7fb236853128
object born,id:0x7fb236853160
4
--4--
gc: collectable <module 0x7fb2368ad5e8>
gc: collectable <dict 0x7fb2368b6f08>
gc: collectable <builtin_function_or_method 0x7fb2368abdc8>
...

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325019001&siteId=291194637