Python memory management and garbage collection algorithms to resolve

This article introduces the Python memory management and garbage collection algorithms resolution, introduced the traditional garbage collection mechanism, its methods of work, finalizer problems and other related content, has a certain reference value, need friends can understand the next.
summary

Exists in the list, tuples, instances, classes, dictionary and functions circular reference problem. There del examples of the method will be processed in a sound manner. To add new types of GC support is very easy. GC and support the regular Python Python is binary compatible.

Generational recovery run work (currently three generational). The results measured by the pybench is about four percent overhead. Virtually all the expansion modules should remain the same working properly (I had to modify the standard release of new and cPickle module). Gc of a new module called the debugger can be used to immediately recover and set debugging options.

Collector should be portable across platforms. Python version of the patch has passed all the tests and return to run Grail, Idle and Sketch of time without any problems.

Since Python 2.0 and later versions, portable garbage collection has been included in which the. Garbage collection is enabled by default. Please glad some of it!

Why do we need garbage collection?

The current version of Python uses reference counting to manage memory allocation. Each object has a reference count Python, this reference count indicates how many objects point to it. When the reference count is 0, the object is released. Reference count for most programs works well. However, the reference count on a defective nature, is caused due to the circular reference. The simplest example of a circular reference is a reference to the object itself. such as:

>>> l = []
>>> l.append(l)
>>> del l

This created a list of reference count is now 1. However, since it is already inaccessible from within Python, and may not be used again, and it should be treated as garbage. In the current version of Python, this list will never be released.

Circular reference is not a good programming practice in general, and that almost always be avoided. However, sometimes it is difficult to avoid creating a circular reference, or the programmer is not even aware of the problem of circular references. For long-running program, such as a server, this problem is particularly worrisome. People do not want their servers can not be released because of a circular reference objects can not access the memory being exhausted. For large programs, how hard to find a circular reference is created.

"Traditional" garbage collection is what?

Traditional garbage collection (such as mark - sweep method or stops - Copy Method) is generally as follows:

Find the root object system. The root object is like a global environment (such as Python in the main module) and object on the stack.
Search for all objects that can be accessed from these objects. The objects are "active".
The release of all other objects.
Unfortunately, this method can not be used for the current version of Python. Since the expansion work modules, Python not fully determine the root set of objects. If the root set of objects can not be accurately determined, we have released risk is still referenced object. Designed so that the use of other expansion module, there is no way to find the portable object on the stack of the current C. Moreover, the reference count provided some Python programmers already looking forward to the locality of memory references about the benefits and termination semantics. The best that we can find a reference count that is able to use, but also can release scheme of circular references.

How does this method work?

Conceptually, this method and the traditional garbage collection mechanism opposite. This method attempts to find all inaccessible objects, rather than go all accessible objects. It is much safer, because if this method fails, at least not without garbage collection than worse (without regard to waste our time and space).

Because we are still in with a reference counting garbage collector only need to find circular references. Reference count will deal with other types of waste. First, we observed that the circular reference can only be to create a container object. Container object is an object that can contain other objects referenced. In Python, lists, dictionaries, instances, classes and examples are the ancestral container object. Integer and String is not a container. With this discovery, we realized that the non-container object can be recycled garbage ignored. This is a useful optimization such as integers and strings should be relatively light.

Now the idea is to record all container objects. There are several ways to do, however, the best way is to use a doubly linked list, the linked list in the object structure comprises a pointer field. So you can quickly insert the objects removed from the collection, and does not require additional memory allocation. When a container is created, it is inserted into this collection, is deleted, it is removed from the collection.

Now that we can get all the container object, how do we find circular references? First we add another field to the two pointers outer container object. We named this field gc_refs. We can find a circular reference in the following steps:

Each container object, provided gc_refs object's reference count value.
Each container object, to find other container objects it references and the values thereof gc_refs minus one.
All gc_refs container object is greater than one object referenced by the collection of objects outside of the container. We can not release these objects, so we put these objects into another collection.
The object is removed the referenced object can not be released. We put them and give them access to the objects are removed from the current collection.
In the current collection of the remaining objects are only referenced by the objects in the collection (that is, they can not be taken into Python, which is rubbish). We can now to release these objects.
Finalizer problem

Our grand plan there is a problem, the problem is to use a finalizer. Is the method of Example Finalizer __del__ in Python. When using a reference count, Finalizer to work well. When an object's reference count drops to zero when the, Finalizer released just before the object is called. For programmers, this is straightforward and easy to understand.

When garbage collection, call the finalizer has become a troublesome problem, especially when faced with the problem of circular references. If two objects in a circular reference has finalizer, how to do? Which the first call? After calling the first finalizer, the objects can not be released because the second finalizer can get to it.

Because there is no good solution to this problem, the referenced object has a finalizer cycle can not be released. On the contrary, these objects are added to a global list of the garbage can not be recycled. Programs should always be re-written to avoid this problem. As a last resort, the program can read this global list and release these references in a loop for the current application in a meaningful way.

At what price?

Like some people say, there is no free lunch under the sun. However, this form of garbage collection is relatively inexpensive. One of the biggest cost per each container object need additional memory space of three words. There are maintenance overhead container collection. The current version of the garbage collector, the cost is probably based on pybench this rate of decline four per cent.

The garbage collector current record three generations of information objects. By adjusting the parameters, time spent in garbage collection can think how small it is how small. For some applications, turn off the automatic garbage collection and run-time explicit call may be significant. However, the default garbage collection parameters run pybench, garbage collection time spent does not look great. Clearly, a large number of applications dispensing container objects can cause more garbage collection time.

The current patch adds a new configuration keys to activate the garbage collector. There are garbage collector with the standard Python Python is binary compatible. If this option is off, the work of the Python interpreter would be no impact.

How can I use it?

Just download the current version of Python on it. The garbage collector has been included in a future version 2.0, and the default is enabled by default. If you're using Python 1.5.2 version, there is a perhaps an old version of patch work. If you are using a Windows platform, you can download a python15.dll to be replaced.

Boehm-Demers Conservative garbage collection

This patch adds some modifications to Python 1.5.2, using a conservative garbage collector Boehm-Demers. But you have to hit this patch. Still uses reference counting. The garbage collector only releases memory reference count is not released (i.e., circular references). This should be the best performance. you need to:

$ cd Python-1.5.2
$ patch -p1 < ../gc-malloc-cleanup.diff
$ patch -p1 < ../gc-boehm.diff
$ autoconf
$ ./configure --with-gc

Suppose you install this patch libgc.a, making -lgc link options are available (/ usr / local / lib should be). If you do not have this library before compiling download and install.

Currently, this patch has only been tested on Linux. Perhaps also work on other Unix machines. On my Linux machine, GC version of Python passes all regression testing.

I write to you, for everyone to recommend a very wide python learning resource gathering, click to enter , there is a senior programmer before learning to share experiences, study notes, there is a chance of business experience, and for everyone to carefully organize a python zero the basis of the actual project data, daily python to you on the latest technology, prospects, learning the small details need to leave a message
summary

That's all for this article about the Python memory management and garbage collection algorithms resolved

Published 43 original articles · won praise 13 · views 50000 +

Guess you like

Origin blog.csdn.net/haoxun11/article/details/105057229