python学习之路--内存管理

1.python内存管理基础

origin_dict = {'a': 1, 'b': [1, 2, 3, 4]}
origin_dict_copy = {}
print "oringin_dict:", origin_dict
origin_dict_copy['key1'] = origin_dict
origin_dict_copy['key2'] = origin_dict
print "oringin_dict_copy:", origin_dict_copy
print('changeorigin_dict_copy')
origin_dict_copy['key1']['a'] = 100
print "oringin_dict_copy:", origin_dict_copy
origin_dict_copy['key1']['b'] = "hello"
print "oringin_dict_copy:", origin_dict_copy
origin_dict_copy['key1'] = {'a': 2, 'b': [2, 1, 4, 5, 3]}
print "oringin_dict_copy:", origin_dict_copy
print "oringin_dict:", origin_dict

代码结果结果:

oringin_dict: {'a': 1, 'b': [1, 2, 3, 4]}

oringin_dict_copy: {'key2': {'a': 1, 'b': [1, 2,3, 4]}, 'key1': {'a': 1, 'b': [1, 2, 3, 4]}}

change origin_dict_copy

oringin_dict_copy: {'key2': {'a': 100, 'b': [1,2, 3, 4]}, 'key1': {'a': 100, 'b': [1, 2, 3, 4]}}

oringin_dict_copy: {'key2': {'a': 100, 'b':'hello'}, 'key1': {'a': 100, 'b': 'hello'}}

oringin_dict_copy: {'key2': {'a': 100, 'b':'hello'}, 'key1': {'a': 2, 'b': [2, 1, 4, 5, 3]}}

oringin_dict: {'a': 100, 'b': 'hello'}

分析：

由于Python对象的引用机制，我们知道，当把一个对象赋给一个变量的时候，实际上是建立了一个该变量到对象的引用。得到这个字典对象的一份拷贝，修改拷贝值实际上修改了了原值。

2.深入理解python内存管理

对于下面这个语句：

a = 1

Python是动态类型的语言，对象与引用分离。整数1为一个对象。而a是一个引用。利用赋值语句，引用a指向对象1。

【引用和对象】

利用内置函数id()返回对象地址。

# number
a = 1
b = 1
print "addr of a is:", hex(id(a))
print "addr of a is:", hex(id(b))
print "a is b ? :", (a is b)
# string
c = "abcdefg"
d = "abcdefg"
print "addr of c is:", hex(id(c))
print "addr of d is:", hex(id(d))
print "c is d ? :", (c is d)
# list
e = [1, 2, 3]
f = [1, 2, 3]
print "addr of e is:", hex(id(e))
print "addr of f is:", hex(id(f))
print "e is f ? :", (e is f)
# tuple
g = (1, 2, 3)
h = (1, 2, 3)
print "addr of g is:", hex(id(g))
print "addr of h is:", hex(id(h))
print "g is h ? :", (g is h)
# dict
i = {'key1': 1, 'key2': 2}
j = {'key1': 1, 'key2': 2}
print "addr of i is:", hex(id(i))
print "addr of j is:", hex(id(j))
print "i is j ? :", (i is j)

代码结果：

addr of a is: 0x1d70438L
addr of a is: 0x1d70438L
a is b ? : True
addr of c is: 0x224e5a8L
addr of d is: 0x224e5a8L
c is d ? : True
addr of e is: 0x2142f48L
addr of f is: 0x21eb2c8L
e is f ? : False
addr of g is: 0x1d9d1f8L
addr of h is: 0x20e5f30L
g is h ? : False
addr of i is: 0x1d91378L
addr of j is: 0x21f2488L
i is j ? : False

在Python中，整数和短小的字符，Python都会缓存这些对象，以便重复使用。当我们创建多个等于1的引用时，实际上是让所有这些引用指向同一个对象。

from sys import getrefcount
a = [1, 2, 3]
print(getrefcount(a))
b = a
print(getrefcount(b))
del a
print(getrefcount(b))

代码结果：

2
3
2

分析：

在Python中，每个对象都有存有指向该对象的引用总数，即引用计数。。需要注意的是，当使用某个引用作为参数，传递给getrefcount()时，参数实际上创建了一个临时的引用。因此，getrefcount()所得到的结果，会比期望的多1。发现del掉a，b的引用减1。

【垃圾回收】

当Python中的对象越来越多，它们将占据越来越大的内存。不过你不用太担心Python的体形，它会乖巧的在适当的时候“减肥”，启动垃圾回收(garbagecollection)，将没用的对象清除。在许多语言中都有垃圾回收机制，比如Java和Ruby。尽管最终目的都是塑造苗条的提醒，但不同语言的减肥方案有很大的差异。

从基本原理上，当Python的某个对象的引用计数降为0时，说明没有任何引用指向该对象，该对象就成为要被回收的垃圾了。比如某个新建对象，它被分配给某个引用，对象的引用计数变为1。如果引用被删除，对象的引用计数为0，那么该对象就可以被垃圾回收。

然而，减肥是个昂贵而费力的事情。垃圾回收时，Python不能进行其它的任务。频繁的垃圾回收将大大降低Python的工作效率。如果内存中的对象不多，就没有必要总启动垃圾回收。所以，Python只会在特定条件下，自动启动垃圾回收。当Python运行时，会记录其中分配对象(objectallocation)和取消分配对象(object deallocation)的次数。当两者的差值高于某个阈值时，垃圾回收才会启动。

我们可以通过gc模块的get_threshold()方法，查看该阈值:

import gc
print(gc.get_threshold())

结果：

返回(700, 10, 10)，后面的两个10是与分代回收相关的阈值，后面可以看到。700即是垃圾回收启动的阈值。可以通过gc中的set_threshold()方法重新设置。

我们也可以手动启动垃圾回收，即使用gc.collect()。

【分代回收】

Python同时采用了分代(generation)回收的策略。这一策略的基本假设是，存活时间越久的对象，越不可能在后面的程序中变成垃圾。我们的程序往往会产生大量的对象，许多对象很快产生和消失，但也有一些对象长期被使用。出于信任和效率，对于这样一些“长寿”对象，我们相信它们的用处，所以减少在垃圾回收中扫描它们的频率。

Python将所有的对象分为0，1，2三代。所有的新建对象都是0代对象。当某一代对象经历过垃圾回收，依然存活，那么它就被归入下一代对象。垃圾回收启动时，一定会扫描所有的0代对象。如果0代经过一定次数垃圾回收，那么就启动对0代和1代的扫描清理。当1代也经历了一定次数的垃圾回收后，那么会启动对0，1，2，即对所有对象进行扫描。

这两个次数即上面get_threshold()返回的(700, 10, 10)返回的两个10。也就是说，每10次0代垃圾回收，会配合1次1代的垃圾回收；而每10次1代的垃圾回收，才会有1次的2代垃圾回收。