Python中hashable和imutable

Hashing is the process of converting some large amount of data into a much smaller amount (typically a single integer) in a repeatable way so that it can be looked up in a table in constant-time (O(1)), which is important for high-performance algorithms and data structures.

Immutability is the idea that an object will not change in some important way after it has been created, especially in any way that might change the hash value of that object.

The two ideas are related because objects which are used as hash keys must typically be immutable so their hash value doesn't change.  If it was allowed to change then the location of that object in a data structure such as a hashtable would change and then the whole purpose of hashing for efficiency is defeated.

哈希(散列)是一个将大体量数据转化为很小数据的过程,甚至可以仅仅是一个数字,以便可以在固定的时间复杂度下查询,哈希对高效的算法和数据结构很重要。

不可改变性是指一些对象在被创建之后不会因为某些方式改变,特别是针对任何可以改变哈希对象的哈希值的方式。

两者相联系是因为哈希键值一定是不可改变的,如果允许它们改变,那么它们在数据结构如哈希表中的存储位置也会改变,因此会与哈希的概念违背。


 hashable 

An object is hashable if it has a hash value which never changes during its lifetime (it needs a __hash__() method), and can be compared to other objects (it needs an __eq__() method). Hashable objects which compare equal must have the same hash value. 
Hashability makes an object usable as a dictionary key and a set member, because these data structures use the hash value internally.

如果一个对象在其生命周期内有一个固定不变的哈希值 (这需要__hash__()方法) 且可以与其他对象进行比较操作 (这需要__eq__()方法) ,那么这个对象就是可哈希对象 (hashable) 。可哈希对象必须有相同的哈希值才算作相等。 
由于字典 (dict) 的键 (key) 和集合 (set) 内部使用到了哈希值,所以只有可哈希 (hashable) 对象才能被用作字典的键和集合的元素。 


All of Python’s immutable built-in objects are hashable(tuple等), while no mutable containers (such as lists or dictionaries) are. Objects which are instances of user-defined classes are hashable by default; they all compare unequal (except with themselves), and their hash value is their id().

所有python内置的不可变对象都是可哈希的,同时,可变容器 (比如:列表 (list) 或者字典 (dict) ) 都是不可哈希的。用户自定义的类的实例默认情况下都是可哈希的;它们跟其它对象都不相等 (除了它们自己) ,它们的哈希值来自id()方法。


python内置的可哈希对象可以使用hash()或者__hash__()方法来查看它的哈希值,比如a.__hash__()或者hash(a),或者如下:

 

a = 1
print(a.__hash__())
b = 1
print(b.__hash__())
print(id(a))
print(id(b))
1
1
1549492384
1549492384

a和b的哈希值和id值都相同。

c = 1.2
print(c.__hash__())
d = 1.2
print(d.__hash__())
print(id(c))
print(id(d))
461168601842738689
461168601842738689
93758832
93758688

c和d的哈希值相同,id值却不同。

说明一下:为什么a和b的哈希值和id都一样,而c和d的哈希值相同id却不同呢?

解释器在对值很小的int和很短的字符串做了一点小优化,只分配了一个对象,所以前面a和b的id一样,这里c和d不符合优化条件,就给它们分别分配了两个对象,id值就不一样了。

class A:
     pass
a = A()
b = A()
print(a)
print(b)
print(a.__hash__())
print(b.__hash__())
print(id(a))
print(id(b))
<__main__.A object at 0x0000000005C76630>
<__main__.A object at 0x0000000005C765F8>
6059619
-9223372036848716193
96953904
96953848

a和b哈希值和id都不一样。


可变也叫做原地可变:所谓原地是说不需要新开辟一个内存单元,那么原地可变就是操作都是在原内存单元进行的。原地可变的数据类型有:list、dict、set,看原地可变的例子:

a = [1,2,3]
print(a)
print(id(a))
a.append(4)
print(a)
print(id(a))
a[0] = 0
print(a)
print(id(a))
[1, 2, 3]
96884360
[1, 2, 3, 4]
96884360
[0, 2, 3, 4]
96884360

这里可以看到a的内容改变了,但是其id没有变,在原内存单元上进行增加修改操作,a并没有申请另外的内存单元。

不可变也叫原地不可变,原地不可变就是说这个对象一旦改变就需要新的内存单元。原地不可变的数据类型有:int、float、complex、string、tuple,看例子:

a = 1
print(a)
print(id(a))
a = 2
print(a)
print(id(a))
1
1549492384
2
1549492416

这里可以看到,a的值一旦改变id就会跟着改变,即原地不可变。

只有原地可变对象有add(),append()等操作。

参考内容:

https://blog.csdn.net/qq_25730711/article/details/53487350

https://www.cnblogs.com/chenzhaosu/articles/3506790.html

猜你喜欢

转载自www.cnblogs.com/salane/p/9289623.html