[Programming Basics] Python built-in module collections usage notes

Collections is a built-in module in the Python standard library. It provides some additional data structure types to enhance Python basic types such as lists, tuples, and dicts. The following is an overview of the main data structure classes in the collections module:

  • namedtuple: named tuple, creates a tuple with a name, and the elements can be accessed by name.
  • deque: A double-ended queue that can efficiently perform insertion and deletion operations on both ends.
  • Counter: Counter, used to count the number of occurrences of elements in an iterable object.
  • defaultdict: Default dictionary, similar to an ordinary dictionary, but returns a default value when accessing a non-existent key.
  • OrderedDict: Ordered dictionary, you can set the order of dictionary key values.
  • ChainMap: A class that combines multiple dictionaries or maps.
  • UserList: A wrapper class for lists, used to create custom lists.
  • UserString: A wrapper class for strings, used to create custom strings.
  • UserDict: Wrapper class for dictionaries, used to create custom dictionaries.

This article mainly introduces the basic usage of these data classes to better utilize Python's collections module to process different types of data. For a more detailed introduction to the collections module, please refer to the official Python documentation: python-collections .

1 namedtuple

A namedtuple is similar to a tuple, but you can specify a name for each element so that its elements can be referenced using their field names rather than relying solely on positional indexes.

The following code shows the use of namedtuple

from collections import namedtuple

# 定义一个名为Person的namedtuple类型,包含name和age两个字段
Person = namedtuple('Person', ['name', 'age'])
# Person = namedtuple('Person','age name') # 另一种创建方式

# 创建一个Person对象
person1 = Person('Alice', 17)

# 访问字段值
print(person1.name) # Alice
print(person1.age) # 17

# 也可以通过索引访问字段值
print(person1[0]) # Alice 
print(person1[1]) # 17  

# namedtuple字段值是不可变的,不能直接修改字段值
# person1.name = 'Bob'  # 这行会抛出异常

# 通过_replace方法创建一个新的命名元组,并替换特定字段的值
person2 = person1._replace(name='Bob')
print(person2) # Person(name='Bob', age=17)
# 打印字段名
print(person2._fields) # ('name', 'age')
Alice
17
Alice
17
Person(name='Bob', age=17)
('name', 'age')

From the above code, you can see that namedtuple and Python dictionary types have some similarities, but there are big differences in their implementation and usage. You need to choose the appropriate data type according to specific needs and situations. If you need to maintain the order of fields and improve access speed and memory efficiency, you can choose namedtuple. And if you need to dynamically add, delete, and modify key-value pairs, and you need to use more of the built-in methods and functions provided by the dictionary, then the dictionary type may be more suitable. Compared with dictionary types, the advantages and disadvantages of namedtuple are as follows:

Advantages of namedtuple:

  • Fast access: namedtupleFields are accessed internally using integer indexes, so they are more efficient than dictionaries.
  • High memory efficiency: namedtupleusing a compact memory layout, it saves more memory than a dictionary.
  • Field order is fixed: namedtupleThe order of fields can be specified when defining, and it is immutable. This is useful for operations involving field order.

Disadvantages of namedtuple:

  • Immutability: namedtupleFields are immutable and cannot be modified once created. Dictionaries can dynamically add, delete and modify key-value pairs.
  • Less flexible: Dictionaries provide more built-in methods and functions, such as iteration, search, update, etc. namedtupleRelatively simplified, without these extra features.

The following code shows the comparison of the space occupied by namedtuple and ordinary dictionary:

import random
import sys
from collections import namedtuple

# 创建字典
person_dict = {
    
    'age': 32, 'name': 'John Doe'}
print('person_dict占用的空间大小:', sys.getsizeof(person_dict))

# 将字典转换为namedtuple
Person = namedtuple('Person', ['age', 'name'])
person_tuple = Person(**person_dict)
print('person_tuple占用的空间大小:', sys.getsizeof(person_tuple))
person_dict占用的空间大小: 248
person_tuple占用的空间大小: 72

2 and

A deque (double-ended queue) is a data structure with queue and stack properties that allows elements to be quickly added and removed from both ends. Deque is similar to a list, but deque has better performance when inserting and deleting elements, especially when operations are frequent. The following code shows the use of deque.

from collections import deque

# 创建一个空的双端队列
my_deque = deque()
# 创建一个包含元素的双端队列
my_deque = deque([1, 2, 3])
# 创建一个指定最大长度的双端队列,多余的元素会被丢弃
my_deque = deque([1, 2, 3], maxlen=5)

# 在队列的右侧添加一个元素
my_deque.append(1)
# 在队列的左侧添加一个元素
my_deque.appendleft(2)

# 移除并返回队列中的最右侧元素
right_element = my_deque.pop()
# 移除并返回队列中的最左侧元素
left_element = my_deque.popleft()

# 输出当前队列中的所有元素
print(my_deque) # deque([1, 2, 3], maxlen=5)
# 输出队列中的第一个元素
print(my_deque[0]) # 1

# deque不支持切片操作,需要转换为list
# print(my_deque[:-1])
print(list(my_deque)[:-1]) # [1, 2]
deque([1, 2, 3], maxlen=5)
1
[1, 2]

deque also supports adding elements based on strings or lists, as shown below:

from collections import deque

# 创建一个空的deque对象
my_deque = deque()

# 使用extend/extendleft添加元素
my_deque.extend([1, 2, 3])
print(my_deque) # deque([1, 2, 3])

# 使用extend/extendleft添加字符串
my_deque.extendleft("Hello")
print(my_deque) # deque(['o', 'l', 'l', 'e', 'H', 1, 2, 3])
deque([1, 2, 3])
deque(['o', 'l', 'l', 'e', 'H', 1, 2, 3])

Some common function operations of deque are as follows:

from collections import deque

# 创建一个空的deque对象
my_deque = deque()
# 在左侧扩展字符串"Hello",将其拆分为字符并逐个添加到deque的左侧
my_deque.extendleft("Hello")

# 打印deque的长度
print(len(my_deque)) # 5
# 统计字符"l"在deque中出现的次数
print(my_deque.count("l")) # 2

# 在deque的左侧插入字符串"123"
my_deque.insert(0, "123")
print(my_deque) # deque(['123', 'o', 'l', 'l', 'e', 'H'])

# 将deque中的元素从右端取两个元素,并把它们移动到左端
# 如果为负数,则从左侧取元素
my_deque.rotate(2)
print(my_deque) # deque(['e', 'H', '123', 'o', 'l', 'l'])

# 反转队列
my_deque.reverse()
print(my_deque) # deque(['l', 'l', 'o', '123', 'H', 'e'])

# 清空deque中的所有元素
my_deque.clear()
print(my_deque)
5
2
deque(['123', 'o', 'l', 'l', 'e', 'H'])
deque(['e', 'H', '123', 'o', 'l', 'l'])
deque(['l', 'l', 'o', '123', 'H', 'e'])
deque([])

3 Counter

Counter is used to count the number of occurrences of elements in iterable objects. These iterable objects can be lists, strings, tuples, etc.

The following code shows the use of Counter.

from collections import Counter

# 创建一个Counter对象来统计列表中各元素的数量
print(Counter(['a','c','d','d','b','c','a'])) # Counter({'a': 2, 'c': 2, 'd': 2, 'b': 1})

# 创建一个Counter对象来统计字符串中各字符的数量
print(Counter('aabbacdd')) # Counter({'a': 3, 'b': 2, 'd': 2, 'c': 1})

# 创建一个Counter对象来统计字符串中各字符的数量
string_count = Counter('aabbacdd')

# Counter对象转换为字典,遍历输出键值对
for num, count in dict(string_count).items():
    print(num, count)

# 遍历Counter对象中的项,输出键值对
for item in string_count.items():
    print(item)
Counter({'a': 2, 'c': 2, 'd': 2, 'b': 1})
Counter({'a': 3, 'b': 2, 'd': 2, 'c': 1})
a 3
b 2
c 1
d 2
('a', 3)
('b', 2)
('c', 1)
('d', 2)

If you calculate the number of occurrences of words and the number of occurrences of characters in a string separately, the code is as follows:

from collections import Counter

line = '你好 世界 你好 !'

# 将字符串按空格拆分成单词列表
list_of_words = line.split() 
# 计算每个单词出现的次数
word_count = Counter(list_of_words)
# 打印每个单词及其出现的次数
print(word_count) # Counter({'你好': 2, '世界': 1, '!': 1})

line = '你好 世界 你好 !'

# 计算每个字符出现的次数
string_count = Counter(line)
# 打印每个字符及其出现的次数
print(string_count) # Counter({' ': 3, '你': 2, '好': 2, '世': 1, '界': 1, '!': 1})
Counter({'你好': 2, '世界': 1, '!': 1})
Counter({' ': 3, '你': 2, '好': 2, '世': 1, '界': 1, '!': 1})

The use of Counter related function function d is as follows:

from collections import Counter

# 创建一个Counter对象,用于统计元素出现的次数
word_count = Counter(['a', 'c', 'd', 'd', 'b', 'c', 'a'])

# 统计出现次数最多的两个元素并打印结果
print(word_count.most_common(2))  # [('a', 2), ('c', 2)]
# 若不指定个数,则列出全部元素及其出现次数
print(word_count.most_common())  # [('a', 2), ('c', 2), ('d', 2), ('b', 1)]

# 打印Counter对象中的元素迭代器
print(word_count.elements()) # <itertools.chain object at 0x7fd228db2110>
# 将元素迭代器转换为列表并打印
print(list(word_count.elements())) # ['a', 'a', 'c', 'c', 'd', 'd', 'b']
# 将元素迭代器排序后打印
print(sorted(word_count.elements())) # ['a', 'a', 'b', 'c', 'c', 'd', 'd']
# 对Counter对象进行排序后打印(按元素字典序排序)
print(sorted(word_count)) # ['a', 'b', 'c', 'd']
# 打印Counter对象的键(即元素)
print(word_count.keys()) # dict_keys(['a', 'c', 'd', 'b'])
# 打印Counter对象的值(即元素出现的次数)
print(word_count.values()) # dict_values([2, 2, 2, 1])
[('a', 2), ('c', 2)]
[('a', 2), ('c', 2), ('d', 2), ('b', 1)]
<itertools.chain object at 0x7efe4809fcd0>
['a', 'a', 'c', 'c', 'd', 'd', 'b']
['a', 'a', 'b', 'c', 'c', 'd', 'd']
['a', 'b', 'c', 'd']
dict_keys(['a', 'c', 'd', 'b'])
dict_values([2, 2, 2, 1])

To operate on a single element in Counter, the code is as follows:

from collections import Counter

# 创建一个 Counter 对象,统计列表中各元素的出现次数
word_count = Counter(['a', 'c', 'd', 'd', 'b', 'c', 'a'])

# 输出字母"c"的出现次数
print(word_count["c"]) # 2

# 更新 Counter 对象,添加新的元素并重新统计出现次数
word_count.update(['b', 'e'])
print(word_count) # Counter({'a': 2, 'c': 2, 'd': 2, 'b': 2, 'e': 1})

# 删除 Counter 对象中的元素"e"
del word_count["e"]
print(word_count) # Counter({'a': 2, 'c': 2, 'd': 2, 'b': 2})

# 将字母"f"的出现次数增加3
word_count['f'] += 3
print(word_count) # Counter({'f': 3, 'a': 2, 'c': 2, 'd': 2, 'b': 2})

# 计算两个Counter对象的交集
print(Counter('abc') & Counter('bde')) # Counter({'b': 1})
# 计算两个 Counter 对象的并集
print(Counter('abc') | Counter('bde')) # Counter({'a': 1, 'b': 1, 'c': 1, 'd': 1, 'e': 1})
2
Counter({'a': 2, 'c': 2, 'd': 2, 'b': 2, 'e': 1})
Counter({'a': 2, 'c': 2, 'd': 2, 'b': 2})
Counter({'f': 3, 'a': 2, 'c': 2, 'd': 2, 'b': 2})
Counter({'b': 1})
Counter({'a': 1, 'b': 1, 'c': 1, 'd': 1, 'e': 1})

4 defaultdict、OrderedDict

4.1 defaultdict

defaultdict is a class in the collections module of the Python standard library, which is a subclass of the dict class. The function of defaultdict is to create a dictionary. When accessing a key that does not exist in the dictionary, a KeyError exception will not be thrown, but a default value will be returned. The usage of defaultdict is as follows:

from collections import defaultdict

# 创建一个默认值为0的defaultdict对象d
d = defaultdict(int)

# 打印d中键'a'对应的值,由于键'a'不存在,所以返回默认值0
print(d['a']) # 0

# 将键'b'赋值为2
d['b'] = 2
print(d) # defaultdict(<class 'int'>, {'a': 0, 'b': 2})

# 将键'c'对应的值加1
d['c'] += 1
print(d) # defaultdict(<class 'int'>, {'a': 0, 'b': 2, 'c': 1})
0
defaultdict(<class 'int'>, {'a': 0, 'b': 2})
defaultdict(<class 'int'>, {'a': 0, 'b': 2, 'c': 1})

4.2 OrderedDict

OrderedDict is used similarly to an ordinary dictionary, the only difference is that it can set the order of elements.

from collections import OrderedDict

# 创建一个空的有序字典,按照元素添加的顺序进行遍历和访问
order_dict = OrderedDict()

# 添加键值对
order_dict['apple'] = 3
order_dict['banana'] = 2
order_dict['orange'] = 5
print(order_dict) # OrderedDict([('apple', 3), ('banana', 2), ('orange', 5)])

# 创建一个字典fruits
fruits = {
    
    'banana': 2, 'apple': 3, 'orange': 5}

# 按照键对字典进行排序并将其转换为有序字典
order_dict = OrderedDict(sorted(fruits.items(), key=lambda x: x[0]))
print(order_dict) # OrderedDict([('apple', 3), ('banana', 2), ('orange', 5)])

# 按照值对字典进行排序并将其转换为有序字典
order_dict = OrderedDict(sorted(fruits.items(), key=lambda x: x[1]))
print(order_dict) # OrderedDict([('banana', 2), ('apple', 3), ('orange', 5)])

# 按照键的长度对字典进行排序并将其转换为有序字典
order_dict = OrderedDict(sorted(fruits.items(), key=lambda x: len(x[0])))
print(order_dict) # OrderedDict([('apple', 3), ('banana', 2), ('orange', 5)])
OrderedDict([('apple', 3), ('banana', 2), ('orange', 5)])
OrderedDict([('apple', 3), ('banana', 2), ('orange', 5)])
OrderedDict([('banana', 2), ('apple', 3), ('orange', 5)])
OrderedDict([('apple', 3), ('banana', 2), ('orange', 5)])

5 ChainMap

ChainMap is used to conveniently merge multiple dictionary or mapping objects so that they can be operated as a whole. The specific usage is as follows:

from collections import ChainMap

employee1 = {
    
    'John': '001', 'Mary': '002', 'David': '003'}
employee2 = {
    
    'Lisa': '004', 'Michael': '005', 'Sarah': '006'}
employee3 = {
    
    'Peter': '007', 'Emily': '008', 'Ryan': '009'}

# 创建ChainMap对象
combined_employees = ChainMap(employee1, employee2, employee3)

# 打印出ChainMap中所有的字典,按照添加顺序
print(combined_employees.maps)

# 打印出ChainMap中所有键的列表,按照添加顺序
print(list(combined_employees.keys()))

# 打印出ChainMap中所有值的列表,按照添加顺序
print(list(combined_employees.values()))
[{'John': '001', 'Mary': '002', 'David': '003'}, {'Lisa': '004', 'Michael': '005', 'Sarah': '006'}, {'Peter': '007', 'Emily': '008', 'Ryan': '009'}]
['Peter', 'Emily', 'Ryan', 'Lisa', 'Michael', 'Sarah', 'John', 'Mary', 'David']
['007', '008', '009', '004', '005', '006', '001', '002', '003']

If the key-value overlap occurs in the objects to be merged, the order of addition will be followed when using ChainMap, and the dictionary added first shall prevail. In this case, after the same key value is merged, the value in the first dictionary will be taken as the value of the duplicate key. Specific examples are as follows:

from collections import ChainMap

# John项重复
employee1 = {
    
    'John': '001', 'Mary': '002'}
employee2 = {
    
    'Lisa': '004', 'John': '005'}

# 创建ChainMap对象
combined_employees = ChainMap(employee1, employee2)

print(combined_employees.maps)
print(list(combined_employees.keys()))
print(list(combined_employees.values()))
[{'John': '001', 'Mary': '002'}, {'Lisa': '004', 'John': '005'}]
['Lisa', 'John', 'Mary']
['004', '001', '002']

After creating a ChainMap object, you can also add new dictionary type children to it.

from collections import ChainMap

employee1 = {
    
    'John': '001', 'Mary': '002', 'David': '003'}
employee2 = {
    
    'Mary': '004', 'Michael': '005', 'Sarah': '006'}
employee3 = {
    
    'Peter': '007', 'Emily': '008', 'Ryan': '009'}

combined_employees = ChainMap(employee1, employee2, employee3)

# 创建字典employee4,包含员工编号信息
employee4 = {
    
    'Jack': '010', 'Halr': '011'}
# 使用new_child方法将employee4添加到combined_employees中
combined_employees = combined_employees.new_child(employee4)

print(combined_employees)
ChainMap({'Jack': '010', 'Halr': '011'}, {'John': '001', 'Mary': '002', 'David': '003'}, {'Mary': '004', 'Michael': '005', 'Sarah': '006'}, {'Peter': '007', 'Emily': '008', 'Ryan': '009'})

6 UserList、UserString、UserDict

6.1 UserList

UserList is a wrapper class for list, used to create a custom list class. As shown below, UserList can be operated like a normal list:

from collections import UserList

# 创建一个普通的Python列表
my_list = [13, 4, 1, 5, 7]

# 使用UserList类构造函数创建一个自定义列表对象,传入普通列表作为参数
# my_list可以通过UserList.data方法访问。
user_list = UserList(my_list)

# 打印自定义列表对象
print(user_list) # [13, 4, 1, 5, 7]

# 打印自定义列表对象的Python列表数据
print(user_list.data) # [13, 4, 1, 5, 7]

print(user_list[:-1]) # [13, 4, 1, 5]
[13, 4, 1, 5, 7]
[13, 4, 1, 5, 7]
[13, 4, 1, 5]

The advantage of UserList is that you can create a subclass that inherits from UserList to customize various methods of the list. Here is a simple example that overrides the append method:

from collections import UserList

class MyList(UserList):
    def __init__(self, initialdata=None):
        super().__init__(initialdata)

    def append(self, item):
        # 在添加元素时打印一条消息
        print("Appending", item)
        super().append(item)


# 创建一个MyList对象并添加元素
my_list = MyList([1, 2, 3])
my_list.append(4)
print(my_list) # [1, 2, 3, 4]
Appending 4
[1, 2, 3, 4]

6.2 UserString

UserString is used to create custom string classes. By inheriting the UserString class, you can create a custom mutable string object and use various string manipulation methods. As follows:

from collections import UserString

 # 自定义user_string类,继承自UserString类
class user_string(UserString):
   
    # 定义append方法,用于向字符串后追加内容
    def append(self, new):
        self.data = self.data + new

    # 定义remove方法,用于删除字符串中的指定内容
    def remove(self, s):
        self.data = self.data.replace(s, "")

text = 'dog cat lion elephant'

animals = user_string(text)
animals.append("monkey")

for word in ['cat', 'elephant']:
    animals.remove(word)
print(animals) # dog  lion monkey
dog  lion monkey

6.3 UserDict

UserDic is a dictionary type wrapper class, used to create custom dictionary classes. By inheriting the UserDict class, you can create a custom dictionary object. As follows:

from collections import UserDict

class MyDict(UserDict):
    def __init__(self, initialdata=None):
        super().__init__(initialdata)
    
    def __setitem__(self, key, value):
        # 在设置键值对时,将所有键转为大写
        super().__setitem__(key.upper(), value)

# 创建自定义字典对象
my_dict = MyDict()

# 添加键值对
my_dict['name'] = 'Alice'
my_dict['age'] = 25

# 输出字典内容
print(my_dict) # {'NAME': 'Alice', 'AGE': 25}

{'NAME': 'Alice', 'AGE': 25}

7 Reference

Guess you like

Origin blog.csdn.net/LuohenYJ/article/details/132697134