Dictionaries and Sets

1. Handling missing keys with setdefault

import sys
import re

WORD_RE = re.compile('\w+')

index = {}

print(sys.argv)

# Example 3-2
with open(sys.argv[1], encoding='utf-8') as fp:
    for line_no, line in enumerate(fp, 1):
        for match in WORD_RE.finditer(line):
            # finditer 返回的格式: <_sre.SRE_Match object; span=(0, 4), match='User'> ;
            # 既有匹配到的内容,也有该内容的位置, match.start() 和 match.end()分别表起始位置和结束位置
            word = match.group()
            # match.group() 返回匹配到的内容: 如  User
            column_no = match.start() + 1
            location = (line_no, column_no)

            # 以下为常规写法:
            occurrences = index.get(word, [])
            occurrences.append(location)
            index[word] = occurrences

for word in sorted(index, key=str.upper):   # 对字典进行排序
    print(word, index[word])

print("-----------------------")

# Example 3-4:handling missing keys with setdefault
index2 = {}

with open(sys.argv[1], encoding='utf-8') as fp:
    for line_no, line in enumerate(fp, 1):
        for match in WORD_RE.finditer(line):
            word = match.group()
            column_no = match.start() + 1
            occurrences = (line_no, column_no)
            # Missing keys with setdefault
            index2.setdefault(word, []).append(occurrences)
            # setdefault :有就用它原来的,没有则设置
            # Get the list of occurrences for word, or set it to [] if not found;
            # setdefault returns the value, so it can be updated without requiring a second search.

for word in sorted(index2, key=str.upper):
    print(word, index2[word])

# Output 示例:
# flasgger [(3, 6), (4, 6)]
# flask [(2, 6)]
# Flask [(2, 19)]
# from [(2, 1), (3, 1), (4, 1)]
# import [(1, 1), (2, 12), (3, 15), (4, 21)]
# jsonify [(2, 26)]
# random [(1, 8)]
# request [(2, 35)]
# Swagger [(3, 22)]
# swag_from [(4, 28)]
# utils [(4, 15)]

"""
The result of this line ...
    my_dict.setdefault(key, []).append(new_value)
... is the same as running ...
    if key not in my_dict:
        my_dict[key] = []
    my_dict[key].append(new_value)
... except that the latter code performs at least two searches for key --- three if not found --- while setdefault
does it all with a single lookup.
"""

2. Mapping with Flexible Key Lookup

2.1 defaultdict: Another Take on Missing Keys

示例代码如下:

import re
import sys
import collections

WORD_RE = re.compile('\w+')

index = collections.defaultdict(list)

with open(sys.argv[1], encoding='utf-8') as fp:
    for line_no, line in enumerate(fp, 1):
        for match in WORD_RE.finditer(line):
            word = match.group()
            column_no = match.start() + 1
            occurrences = (line_no, column_no)

            # defaultdict 示例:
            index[word].append(occurrences)

for word in sorted(index, key=str.upper):
    print(word, index[word])

# Output:
# flasgger [(3, 6), (4, 6)]
# flask [(2, 6)]
# Flask [(2, 19)]
# from [(2, 1), (3, 1), (4, 1)]
# import [(1, 1), (2, 12), (3, 15), (4, 21)]
# jsonify [(2, 26)]
# random [(1, 8)]
# request [(2, 35)]
# Swagger [(3, 22)]
# swag_from [(4, 28)]
# utils [(4, 15)]

"""
defaultdict:
How defaultdict works:
    When instantiating a defaultdict, you provide a callable that is used to produce default value whenever __getitem__
    is passed a nonexistent key argument.
    For example, given an empty defaultdict created as dd = defaultdict(list), if 'new_key' is not in dd, the 
    expression dd['new_key'] does the following steps:
        1. Call list() to create a new list.
        2. Inserts the list into dd using 'new_key' as key.
        3. Returns a reference to that list.
        
The callable that produces the default values is held in an instance attribute called default_factory.
If no default_factory is provided, the usual KeyError is raised for missing keys.

The default_factory of a defaultdict is only invoked to provide default values for __getitem__ calls, and not for the
other methods. For example, if dd is a defaultdict, and k is a missing key, dd[k] will call the default_factory to 
create a default value, but dd.get(k) still returns None.

The mechanism that makes defaultdict work by calling default_factory is actually the __missing__ special method, a
feature supported by all standard mapping.
"""

2.2 The __missing__ Method

示例代码如下:

""" StrKeyDict0 converts nonstring keys to str on lookup """


class StrKeyDict0(dict):

    def __missing__(self, key):
        if isinstance(key, str):    # 如果没有这个判断,self[k] 在没有的情况下会无限递归调用 __missing__
            raise KeyError(key)
        return self[str(key)]

    def get(self, key, default=None):
        """
        The get method delegates to __getitem__ by using the self[key] notation; that gives the opportunity for
        our __missing__ to act.
        :param key:
        :param default:
        :return:
        """
        try:
            return self[key]
        except KeyError:
            return default

    def __contains__(self, key):
        # 此时不能用 key in self (self 指 StrKeyDict0 的实例,就是一个字典)进行判断,
        # 因为 k in dict 也会调用 __contains__ ,所以会出现无限递归调用 __contains__
        return key in self.keys() or str(key) in self.keys()

# A better way to create a user-defined mapping type is to subclass collections.UserDict instead of dict.


"""
Underlying the way mappings deal with missing keys is the aptly named __missing__ method. This method is not defined in
the base dict class, but dict is aware of it: if you subclass dict and provide a __missing__ method, the standard 
dict.__getitem__ will call it whenever a key is not found, instead of raising KeyError.

The __missing__ method is just called by __getitem__ (i.e., for the d[k] operator). The presence of a __missing__ method
has no effect on the behavior of other methods that look up keys, such as get or __contains__ .
"""

小结: 对于字典中不存在的 key ,有三种方式进行处理: 1. setdefault  2. collections.defaultdict  3. __missing__ 方法 

end

猜你喜欢

转载自www.cnblogs.com/neozheng/p/12180100.html
今日推荐