[Python skills] is the time to replace the dictionary with the defaultdict and Counter

Disclaimer: This article is a blogger original article, shall not be reproduced without the bloggers allowed. https://blog.csdn.net/lc013/article/details/91813812

We adopt dicttime, generally you need to determine whether the key exists, if not, set a default value, exists to take other action, but this approach actually need to write a few lines of code, is there a more efficient writing can reduce code, but without compromising the readability of it, after all, as programmers, we all want to write clean code is available and effective.

Today saw an article, the author describes the use defaultdictand Counterinstead dictionarymay write more concise and readable code than high, so today it simple translation of the article, and subsequent brief introduction of these two data types.

original:

https://towardsdatascience.com/python-pro-tip-start-using-python-defaultdict-and-counter-in-place-of-dictionary-d1922513f747

About the dictionary, you can also see what I wrote before Basics _2 basic grammar and variable types Python .

This article directory:

  • Counter 和 defaultdict
  • Defaultdict Why do you use it?
  • defaultdict definition and use
  • The definition and use of Counter

Learning a programming language is very simple, when you learn a new language, I will focus on knowledge in the following order, and began to write code in a new language is very simple:

  • Data types and operators : +, -, int, float , str
  • Conditional statements : if, else, case, switch
  • Loop : For, while
  • Data structure : List, Array, Dict, Hashmaps
  • Defined Functions

But it can write code to write elegant and efficient code is two things, each language has its own unique place.

Therefore, the novice a programming language will always be excessive write more code, for example, for Java developers, after learning Python, write some code to achieve the summation of a set of numbers, the following will be like this:

x=[1,2,3,4,5]
sum_x = 0
for i in range(len(x)):
    sum_x+=x[i]

But for a Python veteran, his code is:

sum_x = sum(x)

So the next will open a series of articles titled "Python Shorts", and introduces the concept of Python provides some simple and useful tips and examples of use, this series goal is to write efficient and readable code.

Counter 和 defaultdict

Here's an example of code to achieve is to count the number of words in a text appears, such as "Hamlet," what should we do it?

Python can be achieved in a variety of methods, but only one is more elegant, and that is achieved using native Python - the dictdata type.

Code as follows:

# count the number of word occurrences in a piece of text
text = "I need to count the number of word occurrences in a piece of text. How could I do that? " \
       "Python provides us with multiple ways to do the same thing. But only one way I find beautiful."

word_count_dict = {}
for w in text.split(" "):
    if w in word_count_dict:
        word_count_dict[w] += 1
    else:
        word_count_dict[w] = 1

Herein may also be applied defaultdictto reduce the number of lines of code:

from collections import defaultdict
word_count_dict = defaultdict(int)
for w in text.split(" "):
    word_count_dict[w] += 1

Use Counteralso can be done:

from collections import Counter
word_count_dict = Counter()
for w in text.split(" "):
    word_count_dict[w] += 1

Counter There is another way, more concise:

word_counter = Counter(text.split(" "))

Counter In fact, a counter, which itself is applied to a given number of statistical variables of an object, therefore, we can also get the highest number of words appear:

print('most common word: ', word_count_dict.most_common(10))

Output is as follows:

most common word:  [('I', 3), ('the', 2), ('of', 2), ('do', 2), ('to', 2), ('multiple', 1), ('in', 1), ('way', 1), ('us', 1), ('occurrences', 1)]

Some other examples of applications:

# Count Characters
print(Counter('abccccccddddd'))  
# Count List elements
print(Counter([1, 2, 3, 4, 5, 1, 2]))  

Output:

Counter({'c': 6, 'd': 5, 'a': 1, 'b': 1})
Counter({1: 2, 2: 2, 3: 1, 4: 1, 5: 1})

Defaultdict Why do you use it?

Since Counterso easy to use, it is not only Counteron it? Of course the answer is no, because Counterthe problem is that it must be an integer value in itself for the number of statistics, so if we need is a string value, or a list of tuples, then it can not continue to use it.

This time, defaultdictcomes in handy. It is compared to the dictbiggest difference is that you can set the default value, even if keydoes not exist. Examples are as follows:

s = [('color', 'blue'), ('color', 'orange'), ('color', 'yellow'), ('fruit', 'banana'), ('fruit', 'orange'),
     ('fruit', 'banana')]
d = defaultdict(list)
for k, v in s:
    d[k].append(v)
print(d)  

Output:

defaultdict(<class 'list'>, {'color': ['blue', 'orange', 'yellow'], 'fruit': ['banana', 'orange', 'banana']})

Here is the dictionary in advance all values are initialized to an empty list, and if it is passed in the collection set:

s = [('color', 'blue'), ('color', 'orange'), ('color', 'yellow'), ('fruit', 'banana'), ('fruit', 'orange'),
     ('fruit', 'banana')]
d = defaultdict(set)
for k, v in s:
    d[k].add(v)
print(d)

Output:

defaultdict(<class 'set'>, {'color': {'blue', 'yellow', 'orange'}, 'fruit': {'banana', 'orange'}})

It should be noted that the added element method is not the same as the set list and the list is list.append(), and the collection is set.add().


Next is to add some definitions and methods of these two data types, mainly explained with reference to official documents.

defaultdict definition and use

About defaultdict, the official presentation of the document are:

class collections.defaultdict([default_factory[, …]])

It returns an object similar to a new dictionary. defaultdict is built dict subclass. It overrides a method and added a writable instance variable. The remaining functions dict same class, will not be repeated here.

The first parameter default_factory provides an initial value. It defaults to None. All other parameters were equivalent in treated and control parameters dict construct, including keyword parameter.

In dicta method setdefault(), which may be implemented using a relatively simple code:

s = [('color', 'blue'), ('color', 'orange'), ('color', 'yellow'), ('fruit', 'banana'), ('fruit', 'orange'),('fruit', 'banana')]
a = dict()
for k, v in s:
    a.setdefault(k, []).append(v)
print(a)

But the official also said that the document, defaultdictthe realization of this method is faster and easier than others.

The definition and use of Counter

Chinese official documents Description:

class collections.Counter([iterable-or-mapping])

Is a subclass of a dict Counter for counting can hash object. It is a collection of elements such as dictionary keys (key) stored in the same, they are stored as a count value. Count may be any integer value including zero and negative. Counter class a bit like the other languages ​​of bags or multisets.

Here, it should not just be a hash objects, as well as iterables, or a list of objects belonging to non-hash, hash whether, in fact, is to see whether the data type implements __hash__methods:

a = (2, 1)
a.__hash__()

Output:

3713082714465905806

The list:

b=[1,2]
b.__hash__()

Error:

TypeError: 'NoneType' object is not callable

Of course, also mentioned previously, call hash()method, it may determine whether a type of data may be hashed, and hash the data types are available immutable data type.

For Counter, also can be initialized by keyword:

c = Counter(cats=4, dogs=8)
print(c)

Output:

Counter({'dogs': 8, 'cats': 4})

CounterThe methods, in addition to the introduction most_common(), there are:

  • elements(): Returns an iterator, which will all appear in accordance with the number of repeated elements na, and returns any order, but if the number of elements less than 1 count is ignored, the following examples:
c = Counter(a=4, b=2, c=0, d=-2)
sorted(c.elements())
# ['a', 'a', 'a', 'a', 'b', 'b']
  • subtract(): Subtraction operation, input and output may be 0 or negative
c = Counter(a=4, b=2, c=0, d=-2)
d = Counter(a=1, b=2, c=3, d=4)
c.subtract(d)
print(c)
# Counter({'a': 3, 'b': 0, 'c': -3, 'd': -6})

In addition, there are the following methods:

# 求和
sum(c.values())                
# 清空 Counter
c.clear()                      
# 转换为 列表
list(c)                         
# 转换为 集合
set(c)                          
# 转换为 字典
dict(c)                        
# 键值对
c.items()                       
# 
Counter(dict(list_of_pairs))    
# 输出 n 个最少次数的元素
c.most_common()[:-n-1:-1]       
# 返回非零正数
+c      
# 返回负数
-c

Furthermore, operators may be used +,-,&,|, each with different implementations action:

c = Counter(a=3, b=1)
d = Counter(a=1, b=2)
# 加法操作 c[x] + d[x]
print(c + d)    # Counter({'a': 4, 'b': 3})                 
# 减法,仅保留正数
print(c - d )   # Counter({'a': 2})                 
# 交集:  min(c[x], d[x]) 
print(c & d)    # Counter({'a': 1, 'b': 1})             
# 并集:  max(c[x], d[x])
print(c | d)    # Counter({'a': 3, 'b': 2})

reference:


summary

If you need to count the occurrences of words such as computing, the use Counteris a good choice, very simple, readability is high; and if you need to save the data is not an integer, and are unified certain type, such as all lists, then used directly defaultdictto define a variable object, than with a dictbetter choice.

Finally, the code examples in this article have been uploaded to the Github:

https://github.com/ccc013/Python_Notes/blob/master/Python_tips/defaultdict_and_counter.ipynb

I welcome the attention of the public micro-channel number - the algorithm ape growth , or scan the QR code below, to share, learn and progress!

Guess you like

Origin blog.csdn.net/lc013/article/details/91813812