The White Python study notes (nine) itertools depth analysis, full of dry goods (on)

Foreword

Today itertools share my own learning experience, itertools Python comes with a library containing a variety of very practical way, by the basic understanding, I found that can greatly enhance the work efficiency.

First details about itertools, I refer to the official documentation is Python 3.7: itertools - Creating iterators Functions for Efficient Looping , we are interested can go and see, there is no Chinese version, very sorry, there had to Tucao one, as what have Japanese, Korean, Chinese version does not keep up with it?

Story book rules, itertools Python3 I think is the coolest thing!

If you have not heard of it, then you're missing out on one of the greatest hidden treasure Python3 standard library, yes, I soon abandoned just shared collections module: the white of the Python study notes (seven) magical treasure Collections after all, men are big sub-trotters

There are many excellent online resources available for learning itertools function module. But I think that the official document is always a good starting point. This article is based essentially on documents collate from.

My overall feeling after learning that only know about itertools function features it contains is enough, do not have contests. The real power lies in the combination of these functions to create a fast, takes very little memory efficiency, beautiful and elegant code.

In this long article, I will conduct a comprehensive review of my learning process towards full copy every detail, before you start, if not quite know what my friends iterators and generators that science literacy may refer to the following:

Magic itertools ##

Sit help steady, we are ready to get on the train, according to the official definition of the document:

This module implements a number of iterator building blocks inspired by constructs from APL, Haskell, and SML. Each has been recast in a form suitable for Python.

Translated it probably is a realization of a number of modules constructed iterator, they inspired constructed from APL, Haskell and SML are ...... can improve efficiency and so on,

This mainly means that the function is on itertools iterator "operation" to produce more complex iterators. For example, consider the built-in zip () function, which any number of iterables as a parameter and returns an iterator in the tuple corresponding elements:


print(list(zip([1, 2, 3], ['a', 'b', 'c'])))
Out：[(1, 'a'), (2, 'b'), (3, 'c')]

复制代码

Here zip in the end how it works?

Like all other list, [1,2,3] and [ 'a', 'b', 'c'] is iterative, which means that they can return a time element. Technically, any implementation:

.__ iter __ ()
Or .__ getitem __ ()

Python object methods are available iteration. If you have questions in this regard, we can look at the tutorial mentioned in the preamble

About the iter () built-in function that, when invoked on a list of objects or other iterations x, x will return to their own iterator object:

iter([1, 2, 3, 4])  
iter((1,2,3,4))
iter({'a':1,'b':2})

Out：<list_iterator object at 0x00000229E1D6B940>
     <tuple_iterator object at 0x00000229E3879A90>
     <dict_keyiterator object at 0x00000229E1D6E818>
复制代码

Indeed, ZIP () function by calling on each parameter iter (), then use the next () propulsion iter () and returned by each iteration is the result of polymerization of tuples is achieved. zip () iterates over the returned tuple

And I write to you have to recall, before the white Python study notes (five) map, filter, reduce, zip summary to introduce the map () built-in function, in fact, is also something of an iterator operation it breaks, it simplest form a single argument function may be applied to each element of the sequence of iterations:

Template: map (func, sequence)

list(map(len, ['xiaobai', 'at', 'paris']))
Out: [7, 2, 5]
复制代码

Reference map template, not difficult to find: map () function by calling iter in the sequence (), use next () to promote the value returned by this iterator until the iterator is exhausted, and func applied to each step in the next (). In the example above, calling on each element len [ 'xiaobai', 'at', 'paris'] (), which returns an iterator whereby the length of each element in the list

Because iterator is iterative and therefore can zip () and map () iterator that generates a plurality of elements on a combination of iteration. For example, the following list of corresponding elements of the two sums:

a = [1, 2, 3]
b = [4, 5, 6]
list(map(sum, zip(a,b)))

Out: [5, 7, 9]

复制代码

This example explains how to build a good itertools so-called "iterators Algebra" meaning the function. We can itertools viewed as a set of building bricks can be combined to form a special "data pipe", as in the case as the sum.

In fact, in Python 3, if we used the map () and zip (), it has been used up itertools, because these two functions return iterators!

We use this itertools inside the so-called "iterators Algebra" The benefits are twofold:

Improve memory efficiency (lazy evaluation)
Speed

There may be benefits to both friends have questions, do not worry, we can analyze a specific scenario:

Now we have a list and a positive integer n, to prepare a list split into groups as a function of length of n. For simplicity, assume that the input length of the list may be divisible by n. For example, if the input = [1,2,3,4,5,6], and n = 2, the function should return [(1,2), (3,4), (5,6)].

We first thought the solution might be as follows:

def naive_grouper(lst, n):
    num_groups = len(lst) // n
    return [tuple(lst[i*n:(i+1)*n]) for i in range(num_groups)]
复制代码

Our simple test, the result is correct:

nums = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
naive_grouper(nums, 2)
Out: [(1, 2), (3, 4), (5, 6), (7, 8), (9, 10)]

复制代码

But the question is, what happens if we try to pass a list that contains 100 million elements? We need a lot of memory! Even if there is enough memory, the program will be suspended for some time, until the final result is generated

If we use this time itertools inside iterator can greatly improve the situation:

def better_grouper(lst, n):
    iters = [iter(lst)] * n
    return zip(*iters)
复制代码

This method contains a large amount of information a little, we are now a more open look, the expression [iters (lst)] * n n created a list of references to the same iterator:

nums = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
iters = [iter(nums)] * 2
list(id(itr) for itr in iters)    # Id 没有变化，就是创建了n个索引

Out: [1623329389256, 1623329389256]

复制代码

Next, the iterator returns a corresponding element on the zip (* iters) is in each iteration of iters. When the first element 1 is taken from the "first" iterator "second" iterator from 2 to start now, because it's just a reference to the "first" iterator, so take a step forward. Thus, zip () generates a first tuple is (1,2).

At this time, iters so-called "two" iterators 3 starts, when the zip () 3 drawn out from the "first" in the iterator, which is obtained from the 4-tuple to produce a "second" ( 3, 4). This process continues to zip () final generation (9, 10) and iters the "two" iterators are used up:

Note: Here the "first", "second", "two" are pointing to an iterator, because id no change! !

Finally, we find that the result is the same:

nums = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
list(better_grouper(nums, 2))

Out: [(1, 2), (3, 4), (5, 6), (7, 8), (9, 10)]
复制代码

But here I did a test and found that the two memory consumption is a big difference, but also in the use of the iter zip () + combination, to perform more than 500 times faster, we are interested can test yourself, put into nums range (100000000) to

Now let's look at just written better_grouper (lst, n) method, not difficult to find, there is an obvious drawback of this method: if we pass the length divisible by n can not be lst, there will be a significant problem when executed:

>>> nums = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> list(better_grouper(nums, 4))
[(1, 2, 3, 4), (5, 6, 7, 8)]
复制代码

Missing elements 9 and 10 in the packet output. This occurs because once it is passed to the shortest number of iterations is exhausted, zip () elements will stop the polymerization. And we want is not missing any elements. So the solution is that we can use itertools.zip_longest () it can accept any number of parameters iterables and fillvalue this keyword, the default is None. We look at a simple example

>>> import itertools as it
>>> x = [1, 2, 3, 4, 5]
>>> y = ['a', 'b', 'c']

>>> list(zip(x, y))                     # zip总是执行完最短迭代次数停止
[(1, 'a'), (2, 'b'), (3, 'c')]

>>> list(it.zip_longest(x, y))          
[(1, 'a'), (2, 'b'), (3, 'c'), (4, None), (5, None)]

复制代码

This example has been very clearly reflects the zip () and zip_longest () difference, and now we can optimize the better_grouper method:

import itertools as it


def grouper(lst, n, fillvalue=None):
    iters = [iter(lst)] * n
    return it.zip_longest(*iters, fillvalue=fillvalue)  #  默认就是None
    
复制代码

We look at the optimized test:

>>> nums = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> print(list(grouper(nums, 4)))
[(1, 2, 3, 4), (5, 6, 7, 8), (9, 10, None, None)]
复制代码

Has been very good, and you students may not be aware, we have just done is to create all the whole process itertools inside grouper method!

Now let's look at a real official document grouper method is written therein:

And we write essentially the same, except iterable can accept multiple parameters, use the * args

Finally contented test:

from itertools import zip_longest

def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    return zip_longest(*args, fillvalue=fillvalue)
    
test_list = list(grouper("ABCDEFG",3))
test_tuple = tuple(grouper(range(0,7),2,'Null'))
test_dict = dict(zip(test_list,test_tuple))


复制代码

Output:

print(test_list)
Out:[('A', 'B', 'C'), ('D', 'E', 'F'), ('G', None, None)]
复制代码

print(test_tuple)
Out:((0, 1), (2, 3), (4, 5), (6, 'Null'))
复制代码

print(test_dict)
Out: {('A', 'B', 'C'): (0, 1), ('D', 'E', 'F'): (2, 3), ('G', None, None): (4, 5)}
复制代码

Violence solving (brute force)

First, the basic concept of literacy, is an algorithm for solving the so-called violence in simple terms is the use of an enumeration of all the circumstances, or the other way and do not have a lot of computing skills, to solve the problem. After I read the broad concept of violence algorithm, the first thought is actually the Tomb of Wang Fat

If you have seen Tomb friend, you will find that the king fat is actually a respected man of violence to solve, reposition itself in the numerous difficulties encountered "enumeration method" is to solve the violence, for example, I was most impressed Genting Temple in a pedestrian trapped in a closet full of jewelry you can not escape, Wang fat by enumerating rule out all possibilities, directly from the final solution "around ghosts" of.

PS: Here salute the South to send Uncle, and those who do not fill in the pit of his

Digress, back to reality, we often encounter the following classic topics:

You have three $ 20 bills, five $ 10 bills, two $ 5 bills and five $ 1 bills. How many different ways can get $ 100?

To brute force the issue, we just put out a list of all the possibilities of combinations, and then find the $ 100 combination can be, first of all, let's create a list, containing all the dollars in our hands:

bills = [20, 20, 20, 10, 10, 10, 10, 10, 5, 5, 1, 1, 1, 1, 1]

复制代码

Here itertools will help us. The itertools.combinations () accepts two parameters

An iterative input
Positive integer n

Eventually generates an iterator in the input element on all combinations of n elements.

import  itertools as it

bills = [20, 20, 20, 10, 10, 10, 10, 10, 5, 5, 1, 1, 1, 1, 1]

result =list(it.combinations(bills, 3))
print(len(result))  # 455种组合
print(result)

Out: 455
     [(20, 20, 20), (20, 20, 10), (20, 20, 10), ... ]
复制代码

I am the only remaining high school math tells me this is actually inside a probability of C 15 (subscript), 3 (superscript) problem, well, now we have a variety of combinations, so we only need in various combinations the total number of selected equal to 100, the problem is solved:

makes_100 = []
for n in range(1, len(bills) + 1):
    for combination in it.combinations(bills, n):
        if sum(combination) == 100:
            makes_100.append(combination)
复制代码

The results thus obtained are included repeated combinations, we can filter out duplicate values in the final with a straight set, finally get the answer:

import  itertools as it

bills = [20, 20, 20, 10, 10, 10, 10, 10, 5, 5, 1, 1, 1, 1, 1]

makes_100 = []
for n in range(1, len(bills) + 1):
    for combination in it.combinations(bills, n):
        if sum(combination) == 100:
            makes_100.append(combination)

print(set(makes_100))

Out：{(20, 20, 10, 10, 10, 10, 10, 5, 1, 1, 1, 1, 1),
      (20, 20, 10, 10, 10, 10, 10, 5, 5),
     (20, 20, 20, 10, 10, 10, 5, 1, 1, 1, 1, 1),
     (20, 20, 20, 10, 10, 10, 5, 5),
     (20, 20, 20, 10, 10, 10, 10)}
     
复制代码

So in the end we found a total of five ways. Now let's change a topic the question is asked, it is completely different:

Now take $ 100 bill into change, you can use any number of $ 50 to $ 20, $ 10, $ 5 and $ 1 bills, how many ways?

In this case, we do not have a predetermined amount of the bills, so we need a way to generate bills using any number of all possible combinations. To this end, we need to use ** itertools.combinations_with_replacement () ** function.

It is like Combination () as input accepted input and iteration n positive integers, and returns an iterator inputted from the n-tuple. Except that combination_with_replacement () allows repeated element which returns the tuple to see a small chestnut:

>>> list(it.combinations_with_replacement([1, 2], 2))   #自己和自己的组合也可以
[(1, 1), (1, 2), (2, 2)]

复制代码

Contrast itertools.combinations ():

>>> list(it.combinations([1, 2], 2))   #不允许自己和自己的组合
[(1, 2)]
复制代码

So for the new issue, the following solution:

bills = [50, 20, 10, 5, 1]
make_100 = []
for n in range(1, 101):
    for combination in it.combinations_with_replacement(bills, n):
        if sum(combination) == 100:
            makes_100.append(combination)
复制代码

The end result we do not need to go heavy, because this method will not be repeated in combination:

>>> len(makes_100)
343
复制代码

If you run it yourself, you may notice that output will take some time. That's because it has to deal with 96,560,645 combinations! Here we are in the implementation of violence to solve

Another "violence" function is itertools permutations (), which accepts a single iterable and generate all possible elements which are arranged (rearranged):

>>> list(it.permutations(['a', 'b', 'c']))
[('a', 'b', 'c'), ('a', 'c', 'b'), ('b', 'a', 'c'),
 ('b', 'c', 'a'), ('c', 'a', 'b'), ('c', 'b', 'a')]
复制代码

Any of the three elements of iterable (such as list) will have six arranged, and the number of objects arranged in a long iteration grow very fast. In fact, the length of iterables n are n! arrangement:

Only a few results of a large amount of input phenomenon known as combinatorial explosion in the use of combination (), combinations_with_replacement () and permutations () We need to keep this in mind.

To be honest, the algorithm is usually best to avoid violence, but sometimes we may have to use (such as essential correctness of the algorithm, or must consider every possible result)

summary

Due to limited space, I will share here, in this article we are mainly in-depth understanding of the basic principles of the following functions:

map（）
zip（）
itertools.combinations
itertools.combinations_with_replacement
itertools.permutations

In the next article I will first summarize the last three, and then continue to share all kinds of magical things inside itertools

Reproduced in: https: //juejin.im/post/5d075227f265da1bad570620