Python笔记：itertools库简介

Python笔记：itertools库简介

1. itertools库是什么

itertools库是python中的一个专门用于高效处理迭代问题的内置函数库。

就我个人而言，其中我最常使用的是其中的四个函数，分别为：

repeat()
accumulate()
permutations()
combinations()

至于其他的函数，坦率地说我倒是基本都没怎么用过，不过这次既然打算写了，就一次性把这些都写了吧。

最后提一嘴，itertools的官方文档写的真的很好，强烈建议直接去看官方文档，这里估计也就是把各个函数的功能全部提一嘴，官方文档里面还给出了具体的python代码实现，更有利于对函数的深刻理解。

所以，如果真的想好好了解一下各个函数的具体实现，强烈，强烈，建议直接看官方文档！

2. itertools库函数简介

itertools库内置有19个函数，分别从属于三个大类，分别为：

元素迭代相关
排列组合相关
其他内置函数

下面，我们来逐次对其进行考察。

1. 元素迭代相关

1. count

计数器函数，就是一个无限的计数器，代码实现可以借下述python代码进行理解：

def count(start, step):
    while True
        yield start
        start += step

2. cycle

就是一个循环取元素的迭代器，功能可以借下述python代码进行理解：

def cycle(iterable):
    while True:
        for elem in iterable:
            yield elem

3. repeat

就是一个对某一元素进行重复输出的函数，其代码功能与下述实现相同：

def repeat(object, times=None):
    if time is None:
        while True:
            yield object
    else:
        for _ in range(times):
            yield object

2. 排列组合相关

1. product

product函数就是一个全排列函数，对于一系列候选集，给出其全排列。

官网中给出的python代码实现样例如下：

def product(*args, repeat=1):
    # product('ABCD', 'xy') --> Ax Ay Bx By Cx Cy Dx Dy
    # product(range(2), repeat=3) --> 000 001 010 011 100 101 110 111
    pools = [tuple(pool) for pool in args] * repeat
    result = [[]]
    for pool in pools:
        result = [x+[y] for x in result for y in pool]
    for prod in result:
        yield tuple(prod)

上述代码本身具有自解释性，这里就不用再多说了。

2. permutations

permutations(iterable, r=None)函数用于处理排列组合问题中的排列问题。

假设iterable中存在有n个元素，则输出就是一个所有 $A_n^r$ 个元素的有序全排列。

其python代码实现示例可以直接看官网中的实现。

3. combinations

combinations(iterable, r=None)函数用于处理排列组合问题中的组合问题。

假设iterable中存在有n个元素，则输出就是一个所有 $C_n^r$ 个元素的有序全排列。

其python代码实现示例可以直接看官网中的实现。

4. combinations_with_replacement

这个函数功能比较奇葩，大致类似于combinations与permutations的组合，第i+1个候选元素为从包含第i个元素开始的后面所有元素。

结合python实现来看会更加清晰一点：

def combinations_with_replacement(iterable, r=None):
    n = len(iterable)
    r = n if r is None else r
    res = []
    def dp(idx, r, history):
        if r == 0:
            res.append(tuple(history))
            return
        for i in range(idx, n):
            dp(i, r-1, history + [iterable[i]])
    dp(0, r, [])
    for it in res:
        yield it

print(list(combinations_with_replacement("abc", 2)))
# [('a', 'a'), ('a', 'b'), ('a', 'c'), ('b', 'b'), ('b', 'c'), ('c', 'c')]

3. 其他内置函数

1. accumulate

累加函数，功能类似于matlab中的cumsum，唯一的区别在于返回的是一个迭代器。

直接看代码可能就足够直观了：

def accumulate(iterable):
    tot = 0
    for it in iterable:
        tot += it
        yield tot

当然，真实的情况会更复杂一点，但是就功能说明而言上面这部分足以说明一切了。

2. chain & chain.from_iterable

从功能来说这两个函数完全一致，无非就是传参方面稍微有点区别，他们的功能都是直接将一系列的iterable拆成原子单元然后通过迭代器方式返回。

同样，直接给出解释代码如下：

def chain(*iterables):
    for iterable in iterables:
        for elem in iterable:
            yield elem

def chain.from_iterable(iterables):
    for iterable in iterables:
        for elem in iterable:
            yield elem

3. compress

compress函数的功能就是一个filter函数，根据输入中的selector向量选择其中真值对应的元素进行输出。

用代码说明会更加清晰一点：

def compress(data, selectors):
    for d, s in zip(data, selectors):
        if s:
            yield d

4. dropwhile

dropwhile函数的功能与do while极其相似，但是功能相反，就是以迭代器的方式输出从一个不满足判定条件开始的所有元素。

同样用代码解释如下：

def dropwhile(predicate, iterable):
    drop = True
    for it in iterable:
        if drop and not predicate(it):
            drop = False
        if not drop:
            yield it

5. filterfalse

filterfalse函数就是一个过滤函数，返回一个迭代器，输出过滤条件判断为否的数据。

用代码语言表达如下：

def filterfalse(predicate, iterable):
    for x in iterable:
        if not predicate(x):
            yield x

6. groupby

groupby函数用于将向量中连续的元素打包成一个迭代器之后再以迭代器的方式进行输出。

用代码语言解释如下：

def groupby(iterable, key_func=None):
    n = len(iterable)
    i = 0 
    while i < n:
        j = i+1
        while j < n and iterable[j] == iterable[i]:
            j += 1
        x = iterable[i] if key_func is None else key_func(iterable[i])
        yield (x, (item for item in iterable[i:j]))
        i = j

7. islice

islice函数就是一个列表的元素选取函数，其定义有两种：

def islice(iterable, stop):
    stop = len(iterable) if stop is None else stop
    for x in iterable[:stop]:
        yield x

def islice(iterable, start, stop, step=1):
    stop = len(iterable) if stop is None else stop
    for x in iterable[start:end:step]:
        yield x

8. starmap

starmap函数则是相当于map函数的迭代器版本，其功能如下：

def starmap(function, iterable):
    for args in iterable:
        yield function(*args)

9. takewhile

takewhile和dropwhile功能刚好相反，一直取用元素直至判断条件不再满足。

def takewhile(predicate, iterable):
    for x in iterable:
        if predicate(x):
            yield x
        else:
            break

10. tee

tee函数坦率地说没看出来有啥用，功能倒是可以用如下python代码直观的看出来，但实在没看出来有啥用处，反正我也没用到过。。。

如果有了解的读者可以在评论区答复一下。

def tee(iterable, n):
    return tuple([iter(iterable) for _ in range(n)])

11. zip_longest

zip_longest的功能是将一串向量按照最长序列进行补足后以迭代器的方式进行输出。

用代码解释如下：

def zip_longest(*iterables, fillvalue=None):
    def get_element(iterable, idx):
        return fillvalue if idx >= len(iterable) else iterable[idx]
        
    n = max(len(iterable) for iterable in iterables)
    for i in range(n):
        yield tuple([get_element(x, i) for x in iterables])

3. 参考链接

https://docs.python.org/3/library/itertools.html