Iterator appreciated, the generator, the yield, iterables

Original: https://foofish.net/iterators-vs-generators.html
article from RQ author of a blog post, the original is Iterables vs. vs. Iterators Generators , I wrote this article is done according to their own understanding reference translation, is not really the translation of the original version, we recommend reading the original, thank you friends correction.

In understanding the data structure Python, many concepts of the container (Container), iterables (Iterable), iterator (Iterator), generator (Generator), a list / set / dictionary derivations (list, set, dict comprehension) parameters mixed together, it is inevitable for beginners confused, I will try to use the article stroke clear relationship between these concepts and their relationships.

Container (container)

The container is a structure of a plurality of data elements are grouped together, the container element can be individually acquired iterations can be used in, not inthe keyword is determined whether the element contained in the container. Such data structures typically all the elements stored in the memory (with some exceptions, not all elements are placed in memory, such as iterators and generator object) in Python, there is a common container objects:

  • list, and ....
  • set, frozensets, ....
  • dict, defaultdict, OrderedDict, Counter, ....
  • tuple, namedtuple, …
  • str

Container easier to understand, because you can see it as a box, a house, a cupboard, which can plug anything. From a technical point of view, when it can be used to ask whether an element is included therein, the object can be considered a container, such as list, set, tuples are container object:

>>> assert 1 in [1, 2, 3]      # lists
>>> assert 4 not in [1, 2, 3]
>>> assert 1 in {1, 2, 3}      # sets
>>> assert 4 not in {1, 2, 3}
>>> assert 1 in (1, 2, 3)      # tuples
>>> assert 4 not in (1, 2, 3)

Asked whether an element used in dict dict the key:

>>> d = {1: 'foo', 2: 'bar', 3: 'qux'}
>>> assert 1 in d
>>> assert 'foo' not in d  # 'foo' 不是dict中的元素

Asks whether a substring in the string:

>>> s = 'foobar'
>>> assert 'b' in s
>>> assert 'x' not in s
>>> assert 'foo' in s 

Although the vast majority of container provides some way to get every one of the elements, but this capability is not provided by the container itself, but iterable given a container of this ability, of course, not all containers are available iteration, such as: Bloom filter , though Bloom filter can be used to detect whether an element is contained in a container, obtaining a value of each of them, but not from the container, as the Bloom filter are fundamentally no element stored in the container, but it is mapped to a value stored in the array by a hash function.

Iterables (iterable)

Just said, many containers are iterables, in addition to more objects is also iterable, such as in a state of open files, sockets, and so on. But those who can return an iterator object can be called iterables, may sound a bit confusing, it does not matter, look at an example:

>>> x = [1, 2, 3]
>>> y = iter(x)
>>> z = iter(x)
>>> next(y)
1
>>> next(y)
2
>>> next(z)
1
>>> type(x)
<class 'list'>
>>> type(y)
<class 'list_iterator'>

Here xis an iterator object is a container and iterables is a popular name for the same, does not refer to a certain specific data type, list objects is iterative, dict is iterables, SET is iterables. yAnd ztwo separate iterator, iterator held inside a state that is used to record the location of the current iteration to the next iteration convenient time to get the right elements. Iterator has a particular iterator type, for example list_iterator, set_iterator. Iterables achieve a __iter__method, which returns an iterator object.

When you run the code:

x = [1, 2, 3]
for elem in x:
    ...

The actual implementation is:

Decompile the sections of the code, you can see the interpreter to call explicitly GET_ITERcommand, equivalent to calling iter(x), FOR_ITERthe instruction is to call the next()method, continue to get the next element in the iterator, but you can not see it in directly from the command, because he optimized interpreter before.

>>> import dis
>>> x = [1, 2, 3]
>>> dis.dis('for _ in x: pass')
  1           0 SETUP_LOOP              14 (to 17)
              3 LOAD_NAME                0 (x)
              6 GET_ITER
        >>    7 FOR_ITER                 6 (to 16)
             10 STORE_NAME               1 (_)
             13 JUMP_ABSOLUTE            7
        >>   16 POP_BLOCK
        >>   17 LOAD_CONST               0 (None)
             20 RETURN_VALUE

Iterator (iterator)

那么什么迭代器呢?它是一个带状态的对象,他能在你调用next()方法的时候返回容器中的下一个值,任何实现了__iter____next__()(python2中实现next())方法的对象都是迭代器,__iter__返回迭代器自身,__next__返回容器中的下一个值,如果容器中没有更多元素了,则抛出StopIteration异常,至于它们到底是如何实现的这并不重要。

所以,迭代器就是实现了工厂模式的对象,它在你每次你询问要下一个值的时候给你返回。有很多关于迭代器的例子,比如itertools函数返回的都是迭代器对象。

生成无限序列:

>>> from itertools import count
>>> counter = count(start=13)
>>> next(counter)
13
>>> next(counter)
14

从一个有限序列中生成无限序列:

>>> from itertools import cycle
>>> colors = cycle(['red', 'white', 'blue'])
>>> next(colors)
'red'
>>> next(colors)
'white'
>>> next(colors)
'blue'
>>> next(colors)
'red'

从无限的序列中生成有限序列:

>>> from itertools import islice
>>> colors = cycle(['red', 'white', 'blue'])  # infinite
>>> limited = islice(colors, 0, 4)            # finite
>>> for x in limited:                         
...     print(x)
red
white
blue
red

为了更直观地感受迭代器内部的执行过程,我们自定义一个迭代器,以斐波那契数列为例:

class Fib:
    def __init__(self):
        self.prev = 0
        self.curr = 1

    def __iter__(self):
        return self

    def __next__(self):
        value = self.curr
        self.curr += self.prev
        self.prev = value
        return value

>>> f = Fib()
>>> list(islice(f, 0, 10))
[1, 1, 2, 3, 5, 8, 13, 21, 34, 55]

Fib既是一个可迭代对象(因为它实现了__iter__方法),又是一个迭代器(因为实现了__next__方法)。实例变量prevcurr用户维护迭代器内部的状态。每次调用next()方法的时候做两件事:

  1. 为下一次调用next()方法修改状态
  2. 为当前这次调用生成返回结果

迭代器就像一个懒加载的工厂,等到有人需要的时候才给它生成值返回,没调用的时候就处于休眠状态等待下一次调用。

生成器(generator)

生成器算得上是Python语言中最吸引人的特性之一,生成器其实是一种特殊的迭代器,不过这种迭代器更加优雅。它不需要再像上面的类一样写__iter__()__next__()方法了,只需要一个yiled关键字。 生成器一定是迭代器(反之不成立),因此任何生成器也是以一种懒加载的模式生成值。用生成器来实现斐波那契数列的例子是:

def fib():
    prev, curr = 0, 1
    while True:
        yield curr
        prev, curr = curr, curr + prev

>>> f = fib()
>>> list(islice(f, 0, 10))
[1, 1, 2, 3, 5, 8, 13, 21, 34, 55]

fib就是一个普通的python函数,它特殊的地方在于函数体中没有return关键字,函数的返回值是一个生成器对象。当执行f=fib()返回的是一个生成器对象,此时函数体中的代码并不会执行,只有显示或隐示地调用next的时候才会真正执行里面的代码。

生成器在Python中是一个非常强大的编程结构,可以用更少地中间变量写流式代码,此外,相比其它容器对象它更能节省内存和CPU,当然它可以用更少的代码来实现相似的功能。现在就可以动手重构你的代码了,但凡看到类似:

def something():
    result = []
    for ... in ...:
        result.append(x)
    return result

都可以用生成器函数来替换:

def iter_something():
    for ... in ...:
        yield x

生成器表达式(generator expression)

生成器表达式是列表推倒式的生成器版本,看起来像列表推导式,但是它返回的是一个生成器对象而不是列表对象。

>>> a = (x*x for x in range(10))
>>> a
<generator object <genexpr> at 0x401f08>
>>> sum(a)
285

总结

  • 容器是一系列元素的集合,str、list、set、dict、file、sockets对象都可以看作是容器,容器都可以被迭代(用在for,while等语句中),因此他们被称为可迭代对象。
  • 可迭代对象实现了__iter__方法,该方法返回一个迭代器对象。
  • 迭代器持有一个内部状态的字段,用于记录下次迭代返回值,它实现了__next____iter__方法,迭代器不会一次性把所有元素加载到内存,而是需要的时候才生成返回结果。
  • 生成器是一种特殊的迭代器,它的返回值不是通过return而是用yield

参考链接:https://docs.python.org/2/library/stdtypes.html#iterator-types



MARSGGBO原创





2019-7-17



Guess you like

Origin www.cnblogs.com/marsggbo/p/11203768.html