python迭代器和生成器(1) -- python cookbook

手动迭代遍历

你想遍历一个可迭代对象中的所有元素，但是却不想使用for循环。

为了手动的遍历可迭代对象，使用 next() 函数并在代码中捕获 StopIteration 异常。比如，下面的例子手动读取一个文件中的所有行：

>>> def manual_iter():
...   with open('/etc/passwd') as f:
...       try:
...         while True:
...           line = next(f)
...           print(line,end=' ')
...       except StopIteration:
...         pass
... 
>>> manual_iter()
root:x:0:0:root:/root:/bin/bash
 bin:x:1:1:bin:/bin:/sbin/nologin
 daemon:x:2:2:daemon:/sbin:/sbin/nologin
 adm:x:3:4:adm:/var/adm:/sbin/nologin
 lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
 sync:x:5:0:sync:/sbin:/bin/sync
 shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
 halt:x:7:0:halt:/sbin:/sbin/halt
 mail:x:8:12:mail:/var/spool/mail:/sbin/nologin
 operator:x:11:0:operator:/root:/sbin/nologin
 games:x:12:100:games:/usr/games:/sbin/nologin
 ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin
 nobody:x:99:99:Nobody:/:/sbin/nologin
 systemd-bus-proxy:x:999:998:systemd Bus Proxy:/:/sbin/nologin
 systemd-network:x:192:192:systemd Network Management:/:/sbin/nologin
 dbus:x:81:81:System message bus:/:/sbin/nologin
 polkitd:x:998:997:User for polkitd:/:/sbin/nologin
 tss:x:59:59:Account used by the trousers package to sandbox the tcsd daemon:/dev/null:/sbin/nologin
 sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin
 postfix:x:89:89::/var/spool/postfix:/sbin/nologin
 chrony:x:997:995::/var/lib/chrony:/sbin/nologin
 ntp:x:38:38::/etc/ntp:/sbin/nologin
 nscd:x:28:28:NSCD Daemon:/:/sbin/nologin
 tcpdump:x:72:72::/:/sbin/nologin
 admin:x:1000:1000::/home/admin:/bin/bash
 pythonx:x:1001:1001::/home/pythonx:/bin/bash

way2:
>>> with open('/etc/passwd') as f:
...    while True:
...      line = next(f)
...      if line is None:
...        break
...      print(line,end='')

代理迭代

构建了一个自定义容器对象，里面包含有列表、元组或其他可迭代对象。你想直接在你的这个新容器对象上执行迭代操作。

#! /usr/bin/python3

class Node:
  def __init__(self,value):
    self._value = value
    self._children = [] #初始化

  def __repr__(self):
    return  'Node({!r})'.format(self._value)


  def add_child(self,node):
    self._children.append(node)

  def __iter__(self):
    return iter(self._children)

if __name__ == '__main__':
    root = Node(0)
    child1= Node(1)
    child2=Node(2)
    root.add_child(child1)
    root.add_child(child2)
    for ch in root:
      print(ch)
**************
[root@izwz9eitqs320brxl6owssz ~]# ./iterator1.py 
Node(1)
Node(2)

iter_() 方法只是简单的将迭代请求传递给内部的 _children 属性。

使用生成器创建新的迭代模式

想实现一种新的迭代模式，使用一个生成器函数来定义它。下面是一个生产某个范围内浮点数的生成器：

>>> def frange(start,stop,increment):
...    x=start
...    while x<stop:
...     yield x
...     x += increment
... 
>>> for n in frange(6,15,2):
...    print(n)
... 
6
8
10
12
14
>>> list(range(1,100,2))
[1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77, 79, 81, 83, 85, 87, 89, 91, 93, 95, 97, 99]

>>> def frange1(start,stop,increment):
...    x=start
...    while x<stop:
...       print(x)
...       x += increment
----不使用生成器

>>> for n in frange1(6,15,2):
...       print(n)
... 
6
8
10
12
14
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'NoneType' object is not iterable

一个函数中需要有一个 yield 语句即可将其转换为一个生成器。跟普通函数不同的是，生成器只能用于迭代操作。

>>> def countdown(n):
...    print('Starting to count from',n)
...    while n > 0:
...      yield n
...      n -= 1
...    print('Done1')
>>> c = countdown(5)
>>> c
<generator object countdown at 0x7ff4a4bba410>
>>> next(c)
Starting to count from 5
5
>>> next(c)
4
>>> next(c)
3
>>> next(c)
2
>>> next(c)
1
>>> next(c)
Done1
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

>>> for i in countdown(5):
...    print(i)
... 
Starting to count from 5
5
4
3
2
1
Done1

实现迭代器协议

，使用Node类来表示树形数据结构。你可能想实现一个以深度优先方式遍历树形节点的生成器。下面是代码示例：

[root@izwz9eitqs320brxl6owssz ~]# vi iterator2.py
#!/usr/bin/python3

class Node:
  def __init__(self,value):
    self._value = value
    self._children = []

  def __repr__(self):
    return 'Node({!r})'.format(self._value)

  def add_child(self,node):
    self._children.append(node)

  def __iter__(self):
     return iter(self._children)
   #迭代一个列表children列表

  def depth_first(self):
     yield self
     for c in self:
       yield from c.depth_first()


if __name__ == '__main__':
   root = Node(0)
   child1 = Node(1)
   child2 = Node(2)
   root.add_child(child1)
   root.add_child(child2)
   child1.add_child(Node(3))
   child1.add_child(Node(4))
   child2.add_child(Node(5))
   for i  in root.depth_first():
       print(i)
**************************
[root@izwz9eitqs320brxl6owssz ~]# ./iterator2.py 
Node(0)
Node(1)
Node(3)
Node(4)
Node(2)
Node(5)

depth_first() 方法简单直观。它首先返回自己本身并迭代每一个子节点并通过调用子节点的 depth_first() 方法(使用 yield from 语句)返回对应元素。

Python的迭代协议要求一个 iter() 方法返回一个特殊的迭代器对象，这个迭代器对象实现了 next() 方法并通过 StopIteration 异常标识迭代的完成。但是，实现这些通常会比较繁琐。下面我们演示下这种方式，如何使用一个关联迭代器类重新实现 depth_first() 方法：

[root@izwz9eitqs320brxl6owssz ~]# vi iterator3.py
#! /usr/bin/python3

class Node2:
  def __init__(self,value):
    self._value = value
    self._children = [] 

  def __repr__(self):
    return 'Node({!r})'.format(self._value)

  def add_child(self,node):
     self._children.append(node)

  def __iter__(self):
    return iter(self._children)

  def depth_first(self):
    return DepthFirstIterator(self)

class DepthFirstIterator:
  '''
  Depth-first traversal
  '''

  def __init__(self,start_node):
    self._node = start_node
    self._children_iter = None
    self._child_iter = None
  def __iter__(self):
     return self

  def __next__(self):
    #Return myself if just started;create an iterator for children
    if self._children_iter is None:
      self._children_iter = iter(self._node)
      return self._node
#if processing a child ,return its next time
    elif self._child_iter:
      try:
        nextchild = next(self._child_iter)
        return nextchild
      except StipIteration:
        self._child_iter = None
        return next(self)
#Advance to the next child and start its iteration
    else:
       self._child_iter = next(self._children_iter).depth_first()
       return next(self)

它写起来很繁琐，因为迭代器必须在迭代处理过程中维护大量的状态信息。坦白来讲，没人愿意写这么晦涩的代码。将你的迭代器定义为一个生成器后一切迎刃而解。

反向迭代

使用内置的 reversed() 函数

>>> a = [1,2,3,4,5]
>>> for i in reversed(a):
...   print(i)
... 
5
4
3
2
1

缺点：需要先设置一个list列表消耗内存空间
可以通过在自定义类上实现 reversed() 方法来实现反向迭代

#! /usr/bin/python3
class Countdown:
  def __init__(self,start):
     self.start = start

  def __iter__(self):
     n = self.start
     while n > 0:
       yield n
       n -= 1

  def __reversed__(self):
     n = 1
     while n <= self.start:
         yield n
         n += 1

for rr in reversed(Countdown(30)):
   print(rr,',',end='')

for rr in Countdown(30):
   print(rr , ',',end='')

带有外部状态的生成器函数

想让你的生成器暴露外部状态给用户，别忘了你可以简单的将它实现为一个类，然后把生成器函数放到 iter() 方法中过去。

[root@izwz9eitqs320brxl6owssz ~]# vi outsidestategenerater.py 
#! /usr/bin/python3

from collections import deque

class linehistory:
  def __init__(self,lines,histlen=3):
    self.lines = lines
    self.history = deque(maxlen=histlen)


  def __iter__(self):
    for lineno,line in enumerate(self.lines,1):
      self.history.append((lineno,line))
      yield line

  def clear(self):
    self.history.clear()

为了使用这个类，你可以将它当做是一个普通的生成器函数。然而，由于可以创建一个实例对象，于是你可以访问内部属性值，比如 history 属性或者是 clear() 方法。

with open('/etc/passwd') as f:
  lines = linehistory(f)
  for line in lines:
   if 'python' in line:
     for lineno,hline in lines.history:
       print('{}:{}'.format(lineno,hline),end='')
*****************       
[root@izwz9eitqs320brxl6owssz ~]# ./outsidestategenerater.py 
24:tcpdump:x:72:72::/:/sbin/nologin
25:admin:x:1000:1000::/home/admin:/bin/bash
26:pythonx:x:1001:1001::/home/pythonx:/bin/bash

一个需要注意的小地方是，如果你在迭代操作时不使用for循环语句，那么你得先调用 iter() 函数。

>>> from collections import deque
>>> class linehistory:
...       def __init__(self, lines, histlen=3):
...         self.lines = lines
...         self.history = deque(maxlen=histlen)
...       def __iter__(self):
...         for lineno, line in enumerate(self.lines, 1):
...               self.history.append((lineno, line))
...               yield line
...       def clear(self):
...           self.history.clear()
... 
>>> f = open('/etc/passwd')
>>> lines = linehistory(f)
>>> next(lines)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'linehistory' object is not an iterator
>>> it = iter(lines)
>>> next(it)
'root:x:0:0:root:/root:/bin/bash\n'
>>> next(it)
'bin:x:1:1:bin:/bin:/sbin/nologin\n'
>>> next(it)
'daemon:x:2:2:daemon:/sbin:/sbin/nologin\n'
>>> next(it)
'adm:x:3:4:adm:/var/adm:/sbin/nologin\n'
>>> next(it)
'lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin\n'
>>> next(it)
'sync:x:5:0:sync:/sbin:/bin/sync\n'
>>> next(it)
'shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown\n'

迭代器切片

想得到一个由迭代器生成的切片对象，但是标准切片操作并不能做到。
函数 itertools.islice() 正好适用于在迭代器和生成器上做切片操作。

>>> def count(n):
...   while True:
...     yield n
...     n += 1
... 
>>> c = count(0)
>>> c[10:20]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'generator' object is not subscriptable
>>> #Now using islice()
... import itertools
>>> for x in itertools.islice(c,10,20):
...    print(x,end='')
... 
10111213141516171819>>>

迭代器和生成器不能使用标准的切片操作，因为它们的长度事先我们并不知道(并且也没有实现索引)。函数 islice() 返回一个可以生成指定元素的迭代器，它通过遍历并丢弃直到切片开始索引位置的所有元素。然后才开始一个个的返回元素，并直到切片结束索引位置。
这里要着重强调的一点是 islice() 会消耗掉传入的迭代器中的数据。必须考虑到迭代器是不可逆的这个事实。所以如果你需要之后再次访问这个迭代器的话，那你就得先将它里面的数据放入一个列表中。

参考书籍