python：可迭代对象与迭代器

一，可迭代对象

有编程python基础的同学应该知道，for循环可以用于Python中任何序列类型，包括列表、元组以及字符串等，如下：

>>> import time
>>> ite_1 = ['风', 111, time.time()]
>>> ite_2 = ('hello', 222, (2221, 'test'))
>>> ite_3 = 'abcdefg'
>>> for i in ite_1:    print(i)
风
111
1598796165.6438172

实际上，for、列表推导、in成员关系测试以及内置函数map等工具都可用于任何可迭代对象。

如果对象是实际保存的序列或是可以在迭代工具上下文中(例如， for循环)一次产生一个结果的对象，那么就看作是可迭代的。总之，可迭代对象包括实际序列，以及能按照需求计算的虚拟序列。

其实，可迭代对象(iterable)指代一个支持iter调用的对象，换句话说，只要内部含有__iter__()的对象，就是可迭代对象。
可以i使用dir(obj)查看一个对象所有拥有的方法来判断它是否是一个可迭代对象：

>>> dir(ite_2)
['__add__', '__class__', '__contains__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'count', 'index']
>>> iter(ite_3)
<str_iterator object at 0x00000202354EF308>
>>> iter_obj = ite_3.__iter__()
>>> iter_obj
<str_iterator object at 0x00000202354A9208>

>>> test = 2333 
>>> dir(test)	#没有__iter__
['__abs__', '__add__', '__and__', '__bool__', '__ceil__', '__class__', '__delattr__', '__dir__', '__divmod__', '__doc__', '__eq__', '__float__', '__floor__', '__floordiv__', '__format__', '__ge__', '__getattribute__', '__getnewargs__', '__gt__', '__hash__', '__index__', '__init__', '__init_subclass__', '__int__', '__invert__', '__le__', '__lshift__', '__lt__', '__mod__', '__mul__', '__ne__', '__neg__', '__new__', '__or__', '__pos__', '__pow__', '__radd__', '__rand__', '__rdivmod__', '__reduce__', '__reduce_ex__', '__repr__', '__rfloordiv__', '__rlshift__', '__rmod__', '__rmul__', '__ror__', '__round__', '__rpow__', '__rrshift__', '__rshift__', '__rsub__', '__rtruediv__', '__rxor__', '__setattr__', '__sizeof__', '__str__', '__sub__', '__subclasshook__', '__truediv__', '__trunc__', '__xor__', 'bit_length', 'conjugate', 'denominator', 'from_bytes', 'imag', 'numerator', 'real', 'to_bytes']
>>> iter_obj = test.__iter__()	# 也就不可迭代
Traceback (most recent call last):
  File "<pyshell#40>", line 1, in <module>
    iter_obj = test.__iter__()
AttributeError: 'int' object has no attribute '__iter__'

二，迭代器

Python 3.X还提供了一个内置函数next，它会自动调用一个对象的__next__()，来完成Python中所谓的迭代协议：
所有带有__next__()的对象会自动前进到下一个结果，而当到达一系列结果的末尾时， next()会引发Stoplteration异常。这样的对象，在Python中被称为迭代器。
其实，迭代器(iterator)就是一个(iter调用为传入的可迭代对象返回的)支持next(I)调用的对象。
这是一个名为test.py的文件：

import time

ite_1 = ['风', 111, time.time()]
ite_2 = ('hello', 222, (2221, 'test'))
ite_3 = 'abcdefg'

for i in ite_1:
    print(i)

获得一个文件迭代器：

>>> f = open('C:\\Users\\PC\\Desktop\\test.py', encoding='utf-8', mode='r')
>>> dir(f)
['_CHUNK_SIZE', '__class__', '__del__', '__delattr__', '__dict__', '__dir__', '__doc__', '__enter__', '__eq__', '__exit__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__lt__', '__ne__', '__new__', '__next__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '_checkClosed', '_checkReadable', '_checkSeekable', '_checkWritable', '_finalizing', 'buffer', 'close', 'closed', 'detach', 'encoding', 'errors', 'fileno', 'flush', 'isatty', 'line_buffering', 'mode', 'name', 'newlines', 'read', 'readable', 'readline', 'readlines', 'reconfigure', 'seek', 'seekable', 'tell', 'truncate', 'writable', 'write', 'write_through', 'writelines'] 
# 既有__iter__，也有__next__
>>> f.__next__()
'import time\n'
>>> f.__next__()
'\n'
>>> f.__next__()
"ite_1 = ['风', 111, time.time()]\n"
>>> f.__next__()
"ite_2 = ('hello', 222, (2221, 'test'))\n"
>>> f.__next__()
"ite_3 = 'abcdefg'\n"
>>> f.__next__()
'\n'
>>> f.__next__()
'for i in ite_1:\n'
>>> f.__next__()
'    print(i)\n'
>>> f.__next__()
Traceback (most recent call last):
  File "<pyshell#25>", line 1, in <module>
    f.__next__()
StopIteration

而且python中的迭代器也拥有__iter__()，所以迭代器也可以迭代.

>>> f = open('C:\\Users\\PC\\Desktop\\test.py', encoding='utf-8', mode='r')
>>> iter(f)
<_io.TextIOWrapper name='C:\\Users\\PC\\Desktop\\test.py' mode='r' encoding='utf-8'>
>>> iter_obj = f.__iter__()
<_io.TextIOWrapper name='C:\\Users\\PC\\Desktop\\test.py' mode='r' encoding='utf-8'>

三，迭代中的两个对象与两个步骤

所有迭代工具内部工作起来时，会首先把可迭代对象传入内置函数iter，内置函数iter调用__iter__()，并最终返回一个迭代器；iter调用返回的迭代器对象有一个__next__()，迭代工具通过内置函数next自动调用这个迭代器对象的__next__()来获取可迭代对象的内容，并且通过捕捉Stopiteration异常来确定何时离开。——这既是一个完整的迭代协议，也是可迭代对象与迭代器间的联系。
模拟for循环内部调用列表的例子：

>>> L = [1, 2, 3]
>>> iter(L) is L
False
>>> L.__next__()
Traceback (most recent call last):
  File "<pyshell#74>", line 1, in <module>
    L.__next__()
AttributeError: 'list' object has no attribute '__next__'
>>> I = iter(L) #现在新版本python中，列表、字典等数据类型自带了一个迭代器
>>> I.__next__()
1
>>> I.__next__()
2
>>> I.__next__()
3
>>> I.__next__()
Traceback (most recent call last):
  File "<pyshell#47>", line 1, in <module>
    I.__next__()
StopIteration

可迭代对象与迭代器间的区别主要在：

是否只能顺序取值
能否显示地读取内容
在迭代过程中的作用
操作方法的灵活性
是否支持多次迭代
是否常驻内存
…

四，推导式

1，引入：

[11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
>>> L = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> for i in range(len(L)):	L[i] += 10
>>> L
[11, 12, 13, 14, 15, 16, 17, 18, 19, 20]

>>> L = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> L = [x+10 for x in L]
>>> L
[11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
# 等效于：
>>> L = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> res =[]
>>> for x in L: res.append(x + 10)
>>> res

列表推导只需更少的代码，并且运行速度会大大提升,尤其对于较大的数据
集合，使用列表推导能带来极大的性能优势。
每当我们需要在一个序列中的每项上执行一个操作时，就可以考虑使用列表推导。
实例：去除每行的格式控制符

f = open('C:\\Users\\PC\\Desktop\\test.py', encoding='utf-8', mode='r')
lines = f.readlines()
print(lines)
['import time\n', '\n', "ite_1 = ['风', 111, time.time()]\n", "ite_2 = ('hello', 222, (2221, 'test'))\n", "ite_3 = 'abcdefg'\n", '\n', 'for i in ite_1:\n', '    print(i)\n']


通过对列表中的每一项运行字符串rstrip方法来移除：
lines = [line.rstrip() for line in open('C:\\Users\\PC\\Desktop\\test.py', encoding='utf-8', mode='r')]
print(lines)
['import time', '', "ite_1 = ['风', 111, time.time()]", "ite_2 = ('hello', 222, (2221, 'test'))", "ite_3 = 'abcdefg'", '', 'for i in ite_1:', '    print(i)']

其他用法：
lines = [('_' in line, line.replace('_', '-')) for line in open('C:\\Users\\PC\\Desktop\\test.py', encoding='utf-8', mode='r')]
print(lines)
[(False, 'import time\n'), (False, '\n'), (True, "ite-1 = ['风', 111, time.time()]\n"), (True, "ite-2 = ('hello', 222, (2221, 'test'))\n"), (True, "ite-3 = 'abcdefg'\n"), (False, '\n'), (True, 'for i in ite-1:\n'), (False, '    print(i)\n')]

当然，字典、集合也都可使用推导式：

快速合并为⼀个字典
list1 = ['name', 'age', 'gender']
list2 = ['Tom', 20, 'man']
res_dict = {
    
    list1[i]: list2[i] for i in range(len(list1))}
print(res_dict)
{
    
    'name': 'Tom', 'age': 20, 'gender': 'man'}

提取符合数量的内容信息
counts = {
    
    'MBP': 268, 'HP': 125, 'DELL': 201, 'Lenovo': 199, 'acer': 99}
count1 = {
    
    key: value for key, value in counts.items() if value >= 200}
print(count1)  # {'MBP': 268, 'DELL': 201}

2，嵌套if：

推导表达式中嵌套的for循环可以有一个关联的if分句，来过滤掉那些测试不为真的结果项。

获取所有以'it'开头的行
lines = [line for line in open('C:\\Users\\PC\\Desktop\\test.py', encoding='utf-8', mode='r') if line[:2] == 'it']
print(lines)

["ite_1 = ['风', 111, time.time()]\n", "ite_2 = ('hello', 222, (2221, 'test'))\n", "ite_3 = 'abcdefg'\n"]

获取文件非空行数
lens = len([line for line in open('C:\\Users\\PC\\Desktop\\test.py', encoding='utf-8', mode='r') if line.strip() != ''])
print(lens)

6

2，嵌套for：

完整语法允许任意数目的for分句，并且每个for分句都可以带有一个可选的关联的if分句。

lens = [x + y for x in 'abc' for y in 'def']
print(lens)

['ad', 'ae', 'af', 'bd', 'be', 'bf', 'cd', 'ce', 'cf']