Detailed explanation of Iterable, Iterator, Generator in Python
This article mainly refers to the video of Gao Tian, a code farmer at station B:
This article will discuss the iterable object iterable, iterator iterator and generator generator in python. Most python programmers will have heard of these three concepts, but they may lack a deep understanding of them.
for loop
lst = [1, 2, 3]
for item in lst:
# do something
psss
The for loop is the most common operation in python. Even beginners who have learned python for half a day can master the usage of for loop proficiently. This is due to the fact that the semantics of the for loop in python are really easy to understand. for item in lst
Take out items one by one from lst and process them. For a list, the elements in it are arranged in order one by one, which is very natural. But what about unordered dictionaries? Or what about more complex file objects? In python, these can be traversed through the for loop, how is this done? What is going on behind the for loop?
In fact, what is done behind the for loop (to a certain degree of abstraction) is not complicated, but understanding it is very helpful for us to deeply understand iterable objects, iterators and generators in python.
The action behind the for loop
The action behind the for loop is actually not complicated, it can be regarded as two steps:
-
First take iter for lst to get the iterator iterator
iterator = iter(lst)
-
Then continuously take next to the iterator, and take out the elements
item0 = next(iterator) item1 = next(iterator) # ...
Until encountering the exception of StopIteration
Here iter
and next
are two built-in functions of python. Respectively act on iterable object iterable and iterator iterator, the function is to obtain iterator from iterable object and fetch elements one by one from iterator. That is, the need in our example lst
is an iterable object, iterator
the iterator it generates.
So, back to our original question, how does the for loop traverse complex data structures such as dictionaries and file objects? How does python know how to get the next element in the data structure? __iter__
This involves two magic methods / __getitem__
and that must be implemented by iterable objects and iterators respectively __next__
.
Iterable iterable object
We have just mentioned that for xxx in yyy
here yyy
must be an iterable object iterable. iter
We can get an iterator by passing an iterable object to the method. So iter
how does the method get an iterator from an iterable? The answer is the magic method implemented on this iterable object: __iter__
or __getitem__
.
For example __iter__
, the method needs to return an iterator.
Iterator iterator
iter
After getting the iterator through the method, we can pass the iterator into the next
method to continuously take out elements from the iterator. How to extract elements? It relies on the method implemented by the iterator object __next__
.
It should be noted that the python official documentation suggests that the iterator we implement should also be an iterable object, that is, we must also implement __iter__
the method. This is to ensure that if we explicitly fetch an iterable object iter
, after getting an iterator, the iterator can also be iter
traversed by means of for loop / fetch again. For example in this situation:
lst = [1, 2, 3]
ite = iter(lst)
next(ite)
for item in ite:
# do sth
pass
If the iterator itself is not an iterable object, it will report an error if it is put into the for loop, because it does not implement __iter__
the method. Of course, in order to ensure that an iterator is an iterable object at the same time, __iter__
the method we need to implement is usually very simple. In most cases, we only need to return itself. Right now:
def __iter__(self):
return self
So far, we have understood which magic methods need to be implemented by the iterable object iterable and the iterator iterator, as well as their differences and connections.
Iterator example
Let's take the linked list as an example to implement its iterator and iterable objects:
class NodeIter:
def __init__(self, node):
self.curr_node = node
def __next__(self):
if self.curr_node is None:
raise StopIteration
node, self.curr_node = self.curr_node, self.curr_node.next
return node
def __iter__(self):
return self
class Node:
def __init__(self, name):
self.name = name
self.next = None
def __iter__(self):
return NodeIter(self)
Here, Node
it is an iterable object, which can be traversed in the for loop, or its iter
iterator can be obtained directly. Its corresponding iterator is to remove elements NodeIter
according to the method it implements , until there are no elements, and raise a StopIteration. __next__
Note that in order to ensure that the iterator NodeIter
is also an iterable object, we also implement __iter__
the method for it, which returns itself directly.
Generator generator
Generators may be a syntax that many python beginners are relatively unfamiliar with. In fact, a generator is a special kind of iterator.
from typing import Iterator
def gen(num):
while num > 0:
yield num
num -= 1
return
g = gen(5)
print(isinstance(g, Iterator))
first = next(g)
print(first)
print('in for loop: ')
for i in g:
print(i)
# 输出:
# True
# 5
# in for loop:
# 4
# 3
# 2
# 1
For example, above is an example of traversing a generator, which gen
is called a generator function and g
called a generator object. It can be used in the next, for loop, etc. ways we introduced in the iterator section before, because the generator is also a kind of iterator.
The following mainly introduces the differences between generators and general iterators.
It is easy to find that there is a yield keyword in the so-called generator function. Note that here we also specially wrote a return keyword. If it is in a general function, it is obvious that gen
the function will return a None. However, when the python interpreter sees a function with the yield keyword present, it marks the function as a generator function . When a generator function is called, it does not run its function body and does not return a value, but instead returns a generator object (in this case g
).
When the generator object is passed to the next method, its corresponding generator function will actually be run. When the generator function is running (that is, when the generator object is called by the next method), the function will return the value after yield when it runs to the yield statement. But it can be seen that after the yield statement, there are still some statements in the function that have not been executed. This call has returned and will not be executed again. At this time, the generator function is equivalent to being pressed a pause button, and when next is called next time, the generator function will continue to run from the current yiled statement. Therefore, num in our example will be decremented by one each iteration.
After num keeps decreasing, the function will jump out of the while loop and execute return. In the generator function, the return statement is equivalent to raising a StopIteration in the iterator. Note that no matter the return in the generator function returns a None or returns a value, this value will not be returned when the generator object is called by next, and next will only return the value of the yield statement. If you really need to get the return value in the generator function, you need to catch the StopIteration exception and get the return value.
From the consumer's point of view, generators, a special kind of iterator, are used in little different ways than normal iterators . From the perspective of implementation principle, ordinary iterators save the current iteration state through class member variables, while in generators, the iteration state is saved in the stack frame of the function and saved through the running state of the function. Generators are usually implemented more concisely than ordinary iterators. Compare the generator example below with the iterator example above.
We say: Generators are used in hardly any different way than ordinary iterators . So what's the difference? Here we introduce an advanced usage of generators: send. The send method can pass in the parameters of the send function as the value of the yield statement ( ) while calling the generator function to yield the yield xxx
value. In the generator function, the value of the yield statement can be received for processing. This allows us to change the internal state of the generator by passing in some values through the send method when iterating the generator, and realize the interaction with the generator.
def gen(num):
while num > 0:
tmp = yield num
if tmp is not None:
num = tmp
num -= 1
g = gen(5)
first = next(g) # first = g.send(None)
print(f"first: {
first}")
print(f"send: {
g.send(10)}")
for i in g:
print(i)
# 输出:
first: 5
send: 9
8
7
6
5
4
3
2
1
Calling the next method directly is equivalent to g.send(None). And if the generator function does not use a variable to accept the return value of the yield statement and process the logic, then any value entered by send is equivalent to being directly discarded. At this time, no matter what xxx is in g.send(xxx), Both are equivalent to calling the next method directly.
generator example
Also take the realization of the linked list as an example. Previously, we NodeIter
implemented the traversal through a class Node
, and Node
this Iterable iter
will return NodeIter
this Iterator when it is called by the method.
Here, we implement the Node
method __iter__
directly as a generator function. When called by the iter method, a generator object will be returned. We know that the generator object is a special iterator, and of course it can be traversed normally. In this way, we realize the traversal of the linked list Node more concisely through the generator. Moreover, this is completely transparent to the user, and the calling method and traversal method are exactly the same as the previous implementation of NodeIter.
class Node:
def __init__(self, name):
self.name = name
self.next = None
def __iter__(self):
node = self
while node is not None:
yield node
node = node.next
node1 = Node('node1')
node2 = Node('node2')
node3 = Node('node3')
node1.next = node2
node2.next = node3
for node in node1:
print(node.name)