Do you really understand python iterators and generators?

Introduction
List a = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], if there is a demand now, we need to add 1 to each element in a, how do we achieve it?
Way 1.

for i in a:
    i +=1

Way 2.

a = map(lambda x: x + 1, a)

Way 3.

a = [i + 1 for i in a]

Through the list generation method of Way 3., we can directly create a list, but due to memory constraints, the capacity of the list must be limited, and creating a list with 1 million elements will not only take up a lot of storage space, if we Only the first few elements need to be accessed, and the space occupied by most of the latter elements is wasted.
So, if the list elements can be calculated according to a certain algorithm, can we continuously calculate the subsequent elements during the loop? In this way, there is no need to create a complete list, which saves a lot of space. In Python, this mechanism of calculating while looping is called a generator: generator

concept introduction

  • Generator A generator
    is a special program that can be used to control the iterative behavior of a loop. A generator in python is a type of iterator that uses the yield return value function. Each call to yield will pause, and then use the next() function and the send() function restores the generator. In layman's terms, a generator is similar to a function whose return value is an array. This function can accept parameters and can be called, but a general function will return an array containing all results at once, while a generator can only generate one in turn. value, so that the memory consumed will be greatly reduced (because sometimes we will not use all the returned results immediately, so returning all the results at once will cause a waste of memory resources), and allow the calling function to quickly process the first few return value. So a generator looks like a function, but behaves like an iterator.
  • Iterators
    Iterators are actually circulators. The iterator contains an implementation of the next method, returns the expected data within the correct range, and throws a StopIteration error to stop iteration after the range is exceeded.
    We know that there are the following types of data types that can be directly applied to the for loop:
    (1) Collection data: list, dict, tuple, set, str, etc.
    (2) generator, including generators and generator functions with yield

The above objects that can directly act on the for loop are collectively called iterable objects (Iterable), and isinstance() can be used to determine whether the object is an Iterable object.

from collections import Iterable

isinstance([], Iterable) # 列表
isinstance({
    
    }, Iterable) # 字典
isinstance('python', Iterable) # 字符串
isinstance((x for x in range(10)), Iterable) # 集合
isinstance(100, Iterable) # 整数

output:

True
True
True
True
False

Generators in Python

There are many ways to create a generator in python. The first and simplest method is to change the [ ] outside the list generation formula in Way 3. above to () to create a generator.

a = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
list_a = [x + 1 for x in a]
print("list_a = ", list_a)
generator_a = (x + 1 for x in a)
print("genrator_a = ", generator_a)

output:

list_a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
generator_a = <generator object <genexpr> at 0x00000230BEF5D048>

So what is the difference between creating list_a and generator_a? On the surface, it looks like [ ] and ( ), but the results are different. One is printed out as a list (because it is a list generation formula), while the second one is printed out as <generator object at 0x00000230BEF5D048>, so how to print out generator_a What about each element?
If you want to print out the elements in generator_a one by one, you can get the next element of generator_a through the next() function.

print(next(generator_a))
print(next(generator_a))
print(next(generator_a))
print(next(generator_a))
print(next(generator_a))
print(next(generator_a))
print(next(generator_a))
print(next(generator_a))
print(next(generator_a))
print(next(generator_a))

output:

1
2
3
4
5
6
7
8
9
10
Traceback (most recent call last):
 
  File "generator_blog.py", line 42, in <module>
 
    print(next(generator_ex))
 
StopIteration

As you can see, the generator saves the algorithm. Every time it calls next(generaotr_a), it calculates the value of its next element until the last element is calculated. When there are no more elements, a StopIteration error is thrown, and It is a bad habit to call continuously like above. The correct way is to use for loop, because generator is also an iterable object (Iterable) :

for i in generator_a:
	print(i)

output:

1
2
3
4
5
6
7
8
9
10

So after we create a generator, we will basically never call next(), but iterate through the for loop, and don't need to care about the StopIteration error. The generator is very powerful. If the calculation algorithm is more complicated, use a similar list generation method When the for loop cannot be implemented, it can also be implemented with a function. Let's give an example.

def dfs(floors):
	n = 0
	a, b = 0, 1
	while n < floors:
		yield b
		a, b = b, a + b
		n += 1
	return 'Mission Complete'

a = dfs(10)
print(a)
print(a.__next__())
print(a.__next__())
print(a.__next__())
print(a.__next__())
print(a.__next__())

output:

<generator object fib at 0x0000023A21A34FC0>
1
2
3
5
8

We can feel the execution flow of the generator, which is only executed once every time __next__() is called, and when the yield statement returns, when __next__() is executed again, it will continue to execute from the place where it returned last time, so every time Each execution will iteratively calculate on the previous basis, how much to use, how much to take, instead of feeding back all the results to us at one time, so as to maximize the memory usage efficiency.
As mentioned above, when using the generator, we basically don’t use the next method to fetch the results one by one, because the generator is an iterable object, so we let it act on the for loop.

for i in dfs(6):
	print(i)

output:

1
2
3
5
8

A lot has been said above about python generators. To sum up, python provides two basic ways to create generators.

  • Generator function
    This approach is the example of the function dfs we cited above. The generator function generates a queue of values ​​​​over time. A general function will return a value and exit after execution, but the generator function will automatically suspend, and then pick up the urgent need to execute again. It will use the yield keyword to close the function, return a value to the caller, and retain the There is enough current state that the function can continue to execute. Generators and iteration protocols are closely related. Iterators have a __next__()__ member method. This method either returns the next item of the iteration or raises an exception. End iteration.
# 函数有了yield之后,函数名+()就变成了生成器
# return在生成器中代表生成器的中止,直接报错
# next的作用是唤醒并继续执行
# send的作用是唤醒并继续执行,发送一个信息到生成器内部
'''生成器'''
 
def create_counter(n):
    print("create_counter")
    while True:
        yield n
        print("increment n")
        n +=1
 
gen = create_counter(2)
print(gen)
print(next(gen))
print(next(gen))

output:

<generator object create_counter at 0x0000023A1694A938>
create_counter
2
increment n
3

From the order of the output results, we can understand the working mechanism of yield .

  • Generator Expressions
    This approach is what we did with the list generator above. Generator expressions are derived from a combination of iteration and list comprehension. Generators are similar to list comprehensions, but use angle brackets instead of square brackets.
    An iterator can be written either as a generator function or as a coroutine generator expression, both of which support automatic and manual iteration. And these generators only support one active iteration, which means that the generator's iterator is the generator itself.

Iterators in Python

As we said above, generators are Iterator objects. However, although list, dict, and str are Iterable (iterable objects), they are not Iterator (iterators), but they can be converted into Iterators using the iter() function.

isinstance(iter([]), Iterator)
isinstance(iter('abc'), Iterator)

output:

True
True

Why list, dict, str and other data types are Iterable instead of Iterator? This is because Python's Iterator object represents a data stream , and the Iterator object can be called by the next() function and continuously return the next data until a StopIteration error is thrown when there is no data. This data stream can be regarded as an ordered sequence, but we cannot know the length of the sequence in advance, and can only continuously calculate the next data on demand through the next() function, so the calculation of Iterator is inert, and only when needed It will only be calculated when the next data is returned.
Iterator can even represent an infinite data stream, such as all natural numbers. It is never possible to store all natural numbers using a list.

summary:

All objects that can be applied to the for loop are Iterable types;
all objects that can be applied to the next() function are Iterator types, which represent a sequence of lazy calculations;
collection data types such as list, dict, str, etc. are Iterable but not Iterator, but an Iterator object can be obtained through the iter() function.

In Python 3, the for loop is essentially implemented by continuously calling the next() function. Example:

for x in [1, 2, 3, 4, 5]:
    pass

These two sentences are equivalent to the following program:

# 首先获得Iterator对象:
it = iter([1, 2, 3, 4, 5])
# 循环:
while True:
    try:
        # 获得下一个值:
        x = next(it)
    except StopIteration:
        # 遇到StopIteration就退出循环
        break

Finally, make a summary of yield

  • In the usual for...in... loop, in is followed by an array, which is an iterable object, similarly there are linked lists, strings, and files. It can be a = [1,2,3] or a = [x*x for x in range(3)]. Its disadvantage is also obvious, that is, all the data is in the memory, if there is a large amount of data, it will consume a lot of memory;
  • A generator is iterable, but it can only be read once at a time. Because it is only generated when it is taken out and used, such as a = (x*x for x in range(3)). Notice the parentheses here, not square brackets. ---->>>The method of creating an iterator with list generation;
  • The key to the generator (generator) being able to iterate is that it has a next() method, which works by repeatedly calling the next() method until an exception is caught;
  • A function with yield is no longer an ordinary function, but a generator generator, which can be used for iteration;
  • yield is a keyword similar to return, and when it encounters yield once iterates, it returns the value behind or to the right of yield. And in the next iteration, execute from the code behind the yield encountered in the previous iteration;
  • yield is a value returned by return, and remember the returned position. The next iteration starts from this position;

Guess you like

Origin blog.csdn.net/Just_do_myself/article/details/118569039