Magical iterators and generators in Python

Preface

The text and pictures in this article are from the Internet and are for learning and communication purposes only. They do not have any commercial use. If you have any questions, please contact us for processing.

PS: If you need Python learning materials, you can click on the link below to get it yourself

Python free learning materials and group communication answers Click to join



Given strings  s  and  t  , determine whether  s  is a  subsequence of  t .

 

A subsequence of a string is a new string formed by deleting some (or not deleting) characters from the original string without changing the relative position of the remaining characters. (For example, "ace" is a subsequence of "abcde", but "aec" is not).

To solve this problem, the conventional algorithm is the greedy algorithm. We maintain two pointers pointing to the beginning of the two strings, and then sweep the second string all the way, if a certain character is the same as the first pointer, then the first pointer is moved forward. When the first pointer moves out of the last element of the first sequence, it returns True, otherwise it returns False.

However, if this algorithm is written normally, it takes about eight lines to write it down:

def is_subsequence(s: str, t: str) -> bool:
    n, m = len(s), len(t)
    i = j = 0
    while i < n and j < m:
        if s[i] == t[j]:
            i += 1
        j += 1
    return i == n

print(is_subsequence("ace", "abcde"))
print(is_subsequence("aec", "abcde"))

But if we use iterators and generators, the amount of code will be greatly reduced:

def is_subsequence(s: str, t: str) -> bool:
    t = iter(t)
    return all(i in t for i in s)
 
print(is_subsequence("ace", "abcde"))
print(is_subsequence("aec", "abcde"))

And the result of the operation is the same as the above, both:

True
False

But if you don't know much about the generator operating mechanism of python, you will definitely look confused.

But don't worry, the topic I am sharing today is the analysis of python iterators and generators .
Contents of this article

  • Iterators and iterable objects
  • List generators and list generators
  • Function generator (generator)
  • The relationship between iterators and generators
  • Detailed explanation of using generator to judge subsequence
  • to sum up

Iterators and iterable objects

Everything in Python is an object, the abstraction of an object is a class, and a collection of objects is a container.

List (list: [0, 1, 2]), tuple (tuple: (0, 1, 2)), dictionary (dict: {0:0, 1:1, 2:2}), set (set: set([0, 1, 2])) are all containers. For containers, it can be considered as a unit of multiple elements together; the difference between different containers lies in the realization of the internal data structure.

All containers are iterable objects (iterable):

from collections.abc import Iterable
params = [
    1234,
    '1234',
    [1, 2, 3, 4],
    set([1, 2, 3, 4]),
    {1: 1, 2: 2, 3: 3, 4: 4},
    (1, 2, 3, 4)
]

for param in params:
    print(f'{param}是否为可迭代对象? ', isinstance(param, Iterable))

operation result:

1234是否为可迭代对象?  False
1234是否为可迭代对象?  True
[1, 2, 3, 4]是否为可迭代对象?  True
{1, 2, 3, 4}是否为可迭代对象?  True
{1: 1, 2: 2, 3: 3, 4: 4}是否为可迭代对象?  True
(1, 2, 3, 4)是否为可迭代对象?  True

It can be seen that all collection containers are iterable objects (iterable), strings are also iterable objects, only a single number is not an iterable object.

For iterable objects, you can return an iterator through the iter() function. Of course, the iterator itself is also an iterable object:

from collections.abc import Iterable, Iterator
params = [
    '1234',
    [1, 2, 3, 4],
    set([1, 2, 3, 4]),
    {1: 1, 2: 2, 3: 3, 4: 4},
    (1, 2, 3, 4)
]

for param in params:
    param = iter(param)
    print("----------")
    print(f'{param}是否为可迭代对象? ', isinstance(param, Iterable))
    print(f'{param}是否为迭代器对象? ', isinstance(param, Iterator))

operation result:

 

 

This means that the iterator itself can also get its own iterator, for example:

for i in iter(l):
    print(i, end=",")
print()
for i in iter(iter(l)):
    print(i, end=",")

operation result:

1,2,3,4,
1,2,3,4,

Iterator (iterator) provides a  next method. After calling this method, you will either get the next object of the container or get a StopIteration error:

l = [1, 2, 3, 4]
l = iter(l)
while True:
    print(l.__next__())

result:

1
2
3
4
---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
<ipython-input-16-e106f3a7bd73> in <module>()
      2 l = iter(l)
      3 while True:
----> 4     print(l.__next__())

StopIteration: 

Of course, the above l. next () should be rewritten as next (l), the essence of the next () method is to call the next () method of the target object .

Actually for loop:

l = [1, 2, 3, 4]
for i in l:
    print(i)

Is essentially equivalent to:

l = [1, 2, 3, 4]
l_iter = iter(l)
while True:
    try:
        i = next(l_iter)
    except StopIteration:
        break
    print(i)

The for in statement makes this process implicit.

List generators and list generators

List comprehensions, namely List Comprehensions, are very simple but powerful built-in comprehensions in Python that can be used to create lists.

print([x * x for x in range(1, 11)])
print([x * x for x in range(1, 11) if x % 2 == 0])

##还可以使用两层循环,可以生成全排列:
print([m + n for m in 'ABC' for n in 'XYZ'])
print([str(x)+str(y) for x in range(1,6) for y in range(11,16)])

##for循环其实可以同时使用两个甚至多个变量,比如dict的items()可以同时迭代key和value:
d = {'x': 'A', 'y': 'B', 'z': 'C' }
print([k + '=' + v for k, v in d.items()])

result:


[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
[4, 16, 36, 64, 100]
['AX', 'AY', 'AZ', 'BX', 'BY', 'BZ', 'CX', 'CY', 'CZ']
['111', '112', '113', '114', '115', '211', '212', '213', '214', '215', '311', '312', '313', '314', '315', '411', '412', '413', '414', '415', '511', '512', '513', '514', '515']
['x=A', 'y=B', 'z=C']

Through the list generation, we can directly create a list. However, due to memory limitations, the list capacity is definitely limited. Moreover, creating a list of 1 million elements not only takes up a lot of storage space, but if we only need to access the first few elements, the space occupied by most of the latter elements is wasted.

Therefore, if the list elements can be calculated according to a certain algorithm, can we continue to calculate the subsequent elements in the loop? This eliminates the need to create a complete list, which saves a lot of space. In Python, this mechanism of calculating while looping is called generator: generator.

Just change the [] of a list comprehension to () to create a generator:

g = (x * x for x in range(10))

The generator saves the algorithm. Every time next(g) is called, the value of the next element of g is calculated. Until the last element is calculated, and there are no more elements, a StopIteration error is thrown.

Using an example, feel the advantages of generators relative to generative, first create a method to view the current memory situation:

import os
import psutil

## 显示当前 python 程序占用的内存大小
def show_memory_info(hint):
    pid = os.getpid()
    p = psutil.Process(pid)

    info = p.memory_full_info()
    memory = info.uss / 1024. / 1024
    print(f'{hint}内存使用: {memory} MB')

Test the list generation :

def test_iterator():
    show_memory_info('initing iterator')
    list_1 = [i for i in range(100000000)]
    show_memory_info('after iterator initiated')
    print(sum(list_1))
    show_memory_info('after sum called')

%time test_iterator()

result:

initing iterator内存使用: 48.69140625 MB
after iterator initiated内存使用: 3936.2890625 MB
4999999950000000
after sum called内存使用: 3936.29296875 MB
Wall time: 9.39 s

Test the list generator :

def test_generator():
    show_memory_info('initing generator')
    list_2 = (i for i in range(100000000))
    show_memory_info('after generator initiated')
    print(sum(list_2))
    show_memory_info('after sum called')

%time test_generator()

result:

initing generator内存使用: 48.8515625 MB
after generator initiated内存使用: 48.85546875 MB
4999999950000000
after sum called内存使用: 49.11328125 MB
Wall time: 7.95 s

Declaring an iterator is very simple, [i for i in range(100000000)] can generate a list of 100 million elements. After each element is generated, it will be saved in the memory. As you can see from the above code, they take up a huge amount of memory. If the memory is not enough, OOM errors will occur.

However, we don't need to store so many things in memory at the same time, such as summing the elements. We only need to know how much each element is at the moment of addition, and we can throw it away when we use it up.

Function generator (generator)

If the calculation algorithm is more complicated, you can also use a function to implement it when it is not possible to implement a for loop similar to the list generation.

For example, the famous Fibonacci sequence (Fibonacci), except for the first and second numbers, any number can be obtained by adding the first two numbers:

1, 1, 2, 3, 5, 8, 13, 21, 34, ...

The Fibonacci sequence can't be written using a list production, but it is easy to print it using a function:

def fib(max):
    n, a, b = 0, 0, 1
    while n < max:
        print(b)
        a, b = b, a + b
        n = n + 1

fib(6)

Print result:

1
1
2
3
5
8

The above function is only one step away from the generator. Just change print(b) to yield b, and the fib function will become a generator:

def fib(max):
    n, a, b = 0, 0, 1
    while n < max:
        yield b
        a, b = b, a + b
        n = n + 1

This is another way to define generators in addition to list generators.

If a function definition contains the yield keyword, then the function is no longer an ordinary function, but a generator:

fib(6)

result:

<generator object fib at 0x0000000005F04A98>

In the previous list generator, I have already said that the generator can be traversed using a for loop:

for i in fib(6):
    print(i)

Print result:

1
1
2
3
5
8

Here, the most difficult thing to understand is that the execution flow of generator and function is different. Functions are executed sequentially, and return when they encounter a return statement or the last function statement. The function that becomes the generator is executed every time next() is called, and it returns when it encounters a yield statement. When it is executed again, it continues execution from the yield statement returned last time.

As a simple example, define a generator to return the numbers 1, 3, and 5 in turn:

def odd():
    print('step 1')
    yield 1
    print('step 2')
    yield(3)
    print('step 3')
    yield(5)

When calling the generator, first generate a generator object, and then use the next() function to continuously obtain the next return value:

o = odd()
while True:
    print(next(o))

result:

step 1
1
step 2
3
step 3
5
---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
<ipython-input-7-554c5fb505f8> in <module>()
      1 o = odd()
      2 while True:
----> 3     print(next(o))

StopIteration: 

It can be seen that odd is not a normal function, but a generator. During the execution, it will be interrupted when it encounters a yield, and the execution will continue next time. After the yield is executed 3 times, there is no yield to execute, so the fourth call to next() throws a StopIteration exception.

For a function generator (generator), encountering a return statement is an instruction to end the generator (the last line of the function body actually implicitly executes return None), and the for loop ends.

The relationship between iterators and generators

In fact, a generator is a special kind of iterator, and iterators including generators are not equivalent to generators. They can all continuously get the next object through the next() method, and they all have the memory of the position that has been read. Features.

E.g:

l = [1, 2, 3, 4]
l_iter = iter(l)

Completion can be understood as a list generator:

l = [1, 2, 3, 4]
l_iter = (i for i in l)

It can also be understood as a function generator:

l = [1, 2, 3, 4]

def func_generator(l):
    for i in l:
        yield i

l_iter = func_generator(l)

Detailed explanation of using generator to judge subsequence

With the previous basic knowledge, I believe the code at the beginning of the article is a bit more eye-catching. Now let's go back to the code at the beginning of the article and analyze it in detail:

def is_subsequence(s: str, t: str) -> bool:
    t = iter(t)
    return all(i in t for i in s)
 
print(is_subsequence("ace", "abcde"))
print(is_subsequence("aec", "abcde"))

First, t = iter(t) we can understand that a generator is produced:

t = (i for i in t)

And i in t is basically equivalent to:

while True:
    val = next(t)
    if val == i:
        yield True

have a test:

t = "abcde"
t = (i for i in t)
print('a' in t)
print('c' in t)
print(next(t))

result:

True
True
d

You can see that the last line directly returns the next value'd' that matches c.

So we test again:

t = "abcde"
t = (i for i in t)
print('a' in t)
print('c' in t)
print('b' in t)

result:

True
True
False

Then you can use the generator to calculate whether each element of the subsequence satisfies the condition:

t = iter("abcde")
[i in t for i in "aec"]

result:

[True, True, False]

The all() function can determine whether all conditions are met:

print(all([True, True, False]), all([True, True, True]))

result:

False True

The above code all(i in t for i in s) does not state that the all([i in t for i in s]) list production form means that the all operation is performed on a list generator.

to sum up

So at this point, we have solved this problem very elegantly. But be careful, try not to use this technique in actual work, because your leader and colleagues may not know the usage of the generator, even if you write detailed comments, they will be difficult to understand. It is better to use conventional methods to solve it. ! After learning today, I hope you are more proficient than others in the technical knowledge of generators.

Today this article shared four different objects: containers, iterables, iterators, and generators:

  • The container is an iterable object, and the iterable object can get an iterator by calling the iter() function.
  • The iterator can get the next element through the next() function to support traversal.
  • A generator is a special kind of iterator (iterators are not necessarily generators).

Guess you like

Origin blog.csdn.net/pythonxuexi123/article/details/112893871