Python series iterators and generators

Many Python programmers will 迭代器confuse 生成器the concepts and functions of and , and can't tell the difference between the two. Today, let's talk about these two concepts.

Iterator

Iterator Pattern

Iterator is a design pattern that provides a way to sequentially access the elements of an aggregate object without exposing its internal implementation. It is a lazy way to get data, we don't need to load all the data into memory at one time, which can avoid the trouble that the data set is too large and the memory cannot be loaded all.
This application scenario, for example: reading a large file, analyzing the keywords of each line .

One of the simplest iterator patterns, represented as an interface, contains two methods:

  1. Next()return next element
  2. hasNext()Returns whether there is a next element

An object that implements these two methods is an iterator.

Iterators in Python

Many times, Python programmers ignore the difference between Iterator and Iterable Object.

In fact, we have to distinguish the two of them well.

Iterable Object

An iterable is an object that has the ability to return one of its own data elements at a time.

E.g:

In [1]: a = [1, 2, 3, 4, 5]

In [2]: for i in a:
   ...:     print(i)
   ...:
1
2
3
4
5

In [3]: b = {"first":1, "second":2, "third":3}

In [4]: for i in b:
   ...:     print i
   ...:
second
third
first

The above code outputs all the elements in the list and all the keys in the dict by iteration. So, we call lists and dicts iterables (not iterators).

In Python, all collections are iterable. Inside the language, iterators support the operations listed below:

  • for loop
  • Traverse files and directories
  • List comprehensions, dictionary comprehensions, and set comprehensions
  • Tuple unpacking
  • When calling a function, use * to unpack the arguments
  • Building and extending collection types

So you can see that iterative operations are important in many places in python.

Reasons why sequences can be iterated

This relies on a buildin-function iter(). If the interpreter wants to iterate over the object x, it will call iter()to generate an iterator to iterate.

The built-in iter function does the following:

  1. Checks if the object implements the __iter__method , and if so calls it, getting an iterator.
  2. If the __iter__method , but the __getitem__method is implemented, Python creates an iterator that tries to get the elements in order (starting at index 0).
  3. If the attempt fails, Python throws a TypeError exception, usually saying "X object is not iterable".
In [8]: x = 2
In [9]: iter(x)
-----------------------------------------------------
TypeError           Traceback (most recent call last)
<ipython-input-9-128770259dbe> in <module>()
----> 1 iter(x)

TypeError: 'int' object is not iterable

Standard sequences all implement the __getitem__ method. Actually, they both implement the __iter__ method, so you should too. The reason why __getitem__ is implemented is for backward compatibility, but it may be deprecated in the future.

How to implement iterable objects

How does the Object created by yourself become an iterable object? How to create an iterator yourself? It's actually very simple.

For iterable objects, either of the following two requirements need to be met (see above for the reasons):

  1. Has a __getitem__method ; accepts a parameter index
  2. Has __iter__method ; returns an Iterator

example:

#!/usr/bin/env python


class MyIterableObject():

    def __init__(self, s):
        self.seq = s.split(' ')

    def __getitem__(self, index):
        return self.seq[index]
        
    def __iter__(self):  
        return MyIterator(self.seq) # MyIterator的具体实现参见后面


if __name__ == '__main__':

    mio = MyIterableObject("a b c d e f g")

    for i in mio:
        print(i)

Iterator

iterWhen an iterator is obtained with the function, the iterator can be manipulated to obtain the data of the object.

Use the next()method to get elements one by one. When all elements are obtained and continue to call the next()method , a StopIteration exception will be thrown.

as follows:

In [13]: a = [1, 2, 3, 4, 5]
In [14]: i = iter(a)
In [15]: while True:
    ...:     print(next(i))
    ...:
1
2
3
4
5
-----------------------------------------------------
StopIteration       Traceback (most recent call last)
<ipython-input-15-ac43f8f9aeeb> in <module>()
      1 while True:
----> 2     print(next(i))
      3

StopIteration:

Python's iterator is simpler, it does not support repositioning to the beginning of such operations. Once an iterator is used, if you want to read from the beginning, you can only create a new iterator.

How to implement an iterator

Standard python iterators need to implement two methods:

  1. __iter__return the iterator itself
  2. next()Returns the next element in the dataset. If there is no next one, throw a StopIteration

TIPS:
The name of the next() method in python3 has been changed __next__, but the way to use python2 is still possible.

example:


class MyIterator():

    def __init__(self, s):
        
        self.seq = s
        self.len = len(self.seq)
        self.index = 0

    def __iter__(self):
        return self

    def next(self):
        try:
            n = self.seq[self.index]
        except IndexError:
            raise StopIteration

        self.index += 1

        return n

One thing to note here: in the iterator pattern description, there needs to be a method to determine whether it is the last element, and this function is replaced by an exception in python. In the process of using the iterator, we can catch this exception. If you use the buildin for .. inmethod , it will automatically capture it for us.

Generator

First of all, when we usually talk about Generatorthis thing, in fact, it generally refers to two things:

  1. Generator Function: A function that uses the yieldkeyword , it becomes a generator function
  2. Generator Object: Generated by Generator Function, it is a special Iterator. It wraps the definition body of the generator function and implements the __iter__and nexttwo methods, conforming to the Iterator protocol.

What is the biggest difference between generators and iterators?

The main difference is that the method of value generation is different. When using an iterator, all elements to be iterated must already exist. For generators, each value does not have to exist, it can be calculated ( generated ) during execution.

For example: use a generator to generate a proportional sequence

def arithmetic_progression(base, dif, count):
    for n in range(count):
        yield base + dif * n


if __name__ == '__main__':

    for i in arithmetic_progression(1, 3, 10):
        print(i)

It can be seen that this proportional sequence does not exist, and is calculated every time the yield is executed during the iteration process.

This feature can be achieved thanks to the yieldkeyword . It can suspend the execution of the function, return the value, and continue where it left off the next time. Its execution flow is as follows:

  1. Call the generator function with next
  2. The function executes to yield, returns a value, and suspends the function
  3. Repeat steps 1-2 until all values ​​are returned
  4. StopIteration is thrown if next is used

The code verification is as follows:

In [21]: def test():
    ...:     yield 1
    ...:     yield 2
    ...:     yield 3
    ...:

In [22]: gen = test()

In [23]: next(gen)
Out[23]: 1

In [24]: next(gen)
Out[24]: 2

In [25]: next(gen)
Out[25]: 3

In [26]: next(gen)
-----------------------------------------------------
StopIteration       Traceback (most recent call last)
<ipython-input-26-8a6233884a6c> in <module>()
----> 1 next(gen)

StopIteration:

Use generators instead of iterators

Now we replace the iterator scheme above with generators MyIterableObject.

class MyGenerator():

    def __init__(self, s):
        self.seq = s.split(' ')

    def __iter__(self):
        for s in self.seq:
            yield s

The code is simplified a lot, we don't need to create the Iterator object by ourselves, yield will do it for us.

Iterator toolset (itertools)

Although, the use of generators is simple enough, but why is a language like python that saves your life time not further packaged?

Python has a lot of built-in generator functions, such as traversing folders os.walk, tools map, enumerateand so on.

Python also has an official library called itertools, which contains 19 generator functions that can be combined to perform various functions.

end

The above is the difference between iterators and generators. In fact, these two things are not difficult to understand. However, there are several concepts that are easily confused here. As long as you understand these concepts, you can distinguish them clearly!


Author and source ( reposkeeper ) authorized to share By CC BY-SA 4.0 Creative Commons License

Follow the WeChat public account to get the push of new articles!
qrcode

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325125481&siteId=291194637