Python iterators and generators

Iterators and Generators

1. Iterator

  An iterator can be understood as a special cursor, which is an abstract description of a series of operations such as loop traversal. The iterator protocol is a binding relationship of the program, and the objects that implement the protocol are called iterable objects. The iterator protocol emphasizes that objects must provide a next or __next__() method, and that there are only two decisions to execute this method, either return the next item in the iteration, or cause a StopIteration exception to terminate the iteration. The essence of the for loop is to loop all objects, and the iterator protocol must be used to generate objects. So a for loop can iterate over all iterable objects (strings, lists, tuples, dictionaries, file objects, etc.). That being the case, why don't we use the next method when we define a sequence ? Why is this? In theory, only objects that implement iterators can be called iterables. When we define strings, lists, tuples, dictionaries, and file objects, we do not give the next method. From this point of view, they do not follow the iterator protocol. But why do we usually think of them as iterable objects?

  Python provides a method that can make a data type an iterable data type, that is, let an object of a certain data type directly call the __iter__() or iter() method. At this time, when we look at the object of this data type, we will The next method is added. Below, as we analyze through a simple example, we use the character string to call the __iter__() method, and then use the iterable object to call the next method.

        string = " hello world " 
        myiter = string. __iter__ ()
         print (myiter)
         print (myiter. __next__ ())   # output: h 
        print (myiter. __next__ ())   # output: e 
        print (myiter. __next__ ())   # Output: l 
        print (myiter. __next__ ())   #Output : l 
        print (myiter. __next__ ())   #Output : o 
        print (myiter. __next__ ())  #output : 
        print (myiter.__ next__ ())   #output :w 
        print (myiter.__ next__ ())   #output :o 
        print (myiter.__ next__ ())   #output :r 
        print (myiter.__ next__ ())   #output : l 
        print (myiter. __next__ ())   #Output : d 
        print (myiter. __next__ ())   #Error : StopIteration exception 

 

  It turns out that the next method is indeed traversing the characters. When the traversal of the next method crosses the boundary, a StopIteration exception is raised. Through the analysis of this experiment, it is found that the mechanism of the for loop has nothing to do with the index, and the strings, lists, tuples, dictionaries, and file objects themselves do not follow the iterator protocol, and they can be regarded as iterable objects. The reason is because the for loop provides a unified method for traversing all objects based on the iterator protocol, that is, before traversing, call the __iter__() method of the object to convert it into an iterator, and then use the iterator protocol to implement Loop access. , until the for loop catches the StopIteration exception and terminates the loop. Of course, we can also understand why when the for loop traverses the dictionary, the key value is obtained instead of the value value, because after it is converted into an iterable object, the next method is called directly , and the next method gets the key value. The same goes for traversal of files. The advantage of iterable objects is that they save memory space, and they can be retrieved when they are used.

 

 

2. Generator

  A data type implemented based on iterators, this data type automatically implements the iterator protocol (other data types need to call the built-in __iter__() method by themselves), so the generator itself has a next method. It does not need to be implemented by calling the __iter__() method. Python has two different ways to provide generators, which will be introduced in detail below. Of course, the author is limited to the knowledge level and cannot cover the knowledge points of generators in all directions.

 

1. Generator function:

  In the conventional function definition method, the yield keyword is used instead of the return keyword to return the result. The yield keyword returns one result at a time, and in the middle of each result, suspends the state of the function so that execution continues from where it left off. The yield keyword is equivalent to calling the iterator protocol directly to encapsulate the iterator. Its first feature is equivalent to return, and the second feature is that it can retain the running state of the function. So how to keep the running state of the function? We can intersperse the yield keyword between states the function wants to keep. Use the next method to execute the following parts of the program in sequence under certain conditions .

        def test():
            yield "Yahoo"

        #call 
        g = test()
         print (g)                #output : <generator object test at 0x000001B64793F7D8> print (g. __next__ ())     #output : Yahoo
        

 

2. Generator expression:

   Similar to a list comprehension, it is a ternary expression. The generator returns result objects on demand, rather than building a list of results all at once. The difference with list comprehensions is that generator expressions use a pair of parentheses, and generator expressions are more memory efficient than list comprehensions. Why is this? Because generator expressions are constructed based on iterators. The advantage of iterable objects is that they save memory space, and they can be retrieved when they are used. At the same time, we also see that the founder of Python is so precise about the memory management of C language. Many of Python's built-in functions are generated based on iterators. Such as map function, sum function and so on. The application of generators is very wide, and the following will be explained in conjunction with the simple sum function.

        num = ( " number: %s " %i for i in range(10 ))
         print (num. __next__ ())   #output : number: 0 print (num. __next__ ())   #output : number: 1 print ( num . __next__ ())   #Output : number: 2 print (num. __next__ ())   #Output : number: 3 print (num. __next__ ())   #Output : number: 4
        
        
        
        

  Lists are very memory intensive , so we can use generators instead. Many built-in functions in Python are iterable, and their parameters are generally iterable parameters. So if we use a generator to construct a calculation result, we usually add a pair of parentheses outside the construction statement, but in an iterable built-in function, the parentheses can be omitted. From the above knowledge, we know that these built-in functions will perform for loop traversal on iterable parameters by default , so it doesn't matter whether you add parentheses or not. The output of the following program is equivalent.

        print (sum([1,2,3,4,5,6,7,8,9]))       #Use a list 
        print (sum(i for i in range(10)))      #Use a generator 
        print (sum( i for i in range(10)))      #remove brackets print (sum( range(10) )               ) #equivalent replacement
        

 

 

3. Summary of the generator

  Generator functions are almost the same as normal functions. The difference between them is that generator functions use the yield keyword to return a value, while regular functions use return to return a value; the yield keyword suspends the state of the function, retaining enough information to later pick up where it left off Go ahead. Python automatically implements iterators for use in an iterative context (such as many built-in functions). Since generators automatically implement the iterator protocol. Therefore, we can call its next method, and when there is no value to return, the generator automatically generates a StopIteration exception; another key knowledge of generators is that generators can only be traversed once, and the way to generate generators is in addition to using generation In addition to the generator function, a parenthesis can also generate a generator.

    #Simple generator function 
    def test():
         for i in range(10 ):
             yield i


    t = test()                #call for i in t:               #the first traversal print (i)              #output the 
    contents of the list      
    tt = (i for i in t)       #the second traversal print (list(tt))           #output : [ ]
        
    

 

    #Parenthesis generator 
    def test():
         for i in range(10 ):
             yield i


    t = test()            #调用
    t1 = (i for i in t)    
    t2 = (i for i in t1)
     print (list(t1))        #output the contents of the list 
    print (list(t2)) #output        : [ ] 
    """
    There has always been a misunderstanding here. In fact, neither t1 nor t2 has any value. Only when the built-in function list is used, list automatically
    When the next function is called, the pass of t1 starts, so the value is generated. After the traversal is completed, since it can only be traversed once,
    The values ​​are all taken by t1, and t2 cannot get the value, so t2 is an empty list.
    
    """

 

4. Advantages of generators

  One of the benefits of generators is lazy computation, returning one result at a time. In other words, it does not generate all computation results at once, which is very useful for large data processing. The memory overhead is relatively small. Another benefit of generators is improved code readability. The idea of ​​programming in Python is to distill code under the premise of readability. This is a must for Python programming. The following will apply the above knowledge to simulate single-threaded concurrent processing, that is, the producer and consumer model. code show as below:

    import time
     def consumer(name):
         print ( " [%s] Ready to eat dumplings " % name)
         while True:
            dumping = yield 
            time.sleep( 1 )
             print ( " [%s] ate %s dumplings " % (name, dumping))

    def producer():
        p1 = consumer("Lily")
        p2 = consumer("Jiony")
        p1.__next__()
        p2.__next__()
        for i in range(10):
            time.sleep(1)
            p1.send(i)
            p2.send(i)

    #call producer         
    ()

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325251117&siteId=291194637