Python learning--3.1 slice, iteration, generator, iterator

slice

>>> L = ['Michael', 'Sarah', 'Tracy', 'Bob', 'Jack']

Take the first 3 elements, what should I do?

Stupid way:

>>> [L[0], L[1], L[2]]
['Michael', 'Sarah', 'Tracy']

Python provides the Slice operator, which greatly simplifies this operation.

>>> L[0:3]
['Michael', 'Sarah', 'Tracy']

L[0:3] means that it is taken from index 0 until index 3, but not including index 3. i.e. indices 0, 1, 2, which are exactly 3 elements. You can also omit if the first index is 0

Similarly, since Python supports L[-1] to take the last element, it also supports inverse slicing, try:

>>> L[-2:]
['Bob', 'Jack']
>>> L[-2:-1]
['Bob']

The first 10 numbers, one for every two:

>>> L[:10:2]
[0, 2, 4, 6, 8]

All numbers, one every 5:

>>> L[::5]
[0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95]

Tuple is also a kind of list, the only difference is that tuple is immutable. Therefore, tuples can also be sliced, but the result of the operation is still a tuple:

>>> (0, 1, 2, 3, 4, 5)[:3]
(0, 1, 2)

The string 'xxx' can also be regarded as a kind of list, each element is a character. Therefore, strings can also be sliced, but the result of the operation is still a string:

>>> 'ABCDEFG'[:3]
'ABC'
>>> 'ABCDEFG'[::2]
'ACEG'

iterate

If a list or tuple is given, we can traverse the list or tuple through a for loop. This traversal is called Iteration.
In Python, iteration is done through for...in.

>>> d = {'a': 1, 'b': 2, 'c': 3}
>>> for key in d:
...     print(key)
...
a
c
b

Because the storage of dict is not in the order of list, the order of iterated results is likely to be different.

By default, dict iterates over keys. If you want to iterate over value, you can use for value in d.values(), if you want to iterate over key and value at the same time, you can use for k, v in d.items().
Since strings are also iterables, they can also be used in for loops:

>>> for ch in 'ABC':
...     print(ch)
...
A
B
C

So, when we use a for loop, as long as it acts on an iterable object, the for loop will work fine, and we don't really care if the object is a list or some other data type.

So, how to judge an object is an iterable object? The method is judged by the Iterable type of the collections module:

>>> from collections import Iterable
>>> isinstance('abc', Iterable) # str是否可迭代
True
>>> isinstance([1,2,3], Iterable) # list是否可迭代
True
>>> isinstance(123, Iterable) # 整数是否可迭代
False

The last little question, what if you want to implement a subscript loop like Java for list? Python's built-in enumerate function can turn a list into index-element pairs, so that both the index and the element itself can be iterated over in a for loop

>>> for i, value in enumerate(['A', 'B', 'C']):
...     print(i, value)
...
0 A
1 B
2 C

In the above for loop, two variables are referenced at the same time, which is very common in Python, such as the following code:

>>> for x, y in [(1, 1), (2, 4), (3, 9)]:
...     print(x, y)
...
1 1
2 4
3 9

Small exercise:

def findMinAndMax(L): 
    if len(L)==0:
        return (None,None)
    else:
        max = L[0]
        min = L[0]
        for i in L:
            if i>max:
                max = i
            if i<min:
                min = i
        return (min,max)

# 测试
if findMinAndMax([]) != (None, None):
    print('1测试失败!')
elif findMinAndMax([7]) != (7, 7):
    print('2测试失败!')
elif findMinAndMax([7, 1]) != (1, 7):
    print('3测试失败!')
elif findMinAndMax([7, 1, 3, 9, 5]) != (1, 9):
    print('4测试失败!')
else:
    print('测试成功!')

list comprehension

List comprehensions, or List Comprehensions, are very simple but powerful built-in comprehensions in Python that can be used to create lists.

For example, to generate the list [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] use list(range(1, 11)):

>>> list(range(1, 11))
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

But what if you want to generate [1x1, 2x2, 3x3, …, 10x10]? The first method is to loop:

>>> L = []
>>> for x in range(1, 11):
...    L.append(x * x)
...
>>> L
[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

But the loop is too cumbersome, and the list comprehension can replace the loop with one line to generate the above list:

>>> [x * x for x in range(1, 11)]
[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

When writing a list comprehension, put the element x * x to be generated in the front, followed by a for loop, you can create a list, which is very useful, write it a few times, and you can quickly become familiar with this syntax.

You can also add an if judgment after the for loop, so that we can filter out only even squares:

>>> [x * x for x in range(1, 11) if x % 2 == 0]
[4, 16, 36, 64, 100]

It is also possible to use a two-layer loop, which can generate a full permutation:

>>> [m + n for m in 'ABC' for n in 'XYZ']
['AX', 'AY', 'AZ', 'BX', 'BY', 'BZ', 'CX', 'CY', 'CZ']

Loops with three or more layers are rarely used.

Using list comprehensions, you can write very concise code. For example, listing all files and directory names in the current directory can be done with one line of code:

>>> import os # 导入os模块，模块的概念后面讲到
>>> [d for d in os.listdir('.')] # os.listdir可以列出文件和目录
['.emacs.d', '.ssh', '.Trash', 'Adlm', 'Applications', 'Desktop', 'Documents', 'Downloads', 'Library', 'Movies', 'Music', 'Pictures', 'Public', 'VirtualBox VMs', 'Workspace', 'XCode']

The for loop can actually use two or more variables at the same time. For example, the items() of dict can iterate key and value at the same time:

>>> d = {'x': 'A', 'y': 'B', 'z': 'C' }
>>> for k, v in d.items():
...     print(k, '=', v)
...
y = B
x = A
z = C

Therefore, a list comprehension can also use two variables to generate a list:

>>> d = {'x': 'A', 'y': 'B', 'z': 'C' }
>>> [k + '=' + v for k, v in d.items()]
['y=B', 'x=A', 'z=C']

Finally convert all strings in a list to lowercase:

>>> L = ['Hello', 'World', 'IBM', 'Apple']
>>> [s.lower() for s in L]
['hello', 'world', 'ibm', 'apple']

The for loop can actually use two or more variables at the same time. For example, the items() of dict can iterate key and value at the same time:

>>> d = {'x': 'A', 'y': 'B', 'z': 'C' }
>>> for k, v in d.items():
...     print(k, '=', v)
...
y = B
x = A
z = C

Therefore, a list comprehension can also use two variables to generate a list:

>>> d = {'x': 'A', 'y': 'B', 'z': 'C' }
>>> [k + '=' + v for k, v in d.items()]
['y=B', 'x=A', 'z=C']

Builder

With list comprehensions, we can directly create a list. However, due to memory constraints, the list capacity is definitely limited. Moreover, creating a list with 1 million elements not only takes up a lot of storage space, but if we only need to access the first few elements, then most of the space occupied by the latter elements is wasted.
So, if the list elements can be calculated according to a certain algorithm, can we continue to calculate the subsequent elements in the process of looping? This saves a lot of space by not having to create a complete list. In Python, this mechanism of computing while looping is called a generator: generator.
To create a generator, there are many ways. The first method is very simple, as long as a list comprehension [] is changed to (), a generator is created:

>>> L = [x * x for x in range(10)]
>>> L
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
>>> g = (x * x for x in range(10))
>>> g
<generator object <genexpr> at 0x1022ef630>

The difference between creating L and g is only the outermost [] and (), L is a list, and g is a generator.

We can print out each element of the list directly, but how do we print out each element of the generator?

If you want to print them out one by one, you can get the next return value of the generator through the next() function:

>>> next(g)
0
>>> next(g)
1
>>> next(g)
4
>>> next(g)
9
>>> next(g)
16
>>> next(g)
25
>>> next(g)
36
>>> next(g)
49
>>> next(g)
64
>>> next(g)
81
>>> next(g)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

The generator saves the algorithm. Each time next(g) is called, the value of the next element of g is calculated until the last element is calculated. When there are no more elements, a StopIteration error is thrown.
Of course, the above kind of constantly calling next(g) is too perverted. The correct way is to use a for loop, because the generator is also an iterable object:

>>> g = (x * x for x in range(10))
>>> for n in g:
...     print(n)
... 
0
1
4
9
16
25
36
49
64
81

So, after we create a generator, we basically never call next(), but iterate it through a for loop, and don't need to care about StopIteration errors.

generators are very powerful. If the calculation algorithm is complex and cannot be implemented with a for loop similar to list generation, it can also be implemented with a function.
For example, in the famous Fibonacci sequence, any number except the first and second numbers can be obtained by adding the first two numbers:

1, 1, 2, 3, 5, 8, 13, 21, 34, …

def fib(max):
    n, a, b = 0, 0, 1
    while n < max:
        print(b)
        a, b = b, a + b
        n = n + 1
    return 'done'

The above functions and generators are just one step away. To turn the fib function into a generator, just change print(b) to yield b:

def fib(max):
    n, a, b = 0, 0, 1
    while n < max:
        yield b
        a, b = b, a + b
        n = n + 1
    return 'done'

This is another way to define a generator. If a function definition contains the yield keyword, the function is no longer a normal function, but a generator:

>>> f = fib(6)
>>> f
<generator object fib at 0x104feaaa0>

Here, the most difficult thing to understand is that the execution flow of generator and function is different. Functions are executed sequentially, and the return statement or the last line of function statement is encountered. The function that becomes the generator is executed every time next() is called, and returns when the yield statement is encountered.
Similarly, after changing the function to a generator, we basically never use next() to get the next return value, but directly use the for loop to iterate:

>>> for n in fib(6):
...     print(n)
...
1
1
2
3
5
8

Small scale chopper:

def fib(max):
    L = [1]
    S = [1]
    print(S)
    n = 1
    while n < max: 
        yield S
        L = S
        S = []
        S.append(L[0])
        b = 1
        for i in L:
            S.append(i+b)
            b = i
        n = n + 1 
    return 'done'

aa = fib(10)

for a in aa:
    print(a)

[1]
[1, 2]
[1, 2, 3]
[1, 2, 3, 5]
[1, 2, 3, 5, 8]
[1, 2, 3, 5, 8, 13]
[1, 2, 3, 5, 8, 13, 21]
[1, 2, 3, 5, 8, 13, 21, 34]
[1, 2, 3, 5, 8, 13, 21, 34, 55]

iterator

We already know that the data types that can be directly applied to the for loop are as follows:

One is a collection data type, such as list, tuple, dict, set, str, etc.;

One is generator, including generator and generator function with yield.

These objects that can act directly on the for loop are collectively called iterable objects: Iterable.

You can use isinstance() to determine whether an object is an Iterable object:

>>> from collections import Iterable
>>> isinstance([], Iterable)
True
>>> isinstance({}, Iterable)
True
>>> isinstance('abc', Iterable)
True
>>> isinstance((x for x in range(10)), Iterable)
True
>>> isinstance(100, Iterable)
False

The generator can not only act on the for loop, but also can be continuously called by the next() function and return the next value, until the StopIteration error is finally thrown, indicating that the next value cannot be returned.

An object that can be called by the next() function and continuously returns the next value is called an iterator: Iterator.

You can use isinstance() to determine whether an object is an Iterator object:

>>> from collections import Iterator
>>> isinstance((x for x in range(10)), Iterator)
True
>>> isinstance([], Iterator)
False
>>> isinstance({}, Iterator)
False
>>> isinstance('abc', Iterator)
False

Generators are Iterator objects, but although list, dict, and str are Iterables, they are not Iterators.

To turn iterables such as list, dict, str, etc. into Iterators, you can use the iter() function:

>>> isinstance(iter([]), Iterator)
True
>>> isinstance(iter('abc'), Iterator)
True

You may ask, why are data types such as list, dict, str, etc. not Iterators?

This is because Python's Iterator object represents a data stream. The Iterator object can be called by the next() function and returns the next data continuously, until a StopIteration error is thrown when there is no data. This data stream can be regarded as an ordered sequence, but we cannot know the length of the sequence in advance, and can only continuously calculate the next data on demand through the next() function, so the calculation of Iterator is lazy, only when needed. It will not be calculated until the next data is returned.

Iterator can even represent an infinite stream of data, such as all natural numbers. And using list is never possible to store all natural numbers.

Summary
All objects that can be used in a for loop are of type Iterable;

All objects that can act on the next() function are of type Iterator, which represent a sequence of lazy computations;

Collection data types such as list, dict, str, etc. are Iterable but not Iterator, but an Iterator object can be obtained through the iter() function.