Python's list comprehension and generator expression comparison

overview

List Comprehension and Generator Expression in Python are two very similar expressions, but their meanings are not very different. Here is a comparison.

list comprehension

List comprehension is a relatively common technique, which can simplify the situation that originally requires for loop and if else statement into one instruction, and finally obtain a list object:

even = [e for e in range(10) if e % 2 == 0]
复制代码

The specific details are not too much to expand, I believe that many people who use Python have enough understanding of this syntax.

One thing to note is that the list comprehension is not lazy calculation (Lazy Loading), so all list members are calculated immediately after the statement is declared (Eager Loading), so in the case of many array members, the speed will be very slow, For example, the following three list comprehension time-consuming statistics in the IPython environment:

In [1]: %timeit even = [e for e in range(100000) if e % 2 == 0]
5.5 ms ± 24.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [2]: %timeit even = [e for e in range(1000000) if e % 2 == 0]
58.9 ms ± 440 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [3]: %timeit even = [e for e in range(100000000) if e % 2 == 0]
5.65 s ± 26.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
复制代码

It can be seen that as the number of elements increases, the execution time of the list comprehension method will also increase accordingly, and the memory occupied will also increase.

There is a situation where we define many, many array elements, but not all of them can be used in the end. For example, after several commands, only the first 10 elements in the list may be used in the end, or only those that meet a certain Elements with some conditions will be used. In this case, Eager mode will waste time and memory to create many unused elements. Obviously, there is a lot of room for improvement.

generator expression

The generator expression solves the above problems. Its element iteration is lazy, so it is produced only when needed, avoiding additional memory and time overhead: No matter how many elements the generator expression has, it is always Constant time because it doesn't create the element immediately.

So what is the syntax of the generator expression? It's very simple, just change the square brackets in the list comprehension to parentheses:

even_gen = (e for e in range(10) if e % 2 == 0)
复制代码

Note that its type is a generator type:

type(even_gen)
# generator
复制代码

Time-consuming statistics of creating generator expressions:

In [1]: %timeit even_gen = (e for e in range(100000) if e % 2 == 0)
376 ns ± 2.61 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [2]: %timeit even_gen = (e for e in range(10000000) if e % 2 == 0)
382 ns ± 1.63 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [3]: %timeit even_gen = (e for e in range(1000000000) if e % 2 == 0)
384 ns ± 2.85 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
复制代码

It can be seen that with the increase of elements, the creation time is basically the same, and it is much lower than the time-consuming of list derivation.

Use scene selection

So does it mean that the list derivation can be replaced by a generator expression in use? Not necessarily, because the list derivation gets a list, and many convenient operations (such as slice, etc.) can be applied to it, and the generator expression The formula will not work:

In [17]: even = [e for e in range(10) if e % 2 == 0]

In [18]: even[:3]
Out[18]: [0, 2, 4]

In [19]: even_gen = (e for e in range(10) if e % 2 == 0)

In [20]: even_gen[:3]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [20], in <cell line: 1>()
----> 1 even_gen[:3]

TypeError: 'generator' object is not subscriptable
复制代码

And there is a fatal difference between the two: generator expressions can only be iterated once, while list comprehensions can be used many times, for example:

In [22]: even_gen = (e for e in range(10) if e % 2 == 0)

In [23]: for e in even_gen:
    ...:     print(e)
    ...:
0
2
4
6
8

In [24]: for e in even_gen:
    ...:     print(e)
    ...:
复制代码

You can see that when the generator expression is iterated for the second time, there are no elements in it! That is, the first iteration has all been generated, and the list comprehension has the same content for each iteration:

In [25]: even = [e for e in range(10) if e % 2 == 0]

In [26]: for e in even:
    ...:     print(e)
    ...:
0
2
4
6
8

In [27]: for e in even:
    ...:     print(e)
    ...:
0
2
4
6
8
复制代码

So in summary, the usage recommendations are as follows:

  • If you want to iterate multiple times, it is recommended to use a list comprehension
  • If the array is large or has infinite elements, it is recommended to use a generator expression
  • Other scenarios: both are acceptable, use one according to the situation, if there is no problem of speed and convenience, if there is a problem, try another

 

Guess you like

Origin blog.csdn.net/weixin_73136678/article/details/128939402