multiprocessing
In addition to Python's multithreading module threading
, it actually provides a simpler and easier-to-use concurrent.futures
module.
This module provides ThreadPoolExecutor
2 ProcessPoolExecutor
packaged classes, which is convenient for people to use and also makes the program look more concise.
I personally think it is one of the modules worth learning & using, which can handle most of the daily usage scenarios about multi-threading.
This article will learn concurrent.futures
the modules through several examples.
Environment of this article
- Python 3.7
thread pool executor
Introduction first ThreadPoolExecutor
.
ThreadPoolExecutor
As the name suggests, Thread
multiple tasks are created through the method Executors
to execute and digest multiple tasks.
For example, in the following example, create a parallel execution ThreadPoolExecutor
with a maximum of 5 , and each required parameter is passed to the Executer for processing by calling:Threads
say_hello_to
say_hello_to
submit
from concurrent.futures import ThreadPoolExecutor |
|
def say_hello_to(name): |
|
print(name) |
|
names = ['John', 'Ben', 'Bill', 'Alex', 'Jenny'] |
|
with ThreadPoolExecutor(max_workers=5) as executor: |
|
for n in names: |
|
executor.submit(say_hello_to, n) |
The execution result of the above example:
John |
|
Ben |
|
Bill |
|
Alex |
|
Jenny |
If the above example is executed several times, it is possible to encounter the situation where the text strings are mixed together, such as the output situation similar to the following. This is caused by multiple Threads wanting to output text at the same time. It is not a mysterious problem. This article This will be addressed in a later example.
John |
|
BenBill |
|
Alex |
|
Jenny |
future object
Then talk about concurrent.futures
the very important role in the module - Future
.
In fact, when the call is made submit
, what is returned is not Thread
the result of the program being executed, but Future
an instance of , and this instance is a proxy (Proxy) for the execution result, so we can query the Future instance in Thread through done
, running
, and other methods cancelled
What is the state of the program being executed? If the program has entered done
the state, you can call to result
get the result.
But Python also provides a simpler method - as_completed
, to help check the status, so you can write less code.
So the previous example can be further modified as follows:
from concurrent.futures import ThreadPoolExecutor, as_completed |
|
def say_hello_to(name): |
|
return f'Hi, {name}' |
|
names = ['John', 'Ben', 'Bill', 'Alex', 'Jenny'] |
|
with ThreadPoolExecutor(max_workers=5) as executor: |
|
futures = [] |
|
for n in names: |
|
future = executor.submit(say_hello_to, n) |
|
print(type(future)) |
|
futures.append(future) |
|
for future in as_completed(futures): |
|
print(future.result()) |
In the above example, after the instance is obtained in line 11 future
, it is put into the list in line 13 futures
, and then in line 15, the instances as_completed(futures)
that have been executed are obtained one by one future
, and result()
the results are obtained and printed out.
Its execution results are as follows:
<class 'concurrent.futures._base.Future'> |
|
<class 'concurrent.futures._base.Future'> |
|
<class 'concurrent.futures._base.Future'> |
|
<class 'concurrent.futures._base.Future'> |
|
<class 'concurrent.futures._base.Future'> |
|
Hi, Jenny |
|
Hi, Bill |
|
Hi, Ben |
|
Hi, John |
|
Hi, Alex |
Since we moved the printing function out of Thread, it also solves the situation that the printed text may stick together.
In addition to submit()
obtaining the Future instance first and then checking the status one by one and obtaining the result, you can also directly use map()
the method to directly obtain the execution result of the Thread, such as the following example:
from concurrent.futures import ThreadPoolExecutor, as_completed |
|
def say_hello_to(name): |
|
for i in range(100000): |
|
pass |
|
return f'Hi, {name}' |
|
names = ['John', 'Ben', 'Bill', 'Alex', 'Jenny'] |
|
with ThreadPoolExecutor(max_workers=5) as executor: |
|
results = executor.map(say_hello_to, names) |
|
for r in results: |
|
print(r) |
process pool executor
The method of using ProcessPoolExecutor is exactly the same as that of ThreadPoolExecutor. Basically, you can choose to use ThreadPoolExecutor or ProcessPoolExecutor according to your needs.
However, it is worth noting that after Python 3.5, map()
there is an additional chunksize
parameter that can be used, and this parameter is only valid for ProcessPoolExecutor. This parameter can improve the execution performance of ProcessPoolExecutor when processing a large number of iterables.
使用 ProcessPoolExecutor时,此方法将可迭代对象分成许多块,将其作为单独的任务提交给池。这些块的(近似)大小可以通过将 chunksize 设置为正整数来指定。对于非常长的可迭代对象,与默认大小 1 相比,使用较大的 chunksize 值可以显着提高性能。对于 ThreadPoolExecutor,chunksize 没有效果。
我们可以将先前例子中的 names
乘以1000 倍的长度后,再测试设定不同 chucksize
的性能:
from concurrent.futures import ProcessPoolExecutor, as_completed |
|
def say_hello_to(name): |
|
return f'Hi, {name}' |
|
names = ['John', 'Ben', 'Bill', 'Alex', 'Jenny'] * 1000 |
|
with ProcessPoolExecutor(max_workers=4) as executor: |
|
results = executor.map(say_hello_to, names) |
以下用Jupyter中的 %timeit
测试其性能:
<span style="background-color:#f8f8f8"><span style="color:#212529"><code class="language-python">%timeit <span style="color:#f47067">with</span> ProcessPoolExecutor(max_workers=<span style="color:#6cb6ff">4</span>) <span style="color:#f47067">as</span> executor: executor.<span style="color:#f69d50">map</span>(say_hello_to, names, chunksize=<span style="color:#6cb6ff">6</span>)</code></span></span>
上图可以看到随着 chunksize
的增加,程序平均的执行时间越来越短,但也不是无限制的增加,到某个数量之后,加速的幅度就开始趋缓,因此chunksize 的设定还是得花点心思才行。