本文介绍了如何使用 Python 多进程模块并行运行代码。
Python多进程简介
通常,程序处理两种类型的任务:
- I/O 绑定任务:如果一个任务做了大量的输入/输出操作,它就被称为I/O 绑定任务。I/O 绑定任务的典型示例是从文件读取、写入文件、连接到数据库和发出网络请求。对于 I/O 绑定任务,可以使用多线程来实现加速。
- CPU 绑定任务:当一个任务使用 CPU 进行大量操作时,它被称为 CPU 绑定任务。例如,数字计算、图像大小调整和视频流都是 CPU 密集型任务。要加速具有大量 CPU 绑定任务的程序,可以使用多进程。
多进程允许两个或多个处理器同时处理程序的两个或多个不同部分。在 Python 中,使用multiprocessing
模块来实现多进程。
Python 多进程示例
请参阅以下程序:
import time |
|
def task(n=100_000_000): |
|
while n: |
|
n -= 1 |
|
if __name__ == '__main__': |
|
start = time.perf_counter() |
|
task() |
|
task() |
|
finish = time.perf_counter() |
|
print(f'It took {finish-start: .2f} second(s) to finish') |
输出:
<span style="background-color:#f8f8f8"><span style="color:#212529"><code class="language-scss">It took <span style="color:#6cb6ff">7.64</span> <span style="color:#f69d50">second</span>(s) to finish</code></span></span>
使用多进程模块
使用多进程模块对上面的代码进行改进:
import time |
|
import multiprocessing |
|
def task(n=100_000_000): |
|
while n: |
|
n -= 1 |
|
if __name__ == '__main__': |
|
start = time.perf_counter() |
|
p1 = multiprocessing.Process(target=task) |
|
p2 = multiprocessing.Process(target=task) |
|
p1.start() |
|
p2.start() |
|
p1.join() |
|
p2.join() |
|
finish = time.perf_counter() |
|
print(f'It took {finish-start: .2f} second(s) to finish') |
输出:
<span style="background-color:#f8f8f8"><span style="color:#212529"><code class="language-scss">It took <span style="color:#6cb6ff">4.29</span> <span style="color:#f69d50">second</span>(s) to finish</code></span></span>
时间大幅减少。
首先,导入多进程模块:
<span style="background-color:#f8f8f8"><span style="color:#212529"><code class="language-python"><span style="color:#f47067">import</span> multiprocessing</code></span></span>
其次,创建两个进程并将任务函数传递给每个进程:
p1 = multiprocessing.Process(target=task) |
|
p2 = multiprocessing.Process(target=task) |
请注意,Process()
构造函数返回一个新Process
对象。
三、调用对象的start()
方法Process
启动进程:
p1.start() |
|
p2.start() |
最后,通过调用join()
方法等待进程完成:
p1.join() |
|
p2.join() |
Python多进程实例
我们将使用多处理模块来调整高分辨率图像的大小。
首先,安装Pillow
图像处理库:
<span style="background-color:#f8f8f8"><span style="color:#212529"><code class="language-python">pip install Pillow</code></span></span>
二、开发一个程序,将images
文件夹中的图片制作成缩略图,并保存到thumbs
文件夹中:
import time |
|
import os |
|
from PIL import Image, ImageFilter |
|
filenames = [ |
|
'images/1.jpg', |
|
'images/2.jpg', |
|
'images/3.jpg', |
|
'images/4.jpg', |
|
'images/5.jpg', |
|
] |
|
def create_thumbnail(filename, size=(50,50), thumb_dir ='thumbs'): |
|
img = Image.open(filename) |
|
img = img.filter(ImageFilter.GaussianBlur()) |
|
img.thumbnail(size) |
|
img.save(f'{thumb_dir}/{os.path.basename(filename)}') |
|
print(f'{filename} was processed...') |
|
if __name__ == '__main__': |
|
start = time.perf_counter() |
|
for filename in filenames: |
|
create_thumbnail(filename) |
|
finish = time.perf_counter() |
|
print(f'It took {finish-start: .2f} second(s) to finish') |
在我们的电脑上,大约需要 1.28 秒才能完成:
images/1.jpg was processed... |
|
images/2.jpg was processed... |
|
images/3.jpg was processed... |
|
images/4.jpg was processed... |
|
images/5.jpg was processed... |
|
It took 1.28 second(s) to finish |
第三,修改程序以使用多进程。每个进程都会为一张图片创建一个缩略图:
import time |
|
import os |
|
import multiprocessing |
|
from PIL import Image, ImageFilter |
|
filenames = [ |
|
'images/1.jpg', |
|
'images/2.jpg', |
|
'images/3.jpg', |
|
'images/4.jpg', |
|
'images/5.jpg', |
|
] |
|
def create_thumbnail(filename, size=(50,50),thumb_dir ='thumbs'): |
|
img = Image.open(filename) |
|
img = img.filter(ImageFilter.GaussianBlur()) |
|
img.thumbnail(size) |
|
img.save(f'{thumb_dir}/{os.path.basename(filename)}') |
|
print(f'{filename} was processed...') |
|
if __name__ == '__main__': |
|
start = time.perf_counter() |
|
# create processes |
|
processes = [multiprocessing.Process(target=create_thumbnail, args=[filename]) |
|
for filename in filenames] |
|
# start the processes |
|
for process in processes: |
|
process.start() |
|
# wait for completion |
|
for process in processes: |
|
process.join() |
|
finish = time.perf_counter() |
|
print(f'It took {finish-start: .2f} second(s) to finish') |
输出:
images/5.jpg was processed... |
|
images/4.jpg was processed... |
|
images/1.jpg was processed... |
|
images/3.jpg was processed... |
|
images/2.jpg was processed... |
|
It took 0.82 second(s) to finish |
在这种情况下,输出显示程序处理图片的速度要快得多。
总结
- 使用 Python 多进程并行运行代码以处理 CPU 密集型任务。