Python: Combining multiprocessing and Asyncio to improve performance

Make a fortune with your little hand, give it a thumbs up!

Introduction

Thanks to the GIL, using multiple threads to perform CPU-intensive tasks was never an option. With the popularity of multi-core CPUs, Python provides a multiprocessing solution to perform CPU-intensive tasks. But until now, there are still some problems in directly using multi-process related APIs.

Before starting this article [1] , we also have a small piece of code to help demonstrate:

import time
from multiprocessing import Process


def sum_to_num(final_num: int) -> int:
    start = time.monotonic()

    result = 0
    for i in range(0, final_num+11):
        result += i

    print(f"The method with {final_num} completed in {time.monotonic() - start:.2f} second(s).")
    return result

This method takes one parameter and increments to it starting from 0. Print method execution time and return result.

Problems with multiple processes

def main():
    # We initialize the two processes with two parameters, from largest to smallest
    process_a = Process(target=sum_to_num, args=(200_000_000,))
    process_b = Process(target=sum_to_num, args=(50_000_000,))

    # And then let them start executing
    process_a.start()
    process_b.start()

    # Note that the join method is blocking and gets results sequentially
    start_a = time.monotonic()
    process_a.join()
    print(f"Process_a completed in {time.monotonic() - start_a:.2f} seconds")

    # Because when we wait process_a for join. The process_b has joined already.
    # so the time counter is 0 seconds.
    start_b = time.monotonic()
    process_b.join()
    print(f"Process_b completed in {time.monotonic() - start_b:.2f} seconds")

As shown in the code, we directly create and start multiple processes, and call the start and join methods of each process. However, there are some problems here:

  1. The join method cannot return the result of task execution.
  2. The join method blocks the main process and executes it sequentially.

Even though later tasks execute faster than earlier ones, as shown in the following diagram:

alt
alt

Problems using the pool

If we use multiprocessing.Pool, there are also some problems:

def main():
    with Pool() as pool:
        result_a = pool.apply(sum_to_num, args=(200_000_000,))
        result_b = pool.apply(sum_to_num, args=(50_000_000,))

        print(f"sum_to_num with 200_000_000 got a result of {result_a}.")
        print(f"sum_to_num with 50_000_000 got a result of {result_b}.")

As the code shows, Pool's apply method is synchronous, which means you must wait for the previous apply task to complete before starting the next apply task.

alt

Of course, we can use the apply_async method to create tasks asynchronously. But again, you need to use the get method to get the result blocking. It brings us back to the problem of the join method:

def main():
    with Pool() as pool:
        result_a = pool.apply_async(sum_to_num, args=(200_000_000,))
        result_b = pool.apply_async(sum_to_num, args=(50_000_000,))

        print(f"sum_to_num with 200_000_000 got a result of {result_a.get()}.")
        print(f"sum_to_num with 50_000_000 got a result of {result_b.get()}.")
alt

Problems with using ProcessPoolExecutor directly

So, what if we use concurrent.futures.ProcesssPoolExecutor to execute our CPU bound tasks?

def main():
    with ProcessPoolExecutor() as executor:
        numbers = [200_000_000, 50_000_000]
        for result in executor.map(sum_to_num, numbers):
            print(f"sum_to_num got a result which is {result}.")

As you can see in the code, everything looks great and is called like asyncio.as_completed. But look at the results; they are still fetched in startup order. This is quite different from asyncio.as_completed, which fetches results in order of execution:

alt
alt

Use asyncio's run_in_executor fix

幸运的是,我们可以使用 asyncio 来处理 IO-bound 任务,它的 run_in_executor 方法可以像 asyncio 一样调用多进程任务。不仅统一了并发和并行的API,还解决了我们上面遇到的各种问题:

async def main():
    loop = asyncio.get_running_loop()
    tasks = []

    with ProcessPoolExecutor() as executor:
        for number in [200_000_000, 50_000_000]:
            tasks.append(loop.run_in_executor(executor, sum_to_num, number))
        
        # Or we can just use the method asyncio.gather(*tasks)
        for done in asyncio.as_completed(tasks):
            result = await done
            print(f"sum_to_num got a result which is {result}")
alt

由于上一篇的示例代码都是模拟我们应该调用的并发过程的方法,所以很多读者在学习之后在实际编码中还是需要帮助理解如何使用。所以在了解了为什么我们需要在asyncio中执行CPU-bound并行任务之后,今天我们将通过一个真实世界的例子来解释如何使用asyncio同时处理IO-bound和CPU-bound任务,并领略asyncio对我们的效率代码。

Reference

[1]

Source: https://towardsdatascience.com/combining-multiprocessing-and-asyncio-in-python-for-performance-boosts-15496ffe96b

本文由 mdnice 多平台发布

Guess you like

Origin blog.csdn.net/swindler_ice/article/details/130715720