Python concurrent.future线程池和进程池

concurrent.futrues是个高级的的库，它只在“任务”级别进行操作，意思是你不需要关注同步和线程、进程的管理了。Future 其实是生产-消费者模型的一种扩展，在生产-消费者模型中，生产者不关心消费者什么时候处理完数据，也不关心消费者处理的结果。你只需要指定一个“max_workers”数量的线程/进程池，然后提交任务和整理结果即可，另一个好处是相对于threading和multiprocessing模块应用于多线程/多进程场景，频繁创建/销毁进程或者线程是非常消耗资源的，而concurrent.futrues有自己的线程池/进程池，以空间换时间。

concurrent.futrues有两个类：concurrent.futrues.ThreadPoolExecutor（线程池），通常用于IO密集型场景；concurrent.futrues.ProcessPoolExecutor（进程池），通常用于计算密集型场景，为什么这样分使用场景，那是python GIL锁的原因，多个线程只能用一个CPU，这里不再赘述了。两者的使用方法是一样。

ThreadPoolExecutor/ProcessPoolExecutor常用的方法如下：

1、ThreadPoolExecutor/ProcessPoolExecutor构造实例的时候，传入max_workers参数来设置线程池中最多能同时运行的线程数目。

2、submit(self, fn, *args, **kwargs)函数来提交线程需要执行的任务（函数名和参数）到线程池中，并返回该任务的句柄（类似于文件、画图），注意submit()不是阻塞的，而是立即返回。

3、done()方法判断该任务是否结束。

4、cancel()方法可以取消提交的任务，如果任务已经在线程池中运行了，就取消不了。

5、result()方法可以获取任务的返回值。查看内部代码，发现这个方法是阻塞的。
6、wait(fs, timeout=None, return_when=ALL_COMPLETED)，wait接受3个参数，fs表示执行的task序列；timeout表示等待的最长时间，超过这个时间即使线程未执行完成也将返回；return_when表示wait返回结果的条件，默认为ALL_COMPLETED全部执行完成再返回
7、map(self, fn, *iterables, timeout=None, chunksize=1)，第一个参数fn是线程执行的函数；第二个参数接受一个可迭代对象；第三个参数timeout跟wait()的timeout一样，但由于map是返回线程执行的结果，如果timeout小于线程执行时间会抛异常TimeoutError。
8、as_completed(fs, timeout=None)方法一次取出所有任务的结果。
An iterator over the given futures that yields each as it completes.

Args:
fs: The sequence of Futures (possibly created by different Executors) to
iterate over.
timeout: The maximum number of seconds to wait. If None, then there
is no limit on the wait time.

Returns:
An iterator that yields the given Futures as they complete (finished or
cancelled). If any given Futures are duplicated, they will be returned
once.

Raises:
TimeoutError: If the entire result iterator could not be generated
before the given timeout.

下面比较在计算密集型场景下ThreadPoolExecutor和ProcessPoolExecutor的效率：

import time
from concurrent.futures import ThreadPoolExecutor, as_completed, ProcessPoolExecutor

def get_fib(num):
    if num < 3:
        return 1
    return get_fib(num - 1) + get_fib(num - 2)

def run_thread_pool(workers, fib_num):
    start_time = time.time()
    with ThreadPoolExecutor(workers) as thread_executor:
        tasks = [thread_executor.submit(get_fib, num) for num in range(fib_num)]
        results = [task.result() for task in as_completed(tasks)]
        print(results)
        print("ThreadPoolExecutor spend time: {}s".format(time.time() - start_time))

def run_process_pool(workers, fib_num):
    start_time = time.time()
    with ProcessPoolExecutor(workers) as process_executor:
        tasks = [process_executor.submit(get_fib, num) for num in range(fib_num)]
        results = [task.result() for task in as_completed(tasks)]
        print(results)
        print("ProcessPoolExecutor spend time: {}s".format(time.time() - start_time))

if __name__ == '__main__':
    #run_thread_pool(6, 38)
    run_process_pool(6, 38)

结果如下：

[5, 2, 1, 1, 3, 1, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765, 10946, 28657, 17711, 121393, 75025, 46368, 317811, 196418, 514229, 1346269, 832040, 2178309, 3524578, 9227465, 5702887, 14930352, 24157817]
ThreadPoolExecutor spend time: 24.460843086242676s

[1, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 144, 89, 377, 610, 987, 233, 1597, 2584, 4181, 6765, 10946, 17711, 28657, 75025, 46368, 121393, 196418, 514229, 317811, 1346269, 832040, 2178309, 5702887, 3524578, 9227465, 14930352, 24157817]
ProcessPoolExecutor spend time: 15.908910274505615s

Gordennizaicunzai

发布了165 篇原创文章 · 获赞 136 · 访问量 44万+

他的留言板关注

Python concurrent.future线程池和进程池

猜你喜欢