Python concurrent.future线程池和进程池

        concurrent.futrues是个高级的的库,它只在“任务”级别进行操作,意思是你不需要关注同步和线程、进程的管理了。Future 其实是生产-消费者模型的一种扩展,在生产-消费者模型中,生产者不关心消费者什么时候处理完数据,也不关心消费者处理的结果。你只需要指定一个“max_workers”数量的线程/进程池,然后提交任务和整理结果即可,另一个好处是相对于threading和multiprocessing模块应用于多线程/多进程场景,频繁创建/销毁进程或者线程是非常消耗资源的,而concurrent.futrues有自己的线程池/进程池,以空间换时间。

        concurrent.futrues有两个类:concurrent.futrues.ThreadPoolExecutor(线程池),通常用于IO密集型场景;concurrent.futrues.ProcessPoolExecutor(进程池),通常用于计算密集型场景,为什么这样分使用场景,那是python GIL锁的原因,多个线程只能用一个CPU,这里不再赘述了。两者的使用方法是一样。

        ThreadPoolExecutor/ProcessPoolExecutor常用的方法如下:

1、ThreadPoolExecutor/ProcessPoolExecutor构造实例的时候,传入max_workers参数来设置线程池中最多能同时运行的线程数目。

2、submit(self, fn, *args, **kwargs)函数来提交线程需要执行的任务(函数名和参数)到线程池中,并返回该任务的句柄(类似于文件、画图),注意submit()不是阻塞的,而是立即返回。

3、done()方法判断该任务是否结束。

4、cancel()方法可以取消提交的任务,如果任务已经在线程池中运行了,就取消不了。

5、result()方法可以获取任务的返回值。查看内部代码,发现这个方法是阻塞的。
6、wait(fs, timeout=None, return_when=ALL_COMPLETED),wait接受3个参数,fs表示执行的task序列;timeout表示等待的最长时间,超过这个时间即使线程未执行完成也将返回;return_when表示wait返回结果的条件,默认为ALL_COMPLETED全部执行完成再返回
7、map(self, fn, *iterables, timeout=None, chunksize=1),第一个参数fn是线程执行的函数;第二个参数接受一个可迭代对象;第三个参数timeout跟wait()的timeout一样,但由于map是返回线程执行的结果,如果timeout小于线程执行时间会抛异常TimeoutError。
8、as_completed(fs, timeout=None)方法一次取出所有任务的结果。
    An iterator over the given futures that yields each as it completes.
    
    Args:
        fs: The sequence of Futures (possibly created by different Executors) to
            iterate over.
        timeout: The maximum number of seconds to wait. If None, then there
            is no limit on the wait time.
    
    Returns:
        An iterator that yields the given Futures as they complete (finished or
        cancelled). If any given Futures are duplicated, they will be returned
        once.
    
    Raises:
        TimeoutError: If the entire result iterator could not be generated
            before the given timeout.

        下面比较在计算密集型场景下ThreadPoolExecutor和ProcessPoolExecutor的效率:

import time
from concurrent.futures import ThreadPoolExecutor, as_completed, ProcessPoolExecutor

def get_fib(num):
    if num < 3:
        return 1
    return get_fib(num - 1) + get_fib(num - 2)

def run_thread_pool(workers, fib_num):
    start_time = time.time()
    with ThreadPoolExecutor(workers) as thread_executor:
        tasks = [thread_executor.submit(get_fib, num) for num in range(fib_num)]
        results = [task.result() for task in as_completed(tasks)]
        print(results)
        print("ThreadPoolExecutor spend time: {}s".format(time.time() - start_time))

def run_process_pool(workers, fib_num):
    start_time = time.time()
    with ProcessPoolExecutor(workers) as process_executor:
        tasks = [process_executor.submit(get_fib, num) for num in range(fib_num)]
        results = [task.result() for task in as_completed(tasks)]
        print(results)
        print("ProcessPoolExecutor spend time: {}s".format(time.time() - start_time))

if __name__ == '__main__':
    #run_thread_pool(6, 38)
    run_process_pool(6, 38)

结果如下:

[5, 2, 1, 1, 3, 1, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765, 10946, 28657, 17711, 121393, 75025, 46368, 317811, 196418, 514229, 1346269, 832040, 2178309, 3524578, 9227465, 5702887, 14930352, 24157817]
ThreadPoolExecutor spend time: 24.460843086242676s


[1, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 144, 89, 377, 610, 987, 233, 1597, 2584, 4181, 6765, 10946, 17711, 28657, 75025, 46368, 121393, 196418, 514229, 317811, 1346269, 832040, 2178309, 5702887, 3524578, 9227465, 14930352, 24157817]
ProcessPoolExecutor spend time: 15.908910274505615s

发布了165 篇原创文章 · 获赞 136 · 访问量 44万+

猜你喜欢

转载自blog.csdn.net/Gordennizaicunzai/article/details/104380281