concurrent.futrues是个高级的的库,它只在“任务”级别进行操作,意思是你不需要关注同步和线程、进程的管理了。Future 其实是生产-消费者模型的一种扩展,在生产-消费者模型中,生产者不关心消费者什么时候处理完数据,也不关心消费者处理的结果。你只需要指定一个“max_workers”数量的线程/进程池,然后提交任务和整理结果即可,另一个好处是相对于threading和multiprocessing模块应用于多线程/多进程场景,频繁创建/销毁进程或者线程是非常消耗资源的,而concurrent.futrues有自己的线程池/进程池,以空间换时间。
concurrent.futrues有两个类:concurrent.futrues.ThreadPoolExecutor(线程池),通常用于IO密集型场景;concurrent.futrues.ProcessPoolExecutor(进程池),通常用于计算密集型场景,为什么这样分使用场景,那是python GIL锁的原因,多个线程只能用一个CPU,这里不再赘述了。两者的使用方法是一样。
ThreadPoolExecutor/ProcessPoolExecutor常用的方法如下:
1、ThreadPoolExecutor/ProcessPoolExecutor构造实例的时候,传入max_workers参数来设置线程池中最多能同时运行的线程数目。
2、submit(self, fn, *args, **kwargs)函数来提交线程需要执行的任务(函数名和参数)到线程池中,并返回该任务的句柄(类似于文件、画图),注意submit()不是阻塞的,而是立即返回。
3、done()方法判断该任务是否结束。
4、cancel()方法可以取消提交的任务,如果任务已经在线程池中运行了,就取消不了。
5、result()方法可以获取任务的返回值。查看内部代码,发现这个方法是阻塞的。
6、wait(fs, timeout=None, return_when=ALL_COMPLETED),wait接受3个参数,fs表示执行的task序列;timeout表示等待的最长时间,超过这个时间即使线程未执行完成也将返回;return_when表示wait返回结果的条件,默认为ALL_COMPLETED全部执行完成再返回
7、map(self, fn, *iterables, timeout=None, chunksize=1),第一个参数fn是线程执行的函数;第二个参数接受一个可迭代对象;第三个参数timeout跟wait()的timeout一样,但由于map是返回线程执行的结果,如果timeout小于线程执行时间会抛异常TimeoutError。
8、as_completed(fs, timeout=None)方法一次取出所有任务的结果。
An iterator over the given futures that yields each as it completes.
Args:
fs: The sequence of Futures (possibly created by different Executors) to
iterate over.
timeout: The maximum number of seconds to wait. If None, then there
is no limit on the wait time.
Returns:
An iterator that yields the given Futures as they complete (finished or
cancelled). If any given Futures are duplicated, they will be returned
once.
Raises:
TimeoutError: If the entire result iterator could not be generated
before the given timeout.
下面比较在计算密集型场景下ThreadPoolExecutor和ProcessPoolExecutor的效率:
import time
from concurrent.futures import ThreadPoolExecutor, as_completed, ProcessPoolExecutor
def get_fib(num):
if num < 3:
return 1
return get_fib(num - 1) + get_fib(num - 2)
def run_thread_pool(workers, fib_num):
start_time = time.time()
with ThreadPoolExecutor(workers) as thread_executor:
tasks = [thread_executor.submit(get_fib, num) for num in range(fib_num)]
results = [task.result() for task in as_completed(tasks)]
print(results)
print("ThreadPoolExecutor spend time: {}s".format(time.time() - start_time))
def run_process_pool(workers, fib_num):
start_time = time.time()
with ProcessPoolExecutor(workers) as process_executor:
tasks = [process_executor.submit(get_fib, num) for num in range(fib_num)]
results = [task.result() for task in as_completed(tasks)]
print(results)
print("ProcessPoolExecutor spend time: {}s".format(time.time() - start_time))
if __name__ == '__main__':
#run_thread_pool(6, 38)
run_process_pool(6, 38)
结果如下:
[5, 2, 1, 1, 3, 1, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765, 10946, 28657, 17711, 121393, 75025, 46368, 317811, 196418, 514229, 1346269, 832040, 2178309, 3524578, 9227465, 5702887, 14930352, 24157817]
ThreadPoolExecutor spend time: 24.460843086242676s
[1, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 144, 89, 377, 610, 987, 233, 1597, 2584, 4181, 6765, 10946, 17711, 28657, 75025, 46368, 121393, 196418, 514229, 317811, 1346269, 832040, 2178309, 5702887, 3524578, 9227465, 14930352, 24157817]
ProcessPoolExecutor spend time: 15.908910274505615s