ThreadPoolExecutor thread pool and ProcessPoolExecutor process pool

ProcessPoolExecutor thread pool

1. Why do we need a thread pool it, if you create a 20 threads, while allowing only three threads running, but 20 threads need to create and destroy, create a thread is the need to consume system resources, so the idea of ​​the thread pool is this: each thread each assigned a task, the remaining tasks are queued skin, when a thread finishes a task, the task queue can arrange to continue this thread

2, concurrent.futures standard library module, which provides ProcessPoolExecutor and ThreadPoolExecutor two classes, the realization of threading and multiprocessing further abstract the (main concern here thread pool), not only can help us to automatically dispatch thread, you can do:

<ul>
    <li>主线程可以获取某一个线程(或者任务)的状态,以及返回值</li>
    <li>当一个线程完成的时候,主线程能够立即知道</li>
    <li>让多线程和多进程的编码接口一致</li>
</ul>

Simple to use

# -*-coding:utf-8 -*-
from concurrent.futures import ThreadPoolExecutor
import time
#参数times用来模拟网络请求时间
def get_html(times):
    print("get page {}s finished".format(times))
    return times
executor = ThreadPoolExecutor(max_workers=2)
#通过submit函数提交执行的函数到线程池中,submit函数立即返回,不阻塞
task1 = executor.submit(get_html,(3))
task2 = executor.submit(get_html,(2))
#done方法用于判断某个任务是否完成
print(task1.done())
#cancel方法用于取消某个任务,该任务没有放到线程池中才能被取消
print(task2.cancel())
print(task1.done())
#result方法可以获取task的执行结果
print(task1.result())

#结果:
# get page 3s finished
# get page 2s finished
# True
# False
# True
# 3
  • ThreadPoolExecutor construction time instance, passing max_workers parameter to set the number of threads in the thread up to run at the same time
  • Use submit function to submit thread needs to perform a task (function name and parameter) to the thread pool, and returns a handle that task (similar to a file, drawing), note submit () instead of blocking, but returned immediately.
  • By submit task handle returned by the function, use can be done () method of determining the task is completed
  • Use result () method gets the return value of the task, view the internal code, I found this method is blocked

as_completed

While the above provides a method to determine whether the end of the mission, but not always judge in the main thread, sometimes we hear about a task is over, and went to get the results, but not always judge each task does not end there. This is a method you can use as_completed out the results of all tasks.

# -*-coding:utf-8 -*-
from concurrent.futures import ThreadPoolExecutor,as_completed
import time
#参数times用来模拟网络请求时间
def get_html(times):
    time.sleep(times)
    print("get page {}s finished".format(times))
    return times
executor = ThreadPoolExecutor(max_workers=2)
urls = [3,2,4]
all_task = [executor.submit(get_html,(url)) for url in urls]
for future in as_completed(all_task):
    data = future.result()
    print("in main:get page {}s success".format(data))

#结果
# get page 2s finished
# in main:get page 2s success
# get page 3s finished
# in main:get page 3s success
# get page 4s finished
# in main:get page 4s success

map

In addition to the above as_completed method can also be used execumap method, but a little different, using the map method, without advance using the submit method, the map method and map the meaning of python standard library are the same, are in the sequence of each element perform the same function. The above code is all executed get_html function on each element urls and assigns each thread pool. You can see the different results as_completed method of execution result of the above, the same output sequence order and a list of urls, even if the task is finished first 2s, 3s would print out the tasks to complete before the print job is completed 2s

# -*-coding:utf-8 -*-
from concurrent.futures import ThreadPoolExecutor,as_completed
import time
#参数times用来模拟网络请求时间
def get_html(times):
    time.sleep(times)
    print("get page {}s finished".format(times))
    return times
executor = ThreadPoolExecutor(max_workers=2)
urls = [3,2,4]
for data in executor.map(get_html,urls):
    print("in main:get page {}s success".format(data))

#结果
# get page 2s finished
# get page 3s finished
# in main:get page 3s success
# in main:get page 2s success
# get page 4s finished
# in main:get page 4s success

wait

wait method allows the main thread to block until meet the set requirements. Task sequence wait method accepts three parameters, wait, wait and timeout conditions. Wait condition return_when default ALL_COMPLETED, indicate that you want to wait for all tasks to spend the night. You can see the results, it is indeed all tasks are completed, the main thread just print out the main, waiting for conditions that can be set to FIRST_COMPLETED, it represents the first task is completed to stop and wait

# -*-coding:utf-8 -*-
from concurrent.futures import ThreadPoolExecutor,wait,ALL_COMPLETED,FIRST_COMPLETED
import time
#参数times用来模拟网络请求时间
def get_html(times):
    time.sleep(times)
    print("get page {}s finished".format(times))
    return times
executor = ThreadPoolExecutor(max_workers=2)
urls = [3,2,4]
all_task = [executor.submit(get_html,(url)) for url in urls]
wait(all_task,return_when=ALL_COMPLETED)
print("main")
#结果
# get page 2s finished
# get page 3s finished
# get page 4s finished
# main

ProcessPoolExecutor

A way, a synchronous call way: submit the task, still waiting for the end of the task execution, to get the job return results. And then the next line of code will result in serial execution task

# -*- coding:utf-8 -*-
# 方式一、同步调用方式:提交任务,原地等待任务执行结束,拿到任务返回结果。再执行下一行代码会导致任务串行执行

# 进程池的两种任务提交方式

import datetime
from concurrent.futures import ProcessPoolExecutor,ThreadPoolExecutor
from threading import current_thread
import time, random, os
import requests
def task(name):
    print('%s %s is running'%(name,os.getpid()))
    #print(datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"))
if __name__ == '__main__':
    p=ProcessPoolExecutor(4) #设置进程池内进程数
    for i in range(10):
        #同步调用方式,调用和等值
        obj = p.submit(task,"进程pid:")#传参方式(任务名,参数),参数使用位置或者关键字参数
        res =obj.result()
    p.shutdown(wait=True) #关闭进程池的入口,等待池内任务运行结束
    print("主")

Asynchronous submission

def task(name):
    print("%s %s is running" %(name,os.getpid()))
    time.sleep(random.randint(1,3))

if __name__ == '__main__':
    p = ProcessPoolExecutor(4) #设置进程池内进程
    for i in range(10):
        #异步调用方式,只调用,不等值
        p.submit(task,'进程pid:') #传参方式(任务名,参数),参数使用位置参数或者关键字参数
    p.shutdown(wait=True)
    print('主')

Synchronous process invocation pool

Decoupling synchronous call advantages, disadvantages: slow

def get(url):
    print('%s GET %s' % (os.getpid(),url))
    time.sleep(3)
    response = requests.get(url)
    if response.status_code == 200:
        res = response.text
    else:
        res = "下载失败"
    return res

def parse(res):
    time.sleep(1)
    print("%s 解析结果为%s" %(os.getpid(),len(res)))

if __name__ == "__main__":
    urls = [
        'https://www.baidu.com',
        'https://www.sina.com.cn',
        'https://www.tmall.com',
        'https://www.jd.com',
        'https://www.python.org',
        'https://www.openstack.org',
        'https://www.baidu.com',
        'https://www.baidu.com',
        'https://www.baidu.com',

    ]
    p=ProcessPoolExecutor(9)
    l=[]
    start = time.time()
    for url in urls:
        future = p.submit(get,url)
        l.append(future)
    p.shutdown(wait=True)

    for future in l:
        parse(future.result())
    print('完成时间:',time.time()-start)
    #完成时间: 13.209137678146362

Asynchronous invocation

Asynchronous invocation: coupling shortcomings, the advantage of fast speed

def get(url):
    print('%s GET %s' % (os.getpid(),url))
    time.sleep(3)
    reponse = requests.get(url)
    if reponse.status_code == 200:
        res = reponse.text
    else:
        res = "下载失败"
    parse(res)

def parse(res):
    time.sleep(1)
    print('%s 解析结果为%s' %(os.getpid(),len(res)))

if __name__ == '__main__':
    urls = [
        'https://www.baidu.com',
        'https://www.sina.com.cn',
        'https://www.tmall.com',
        'https://www.jd.com',
        'https://www.python.org',
        'https://www.openstack.org',
        'https://www.baidu.com',
        'https://www.baidu.com',
        'https://www.baidu.com',

    ]

    p = ProcessPoolExecutor(9)
    start = time.time()
    for url in urls:
        future = p.submit(get,url)
    p.shutdown(wait=True)
    print("完成时间",time.time()-start)#完成时间 6.293345212936401

2-1-4 process pool, asynchronous call + callback function: solving the coupling, slow (asynchronous call + callback)

def get(url):
    print('%s GET %s' % (os.getpid(), url))
    time.sleep(3)
    response = requests.get(url)
    if response.status_code == 200:
        res = response.text
    else:
        res = '下载失败'
    return res

def parse(future):
    time.sleep(1)
    # 传入的是个对象,获取返回值 需要进行result操作
    res = future.result()
    print("res",)
    print('%s 解析结果为%s' % (os.getpid(), len(res)))



if __name__ == '__main__':
    urls = [
        'https://www.baidu.com',
        'https://www.sina.com.cn',
        'https://www.tmall.com',
        'https://www.jd.com',
        'https://www.python.org',
        'https://www.openstack.org',
        'https://www.baidu.com',
        'https://www.baidu.com',
        'https://www.baidu.com',
    ]
    p = ProcessPoolExecutor(9)
    start = time.time()
    for url in urls:
        future = p.submit(get,url)
        #模块内的回调函数方法,parse会使用future对象的返回值,对象返回值是执行任务的返回值
        #回调应该是相当于parse(future)
        future.add_done_callback(parse)

    p.shutdown(wait=True)
    print("完成时间",time.time()-start)#完成时间 33.79998469352722

Thread Pool: + asynchronous callback ---- IO-intensive major use, thread pool: Available for whom who performed the operation to perform

def get(url):
    print("%s GET %s" %(current_thread().name,url))
    time.sleep(3)
    reponse = requests.get(url)
    if reponse.status_code == 200:
        res = reponse.text
    else:
        res = "下载失败"
    return res

def parse(future):
    time.sleep(1)
    res = future.result()
    print("%s 解析结果为%s" %(current_thread().name,len(res)))

if __name__ == '__main__':
    urls = [
        'https://www.baidu.com',
        'https://www.sina.com.cn',
        'https://www.tmall.com',
        'https://www.jd.com',
        'https://www.python.org',
        'https://www.openstack.org',
        'https://www.baidu.com',
        'https://www.baidu.com',
        'https://www.baidu.com',
    ]
    p = ThreadPoolExecutor(4)
    start = time.time()
    for url in urls:
        future = p.submit(get,url)
        future.add_done_callback(parse)
    p.shutdown(wait=True)
    print("主",current_thread().name)
    print("完成时间",time.time()-start)#完成时间 32.52604126930237

1, a thread is not possible, the context switch involves cpu (put on a record-keeping).

2, the process consumes resources than threads, processes the equivalent of a factory, the factory there are a lot of people, the people inside enjoying the benefits of common resources ,, a process where default is only one main thread, such as: open procedure is the process, which is executed thread, the thread is just a process to create more than one person at the same time to work.

3, thread, there GIL Global Unlock is: do not allow cpu scheduling

4, calculated density is suitable for multi-process

5, threads: the thread is the smallest unit of a computer working in

6, process: the default main thread (to help work) can coexist multithreading

7, coroutine: a thread, a process to do multiple tasks, using the process one thread to do multiple tasks, micro-threads

8, GIL Global Interpreter Lock: to ensure that only one thread is scheduled cpu

From: https://www.jianshu.com/p/b9b3d66aa0be

From: https://blog.csdn.net/qq_33961117/article/details/82587873

Guess you like

Origin www.cnblogs.com/venvive/p/11601190.html