Synchronous asynchronous callback function +

Key memory asynchronous callback function

If the process pool + callback: callback function to be executed by the master process.
If the thread pool + callback: function to be performed by back to idle thread (for example, there are four threads, 10 tasks, the completion of the first round of four tasks, pay the results from the main thread, the same is true of the second round, the third round will be idle but the two sub-process, then this process will be two sub and main course with the results, and so on, when all the tasks are completed, all the main processes and sub-processes with the results, increase efficiency)

The callback function with or without return data, the return value is None , inside the callback function plus a function name to call this function , obj invisible mass participation

1. Concept

1. From the implementation point

  1. Blocked: running, met io, the program hangs, forcing the CPU operating system cut away
  2. Non-blocking: the programmer did not encounter io, io, but I met a programmer or by some means, let the CPU forced to run my program

2. Submit angle task

  1. Sync: submit a task, subtask runs until the end of this task (you might encounter io), and returns a return value, I submit next task
  2. Asynchronous: submission of all tasks, then I'll just execute the next line of code, high efficiency

3. What reptile?

  1. Using code simulates a browser, the browser perform workflow for a pile of the source code.
  2. The source code for data cleansing data get what I want.

2. synchronous call

from concurrent.futures import ProcessPoolExecutor,ThreadPoolExecutor
import time
import random
import os

def task(i):
    print(f'{os.getpid()}开始执行')
    time.sleep(random.randint(1,3))
    print(f'\033[1;35;0m{os.getpid()}任务结束\033[0m')
    return i  #就是obj.result()的返回值

if __name__ == '__main__':
    pool = ProcessPoolExecutor(4)
    for i in range(10):
        obj = pool.submit(task,i) #默认接受
        # obj是一个动态对象,返回的当前的对象的状态,有可能运行中,可能(就绪阻塞),还可能是结束了
        print(f'任务结果是:{obj}')
        print(f'任务结果是:{obj.result()}')
        #obj.result() #阻塞,必须等到这个任务完成并返回了结果之后,再执行下一个任务
    pool.shutdown(wait=True)
    # shutdown: 让我的主进程等待进程池中所有的子进程都结束任务之后, 再执行.有点类似与join
    print('==主')
obj是一个动态对象,返回的当前的对象的状态,有可能运行中,可能(就绪阻塞),还可能      是结束了.
obj.result() 必须等到这个任务完成后,返回了结果之后,在执行下一个任务.
shutdown: 让我的主进程等待进程池中所有的子进程都结束任务之后,再执行. 有点类    似与join.
shutdown: 在上一个进程池没有完成所有的任务之前,不允许添加新的任务.
一个任务是通过一个函数实现的,任务完成了他的返回值就是函数的返回值.

3. asynchronous call

from concurrent.futures import ProcessPoolExecutor,ThreadPoolExecutor
import time
import random
import os

def task(i):
    print(f'{os.getpid()}开始任务')
    time.sleep(random.randint(1,3))
    print(f'\033[1;35;0m{os.getpid()}任务结束\033[0m')
    return i

if __name__ == '__main__':
    #异步调用:基于发布任务的角度
    pool = ProcessPoolExecutor(4) #设置进程数,pid一共就四个
    for i in range(10): #一次发布10个任务
        pool.submit(task,i)
    pool.shutdown(wait=True)

4. Simple Explanation reptiles

1. install third-party module requests (see blog installation methods garden)

import requests
ret = requests.get('http://www.taobao.com')
if ret.status_code == 200:  #固定写法,验证请求
    print(ret.text) #获取源码
    print(len(ret.text))  #有时候直接打印ret.text报错,是浏览器编码问题,代码可用

2. a version

    1. 异步发出10个任务,并发的执行,但是统一的接收所有的任务的返回值.(效率低,不能实时的获取结果)
    2. 分析结果流程是串行,影响效率.
         for res in obj_list:
            print(parse(res.result()))
from concurrent.futures import ProcessPoolExecutor,ThreadPoolExecutor
import requests

def task(url):
    """模拟的就是多个源代码,一定有io操作(爬取时网络延迟)"""
    ret = requests.get(url)
    if ret.status_code == 200:
        return ret.text  #源码

def parse(content):
    """模拟对数据进行分析 一般没有IO"""
    return len(content)

if __name__ == '__main__':
    url_list = ['http://www.taobao.com',
                'http://www.baidu.com',
                'http://www.aqiyi.com',
                'http://www.youku.com',
                'https://gitee.com/',
                'https://www.cnblogs.com/jin-xin/articles/10067177.html']
    pool = ProcessPoolExecutor(1)
    obj_list = []
    for url in url_list:
        obj = pool.submit(task,url) #注意返回值不是ret.text
        obj_list.append(obj)
    print(obj_list)
    pool.shutdown(wait=True)
    for res in obj_list:
        print(parse(res.result()))  #res.result = ret.text  #分析结果串行
    print('==主')

3. Version two

  线程池设置4个线程, 异步发起10个任务,每个任务是通过网页获取源码+数据分析, 并发执行,最后将所有的结果展示出来.
  耦合性增强了.
  并发执行任务,此任务最好是IO阻塞,才能发挥最大的效果
from concurrent.futures import ProcessPoolExecutor, ThreadPoolExecutor
import requests

def task(url):
      """模拟的就是多个源代码,一定有io操作(爬取时网络延迟)"""
    ret = requests.get(url)
    if ret.status_code == 200:
        return parse(ret.text)

def parse(content):
    """模拟对数据进行分析 一般没有IO"""
    return len(content)

if __name__ == '__main__':
    url_list = [
            'http://www.baidu.com',
            'http://www.JD.com',
            'http://www.JD.com',
            'http://www.JD.com',
            'http://www.taobao.com',
            'https://www.cnblogs.com/jin-xin/articles/7459977.html',
            'https://www.luffycity.com/',
            'https://www.cnblogs.com/jin-xin/articles/9811379.html',
            'https://www.cnblogs.com/jin-xin/articles/11245654.html',
            'https://www.sina.com.cn/',
        ]
    pool = ProcessPoolExecutor(4)
    obj_list = []
    for url in url_list:
        obj = pool.submit(task,url)  #obj接收的是submit的返回值,一个动态对象
        obj_list.append(obj)
    pool.shutdown(wait=True)
    for res in obj_list:
        print(res.result())  #相当于obj.result()接受的是task的返回值

4. Version 3 asynchronous call to the callback function +

  1. Based on the results of the asynchronous call recover all the tasks I want to do real-time results of recovery, # concurrent execution of tasks each task just deal with IO blocked, you can not add new features too.

  2. Emphasis

  3. 如果进程池+回调: 回调函数由主进程去执行.
    如果线程池+回调: 回到函数由空闲的线程去执行.
from concurrent.futures import ProcessPoolExecutor, ThreadPoolExecutor
import requests

def task(url):
    '''模拟的就是爬取多个源代码 一定有IO操作'''
    ret = requests.get(url)
    if ret.status_code == 200:
        return ret.text

def parse(self): #隐形传参,默认接受obj
    '''模拟对数据进行分析 一般没有IO'''
    print(len(self.result()))

if __name__ == '__main__':
    url_list = [
            'http://www.baidu.com',
            'http://www.JD.com',
            'http://www.JD.com',
            'http://www.JD.com',
            'http://www.taobao.com',
            'https://www.cnblogs.com/jin-xin/articles/7459977.html',
            'https://www.luffycity.com/',
            'https://www.cnblogs.com/jin-xin/articles/9811379.html',
            'https://www.cnblogs.com/jin-xin/articles/11245654.html',
            'https://www.sina.com.cn/',
        ]
    pool = ThreadPoolExecutor(4)

    for url in url_list:
        obj = pool.submit(task,url)
        obj.add_done_callback(parse)
        #回调函数不管有没有返回值,都是None,回调函数内部加函数名是调用此函数,obj隐形传参

Guess you like

Origin www.cnblogs.com/lvweihe/p/11415215.html