Blocking, non-blocking, synchronous and asynchronous

Blocking and non-blocking

Processes running three states: running, ready, blocked

Blocking and non-blocking:

Blocked: running, met IO, program hangs, cpu cut away.

Non-blocking: the program did not encounter IO, IO encounters but I by some means, let cpu forced to run my program.

Submit angle tasks:

Sync: submit a task, the task from start to run until the end of the mission (there may be IO), returns a return value after a mission in my submission.

Asynchronous: once submit multiple tasks, and then I'll just execute the next line of code.

Synchronous calls and asynchronous calls

Synchronous call:

from concurrent.futures import ProcessPoolExecutor,ThreadPoolExecutor
import time
import random
import os

def task(i):
    print(f'{os.getpid()}开始任务')
    time.sleep(random.randint(1,3))
    print(f'{os.getpid()}任务结束')
    return i
if __name__ == '__main__':

    # 同步调用
    pool = ProcessPoolExecutor()
    for i in range(10):
        obj = pool.submit(task,i)
        # obj是一个动态对象,返回的当前的对象的状态,有可能运行中,可能(就绪阻塞),还可能是结束了.
        # obj.result() 必须等到这个任务完成后,返回了结果之后,在执行下一个任务.
        print(f'任务结果:{obj.result()}')  # 进程执行完成后返回结果

    pool.shutdown(wait=True)
    # shutdown: 让我的主进程等待进程池中所有的子进程都结束任务之后,在执行. 有点类似与join.
    # shutdown: 在上一个进程池没有完成所有的任务之前,不允许添加新的任务.
    # 一个任务是通过一个函数实现的,任务完成了他的返回值就是函数的返回值.
    print('===主')

Asynchronous call:

from concurrent.futures import ProcessPoolExecutor,ThreadPoolExecutor
import time
import random
import os

def task(i):
    print(f'{os.getpid()}开始任务')
    time.sleep(random.randint(1,3))
    print(f'{os.getpid()}任务结束')
    return i
if __name__ == '__main__':

    # 异步调用
    pool = ProcessPoolExecutor()
    for i in range(10):
        pool.submit(task,i)  # 未解决异步调用返回值问题

    pool.shutdown(wait=True)
    print('===主')

Asynchronous call two ways to get results:

Mode 1: uniform recycling results: After the results of the mission to take all dynamic objects, to obtain the return value of the function

from concurrent.futures import ProcessPoolExecutor,ThreadPoolExecutor
import time
import random
import os

def task(i):
    print(f'{os.getpid()}开始任务')
    time.sleep(random.randint(1,3))
    print(f'{os.getpid()}任务结束')
    return i

if __name__ == '__main__':

    # 异步调用
    pool = ProcessPoolExecutor()
    l1 = []
    for i in range(10):
        obj = pool.submit(task,i)
        l1.append(obj)

    pool.shutdown(wait=True)
    print(l1)
    for i in l1:
        print(i.result())
    print('===主')
# 结果：
12708开始任务
8632开始任务
1848开始任务
14544开始任务
10704开始任务
18776开始任务
18480开始任务
18548开始任务
13916开始任务
17144开始任务
1848任务结束
14544任务结束
18548任务结束
8632任务结束
10704任务结束
18480任务结束
13916任务结束
17144任务结束
12708任务结束
18776任务结束
[<Future at 0x232b4a377b8 state=finished returned int>, <Future at 0x232b4a82c88 state=finished returned int>, <Future at 0x232b4a82d30 state=finished returned int>, <Future at 0x232b4a82dd8 state=finished returned int>, <Future at 0x232b4a82e80 state=finished returned int>, <Future at 0x232b4a82f28 state=finished returned int>, <Future at 0x232b4a8d048 state=finished returned int>, <Future at 0x232b4a8d128 state=finished returned int>, <Future at 0x232b4a8d208 state=finished returned int>, <Future at 0x232b4a8d2e8 state=finished returned int>]
0
1
2
3
4
5
6
7
8
9
===主

Option 2: asynchronous call callback function +

Asynchronous callback call +

requests Modules:

Browser works: the service sends a request to the server to verify your request, if correct, give your browser returns a file, the browser receives the file, the file inside the code you see rendered pretty beautiful appearance.

Reptile principle:

1. The use of code simulates a browser, a browser workflow for a pile of the source code.

2. The source code for data cleansing data get what I want.

import requests
ret = requests.get('http://www.baidu.com')
if ret.status_code == 200:
    print(ret.text)

Introduction of three processes callback function:

version 1:

from concurrent.futures import ProcessPoolExecutor,ThreadPoolExecutor
import requests

def task(url):
    content = requests.get(url)
    return content.text

def parse(obj):
    return len(obj.result())

if __name__ == '__main__':
    pool = ThreadPoolExecutor(4)
    url_list = ['http://www.JD.com','http://www.JD.com', 'https://home.cnblogs.com/u/lifangzheng/',
         'https://wizardforcel.gitbooks.io/gopl-zh/content/ch0/ch0-01.html', 'https://www.pypypy.cn/#/',
         'https://www.liaoxuefeng.com/', 'https://home.cnblogs.com/u/lifangzheng/',
         'https://home.cnblogs.com/u/lifangzheng/', 'https://gitee.com/clover16', 'https://gitee.com/clover16']
    obj_list = []
    for url in url_list:
        obj = pool.submit(task,url)
        obj_list.append(obj)

    pool.shutdown(wait=True)
    for res in obj_list:
        print(parse(res.result()))
# 版本一的两个缺陷:
#    1. 异步发出10个任务,并发的执行,但是统一的接收所有的任务的返回值.(效率低,不能实时的获取结果)
#    2. 分析结果流程是串行,影响效率.

:( improvement from version 2-based version of a second disadvantage)

from concurrent.futures import ProcessPoolExecutor,ThreadPoolExecutor
import requests

def task(url):
    content = requests.get(url)
    return parse(content.text)

def parse(obj):
    return len(obj.result())  # 嵌套函数，并发执行中分析结果，增加了函数的耦合性
# 并发执行任务,每个任务是通过网页获取源码+数据分析,此任务最好是IO阻塞,才能发挥最大的效果
if __name__ == '__main__':
    pool = ThreadPoolExecutor(4)
    url_list = ['http://www.JD.com','http://www.JD.com', 'https://home.cnblogs.com/u/lifangzheng/',
         'https://wizardforcel.gitbooks.io/gopl-zh/content/ch0/ch0-01.html', 'https://www.pypypy.cn/#/',
         'https://www.liaoxuefeng.com/', 'https://home.cnblogs.com/u/lifangzheng/',
         'https://home.cnblogs.com/u/lifangzheng/', 'https://gitee.com/clover16', 'https://gitee.com/clover16']
    obj_list = []
    for url in url_list:
        obj = pool.submit(task,url)
        obj_list.append(obj)

    pool.shutdown(wait=True)
    for res in obj_list:
        print(parse(res.result()))

Version 3:

from concurrent.futures import ProcessPoolExecutor,ThreadPoolExecutor
import requests

def task(url):
    content = requests.get(url)
    return content.text

def parse(obj):
    print(len(obj.result()))

if __name__ == '__main__':
    pool = ThreadPoolExecutor(4)
    url_list = ['http://www.JD.com','http://www.JD.com', 'https://home.cnblogs.com/u/lifangzheng/',
         'https://wizardforcel.gitbooks.io/gopl-zh/content/ch0/ch0-01.html', 'https://www.pypypy.cn/#/',
         'https://www.liaoxuefeng.com/', 'https://home.cnblogs.com/u/lifangzheng/',
         'https://home.cnblogs.com/u/lifangzheng/', 'https://gitee.com/clover16', 'https://gitee.com/clover16']
    for url in url_list:
        obj = pool.submit(task,url)
        obj.add_done_callback(parse)  # add_done_callback函数无返回值
        # 线程发布后，由空闲线程执行回调函数

PS: IO type of asynchronous processing, processing of non-IO callback type

Relations asynchronous calls and callbacks?

Asynchronous angle issuing a job standing, standing reception result correction angle: callback function receives the results of each task in sequence, for further processing.

Thread Pool + Pool + callback functions and processes callbacks small difference:

Process pool + callback: callback function to be executed by the main process thread pool + callback: function return to the idle thread to execute.