Python million annual salary of the road - as much as a thread three concurrent programming

1. blocking, non-blocking, synchronous, asynchronous

Three state processes running: running, ready, blocked.

From the perspective of the implementation of:

Blocking: the process is running, the IO encountered, the process hangs, CPU cut away.

Non-blocking: the process when the process encounters did not encounter IO IO, but I by some means, let the CPU forced to run my process

Submit angle tasks:

Sync: submit a task, the task from start to run until the end of the mission (there may be IO), returns a return value after a mission in my submission.

Asynchronous: once submit multiple tasks, and then I'll just execute the next line of code.

Returned results, how to recover?

eg: teacher to three publishing tasks:

Sync: First inform the first teacher to complete the task of writing a book, and then I wait in place, such as after his two days to complete, and told me to get away, I was released the next task ........

Asynchronous: three tasks directly inform three teacher, I busy I am, you know, after the completion of three teachers, let me know.

2. synchronous calls, asynchronous calls

  1. Synchronous call:
# 同步调用
from concurrent.futures import ProcessPoolExecutor, ThreadPoolExecutor
import time
import random
import os

def task(i):
    print(f"{os.getpid()} 开始任务")
    time.sleep(random.randint(1,3))
    print(f"{os.getpid()} 任务结束")
    return i


if __name__ == '__main__':

    # 同步调用
    pool = ProcessPoolExecutor()
    for i in range(10):
        obj = pool.submit(task,i)
        # obj是一个动态对象,返回的是当前的对象的状态,有可能运行中,可能(就绪阻塞),还可能是结束了.
        # obj.result() 必须等到这个任务完成后,返回了结果之后,再能执行下一个任务.
        print(f"任务结果:{obj.result()}")

    pool.shutdown(wait=True)

    # shutdown : 让我的主进程等待进程池中的所有子进程都结束任务之后,再执行,有点类似于join
    # shutdown : 在上一个进程池没有完成所有的任务之前,不允许添加新的任务.
    # 一个任务是通过一个函数实现的,任务完成了,他的返回值就是函数的返回值.
    print('===main===')
  1. Asynchronous call:
# 异步调用
# 异步调用返回值如何接收? 未解决?

from concurrent.futures import ProcessPoolExecutor, ThreadPoolExecutor
import time
import random
import os

def task(i):
    print(f"{os.getpid()} 开始任务")
    time.sleep(random.randint(1,3))
    print(f"{os.getpid()} 任务结束")
    return i


if __name__ == '__main__':

    # 异步调用
    pool = ProcessPoolExecutor()
    for i in range(10):
        pool.submit(task,i)

    pool.shutdown(wait=True)

    # shutdown : 让我的主进程等待进程池中的所有子进程都结束任务之后,再执行,有点类似于join
    # shutdown : 在上一个进程池没有完成所有的任务之前,不允许添加新的任务.
    # 一个任务是通过一个函数实现的,任务完成了,他的返回值就是函数的返回值.
    print('===main===')
  1. How to get asynchronous results:

    • method one:

      Unified recovery

      # 异步调用
      # 方式一: 异步调用,统一回收结果.
      
      from concurrent.futures import ProcessPoolExecutor, ThreadPoolExecutor
      import time
      import random
      import os
      
      def task(i):
          print(f"{os.getpid()} 开始任务")
          time.sleep(random.randint(1,3))
          print(f"{os.getpid()} 任务结束")
          return i
      
      
      if __name__ == '__main__':
      
          # 异步调用
          pool = ProcessPoolExecutor()
          lst = []
          for i in range(10):
              obj = pool.submit(task,i)
              lst.append(obj)
      
          pool.shutdown(wait=True)
      
          # shutdown : 让我的主进程等待进程池中的所有子进程都结束任务之后,再执行,有点类似于join
          # shutdown : 在上一个进程池没有完成所有的任务之前,不允许添加新的任务.
          # 一个任务是通过一个函数实现的,任务完成了,他的返回值就是函数的返回值.
          print(lst)
          for i in lst:
              print(i.result())
          print('===main===')
          # 统一回收结果: 不能马上收到任何一个已经完成的任务的返回值,只能等待所有的任务全部结束统一返回回收.

3. asynchronous callback call +

First introduced to reptiles:

Browser works: send a request to the server, the server to verify your request, if correct, give your browser returns a file, the browser receives the file, the file inside the code rendered into beautiful beautiful mill kind you see .

What are reptiles?

  1. Using code simulates a browser, the browser perform workflow for a pile of the source code.
  2. The source code for data cleansing to get the data I want.
import requests
response = requests.get("http://www.baidu.com")
if response.status_code == 200:
    print(response.text)

Version 1:

# 版本一:
from concurrent.futures import ProcessPoolExecutor,ThreadPoolExecutor
import time
import random
import os
import requests

def task(url):
    """模拟的就是爬起多个源代码,一定会有IO操作(网络延迟)"""
    response = requests.get(url)
    if response.status_code == 200:
        return response.text

    
def parse(content):
    """模拟对数据的分析, 一般没有IO"""
    return len(content)


if __name__ == '__main__':
    # """串行 消耗时间长,不可取"""
    # ret = task("http://www.baidu.com")
    # print(parse(ret))
    # ret = task('http://www.JD.com')
    # print(parse(ret))
    #
    # ret = task('http://www.taobao.com')
    # print(parse(ret))
    #
    # ret = task('https://www.cnblogs.com/jin-xin/articles/7459977.html')
    # print(parse(ret))



    # 开启线程池,并发并行的执行
    url_list = [
                'http://www.baidu.com',
                'http://www.JD.com',
                'http://www.JD.com',
                'http://www.JD.com',
                'http://www.taobao.com',
                'https://www.cnblogs.com/jin-xin/articles/7459977.html',
                'https://www.luffycity.com/',
                'https://www.cnblogs.com/jin-xin/articles/9811379.html',
                'https://www.cnblogs.com/jin-xin/articles/11245654.html',
                'https://www.sina.com.cn/',
    ]
    pool = ThreadPoolExecutor(4)
    obj_list = []
    for url in url_list:
        obj = pool.submit(task,url)
        obj_list.append(obj)


    pool.shutdown(wait=True)
    for res in obj_list:
        print(parse(res.result()))


# 版本一的问题:
# 1.异步发出多个任务,并发的执行,但是统一的接收所有的任务的返回值.(效率低,不能实时的获取结果)
# 2. 分析结果流程是串行的,影响效率.

Version two:

# 版本二:
from concurrent.futures import ProcessPoolExecutor, ThreadPoolExecutor
import time
import random
import os
import requests


def task(url):
    """模拟的就是爬起多个源代码,一定会有IO操作(网络延迟)"""
    response = requests.get(url)
    if response.status_code == 200:
        return parse(response.text)


def parse(content):
    """模拟对数据的分析, 一般没有IO"""
    return len(content)
    # print(len(content))


if __name__ == '__main__':

    # 开启线程池,并发并行的执行
    url_list = [
        'http://www.baidu.com',
        'http://www.JD.com',
        'http://www.JD.com',
        'http://www.JD.com',
        'http://www.taobao.com',
        'https://www.cnblogs.com/jin-xin/articles/7459977.html',
        'https://www.luffycity.com/',
        'https://www.cnblogs.com/jin-xin/articles/9811379.html',
        'https://www.cnblogs.com/jin-xin/articles/11245654.html',
        'https://www.sina.com.cn/',
    ]
    pool = ThreadPoolExecutor(4)
    obj_list = []
    for url in url_list:
        obj = pool.submit(task, url)
        obj_list.append(obj)

    #
    pool.shutdown(wait=True)
    for res in obj_list:
        print(res.result())

Now two ways to solve the problem:

  1. Process to open a thread pool to handle concurrent parallel, but a process to open the overhead of thread pool

  2. The expansion of the original task

    Version 1:

    • Thread pool set four threads, asynchronous launched 10 tasks, each task is to get the source code via the web, concurrent execution. Finally, unified list of used recycling 10 mission, serial analysis of the source code.

    Version two:

    • Thread pool set four threads, asynchronous launched 10 tasks, each task is to get the source code via the web + data analysis, complicated by a parallel,
    • Finally, all the results displayed.
    • Enhanced coupling
    • Concurrent execution of tasks, this task is best blocking IO, in order to play the biggest role.

Version three:

Can bind process for each process or thread pool thread pool or a function that automatically triggered when the task is completed the implementation process or thread, and receive task return value as a parameter, the callback function is called

# 版本三:
# 基于异步调用回收所有任务的结果,并且要做到实时回收结果,
# 并发执行任务,每个任务只有处理IO阻塞的,不能增加新功能
# 异步调用 + 回调函数


from concurrent.futures import ProcessPoolExecutor, ThreadPoolExecutor
import time
import random
import os
import requests


def task(url):
    """模拟的就是爬起多个源代码,一定会有IO操作(网络延迟)"""
    response = requests.get(url)
    if response.status_code == 200:
        return response.text


def parse(obj):
    """模拟对数据的分析, 一般没有IO"""
    print(len(obj.result()))


if __name__ == '__main__':

    # 开启线程池,并发并行的执行
    url_list = [
        'http://www.baidu.com',
        'http://www.JD.com',
        'http://www.JD.com',
        'http://www.JD.com',
        'http://www.taobao.com',
        'https://www.cnblogs.com/jin-xin/articles/7459977.html',
        'https://www.luffycity.com/',
        'https://www.cnblogs.com/jin-xin/articles/9811379.html',
        'https://www.cnblogs.com/jin-xin/articles/11245654.html',
        'https://www.sina.com.cn/',
    ]
    pool = ThreadPoolExecutor(4)
    for url in url_list:
        obj = pool.submit(task, url)
        obj.add_done_callback(parse)
    """
    线程池设置4个线程,异步发起10个任务,每个任务是通过网页获取源代码,并发执行,
    当一个任务完成之后,将parse这个分析代码的任务交给剩余的空闲的线程去执行,这个完成了获取源代码的线程继续去处理其他任务
    如果进程池 + 回调: 回调函数由主进程去执行.
    如果线程池 + 回调: 回调函数由空闲的线程去执行.
    """

Q:

One thing to do and asynchronous callback?

A: not the same thing,

Asynchronous publishing tasks standing point of view, once submitted multiple tasks, and then direct the next line of code.

The callback function, the reception result standing angle: the order received according to the result of each task, for further processing.

The callback function: can bind a function for each process or thread pool process or thread pool, this function is automatically triggered when the task is completed the implementation process or thread, and receive task return value as a parameter, the callback function is called function

+ Asynchronous callbacks:

Asynchronous processing is non IO IO type, the type of callback handler

We can put the elapsed time (blocking) of the task into the process pool, and then specify a callback function (main process responsible for implementation), so that the main process in the implementation of the callback function eliminating the need for I / O process, it directly to that results of the task

Guess you like

Origin www.cnblogs.com/zhangchaoyin/p/11416023.html