How to improve the efficiency of python reptile

Open thread pool:

  • Thread Pool
    • asyncio
      • Special Function
      • Coroutine
      • The task object
        • The task object binding
      • Event Loop
  • from multiprocessing.dummy import Pool
  • map(func,alist):
    • Can make func callback process each list element in alist, this treatment process is based on asynchronous.

In [7]:

import requests
import time
from multiprocessing.dummy import Pool
start = time.time()
pool = Pool(3)
urls = [
    'http://127.0.0.1:5000/index',
    'http://127.0.0.1:5000/index',
    'http://127.0.0.1:5000/index'
]
#用作与网络请求(耗时)
def req(url):
    return requests.get(url).text

page_text_list = pool.map(req,urls)
print(page_text_list)
print('总耗时:',time.time()-start)


['hello bobo!!!', 'hello bobo!!!', 'hello bobo!!!']
总耗时: 2.1126856803894043

+ Single-threaded asynchronous multi-task coroutine asyncio

Initial 1.asyncio

import asyncio
from time import sleep

#特殊的函数
async def get_request(url):
    print('正在下载:',url)
    sleep(2)
    print('下载完毕:',url)

    return 'page_text'
#回调函数的定义(普通的函数)
def parse(task):
    #参数表示的就是任务对象
    print('i am callback!!!',task.result())

#特殊函数的调用
c = get_request('www.1.com')

#创建一个任务对象
task = asyncio.ensure_future(c)
#给任务对象绑定一个回调函数
task.add_done_callback(parse)

#创建一个事件循环对象
loop = asyncio.get_event_loop()
#将任务对象注册到该对象中并且开启该对象
loop.run_until_complete(task)#让loop执行了一个任务
Explanation:
- ##### 特殊函数:

  - 就是async关键字修饰的一个函数的定义
  - 特殊之处:
    - 特殊函数被调用后会返回一个协程对象
    - 特殊函数调用后内部的程序语句没有被立即执行

- ##### 协程

  - 对象,协程==特殊的函数。协程表示的就是一组特定的操作。

- ##### 任务对象

  - 高级的协程(对协程的进一步的封装)
    - 任务对象==协程==特殊的函数
      - 任务对象==特殊的函数
  - 绑定回调: 
    - task.add_done_callback(task)
      - 参数task:当前回调函数对应的任务对象
      - task.result():返回的就是任务对象对应的特殊函数的返回值

- ##### 事件循环对象

  - 创建事件循环对象
  - 将任务对象注册到该对象中并且开启该对象
  - 作用:loop可以将其内部注册的所有的任务对象进行异步执行

- ##### 挂起:

  就是交出cpu的使用权。
Multitasking asynchronous reptiles:
import asyncio
import requests
import time
from bs4 import BeautifulSoup
#将被请求的url全部整合到一个列表中
urls = ['http://127.0.0.1:5000/bobo','http://127.0.0.1:5000/jay','http://127.0.0.1:5000/tom']
start = time.time()

async def get_request(url):
    #requests模块不支持异步,中断了整个的异步效果
    page_text = requests.get(url).text
    return page_text

def parse(task):
    page_text = task.result()
    soup = BeautifulSoup(page_text,'lxml')
    data = soup.find('div',class_="tang").text
    print(data)
tasks = []
for url in urls:
    c = get_request(url)
    task = asyncio.ensure_future(c)
    task.add_done_callback(parse)
    tasks.append(task)

loop = asyncio.get_event_loop()
loop.run_until_complete(asyncio.wait(tasks))

print('总耗时:',time.time()-start)
  • [Emphasis] in the realization of special internal function, the code does not appear to support asynchronous modules, if there is,
    it will interrupt the entire asynchronous effect! ! !

  • certainly does not support asynchronous requests

  • aiohttp support network is an asynchronous request module

    • Environment Installation

    • Encoding process:

      • General architecture:
      with aiohttp.ClientSession() as s:
         #s.get(url,headers,params,proxy="http://ip:port")
         with s.get(url) as response:
             #response.read()二进制(.content)
             page_text = response.text()
             return page_text
        - 补充细节
            - 在每一个with前加上async
            - 需要在每一个阻塞操作前加上await
    
    
            ```python
            async with aiohttp.ClientSession() as s:
                #s.get(url,headers,params,proxy="http://ip:port")
                async with await s.get(url) as response:
                    #response.read()二进制(.content)
                    page_text = await response.text()
                    return page_text
    • Code implementation:

      import asyncio
      import aiohttp
      import time
      from bs4 import BeautifulSoup
      #将被请求的url全部整合到一个列表中
      urls = ['http://127.0.0.1:5000/bobo','http://127.0.0.1:5000/jay','http://127.0.0.1:5000/tom']
      start = time.time()
      
      async def get_request(url):
          async with aiohttp.ClientSession() as s:
              #s.get(url,headers,params,proxy="http://ip:port")
              async with await s.get(url) as response:
                  #response.read()二进制(.content)
                  page_text = await response.text()
                  return page_text
      
      def parse(task):
          page_text = task.result()
          soup = BeautifulSoup(page_text,'lxml')
          data = soup.find('div',class_="tang").text
          print(data)
      tasks = []
      for url in urls:
          c = get_request(url)
          task = asyncio.ensure_future(c)
          task.add_done_callback(parse)
          tasks.append(task)
      
      loop = asyncio.get_event_loop()
      loop.run_until_complete(asyncio.wait(tasks))
      
      print('总耗时:',time.time()-start)

Guess you like

Origin www.cnblogs.com/zhufanyu/p/11998925.html