An article to understand Asynchronous Programming Python

 

 

 

This article describes how to asyncio depth is achieved by a single-threaded concurrent effect of a single process. And 异步the code is not able to replace the synchronization code are in all respects.

Some examples

The first example

Suppose you need to use a rice cooker to cook, do the laundry with washing machine, a friend called him over for dinner. Among them, rice cooker takes 30 minutes to put the cooked rice, washing machine takes 40 minutes to wash the clothes, friends takes 50 minutes to get to your house. Well, it is not that you need to consume at these three things above 30 + 40 + 50 = 120 minutes?

In fact, in reality, you only need to consume 50 minutes on it -

  1. Give a friend called to let him go now
  2. Put your clothes in the washing machine and turn on the power
  3. Washing clean the rice, the rice cooker into and powered

Then, you have to do is wait.

The second example

Now, you need to complete language papers, papers in maths and English papers. Each paper needs to be done for one hour. So you need to 1 + 1 + 1 = 3 hours to complete all the papers. No one to help you, so you can not finish the three papers in the case of less than 3 hours.

A third example

Now, you need to use a rice cooker to cook, do the laundry with washing machine, and a complete mathematical papers. Among them, rice cooker takes 30 minutes to put the cooked rice, washing machine takes 40 minutes to wash the clothes, papers takes 1 hour to complete.

But you do not need 30 + 40 + 60 = 130 minutes. You only need about 70 minutes -

  1. Put your clothes in the washing machine and turn on the power
  2. Washing clean the rice, the rice cooker into and powered
  3. Start complete papers

You can not be asynchronous and asynchronous

In the first example, cooking, laundry, and other friends have in common is that each will appear to take a long time, but very little time to really need people to operate: Wash rice, turn on the power when using the rice cooker 5 minutes; put the clothes into the washing machine, turn on the power with two minutes; when a friend called with 1 minute. Most of the rest of the time people do not need to operate, can all wait.

Look at the second example, every piece of paper you will occupy the whole, there is no time to wait, it must complete a piece of paper.

These two examples actually corresponds to two types of procedure: I / O-intensive programs and compute-intensive programs.

We request the use requests URL, query a remote database or read and write local files, that is, I / O operations. The common feature of these operations is to wait.

To request the request URL, for example, requests to initiate the request, perhaps only 0.01 seconds of time. The program then stuck waiting for the site to return. Request data transmitted over the network server, database server initiates a query request, the server returns data back to the data through the network cable to your computer. After receiving the data requests continue later.

A lot of time wasted waiting for data to return the site. If we can take advantage of this waiting time, you can initiate more requests. And this is the reason why it's useful asynchronous request.

But for the code requires a lot of computing tasks is, CPU is always in a state of high-speed operation, no wait, so there is no use waiting time to do other things to say.

and so:异步只适用于 I/O 操作相关的代码,不适用于非 I/O操作。

Python's asynchronous code

We use the above example to illustrate the life of asynchronous requests, which may give you a misunderstanding - I can control the code, so the code where I want him asynchronous asynchronous, synchronous not want asynchronous place. For example, some people may wish to use the following piece of pseudo-code that describes ways to write code:

请求 https://baidu.com,在网站返回期间:
a = 1 + 1
b = 2 + 2
c = 3 + 3
拿到返回的数据,做其他事情

We like the rice cooker power plug, wait for rice cooked in the process, I can read, you can call, you can watch TV, do what you want to do.

This pseudo-code is written very intuitive, but there can not be written using Python.

Let's use some real code, written to illustrate this problem.

First of all, we do a website, as we requested http://127.0.0.1:8000/sleep/<num>, the website will wait for numseconds before return. For example: http://127.0.0.1:8000/sleep/3he said that when you initiate the request, the website will wait 3 seconds back. Operation effect as shown in FIG.

 

 

 

Now, we use three times aiohttp send request, each wait 1 second, 2 seconds, 3 seconds to return to:

import aiohttp
import asyncio
import time


asyncdef request(sleep_time):
asyncwith aiohttp.ClientSession() as client:
resp = await client.get(f'http://127.0.0.1:8000/sleep/{sleep_time}')
resp_json = await resp.json()
print(resp_json)


asyncdef main():
start = time.perf_counter()
await request(1)
a = 1 + 1
b = 2 + 2
print('能不能在第一个请求等待的过程中运行到这里?')
await request(2)
print('能不能在第二个请求等待的过程中运行到这里?')
await request(3)

end = time.perf_counter()
print(f'总计耗时:{end - start}')


asyncio.run(main())

Run results as shown below:

 

 

 

Line 15 in FIG code, a second request is initiated, then the line 15 should wait 1 second before return data. 17, 18 and the first row is a simple assignment and print functions, the running time together are apparently less than 1 second, so in theory we see the return should be:

能不能在第一个请求等待的过程中运行到这里?
能不能在第二个请求等待的过程中运行到这里?
{'success': True, 'time': 1}
{'success': True, 'time': 2}
{'success': True, 'time': 3}
总计耗时:3.018130547

But in fact, we see the effect, but it is: a program to run to the line 15, after waiting for a request to complete the return to the site, and then run the line 16, 17, 19 and then run the line, such as two seconds to complete the request, and then run line 20, line 21 of the final run. 3 serial request issued, eventually took 6 seconds.

程序的运行逻辑与我们期望的不一样。程序并没有利用 I/O 等待的时间发起新的请求,而是等上一个请求结束了再发送下一个请求。

问题出在哪里?

问题出现在,Python 的异步代码,请求之间的切换不能由开发者来直接管理。

开发者通过await语句告诉 asyncio,它后面这个函数,可以被异步等待。注意是可以被等待,但要不要等待,这是 Python 底层自己来决定的。

因为一个 I/O 操作,无论你是发网络请求,还是读写硬盘,Python 都知道,所以当 Python 发现你现在的这个操作确实是一个 I/O操作时,它才会利用I/O 等待时间。

所以,在 Python 的异步编程中,开发者能做的事情,就是把所有能够异步的操作,一批一批告诉 Python。然后由 Python 自己来协调、调度这批任务,并充分利用等待时间。开发者没有权力直接决定这些 I/O操作的调度方式。

所以,上面的代码我们需要做一些修改:

import aiohttp
import asyncio
import time


asyncdef request(sleep_time):
asyncwith aiohttp.ClientSession() as client:
resp = await client.get(f'http://127.0.0.1:8000/sleep/{sleep_time}')
resp_json = await resp.json()
print(resp_json)


asyncdef main():
start = time.perf_counter()
tasks_list = [
asyncio.create_task(request(1)),
asyncio.create_task(request(2)),
asyncio.create_task(request(3)),
]
await asyncio.gather(*tasks_list)
end = time.perf_counter()
print(f'总计耗时:{end - start}')


asyncio.run(main())

运行效果如下图所示:

 

 

可以看到,现在耗时3秒钟,说明这3次请求,确实利用了请求的等待时间。

我们通过asyncio.create_task()把不同的协程定义成异步任务,并把这些异步任务放入一个列表中,凑够一批任务以后,一次性提交给asyncio.gather()。于是,Python 就会自动调度这一批异步任务,充分利用他们的请求等待时间发起新的请求。

我们平时在写 Scrapy 爬虫时,会有类似下面这样的代码:

...
yield scrapy.Request(url, callback=self.parse)
next_url = url + '&page=2'
yield scrapy.Request(next_url, callback=self.parse)

看起来像是先“请求”url,然后利用这个请求的等待时间执行next_url = url + '&page=2'接下来再发起另一个请求。

但实际上,在 Scrapy 内部,当我们执行yield scrapy.Request后, 仅仅是把一个请求对象放入 Scrapy 的请求队列里面,然后就继续执行next_url = url + '&page=2'了。

请求对象放进请求队列后,还没有真正发起 HTTP请求。只有凑够了一定数量的请求对象或者等待一段时间以后,Scrapy 的下载器才会统一调度这一批请求对象,统一发送 HTTP请求。当某个请求返回以后,Scrapy 把返回的 HTML 组装成 Response 对象,并把这个对象传入 callback 函数执行后续操作。

综上所述,在 Python 里面的异步编程,你需要先凑够一批异步任务,然后统一提交给 asyncio,让它来帮你调度这批任务。你不能像 JavaScrapt 中那样手动直接控制在异步请求等待时执行什么代码。

在异步代码中调用同步函数

在异步函数里面是可以调用同步函数的。但是如果被调用的同步函数很耗时,那么就会卡住其他异步函数。例如print函数就是一个同步函数,但是由于它耗时极短,所以不会卡住异步任务。

我们现在写一个基于递归的斐波那契数列第 n 项计算函数,并在另一个异步函数中调用它:

def sync_calc_fib(n):
if n in [1, 2]:
return1
return sync_calc_fib(n - 1) + sync_calc_fib(n - 2)


asyncdef calc_fib(n):
result = sync_calc_fib(n)
print(f'第 {n} 项计算完成,结果是:{result}')
return result

众所周知,基于递归的方式计算斐波那契数列第 n 项,速度非常慢,我们计算一下第36项,可以看到耗时在5秒钟左右:

 

 

如果我们把计算斐波那契数列(CPU 密集型)与请求网站(I/O密集型)任务放在一起会怎么样呢?

我们来看看效果:

 

 

可以看出,总共耗时8秒左右,其中计算斐波那契数列第36项耗时5秒,剩下3次网络请求耗时3秒,所以总共耗时8秒。

这段代码说明,当一个异步函数(calc_fib)中调用了一个耗时非常长的同步函数(sync_calc_fib)时,这一批所有的异步任务都会被卡住,只有这个同步函数运行完成以后,其他的异步函数才能被正常调度。这就是为什么在异步编程里面,不建议使用 time.sleep的原因。

Guess you like

Origin www.cnblogs.com/wanghuaqiang/p/12357238.html