A complete guide to python asynchronous IO

Original address: https://flyingbyte.cc/post/async_io/

A Complete Guide to Python Asynchronous IO
As a parallel programming paradigm, asynchronous IO is very important in Python, and it has evolved rapidly from Python3.4 to 3.7. We already have multi-threading, multi-processing, concurrency, and so many parallel technologies to handle parallel programming. Compared with these technologies, what are the new features and benefits of asynchronous IO? This article will answer these questions. By reading this article, you will learn:

  • Asynchronous IO: A language-independent programming paradigm, already supported by many programming languages
  • async/await: two new keywords introduced by Python to define coroutine
  • asyncio: A Python library that supports running and managing coroutines

Coroutines, a type of generator in Python, is the core and foundation of Python's support for asynchronous IO. We will learn more about it in this article. Glossary: ​​In this article, I will use the following terms

Asynchronous IO language-independent asynchronous IO paradigm
asyncio specific Python library
concurrency concurrency, a parallel programming technique
parallel parallelism
generator generator
iterator iterator
analytic comprehensions
context manager context manager
iteration iteration

Event loop event loop, the loop that handles all events
Before starting our learning journey, we must first create an independent Python environment and install the Python libraries required in this article.

Create a Python environment

The Python sample code in this article requires Python 3.7 or above, and the two libraries of aiohttp and aiofiles.

python3.7 -m venv ./py37async
source ./py37async/bin/activate
pip install --upgrade pip aiohttp aiofiles

Is the installation complete? Now start the journey of Python asynchronous IO, let's rock!

Asynchronous IO overview

Async IO is a bit lesser known than its tried-and-true cousins, multiprocessing and threading. This section will give you a fuller picture of what async IO is and how it fits into its surrounding landscape. And other parallel programming techniques like multi Compared with threading and multi-processing, people have much less understanding of asynchronous IO. This section hopes to give you a comprehensive understanding of it.

Background on asynchronous IO

Concurrency and parallelism are very large topics that involve a lot of content and often confuse people. Although this article mainly focuses on asynchronous IO and its implementation in Python, in order to give readers a more comprehensive understanding of the relevant knowledge, take some time to compare the similarities and differences between asynchronous IO and other technologies.

Parallelism refers to performing multiple operations at the same time. Multiprocessing is a way to achieve parallelism: the computer's central processing unit is responsible for processing multiple tasks at the same time. Multi-processing is suitable for CPU-intensive tasks, such as mathematical calculations.

The concept of concurrency is broader than parallelism. It means that multiple tasks can run in an overlapping manner. Please note the difference here. Concurrency is not equal to parallelism.

Threads belong to the concurrent execution mode: multiple threads are scheduled for execution in sequence. Several processes can contain multiple threads. Because of the existence of GIL, the relationship between Python and threads is very complicated, so this article will not go into details. Interested readers can refer to: https://realpython.com/python-gil/

Thread mode is suitable for IO-intensive tasks. CPU-intensive tasks are characterized by the computer's CPU performing long-term, rarely interrupted tasks, while IO-intensive tasks include a large number of waiting for IO to complete.

In summary, concurrency includes both multi-process (suitable for computing-intensive tasks) and multi-threading (suitable for IO-intensive tasks). Multi-processing is a type of parallel model, and the parallel model is a subset of the concurrent model. Python's standard library supports both models.

Now we're adding a new member to the family. Over the past few years, a new concurrency paradigm: asynchronous IO, has been introduced to Python through the standard library asyncio and the new keywords async/await. Asynchronous IO is not a newly invented concept, it existed in some other languages ​​like Go, C# and Scala before it was introduced into Python.

Python's documentation defines asyncio as a library for implementing concurrent code. However, asynchronous IO is not multi-threaded, nor is it multi-process. It has nothing to do with either.

In fact, asynchronous IO is a single-process, single-threaded implementation: it uses a cooperative multitasking model (it doesn't matter if you are not familiar with this term now, you will have a deeper understanding of this term after reading this article up). In other words, asynchronous IO gives people a feeling of concurrent processing of multitasking in a single-threaded, single-process environment. Coroutines (coroutines: the heart of asynchronous IO) can be scheduled concurrently, but they are not inherently executed in parallel.

That is to say, asynchronous IO is a paradigm of concurrent programming (multiple tasks can be executed overlappingly), but not parallel (multiple tasks are executed simultaneously). It behaves more like multithreading than multiprocessing, but it's actually quite different from both. Multiprocessing, multithreading and asynchronous IO are three different ways of concurrent programming.

You should have heard the word "asynchronous" in other places. In order to better understand the meaning of this concept, here are its two properties:

1. 异步的程序在等待某长时间执行的调用返回结果时,可以暂停自己以便CPU可以利用这段时间运行其他的程序。
2. 通过上述机制,异步代码可以并发执行。也就是说,异步代码可以有并发执行的效果。

The following figure fully presents the relationship between them.

insert image description here

Here’s a diagram to put it all together. The white terms represent concepts, and the green terms represent ways in which they are implemented or effected:

Concurrency and parallelism:

My discussion of the parallel programming model stops here. The main purpose of this article is to focus on asynchronous IO, how to use it, and related fast-changing APIs. If you want to understand the differences between threading, multiprocessing, and asynchronous IO models, you can read Jim Anderson's detailed explanation of concurrency in Python.

Asynchronous IO explained
Asynchronous IO may seem counter-intuitive and contradictory. How can concurrent code run on a single thread and a single CPU? Miguel Gringerg explained everything clearly and concisely at PyCon 2017, and I'll quote some of his speech here:

象棋大师 Judit Polgár 举行一场一对多的比赛。她有两种方式来进行这场比赛:同步的和异步的。

假设:
有24名对手
Judit 每下一步棋需要5秒钟
每个对手需要55秒钟下一步棋
每一局棋平均需要60步(每人下30步)

同步的版本:Judit 一次只和一个人下棋,直到这局比赛结束。每局比赛需要的事件是 (55 + 5) * 3 = 1800 秒,也就是30分钟。整个比赛持续的事件就是 24 * 30 = 720 分钟,也就是12小时。

异步的版本:Judit 从一张桌子移动到另一张桌子,每次下一步棋。每下完一步,她就离开桌子,让她的对手在等待她回到桌子前的这段事件思考并下完自己那一步。Judit 完成一个循环需要的事件是 25 * 5 = 120 秒,也就是2分钟。这段时间足够每个对手完成自己的棋步。整个比赛占用的时间被压缩到了 120 * 30 = 3600秒,也就是一个小时。

There is only one Judit Polár, she has only two hands, and she can only make one move at a time. But using an asynchronous approach can reduce the normal game time from 12 hours to 1 hour. Therefore, cooperative multitasking is a wonderful method. Through this method, the time loop of a program communicates with multiple tasks to ensure that each task can run at the right time and at an optimized time.

All IO calls will take a long time. The difference between asynchronous IO and synchronous IO is that if a function calls synchronous IO, the function will not return while waiting for IO to return, and other functions will be blocked and cannot run; The function that calls asynchronous IO will return immediately, allowing other functions to run. When IO returns, the function that calls asynchronous IO will continue to run from where it was interrupted.

Asynchronous IO is not easy.
I've heard people say, "Use async IO when possible; use threads only when necessary." It's true that writing long-running multithreaded code is difficult and error-prone, and using async IO avoids some Mistakes that are easy to make with multi-threaded architectures.

But that's not to say that writing asynchronous IO programs in Python is easy. Be careful: asynchronous IO programming can be difficult when you understand the internal mechanism of asynchronous IO. Python's asynchronous model is built on the concepts of callbacks, events, transports, protocols, and futures—these concepts alone are scary enough. And the API of the asynchronous IO library is constantly changing, which adds to the difficulty.

Fortunately, the asyncio library is relatively mature, most of the features are no longer experimental, and the related documents have undergone a lot of renovations, and now there are some relatively high-quality materials to help learn to understand it.

asyncio library with keywords async/await

Now that you have a certain understanding of the concept of asynchronous IO, let's explore how Python supports asynchronous IO. The asyncio library introduced in Python 3.4, along with two new keywords: async/await, provide different functionalities that, taken together, help you declare, define, execute, and manage asynchronous code.

async/await syntax and native Coroutines
Please pay attention to your unique content about Python asynchronous IO from the Internet, because Python's asynchronous IO API has undergone drastic changes from 3.4 to 3.7. Some old paradigms are no longer used, and some features that were not supported in the first place are introduced with new features. Most of what you can find online, including this article, becomes outdated quickly.

The core of asynchronous IO is coroutines. A coroutine is a special kind of Python generator function. We start our study from the most basic definition: A coroutine is a function that can suspend its execution process before returning at the end, and it can indirectly give up the CPU to other coroutines to run.

We'll dig into how traditional generator functions evolve into coroutines later. Now let's go through some examples to see how coroutines work.

The first example is an asynchronous IO program. Although it is very short, it has demonstrated the core functions of asynchronous IO:

#!/usr/bin/env python3
# countasync.py

import asyncio

async def count():
    print("One")
    await asyncio.sleep(1)
    print("Two")

async def main():
    await asyncio.gather(count(), count(), count())

if __name__ == "__main__":
    import time
    s = time.perf_counter()
    asyncio.run(main())
    elapsed = time.perf_counter() - s
    print(f"{
      
      __file__} executed in {
      
      elapsed:0.2f} seconds.")

When executing this program, notice how its output differs from a synchronous function using 'def' and 'time.sleep()':

$ python3 countasync.py
One
One
One
Two
Two
Two
countasync.py executed in 1.01 seconds.

The core of asynchronous IO is the execution order of the code. In the above example, the function count() is driven by a single event loop (event loop, or coordinator). When each task runs to

await asyncio.sleep(1)

The function will suspend itself, temporarily return the right to execute to the event loop, and at the same time notify the event loop: I need to sleep for 1 second, you can let other tasks execute during this time, and then schedule me after 1 second.

A synchronous version of this code:

#!/usr/bin/env python3
# countsync.py

import time

def count():
    print("One")
    time.sleep(1)
    print("Two")

def main():
    for _ in range(3):
	count()

if __name__ == "__main__":
    s = time.perf_counter()
    main()
    elapsed = time.perf_counter() - s
    print(f"{
      
      __file__} executed in {
      
      elapsed:0.2f} seconds.")

Compared with the previous asynchronous version, the execution order of the code is subtly but essentially different:

$ python3 countsync.py
One
Two
One
Two
One
Two
countsync.py executed in 3.01 seconds.

time.sleep() and asyncio.sleep() seem to have no meaning, they are used to replace any function calls that need to wait in time-intensive programs. The simplest call to wait is sleep(), which just waits for completion and does nothing. The difference between the two is that time.sleep() represents any blocking waiting call and asyncio.sleep() represents a call that also takes a while to complete, but does not block.

As you'll see in the next section, awaiting and asyncio.sleep() combine to allow functions that call them to temporarily abandon execution while other tasks that have instructions that can be executed immediately are scheduled for execution. Correspondingly, time.sleep() or other blocking calls to Python's asynchronous code are not compatible, because they block all other tasks during sleep, including the event loop.

Rules for asynchronous IO
Now we can give a more formal definition of async/await and coroutine. This section is a bit difficult to understand, but mastering async/await is very important for a deep understanding of asynchronous IO. You can skip this section and come back when you need it:

关键词 async def 产生一个原生的coroutine对象或一个异步生成器对象。async for 和 async with 也是合法的表达式,后面你会看到它们的使用。

coroutine函数里可以包含有关键字await的表达式。当执行到它时,会暂停这个coroutine的执行,将控制返回给事件循环。比如在g()里包含了'await f()',
那么g()必须是一个coroutine, 也就是由'async def’来定义的。
当Python执行到这一行时,await会通知事件循环,暂停对coroutine g() 的执行,直到await所等待的函数f()返回。在这段事件,可以让其它能立刻执行的任务执行。

Translating these two definitions into code, now try to understand what it means.

async def g():
    # 在这里暂停g()的执行,执行其它可被调度的任务,等到f()返回后再继续执行g()
    r = await f()
    return r

Here are some rules about when and how to use async/awai, they will be useful to you whether you are still getting familiar with the syntax, or already know how to use async/awai.

由 async def 定义的函数执行后返回一个coroutine对象。在coroutine里可以有 await, return或yield, 但它们都不是必须的。

async def noop():
    pass

也是合法的定义。
A function that you introduce with async def is a coroutine. It may use await, return, or yield, but all of these are optional. Declaring async def noop(): pass is valid:

    Using await and/or return creates a coroutine function. To call a coroutine function, you must await it to get its results.

    It is less common (and only recently legal in Python) to use yield in an async def block. This creates an asynchronous generator, which you iterate over with async for. Forget about async generators for the time being and focus on getting down the syntax for coroutine functions, which use await and/or return.

    couroutine 里不能使用 yield from。

如同在函数体之外使用'yield'会产生语法错误一样,在由'async def'定义的coroutine之外使用 await 也会产生错误:
    SyntaxError: 'await' outside async function.

The following code snippets embody the above rules:

async def f(x):
    y = await z(x)  # OK - `await` 和 `return` 都可以出现在coroutines里
    return y

async def g(x):
    yield x  # OK - 使用yield表明这是一个异步生成器

async def m(x):
    yield from gen(x)  # No - couroutine里不能使用 yield from

def m(x):
    y = await z(x)  # No - await必须包含在一个coroutine内
    return y

Finally, when you use await f(), f() must be an awaitable object. Doesn't sound like much nutrition, does it? Now you just need to know that an awaitable object is either:

1. 一个 coroutine
2. 一个定义了一个返回值是一个迭代器,名字是 .__await()的成员函数的对象。

Basically when you write code, you only need to focus on the first one.

The object of await is actually a coroutine: Recall that in addition to using async to define a coroutine, we can also use @asyncio.coroutine to modify an ordinary function to define a coroutine. With this approach, what we get is a generator type coroutine. This approach has become obsolete since the async/await syntax was introduced in Python 3.5.

These two methods are essentially equivalent, and both create an awaitable object, but the first is based on generators, while the second is a native coroutine:

import asyncio

@asyncio.coroutine
def py34_coro():
    """基于生成器的 coroutine, 旧式语法"""
    yield from stuff()

async def py35_coro():
    """原生 coroutine, 现代语法"""
    await stuff()

When writing code, please use native coroutine first, because it explicitly defines coroutine, while the generator-based syntax implicitly defines coroutine, and will be deleted after Python 3.10.

In the remainder of this article, we will only refer to generator-based coroutines for technical discussion purposes. The purpose of introducing the async/await syntax is to make coroutine a separate feature in Python, not to be confused with generators.

Don't bother with generator-based coroutines, which are outdated and whose rules are incompatible with async/await syntax.

Before moving on to the next topic, let's look at a few more example programs.

The following program demonstrates how asynchronous IO can reduce waiting time: makerandome() is a coroutine that generates random integers from 0 to 10 in a loop until the generated random number is greater than each threshold, and generates a random number each time After that, the coroutine will sleep for a period of time. We want to be able to run multiple coroutines, but not have to wait for one to finish before another can run. This code largely follows the pattern of the previous two programs, with minor changes:

#!/usr/bin/env python3
# rand.py

import asyncio
import random

# ANSI colors
c = (
    "\033[0m",   # End of color
    "\033[36m",  # Cyan
    "\033[91m",  # Red
    "\033[35m",  # Magenta
)

async def makerandom(idx: int, threshold: int = 6) -> int:
    print(c[idx + 1] + f"Initiated makerandom({
      
      idx}).")
    i = random.randint(0, 10)
    while i <= threshold:
	print(c[idx + 1] + f"makerandom({
      
      idx}) == {
      
      i} too low; retrying.")
	await asyncio.sleep(idx + 1)
	i = random.randint(0, 10)
    print(c[idx + 1] + f"---> Finished: makerandom({
      
      idx}) == {
      
      i}" + c[0])
    return i

async def main():
    res = await asyncio.gather(*(makerandom(i, 10 - i - 1) for i in range(3)))
    return res

if __name__ == "__main__":
    random.seed(444)
    r1, r2, r3 = asyncio.run(main())
    print()
    print(f"r1: {
      
      r1}, r2: {
      
      r2}, r3: {
      
      r3}")

The output with different colors can show the mode of each task operation more clearly than my description:

rand.py program execution
rand.py execution

This program defines a main coroutine: makerandom(), and then calls it three times with different input parameters. The pattern here is very representative: define multiple small, modular coroutines, and then use a main coroutine to run these small coroutines in series. The main() function calls the main coroutine multiple times by mapping an iterator or pool, and then collects the execution results of these courtines in the main() function.

The pool used in this small program is range(3), and we will see a more complete program later, that program will traverse a collection of URLs, and in the main() function, the main coroutine will implement the URLs in this collection. URLs establish connections, send requests, parse responses, and the processing of these URLs is all concurrent.

Going back to the current program, we use asyncio.sleep() to simulate how an IO-intensive program waits for the IO operation to complete. For example, in instant messaging software, two clients send and receive information from each other, then send or receive , the function must wait for the response from the other party before the IO operation returns.

Asynchronous IO design pattern
This section begins to introduce the unique design pattern of asynchronous IO.

Calling Coroutines in series
As we introduced before, a coroutine is an awaitable object that can be called by another coroutine through await: coroutine -> await -> coroutine. In this way we can decompose the program into a series of small, manageable, call chains that can be called repeatedly.

# -*- coding: utf-8 -*-
#!/usr/bin/env python3
# chained.py

import asyncio
import random
import time

# ANSI colors
c = {
    
    
    "end"  : "\033[0m",   # End of color
    "张三" : "\033[36m",  # Cyan
    "李四" : "\033[91m",  # Red
    "王五" : "\033[35m",  # Magenta
}

async def part1(n: str) -> str:
    i = random.randint(0, 10)
    print(c[n] + f"({
      
      n}) 开始做{
      
      i}个面包.")
    await asyncio.sleep(i)
    print(c[n]+f"({
      
      n}) 做好了{
      
      i}个面包.")
    return i

async def part2(n: str, arg: int) -> str:
    i = random.randint(0, 10)
    print(c[n]+f"({
      
      n}) 开始吃面包.")
    await asyncio.sleep(i)
    print(c[n]+f"({
      
      n}) 花了{
      
      i}分钟吃了{
      
      arg}个面包."+c["end"])
    return i

async def chain(n: str) -> None:
    start = time.perf_counter()
    p1 = await part1(n)
    p2 = await part2(n, p1)
    end = time.perf_counter() - start
    print(f"-->{
      
      n}的工作 => 做面包用了{
      
      p1}分钟, 吃面包用了{
      
      p2}分钟.")

async def main(*args):
    await asyncio.gather(*(chain(n) for n in args))

if __name__ == "__main__":
    import sys
    random.seed(12)
    args = ["张三", "李四", "王五"]
    start = time.perf_counter()
    asyncio.run(main(*args))
    end = time.perf_counter() - start
    print(f"Program finished in {
      
      end:0.2f} seconds.")

Both part1() and part()2 are coroutines, they are serially called by coroutine chain(), and each serially calls coroutine asyncio.sleep(). When part1 executes to 'await asyncio.sleep(i)', it will pause and return from coroutine part1(), and then serially call chain() of part1() to return, and return the control to the event loop, and the event loop schedules an executable task. When the sleep is over, part1() is scheduled to execute again from the place where it was suspended last time and returns to chain(), and part2() starts to execute.

$python3 ./chained.py
(张三) 开始做7个面包.
(李四) 开始做4个面包.
(王五) 开始做10个面包.
(李四) 做好了4个面包.
(李四) 开始吃面包.
(张三) 做好了7个面包.
(张三) 开始吃面包.
(王五) 做好了10个面包.
(王五) 开始吃面包.
(李四) 花了8分钟吃了4个面包.
-->李四的工作 => 做面包用了4分钟, 吃面包用了8分钟.
(王五) 花了5分钟吃了10个面包.
-->王五的工作 => 做面包用了10分钟, 吃面包用了5分钟.
(张三) 花了10分钟吃了7个面包.
-->张三的工作 => 做面包用了7分钟, 吃面包用了10分钟.
Program finished in 17.01 seconds.

Finally, the total running time of the program is equal to the time spent by the longest running task.

Using Queues
The asyncio library provides a queue data structure. Our sample programs so far have not required this structure. In chained.py, each task is composed of a series of coroutines executed serially.

In addition to this method, asynchronous IO has another commonly used structure: some mutually independent producers put transactions into the queue. Each producer may put any number of items into the queue at a random time, and once an item is put into the queue, a group of consumers will immediately read as many items as possible from the queue for processing without waiting Any signal to trigger this action.

In this design pattern, consumers and producers are independent of each other. The consumer does not know the number of producers, nor does it know in advance how many items will be put on the queue by the producer.

Each producer and consumer will take a certain amount of time to put/take transactions from the queue. Queues act as channels, enabling producers and consumers to communicate without direct contact.

Note: In Python multi-threaded programming, queues are widely used because of their thread-safe features, but in asynchronous IO you don't need to consider thread safety (in asynchronous IO we only have one thread, unless asynchronous IO is combined with multi-threading use, this article does not cover this part).

The sample programs in this section use queues as the transport medium for producers and consumers without direct contact with each other.

A synchronous version of this program would start off badly: a group of producers put transactions into the queue one after the other, the queue is not released until all producers have completed their tasks, and then consumers dequeue transactions one by one. This synchronous design introduces a lot of latency. Transactions wait in the queue for a long time instead of being taken out and processed immediately.

The following code asyncq.py is an asynchronous version implemented using asyncio.queue. There are three coroutines defined in asyncio.queue. await queue.put() puts a transaction into the queue. If the queue is full, it will return and wait for the queue to be scheduled again when there is new space. await queue.get() takes out the transaction from the queue, if the queue is empty, it will return, waiting for a new transaction in the queue. await queue.join() determines whether all transactions in the queue have been taken out and processed, and returns if so.

Below is the completed code.

#!/usr/bin/env python3
# asyncq.py

import asyncio
import itertools as it
import os
import random
import time

# 辅助函数:异步生成随机数
async def makeitem(size: int = 5) -> str:
    return os.urandom(size).hex()

# 辅助函数: 异步休眠随机时间
async def randsleep(a: int = 1, b: int = 5, caller=None) -> None:
    i = random.randint(0, 10)
    if caller:
	print(f"{
      
      caller} sleeping for {
      
      i} seconds.")
    await asyncio.sleep(i)

async def produce(name: int, q: asyncio.Queue) -> None:
    n = random.randint(0, 10)
    for _ in it.repeat(None, n):  # Synchronous loop for each single producer
	await randsleep(caller=f"Producer {
      
      name}")
	i = await makeitem()
	t = time.perf_counter()
	await q.put((i, t))
	print(f"Producer {
      
      name} added <{
      
      i}> to queue.")

async def consume(name: int, q: asyncio.Queue) -> None:
    while True:
	await randsleep(caller=f"Consumer {
      
      name}")
	i, t = await q.get()
	now = time.perf_counter()
	print(f"Consumer {
      
      name} got element <{
      
      i}>"
	      f" in {
      
      now-t:0.5f} seconds.")
	q.task_done()

async def main(nprod: int, ncon: int):
    q = asyncio.Queue()
    producers = [asyncio.create_task(produce(n, q)) for n in range(nprod)]
    consumers = [asyncio.create_task(consume(n, q)) for n in range(ncon)]
    await asyncio.gather(*producers)
    await q.join()  # Implicitly awaits consumers, too
    for c in consumers:
	c.cancel()

if __name__ == "__main__":
    import argparse
    random.seed(444)
    parser = argparse.ArgumentParser()
    parser.add_argument("-p", "--nprod", type=int, default=5)
    parser.add_argument("-c", "--ncon", type=int, default=10)
    ns = parser.parse_args()
    start = time.perf_counter()
    asyncio.run(main(**ns.__dict__))
    elapsed = time.perf_counter() - start
    print(f"Program completed in {
      
      elapsed:0.5f} seconds.")

The first two functions are helper functions: the first function returns a random value as a transaction put into the queue by the producer, and the second function sleeps for a random period of time. The producer will put a random number of transactions within 10 into the queue, and each transaction is a tuple (i, t), where i is the random value generated by the auxiliary function, and t is the time when the transaction was put into the queue. After the consumer takes the transaction out of the queue, it can calculate how long the transaction has been stored in the queue according to the timestamp in the transaction.

Please don't forget that asyncio.sleep() here is only used to simulate functions that will block.

The following is the running result of two producers and five consumers.

$ python3 asyncq.py -p 2 -c 5
Producer 0 sleeping for 3 seconds.
Producer 1 sleeping for 3 seconds.
Consumer 0 sleeping for 4 seconds.
Consumer 1 sleeping for 3 seconds.
Consumer 2 sleeping for 3 seconds.
Consumer 3 sleeping for 5 seconds.
Consumer 4 sleeping for 4 seconds.
Producer 0 added <377b1e8f82> to queue.
Producer 0 sleeping for 5 seconds.
Producer 1 added <413b8802f8> to queue.
Consumer 1 got element <377b1e8f82> in 0.00013 seconds.
Consumer 1 sleeping for 3 seconds.
Consumer 2 got element <413b8802f8> in 0.00009 seconds.
Consumer 2 sleeping for 4 seconds.
Producer 0 added <06c055b3ab> to queue.
Producer 0 sleeping for 1 seconds.
Consumer 0 got element <06c055b3ab> in 0.00021 seconds.
Consumer 0 sleeping for 4 seconds.
Producer 0 added <17a8613276> to queue.
Consumer 4 got element <17a8613276> in 0.00022 seconds.
Consumer 4 sleeping for 5 seconds.
Program completed in 9.00954 seconds.

In this example, all transactions are processed at a speed of about a few ten thousandths of a second. The time required to process transactions comes from two places:

Standard, unavoidable runtime overhead.
The case where all consumers happen to be sleeping
Consider the second case, fortunately this is the normal case when there are a large number of consumers. You can try it and set the number of consumers to 1000 to see what will happen below. The important thing is that, in theory, you could have different users on different systems controlling the behavior of producers and consumers, with the queue acting as the central conduit, like kafka.

Well, here you have learned a lot about asynchronous IO, how to use async/await to define coroutine, and read three related sample codes. If you want to continue to learn more about how the coroutine mechanism is implemented in Python, please turn to the next chapter.

Python's asynchronous IO is rooted in generators.
You have seen an example of the old-style generator type coroutine before. Although this way of defining coroutines has been replaced by native coroutines, in order to give you a deep understanding of coroutines, We still have to study this way of defining coroutines.

import asyncio

@asyncio.coroutine
def py34_coro():
    """生成器类型的coroutine"""
    # 请用原生coroutine,但是你需要知道这种使用方式
    s = yield from stuff()
    return s

async def py35_coro():
    """更现代的语法:原生的生成器"""
    s = await stuff()
    return s

async def stuff():
    return 0x10, 0x20, 0x30

Please try, what will happen if you call these two functions directly without await?

>>> py35_coro()
<coroutine object py35_coro at 0x10126dcc8>

Well, is it not the same as you thought? The return value of directly calling a coroutine function is a coroutine object (in fact, await requires such an object behind, and we make such an object awaitable).

Question: Can you remember another function feature in Python that looks similar to this: when you call it, instead of executing the internal code, it returns an object?

If your answer is a generator, then congratulations, you have understood the essence of coroutine. Inside the coroutine is an enhanced generator function.

>>> def gen():
...     yield 1
...     yield 2
...
>>> g = gen() #调用生成器函数会返回一个生成器对象
>>> g  # 什么也不会发生 - 生成器对象是一种迭代对象(iter),需要通过next来遍历它的内部成员`.__next__()`
<generator object gen at 0x1012705e8>
>>> next(g)
1
>>> next(g)
2

In fact, whether you declare native coroutines with 'async dev' or use the old-style '@asyncio.coroutine wrappe' decorator, they are based on generator functions. Technically, 'await' is more like 'yield from' than 'yield' (but don't forget that 'yield from x()' is just syntax for 'i in x(): yield i' usage sugar).

The reason why the generator function can be called the basis of asynchronous IO is because it can return at a certain point, and then reentrant at this point. When a scheduler function encounters 'yield' during execution, it returns at this point, after which the function yields control of the CPU to its caller, until the next time it is called, it continues from this point implement. For example, if you have a loop in a generator function, and 'yield' is called in the loop, then the loop will temporarily stop at this point, and will continue to execute subsequent code at this point until the next time the function is called.

We illustrate this with an example.

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

from itertools import cycle

# 生成器函数
def endless():
    """Yields 9, 8, 7, 6, 9, 8, 7, 6, ... forever"""
    # 每次执行到这里,会返回一个数值,然后交出执行权直到下一次被调用
    for i in cycle((9, 8, 7, 6)):
	yield i
    #yield from cycle((9, 8, 7, 6))

def yield_from_loop():
    # 执行生成器函数,返回一个生成器对象
    # 请和coroutine对比
    e = endless()
    total = 0
    # 每次生成器执行到'yield'返回,然后下一次迭代从返回的点开始
    for i in e:
	if total < 6:
	    print(i, end=" ")
	    total += i
	else:
	    print()
	    # Pause execution. We can resume later.
	    break

#打印 9 8 7 6 9 8
yield_from_loop()

# 从上次的点继续运行,输出(6, 9, 8)
next(e), next(e), next(e)

The behavior of the keyword 'await' is the same as this type, and also marks a pause point, then returns execution rights (to the event loop), and then the event loop schedules other coroutines that can be executed immediately. Pause, which means that the coroutine temporarily gives up execution, but it does not exit or end. Don't forget that 'yield', 'yield from' and 'await' all mark a break point in the execution of the generator.

This is the essential difference between an ordinary function and a generator: an ordinary function is one that either does not run, or runs to the end until a 'return' is encountered, and then returns the return value of the function to the calling function. The generator is different. Every time it encounters a 'yield', it will not only return the return value to the calling function, but also save the current running state (paused position, current values ​​of local variables, etc.), etc. The next time you call next(), it will continue running from the saved pause point.

In addition, the generator has a feature to help achieve asynchronous IO. You can pass a value to a generator object through its send() method. In this way, generator objects (and also coroutine objects) can call another object without blocking. This feature is only used in the underlying implementation of coroutines, you will not use this method directly.

If you are interested in this, you can start learning from the definition of PEP 342: coroutine. "How the Heck Does Async-Await Work in Python" by Bretty Cannon, and "Curious Course on Couroutines and Concurrency" by David Beazley are good textbooks for in-depth study of coroutine mechanism.

Here I condense the content of the above material into a few words: coroutines work in a special and unconventional way. An exception property is thrown when their send() method is called. There are some pretty meandering connections between this, but it won't help you actually use the language, so we won't go into that topic further.

Putting these knowledge points together, now we can summarize the following knowledge points about using generators as coroutines:

1. courtine 是特殊的生成器,它利用了生成器的一些特性。
2. 旧式的基于生成器的 coroutine 使用 ‘yield from’ 来等待 coroutine 的结果。现代的 Python 原生语法只是用 await 替换了 ‘yield from’ 来等待一个 coroutine 的结果。 ‘await’ 是对‘yield from’ 的一种模仿,认识到这种联系应该会能帮助你理解 coroutine 是如果工作的。
3. ‘await’ 是在它被调用的地方标记了一个断点。coroutine 会在这里暂停,临时交出 CPU 使用权直到下次被调度时,从这一点开始继续执行。

Other features: asynchronous generators and asynchronous comprehensions

In addition to the async/await keywords, Python also introduces 'async for' to traverse an asynchronous iterator (iterator). The purpose of an asynchronous iterator is to enable calling code asynchronously each time through the loop.

This concept extends naturally to asynchronous generators. Recall what we said before, in a native coroutine, you can use await, return, yield to temporarily interrupt or end the running of the coroutine. Python 3.6 and later (via PEP 525) allows 'yield' in coroutines, calling 'await' and 'yield' in the same coroutine will become an asynchronous generator:

async def mygen(u: int = 10):
    """异步生成2的幂."""
    i = 0
    while i < u:
	yield 2 ** i
	i += 1
	await asyncio.sleep(0.1)

Last but not least, Python allows the use of 'async for' to construct asynchronous array comprehensions. Like its synchronous counterpart, this is essentially a big syntactic sugar:

async def main():
    # This does *not* introduce concurrent execution
    # It is meant to show syntax only
    g = [i async for i in mygen()]
    f = [j async for j in mygen() if not (j // 3 % 5)]
    return g, f

g, f = asyncio.run(main())
print(g)
#[1, 2, 4, 8, 16, 32, 64, 128, 256, 512]
print(f)
#[1, 2, 16, 32, 256, 512]

There is a subtle but important point: neither asynchronous generators nor asynchronous parsers make iteration calls concurrent, but instead temporarily relinquish ownership of the CPU when the loop hits an asynchronous point so other coroutines can run.

In other words, neither asynchronous iterators nor asynchronous generators are used to map functions onto a sequence or iterator concurrently. They are only designed so that the coroutine containing them can be paused to allow other tasks to run. The reason 'async for' and 'async with' are designed at this time is because using synchronous 'for' and 'with' will break the 'await' call to the coroutine containing them. Understanding the difference between asynchrony and concurrency is the key to mastering this difference.

What is an event loop, and the asyncio.run
event loop is an infinite loop, such as a typical:

while True:
    print("in loop")
    time.sleep(5)

Of course, in the event loop, the work to be done is not as simple as printing output. In fact, the task of the event loop is similar to the scheduler in the kernel, which monitors the status of all coroutines. After the currently running coroutine becomes idle, look for possible A coroutine that is scheduled for execution. When a resource that an idle coroutine is waiting for becomes available, the event loop can wake it up.

Currently, all management of the event loop is done in a single function:

asyncio.run(main())  # Python 3.7+

asyncio.run() was introduced in Python 3.7 and is responsible for creating an event loop, running all tasks until they complete, and then ending the event loop.

There is also a relatively verbose way to handle event loops, using get_event_loop(). A typical pattern is:

loop = asyncio.get_event_loop()
try:
    loop.run_until_complete(main())
finally:
    loop.close()

You've definitely seen loop.get_event_loop() used in some obsolete sample programs, but unless you have a clear reason to fine-grain the management of the event loop, asyncio.run() should suffice for most programs .

If you need to interact with the event loop in a Python program, loop is a well-designed old-fashioned Python object that supports checking internal state with loop.is_running() and loop.is_closed() . You can manipulate it when you need to, such as passing a callback as an argument to the loop to schedule it.

Regarding the event loop, it is more important to understand some of its internal mechanisms. Here are some things worth taking the time to learn about it:

The coroutine itself doesn't do much unless it is bound to an event loop

While this was shown in our previous explanation of generators, it's worth emphasizing it again here. If you have a main coroutine that awaits other coroutines, then just calling this coroutine will have no effect:

>>> import asyncio

>>> async def main():
...     print("Hello ...")
...     await asyncio.sleep(1)
...     print("World!")

>>> routine = main()
>>> routine
<coroutine object main at 0x1027a6150>

Remember to use asyncio.run() to schedule the main coroutine, executing it in the event loop:

>>> asyncio.run(routine)
Hello ...
World!

(Other sub-coroutines modified by await will also be executed. Usually, there is no need to execute main() separately to generate a coroutine object, just use main() as a parameter of aynscio.run(), then the main coroutine, including all The sub-coroutines modified with await and called serially will also be scheduled and executed by the event loop)

In general, an asynchronous IO event loop runs in a thread and runs on a CPU. Usually running a single-threaded event loop on one CPU is sufficient. The event loop can also be configured on multiple CPUs. You can refer to John Reese's related lecture,
the link is given at the end of this article.

The event loop is pluggable. That said, you can also write your own event loop implementation if you want. A good example is uvloop, which implements an event loop in Cythyon.
A pluggable event loop means that you can use any event loop implementation, regardless of the coroutine structure. In fact, asyncio itself contains two implementations of the event loop, the default is based on the selector module, and the other is for windows.

A Complete Example: Asynchronous HTTP Request
Congratulations, you have come so far, and learned so much fresh content, now is the time to enjoy your learning. In this section, you will use aiohttp to write a small web scraping program, areq.py. aiohttp is a very fast asynchronous HTTP client/server framework (we only need the client side). A program like this can be used to map HTTP connections to hosts in a cluster by composing urls into a directed graph. You've made it this far, and now it's time for the fun and painless part. In this section, you'll build a web-scraping URL collector, areq.py, using aiohttp, a blazingly fast async HTTP client/server framework . (We just need the client part.) Such a tool could be used to map connections between a cluster of sites, with the links forming a directed graph.

Note: You may wonder why Python's request library is not compatible with asynchronous IO. This is because Python's request library is implemented on top of urllib3, and urllib3 uses Python's http and socket modules.

By default, socket operations are blocked. This means that 'await requests.get(url)' should not be used because '.reqeusts.get()' is not an awaitable object. In contrast, almost all members of the aiohttp library are awaitable objects, such as session.request() and response.text(). So please don't use the request library to write asynchronous code.

The program's architecture is this:

Read a series of URL addresses from a local file urls.txt

Send GET requests to these URLs and parse the return values. If it fails, do not continue processing the URL.

The URL in the href contained in the HTTP content returned by the search

Write these URLs to the local file foundurls.txt

Try to use asynchrony and concurrency to implement the above logic. (Use aiohttp to send HTTP requests and aiofiles to write files. Both libraries are good examples of asynchronous IO patterns).

Below is the content of the urls.txt file. It's not a huge file, and contains some unreachable sites:

$ cat urls.txt
https://regex101.com/
https://docs.python.org/3/this-url-will-404.html
https://www.nytimes.com/guides/
https://www.mediamatters.org/
https://1.1.1.1/
https://www.politico.com/tipsheets/morning-money
https://www.bloomberg.com/markets/economics
https://www.ietf.org/rfc/rfc2616.txt

The second URL in there should return a 404 and should be handled with care. If you want to implement a more practical program, you also need to handle some more responsible situations, such as server connection disconnection, or infinite redirection, etc.

All requests should be included in a session, so that the internal connection pool of the session can be reused.

Let's take a complete look at the program first, and then analyze it step by step:

# -*- coding: utf-8 -*-
#!/usr/bin/env python3
# areq.py

""" 异步的获取嵌入在 HTML 内的 HTTP 链接 """

import asyncio
import logging
import re
import sys
from typing import IO
import urllib.error
import urllib.parse

import aiofiles
import aiohttp
from aiohttp import ClientSession

logging.basicConfig(
    format="%(asctime)s %(levelname)s:%(name)s: %(message)s",
    level=logging.DEBUG,
    datefmt="%H:%M:%S",
    stream=sys.stderr,
)
logger = logging.getLogger("areq")
logging.getLogger("chardet.charsetprober").disabled = True

HREF_RE = re.compile(r'href="(.*?)"')

async def fetch_html(url: str, session: ClientSession, **kwargs) -> str:
    """ 利用 HTTP GET 请求获取 HTML 页面。

    kwargs 作为参数传递给 `session.request()`.
    """

    resp = await session.request(method="GET", url=url, **kwargs)
    resp.raise_for_status()
    logger.info("Got response [%s] for URL: %s", resp.status, url)
    html = await resp.text()
    return html

async def parse(url: str, session: ClientSession, **kwargs) -> set:
    """ 获取 url 链接返回的 html 中内嵌的 href """
    found = set()
    try:
	html = await fetch_html(url=url, session=session, **kwargs)
    except (
	aiohttp.ClientError,
	aiohttp.http_exceptions.HttpProcessingError,
    ) as e:
	logger.error(
	    "aiohttp exception for %s [%s]: %s",
	    url,
	    getattr(e, "status", None),
	    getattr(e, "message", None),
	)
	return found
    except Exception as e:
	logger.exception(
	    "Non-aiohttp exception occured:  %s", getattr(e, "__dict__", {
    
    })
	)
	return found
    else:
	for link in HREF_RE.findall(html):
	    try:
		abslink = urllib.parse.urljoin(url, link)
	    except (urllib.error.URLError, ValueError):
		logger.exception("Error parsing URL: %s", link)
		pass
	    else:
		found.add(abslink)
	logger.info("Found %d links for %s", len(found), url)
	return found

async def write_one(file: IO, url: str, **kwargs) -> None:
    """  爬取一个 url 链接,将 html 包含的 href 写入文件 """
    res = await parse(url=url, **kwargs)
    if not res:
	return None
    async with aiofiles.open(file, "a") as f:
	for p in res:
	    await f.write(f"{
      
      url}\t{
      
      p}\n")
	logger.info("Wrote results for source URL: %s", url)

async def bulk_crawl_and_write(file: IO, urls: set, **kwargs) -> None:
    """ 异步的爬取多个 url 并写入文件 """
    async with ClientSession() as session:
	tasks = []
	for url in urls:
	    tasks.append(
		write_one(file=file, url=url, session=session, **kwargs)
	    )
	await asyncio.gather(*tasks)

if __name__ == "__main__":
    import pathlib
    import sys

    assert sys.version_info >= (3, 7), "Script requires Python 3.7+."
    here = pathlib.Path(__file__).parent

    """ 打开 urls 文件,将 url 读入 set """
    with open(here.joinpath("urls.txt")) as infile:
	urls = set(map(str.strip, infile))

    """ 打开存放提取出的 url 的文件 """
    outpath = here.joinpath("foundurls.txt")
    with open(outpath, "w") as outfile:
	outfile.write("source_url\tparsed_url\n")

    """ 运行异步任务 """
    asyncio.run(bulk_crawl_and_write(file=outpath, urls=urls))

This program is more practical and more complex than any of the ones we've written before, so let's break it down.

The constant HREF_RE is a regular expression used to isolate the last thing we want to search for: href tags contained in HTML.

>>> HREF_RE.search('Go to <a href="https://realpython.com/">Real Python</a>')
<re.Match object; span=(15, 45), match='href="https://realpython.com/"'>

fetch_html() as a couroutine encapsulates the process of sending a GET request to the server and then analyzing the returned HTML content. It sends a request, awaits the response from the server, and if the returned status is not 200-OK, an exception is thrown and then processed:

resp = await session.request(method="GET", url=url, **kwargs)
resp.raise_for_status()

If the status is OK, fetch_html() returns the content of the HTML page as a string. It is worth noting that the exception is not handled in this function, it is passed to the calling function and handled there:

html = await resp.text()

We use await to handle session.request() and resp.text() because they are both awaitable coroutines objects. If fetch_html() is synchronous, this request/response cycle will take a lot of time, but using asynchronous IO, other tasks can be called during this time, such as parsing html content, writing the parsed URL link to document.

After coroutine fetch_html() returns, the chain of coroutine calls continues. coroutine parse() starts to extract the href tags contained in the HTML page returned by fetch_html() to ensure that the content is correct, and then converts them to absolute path format.

Yes, the second part of the coroutine parse() is blocking, but it contains a quick regex match, and in our particular case, this synchronous code should execute quickly and without problems. But keep in mind that each line of code in a coroutine will block the execution of other coroutines until it calls yield, await or return to relinquish execution. If parsing in our example is a more CPU-intensive call, one should consider calling loop.run_in_executor() to execute this part of the code in a separate thread.

Next, the two parameters of coroutine write_one() are a file object and a URL, it awaits parse() first, and when parse() returns the parsed URLs, these URLs and their source URLs are asynchronously written to the file .

Finally, bulk_crawl_and_write() is the main entry in the program's coroutine call chain. It uses a session, and each URL read from urls.txt is a separate task.

Here are some knowledge points to pay attention to:

By default, ClientSession supports up to 100 connections. This behavior can be configured by passing an asyncio.connector.TCPConnector to the ClientSession. You can also configure caps individually for each host.

You can set connection timeouts for sessions or for each individual request.

This script also uses async with to call an asynchronous context manager. Asynchronous context managers in Python are no different from synchronous context managers, except that .exit ( ) and .enter () are replaced by .aenter() and .aexit() . As you must have guessed, async with can only be used in coroutines defined with async def.

Please refer to the references section later in this article to continue to learn this part of the knowledge in depth.

We tried running areq.py, to process 9 URLs, and within a second, all the work of fetching, parsing and saving the results was done. The following is the result of running:

$ python3 areq.py
21:33:22 DEBUG:asyncio: Using selector: KqueueSelector
21:33:22 INFO:areq: Got response [200] for URL: https://www.mediamatters.org/
21:33:22 INFO:areq: Found 115 links for https://www.mediamatters.org/
21:33:22 INFO:areq: Got response [200] for URL: https://www.nytimes.com/guides/
21:33:22 INFO:areq: Got response [200] for URL: https://www.politico.com/tipsheets/morning-money
21:33:22 INFO:areq: Got response [200] for URL: https://www.ietf.org/rfc/rfc2616.txt
21:33:22 ERROR:areq: aiohttp exception for https://docs.python.org/3/this-url-will-404.html [404]: Not Found
21:33:22 INFO:areq: Found 120 links for https://www.nytimes.com/guides/
21:33:22 INFO:areq: Found 143 links for https://www.politico.com/tipsheets/morning-money
21:33:22 INFO:areq: Wrote results for source URL: https://www.mediamatters.org/
21:33:22 INFO:areq: Found 0 links for https://www.ietf.org/rfc/rfc2616.txt
21:33:22 INFO:areq: Got response [200] for URL: https://1.1.1.1/
21:33:22 INFO:areq: Wrote results for source URL: https://www.nytimes.com/guides/
21:33:22 INFO:areq: Wrote results for source URL: https://www.politico.com/tipsheets/morning-money
21:33:22 INFO:areq: Got response [200] for URL: https://www.bloomberg.com/markets/economics
21:33:22 INFO:areq: Found 3 links for https://www.bloomberg.com/markets/economics
21:33:22 INFO:areq: Wrote results for source URL: https://www.bloomberg.com/markets/economics
21:33:23 INFO:areq: Found 36 links for https://1.1.1.1/
21:33:23 INFO:areq: Got response [200] for URL: https://regex101.com/
21:33:23 INFO:areq: Found 23 links for https://regex101.com/
21:33:23 INFO:areq: Wrote results for source URL: https://regex101.com/
21:33:23 INFO:areq: Wrote results for source URL: https://1.1.1.1/

Looks pretty good, doesn't it? You can check how many lines there are in the output to see if it works correctly. The result when I tested is

$ wc -l foundurls.txt
     626 foundurls.txt

$ head -n 3 foundurls.txt
source_url  parsed_url
https://www.bloomberg.com/markets/economics https://www.bloomberg.com/feedback
https://www.bloomberg.com/markets/economics https://www.bloomberg.com/notices/tos

What to do next: You can turn the program into a recursive call, use aio-redis to record which URLs have been crawled to avoid repeated requests, and you can use the networkx library to initiate connection requests.

It must be moderate, and it is very, very bad behavior to initiate 1000 requests to a small website at the same time. There are many ways to limit how many requests are sent at once, such as using semphore provided by asyncio. If you don't obey this warning, you'll just end up with tons of timeout errors, hurting your own program, and it's no good for anyone.

Background on asynchronous IO
Now that you've seen a lot of Python code for asynchronous IO, let's talk about when you should use asynchronous IO, when you should choose other concurrency modes, and why you made this choice.

When to choose asynchronous IO?

This article is not a formal discussion of asynchronous IO, a formal comparison of threaded mode and multi-process mode, but it will discuss when it is best to use asynchronous IO instead of using the other two models.

Asynchronous IO and multi-process mode do not actually constitute a competitive relationship. In fact they can be used simultaneously. If you have multiple CPU-intensive tasks, such as grid search used in scikit-learn or keras, it is obvious to use a multi-process model.

If functions call blocking applications, it is not a good idea to prefix these functions with async, it will slow down your program. But as mentioned before, sometimes asynchronous IO can be used with multi-processing model.

The comparison between asynchronous IO and threading models is more straightforward. At the beginning of this article I mentioned that "threads are hard to use". In fact, even when the threading model is easy to implement, using the threading model can still be difficult to debug because of race conditions, memory usage, and so on.

Also, the threading model is harder to scale than asynchronous IO. Because threads are a limited system resource. Creating thousands of threads on many machines will fail, and I wouldn't recommend you try it. But it is often feasible to create tasks with thousands of asynchronous IO.

Asynchronous IO is suitable for use in scenarios that contain multiple IO-intensive tasks, and waiting for these blocking IOs will take up a lot of time for these tasks. for example:

Network IO, use asynchronous IO as server or client
serverless design, such as end-to-end, multi-user network chat room
to mimic "fire and forget" style read and write operations, but do not want to add Locks
The biggest reason not to use asynchronous IO is that it requires special support from the codebase. If you want to use asynchronous write operations for a certain database, you not only need to find the Python wrapper library for this database, but also check whether it supports the async/await syntax. Always keep in mind that coroutines containing synchronous calls can block other tasks from running.

The appendix at the end of this article contains a brief list of codebases that support the async/await syntax.

Which asynchronous IO library should I use?

Although this article uses the asynchronous IO library: asyncio, you can also use other libraries that encapsulate asynchronous IO. Regarding the situation of these libraries, Nathaniel J. Smith has a description:

"In the near future, asyncio is likely to be reduced to the same situation as other Python standard libraries that have practically little use, such as urllib2.

Actually what I want to point out is that precisely because asyncio is so successful, it faces the situation that it was the best way to use it when it was designed; but since then, many good features introduced by asyncio, For example, async/await provides inspiration for later developers, while asyncio is tired of its previous success and slowly becomes obsolete.

Among other libraries that provide asynchronous IO, curio and trio are the two most popular. I personally think that if you are building a medium-sized project with a relatively straightforward design, using asyncio directly is enough, and you can avoid introducing other dependencies besides the Python standard library.

But anyway, try to learn to use curio and trio, you may find that they are easier to use and more intuitive for you. Many asyncio concepts apply to other asynchronous IO libraries as well.

Miscellaneous
Next we will discuss some miscellaneous things about the asyncio library and async/await syntax. These knowledge points are not easy to put in the previous chapters, but they are still very important for building and understanding a complete program.

Other asyncio high-level functions
In addition to asyncio.run(), you can also see other high-level functions of the asyncio library, such as asyncio.create_task() and asyncio.gather().

You can use create_task() plus run() to schedule the execution of a coroutine object:

import asyncio

async def coro(seq) -> list:
    """'IO' 等待的时间是 seq 里的最大值"""
    await asyncio.sleep(max(seq))
    return list(reversed(seq))

async def main():
    # 因为这个例子里只有一个任务,所以可以不用 creae_task
    # 直接使用 `await coro([3, 2, 1])`
    t = asyncio.create_task(coro([3, 2, 1]))  # Python 3.7 或更新
    await t
    print(f't: type {
      
      type(t)}')
    print(f't done: {
      
      t.done()}')

t = asyncio.run(main())
# t: type <class '_asyncio.Task'>
# t done: True

There is a subtlety in this example: if 'await t' is not called in main(), the coroutine coco may finish after main() finishes. Because asyncio.run(main()) actually calls loop.run_until_complete(main()), the event loop only cares about when 'main' ends, and does not know that other tasks are created in main(), and it is still I don't know when the coroutine coco will end. So if the coroutine coco ends after main(), then the event loop is over and the coroutine coco is terminated. You can get a list of all tasks waiting to be scheduled in this event loop through the function asyncio.Task.all_tasks().

Note: asyncio.create_task() was introduced in Python 3.7. In Python 3.6 or lower, asyncio.ensure_future() can be used.

Another way is to use asyncio.gather(). Its function is to integrate multiple coroutines into a single future object and return it. If you pass multiple coroutines as parameters to 'await asyncio.gather()', you can wait until they are all executed. (This is a bit like queue.jone() in our earlier example). The return value of gather() is a linked list of the execution results of all its input coroutines.

>>> import time
>>> async def main():
...     t = asyncio.create_task(coro([3, 2, 1]))
...     t2 = asyncio.create_task(coro([10, 5, 0]))  # Python 3.7+
...     print('Start:', time.strftime('%X'))
...     a = await asyncio.gather(t, t2)
...     print('End:', time.strftime('%X'))  # Should be 10 seconds
...     print(f'Both tasks done: {
      
      all((t.done(), t2.done()))}')
...     return a
...
>>> a = asyncio.run(main())
Start: 16:20:11
End: 16:20:21
Both tasks done: True
>>> a
[[1, 2, 3], [0, 5, 10]]

You may have noticed that gateyer() will wait for the execution results of all coroutines in the parameter. You can also iterate over asyncio.as_complted() to get the output of completed tasks in order of completion. Like gather(), the input parameters of this function are all coroutines, and the output is an iterator of yield execution results when these tasks complete. In the following example, the output of coroutine coco([3,2,1]) will be obtained before coco([10,5,0]), not at the same time as gather().

>>> async def main():
...     t = asyncio.create_task(coro([3, 2, 1]))
...     t2 = asyncio.create_task(coro([10, 5, 0]))
...     print('Start:', time.strftime('%X'))
...     for res in asyncio.as_completed((t, t2)):
...         compl = await res
...         print(f'res: {compl} completed at {time.strftime("%X")}')
...     print('End:', time.strftime('%X'))
...     print(f'Both tasks done: {all((t.done(), t2.done()))}')
...
>>> a = asyncio.run(main())
Start: 09:49:07
res: [1, 2, 3] completed at 09:49:10
res: [0, 5, 10] completed at 09:49:17
End: 09:49:17
Both tasks done: True

Finally, you'll also see asyncio.ensure_future() . But you should rarely use it, because it is a low-level API, and most of its functions have been replaced by the create_task() we mentioned earlier.

Priority of await
Although the type of behavior, the keyword await has a much higher priority than yield. This also means that if await is used instead of yield from, there is no need to use parentheses in many scenarios. See PEP 492 for examples of using await expressions for more information.

Summary ## Title
Now you have learned to use async/await and the libraries built on top of them. Here is what you should have learned:

1. 异步 IO 是一种语言无关的模型,以及一种通过允许 coroutine 相互之间间接通讯来实现并发的方法。
2. Python 引入新的关键字 async/await,用来标记和定义 coroutine
3. asyncio, 是一个提供运行和管理 coroutine 的 Python 包

related resources

The Python version specification
Async IO is a part that is still evolving rapidly, and new things are constantly appearing. Here we list the evolution of Python versions related to asyncio:

3.3: Introduce yield from expression allows generator expression allows for generator delegation.

3.4: asyncio is introduced into the Python standard library, at which point its API is marked as not yet stable.

3.5: async and await are part of the Python syntax, used to define and wait for coroutines, but have not yet become keywords, that is to say, you can also define functions or variables whose names are async and await.

3.6: Introduce asynchronous generators and asynchronous parsers. asyncio's API is marked as stable.

3.7: async and await are called reserved keywords. It is recommended to no longer use the old-style asyncio.routine() to define coroutines. Introduced asyncio.run() and some other features.

If you want to use asyncio safely and want to use asyncio.run(), it is recommended to upgrade to a version above Python 3.7.

literature

Here are some related extension resources:

  • Real Python: Speed up your Python Program with Concurrency

  • Real Python: What is the Python Global Interpreter Lock?

  • CPython: The asyncio package source

  • Python docs: Data model > Coroutines

  • TalkPython: Async Techniques and Examples in Python

  • Brett Cannon: How the Heck Does Async-Await Work in Python 3.5?

  • PYMOTW: asyncio

  • A. Jesse Jiryu Davis and Guido van Rossum: A Web Crawler With asyncio Coroutines

  • Andy Pearce: The State of Python Coroutines: yield from

  • Nathaniel J. Smith: Some Thoughts on Asynchronous API Design in a Post-async/await World

  • Armin Ronacher: I don’t understand Python’s Asyncio

  • Andy Balaam: series on asyncio (4 posts)

  • Stack Overflow: Python asyncio.semaphore in async-await function


  • The motivation behind the language change is explained in detail in Yeray Diaz: AsyncIO for the Working Python Developer Asyncio Coroutine Patterns: Beyond await Some What's New in Python:

  • What’s New in Python 3.3 (yield from and PEP 380)

  • What's New in Python 3.6 (PEP 525 & 530)
    David Beazley's related articles:

  • Generator: Tricks for Systems Programmers

  • A Curious Course on Coroutines and Concurrency

  • Generators: The Final Frontier
    Video resource on YouTube:

  • John Reese - Thinking Outside the GIL with AsyncIO and Multiprocessing - PyCon 2018

  • Keynote David Beazley - Topics of Interest (Python Asyncio)

  • David Beazley - Python Concurrency From the Ground Up: LIVE! - PyCon 2015

  • Raymond Hettinger, Keynote on Concurrency, PyBay 2017

  • Thinking about Concurrency, Raymond Hettinger, Python core developer

  • Miguel Grinberg Asynchronous Python for the Complete Beginner PyCon 2017

  • Yury Selivanov asyncawait and asyncio in Python 3 6 and beyond PyCon 2017

  • Fear and Awaiting in Async: A Savage Journey to the Heart of the Coroutine Dream

  • What Is Async, How Does It Work, and When Should I Use It? (PyCon APAC 2014)

Async/await compatible libraries:

aio-libs:

- aiohttp: Asynchronous HTTP client/server framework
- aioredis: Async IO Redis support
- aiopg: Async IO PostgreSQL support
- aiomcache: Async IO memcached client
- aiokafka: Async IO Kafka client
- aiozmq: Async IO ZeroMQ support
- aiojobs: Jobs scheduler for managing background tasks
- async_lru: Simple LRU cache for async IO

magicstack:

 - uvloop: Ultra fast async IO event loop
 - asyncpg: (Also very fast) async IO PostgreSQL support

other:

- trio: Friendlier asyncio intended to showcase a radically simpler design
- aiofiles: Async file IO
- asks: Async requests-like http library
- asyncio-redis: Async IO Redis support
- aioprocessing: Integrates multiprocessing module with asyncio
- umongo: Async IO MongoDB client
- unsync: Unsynchronize asyncio
-  aiostream: Like itertools, but async

This article now has a related video tutorial in Real Python, you can watch and complete the corresponding programming training to deepen your understanding: Hands-on Python 3 Concurrency With the asyncio Module

Guess you like

Origin blog.csdn.net/u010006102/article/details/132231434