Python coroutine asyncio

asyncio is an asynchronous IO library in Python that is used to write concurrent coroutines. It is suitable for scenarios where IO is blocked and requires a lot of concurrency, such as crawlers, file reading and writing.

asyncio was introduced in Python 3.4. After several iterations, the features and syntactic sugar have been improved to varying degrees. This also makes the usage of asyncio in different versions of Python different, which seems a bit messy. At that time, it was also based on the principle of just being able to use it. I took some detours in writing. Now I will sort out the usage of asyncio in Python3.7+ and Python3.6, so that it can be better used in the future.

1. Coroutine and asyncio

Coroutines, also known as microthreads, are not managed by the operating system kernel, but are completely controlled by the program. The cost of coroutine switching is small, so it has higher performance.

A coroutine can be compared to a subroutine. The difference is that during execution, a coroutine can suspend the current state, switch to other coroutines, and return to execute when appropriate. The switch between coroutines does not involve any system calls. Or any blocking call is completely scheduled by the coroutine scheduler.

Python relies on asyncio, and uses async/await syntax to create and use a coroutine. The following async syntax creates a coroutine function:

async def work():
    pass

In addition to the functions of ordinary functions in a coroutine, the main function is: use await syntax to wait for the end of another coroutine, which will suspend the current coroutine until another coroutine produces a result and then continue execution:

async def work():
    await asyncio.sleep(1)
    print('continue')

asyncio.sleep() It is the built-in coroutine function of the asyncio package, here to simulate time-consuming IO operations, the execution of the above coroutine to this sentence will suspend the current coroutine and execute other coroutines until sleep ends, when there are multiple coroutine tasks , This switch will make their IO operations parallel processing.

Note that executing a coroutine function will not actually run it, but will return a coroutine object. To make the coroutine really run, you need to add them to the event loop to run. The official recommendation is that the asyncio program should have a master Entry coroutine, used to manage all other coroutine tasks:

async def main():
    await work()

In Python3.7+, running this asyncio program only needs one sentence:, asyncio.run(main())while in Python3.6, you need to manually obtain the event loop and add the coroutine task:

loop = asyncio.get_event_loop()
loop.run_until_complete(main())
loop.close()

The event loop is a circular queue in which the coroutines are scheduled for execution. When a coroutine is added to the loop, other coroutines created by this coroutine will automatically be added to the current event loop.

In fact, the coroutine object does not run directly, but is encapsulated into a task to be executed. In most cases, asyncio will help us to encapsulate it. We can also encapsulate the Task in advance to gain more control over the coroutine. Note that the package task need the current thread is an event loop running , otherwise it will lead RuntimeError, which is the official reason is recommended to use the main entrance coroutine, if you create a job outside the main entrance coroutine will need to obtain the event loop and then manually Use the low-level method loop.create_task(), and there must be a running loop within the main entry coroutine. After the task is created, there is a status, you can view the running status, view the result, cancel the task, etc.:

async def main():
    task = asyncio.create_task(work())
    print(task)
    await task
    print(task)

#----执行结果----#
<Task pending name='Task-2' coro=<work() running at d:\tmp\code\asy.py:5>>
<Task finished name='Task-2' coro=<work() done, defined at d:\tmp\code\asy.py:5> result=None>

asyncio.create_task()It is a high-level API added by Python 3.7. In Python 3.6, you need to use a low-level API asyncio.ensure_future()to create a Future. Future is also an object that manages the running state of a coroutine, which is not essentially different from Task.

2. Concurrent coroutines

Usually, a program containing a series of concurrent coroutines is written as follows (Python3.7+):

import asyncio
import time


async def work(num: int):
    '''
    一个工作协程,接收一个数字,将它 +1 后返回
    '''
    print(f'working {num} ...')
    await asyncio.sleep(1)    # 模拟耗时的IO操作
    print(f'{num} -> {num+1} done')
    return num + 1


async def main():
    '''
    主协程,创建一系列并发协程并运行它们
    '''
    # 任务队列
    tasks = [work(num) for num in range(0, 5)]
    # 并发执行队列中的协程并等待结果返回
    results = await asyncio.gather(*tasks)
    print(results)


if __name__ == "__main__":
    asyncio.run(main())

The key to running multiple coroutine tasks concurrently is asyncio.gather(*tasks)that it accepts multiple coroutine tasks and adds them to the event loop. After all tasks are completed, the result list will be returned. Here we do not manually encapsulate the Task, because the gather function will automatically Package.

There is another way to run concurrently asyncio.wait(tasks), the difference is:

  • Gather is more high-level than wait. Gather can group tasks. Generally, gather is preferred:
tasks1 = [work(num) for num in range(0, 5)]
tasks2 = [work(num) for num in range(5, 10)]
group1 = asyncio.gather(*tasks1)
group2 = asyncio.gather(*tasks2)
results1, results2 = await asyncio.gather(group1, group2)
print(results1, results2)
  • When some customized tasks are required, wait can be used:
# Python3.8 版本后,直接向 wait() 传入协程对象已弃用,必须手动创建 Task
tasks = [asyncio.create_task(work(num)) for num in range(0, 5)]
done, pending = await asyncio.wait(tasks)
for task in tasks:
    if task in done:
        print(task.result())
for p in pending:
    p.cancel()

3. Tips

  • After await statement must be a can wait on an object , the object can wait there are three: Python coroutine, Task, Future. Normally, there is no need to create Future objects in application-level code.
  • Although the synchronization code used in the asyncio program does not report an error, it also loses the meaning of concurrency, such as network requests. If you use requests that only support synchronization, you cannot initiate other requests before receiving the response result after initiating a request. When you want to access multiple web pages concurrently, even if you use asyncio, switching to other coroutines after sending a request will still be blocked due to synchronization problems, and there is no speed improvement. At this time, other request libraries that support asynchronous operations are needed. Such as aiohttp .
  • About asyncio of more detailed operations, see the official documentation

Guess you like

Origin blog.csdn.net/zzh2910/article/details/108319093