Python asynchronous programming coroutine

overview

Why use coroutines

In a multi-threaded program, thread switching is determined by the operating system and cannot be intervened by humans. There is no correlation between threads, no sequence, no mutual reference, and zero coupling. Multithreading is very suitable for this scenario. Coroutine is an advanced programming pattern that writes high-coupling code that is determined by the programmer to execute the code and can influence each other on the basis of threads.

In a thread, no matter how it is designed, within a thread, the code is executed sequentially, and it must be blocked when encountering IO. Until coroutines appeared, this sentence became a false proposition. There can be multiple coroutines in a thread, which is equivalent to multiple subtasks in a workshop. If one coroutine encounters IO blockage, the CPU will automatically go to work in another coroutine, and the program decides where to work. In addition, even the overhead of creating threads and thread switching is saved.

The characteristics and principles of coroutines

The function of the coroutine: when executing function A, it can be interrupted at any time to execute function B, and then interrupt and continue to execute function A (you can switch freely). But this process is not a function call (no call statement), the whole process looks like multi-threading, but only one thread executes the coroutine.

When implementing multitasking, thread switching is much more than saving and restoring CPU context from the system level. For the efficiency of program running, the operating system has its own cache and other data for each thread, and the operating system will also help you restore these data. So thread switching is very performance-consuming. But the switching of the coroutine is just a simple operation of the context of the CPU, so the system can resist switching millions of times in a second.

Is the principle of coroutines very similar to the yield we mentioned earlier? Coroutines are evolved from yield. Yield reference article about python: yield in python .

Advantages and disadvantages of coroutines

advantage:

  1. It not only handles high concurrency (handling high concurrency under a single thread), but also saves resources (the essence of a coroutine is a single thread)
  2. The switching overhead of the coroutine is smaller, which belongs to the switching of the program level, without the overhead of thread context switching, and the operating system is completely unaware, so it is more lightweight, convenient to switch control flow, and simplifies the programming model;
  3. The effect of concurrency can be achieved in a single thread to maximize the use of cpu
  4. High concurrency + high scalability + low cost: It is not a problem for a CPU to support tens of thousands of coroutines. So it is very suitable for high concurrency processing

shortcoming:

  1. The disadvantage is that it cannot use multi-core resources. It is single-core in nature. It cannot use multiple cores of a single CPU at the same time. The coroutine needs to cooperate with the process to run on multiple CPUs;
  2. Once the coroutine is introduced, it is necessary to detect all IO behaviors under a single thread, and switch when encountering IO. It is impossible to miss one. I think that once a task is blocked, the entire thread will be blocked. Even if other tasks can be calculated, but can't run

gevent implements coroutines

gevent is a third-party library that implements coroutines in python. gevent is the encapsulation of greenlet, and greenlet is the encapsulation of yield.

gevent switching coroutines is done automatically.

See an example below:

import gevent
 
 
def f1():
    for i in range(1, 6):
        print('f1', i)
        gevent.sleep(0)
 
 
def f2():
    for i in range(6, 11):
        print('f2', i)
        gevent.sleep(0)
 
 # 创建协程对象
g1 = gevent.spawn(f1)
g2 = gevent.spawn(f2)
# 多个协程对象等待协程执行结束,与线程守护类似
gevent.joinall([g1, g2])

asyncio coroutine decorator

In Python 3.4, the asyncio module appeared, and the creation of coroutine functions must be marked with the asyncio.coroutine decorator. The previous functions containing the yield from statement can be called both generator functions and coroutine functions. In order to highlight the importance of coroutines, the functions that use the asyncio.coroutine decorator are now real coroutine functions.

Tasks and the event loop

coroutine coroutine

Coroutine object, the function decorated with asyncio.coroutine decorator is called a coroutine function, its call will not execute the function immediately, but return a coroutine object, that is, the result of the coroutine function is a coroutine object, pay attention here The "running result" mentioned is not the return value. The coroutine object needs to be packaged as a task and injected into the event loop, and called by the event loop.

task task

Create a task with a coroutine object as a parameter. The task is a further encapsulation of the coroutine object, which contains various states of the task.

event_loop event loop

Comparing multithreading to multiple workshops in a factory, coroutines are multiple machines in a workshop. In the thread-level program, when one machine starts to work, other machines in the workshop cannot work at the same time. It needs to wait for one machine to stop, but the machines in other workshops can start at the same time, which can significantly improve work efficiency. In the coroutine program, different machines in a workshop can run at the same time, and operations such as starting the machine, pausing operation, delaying the start, and stopping the machine can all be manually set.

The event loop can control the task running process, that is, the caller of the task.

import time
import asyncio


def main():
    start = time.time()

    @asyncio.coroutine  # 使用协程装饰器创建协程函数
    def do_some_work():
        print("start work")
        time.sleep(1)  # 模拟 IO 操作
        print("work completed")

    # 创建事件循环。每个线程中只能有一个事件循环,get_event_loop 方法会获取当前已经存在的事件循环,如果当前线程中没有,新建一个
    loop = asyncio.get_event_loop()
    coroutine = do_some_work()  # 调用协程函数获取协程对象
    # 将协程对象注入到事件循环,协程的运行由事件循环控制。事件循环的 run_until_complete 方法会阻塞运行,直到任务全部完成。
    # 协程对象作为 run_until_complete 方法的参数,loop 会自动将协程对象包装成任务来运行。后面我们会讲到多个任务注入事件循环的情况
    loop.run_until_complete(coroutine)

    end = time.time()
    print("耗时%ds" % (end - start))

main()

# start work
# work completed
# 耗时1s

task status

Event loop create_task methods can create tasks, and other asyncio.ensure_futuremethods can also create tasks, and the parameters must be coroutine objects.

task is asyncio.Taskan instance of a class, why use a coroutine object to create a task? Because some work asyncio.Task has been done , including the pre-excitation coroutine and the handling of some exceptions encountered during the coroutine operation.

The attribute of the task object _statesaves the running status of the current task, and there are two PENDING running statuses of the task and .FINISHED

import time
import asyncio


def main():
    start = time.time()

    @asyncio.coroutine
    def do_some_work():
        print("start work")
        time.sleep(1)
        print("work completed")

    loop = asyncio.get_event_loop()
    task = loop.create_task(do_some_work())  # 创建任务
    print("task is instance of asyncio.Task:", isinstance(task, asyncio.Task))
    print("task state:", task._state)
    loop.run_until_complete(task)
    print("task state:", task._state)

    end = time.time()
    print("耗时%ds" % (end - start))

main()

# task is instance of asyncio.Task: True
# task state: PENDING
# start work
# work completed
# task state: FINISHED
# 耗时1s

async/await native coroutine

In Python 3.5, the async / await keywords are added to define coroutine functions. These two keywords are a combination, which is equivalent to the asyncio.coroutine decorator and the yield from statement. Since then, coroutines and generators have been completely separated.

call back

With the asyncio / await keywords in place, we continue to learn the basic functionality of the asyncio module.

If the coroutine contains an IO operation (this is almost certain), after it has processed the data, we want to be notified for the next step of data processing. This requirement can be achieved by adding a callback to the future object. So what is a future object? The task object is the future object, we can think of it this way, because asyncio.Task is a subclass of asyncio.Future. That is, task objects can add callback functions. The last parameter of the callback function is a future or task object, through which the return value of the coroutine can be obtained. If the callback requires multiple parameters, it can be imported through a partial function.

In short, the code that needs to be piggybacked on after a task completes can be placed in a callback function. Modify the previous program as follows:

import time
import asyncio
from functools import partial


def main():
    start = time.time()

    async def do_some_work():
        print("start work")
        time.sleep(1)
        print("work completed")

    # 回调函数,协程终止后需要顺便运行的代码写入这里
    def callback(name, task):  # 最后一个参数必须是task或future
        print('[callback] Hello {}'.format(name))
        print('[callback] coroutine state: {}'.format(task._state))

    loop = asyncio.get_event_loop()
    task = loop.create_task(do_some_work())
    task.add_done_callback(partial(callback, "task"))  # 添加回调函数
    loop.run_until_complete(task)

    end = time.time()
    print("耗时%ds" % (end - start))

main()

# start work
# work completed
# [callback] Hello task
# [callback] coroutine state: FINISHED
# 耗时1s

Note: The add_done_callback method of the task object can add a callback function. Note that the parameter must be a callback function. This method cannot pass in the parameters of the callback function. This needs to be solved through the partial method of the functools module. The callback function and its parameter name are used as the partial method parameter, the return value of this method is a partial function, and the partial function can be used as a parameter of the task.add_done_callback method.

gather

In actual projects, there are often multiple coroutines that create multiple task objects and run in a loop at the same time. In order to hand over multiple coroutines to the loop, you need to use the asyncio.gather method. The result method of the task can obtain the return value of the corresponding coroutine function.

The await keyword is equivalent to the yield from statement in Python 3.4 followed by the coroutine object.

Use the asyncio.sleep method instead of time.sleep, because the return value of asyncio.sleep is a coroutine object, and this step is a blocking operation. asyncio.sleep is different from time.sleep. The former blocks the current coroutine, that is, the operation of the corowork function, while time.sleep blocks the entire thread, so the former must be used here to block the current coroutine. Execute in the coroutine.

import time
import asyncio
from functools import partial


def main():
    start = time.time()

    async def do_some_work(name, t):
        print(f"{
      
      name} start work")
        await asyncio.sleep(t)
        print(f"{
      
      name} work completed")
        return name

    loop = asyncio.get_event_loop()
    task1 = loop.create_task(do_some_work("协程1", 3))
    task2 = loop.create_task(do_some_work("协程2", 1))
    gather = asyncio.gather(task1, task2)
    
    # 将任务对象作为参数,asyncio.gather 方法创建任务收集器。
    # 注意,asyncio.gather 方法中参数的顺序决定了协程的启动顺序
    loop.run_until_complete(gather)
    print("task1 result:", task1.result())
    print("task2 result:", task2.result())

    # 多数情况下无需调用 task 的 result 方法获取协程函数的 return 值,
    # 因为事件循环的 run_until_complete 方法的返回值就是协程函数的 return 值。
    # result = loop.run_until_complete(gather)
    # print(result)

    end = time.time()
    print("耗时%ds" % (end - start))

main()

# 协程1 start work
# 协程2 start work
# 协程2 work completed
# 协程1 work completed
# task1 result: 协程1
# task2 result: 协程2
# 耗时3s

The above code already has an asynchronous programming structure. Inside the event loop, the two coroutines run alternately to complete. Briefly describe the running process of the coroutine part of the program:

-> Run task1 first

-> Print [corowork] Start coroutine ONE

-> encountered asyncio.sleep blocking

-> Release the CPU and go to task2 for execution

-> print [corowork] Start coroutine TWO

-> encountered asyncio.sleep blocking again

-> There are no other coroutines to run this time, so we can only wait for the blocking to end

-> The blocking time of task2 is relatively short, it ends first after blocking for 1 second, and prints [corowork] Stop coroutine TWO

-> After another 2 seconds, task1 that was blocked for 3 seconds also ended the blocking, and printed [corowork] Stop coroutine ONE

-> At this point, both tasks are completed, and the event loop stops

-> Print the results of two tasks

-> print program running time

-> The program is all over

Supplementary note:

1. In most cases, there is no need to call the add_done_callback method of the task. You can directly write the code in the callback function after the await statement, and the coroutine can be paused and resumed.

2. In most cases, there is no need to call the result method of the task to obtain the return value of the coroutine function, because the return value of the run_until_complete method of the event loop is the return value of the coroutine function:

import time
import asyncio
from functools import partial


def main():
    start = time.time()

    async def do_some_work(name, t):
        print(f"{
      
      name} start work")
        await asyncio.sleep(t)
        print(f"{
      
      name} work completed")
        return name

    loop = asyncio.get_event_loop()
    task1 = loop.create_task(do_some_work("协程1", 3))
    task2 = loop.create_task(do_some_work("协程2", 1))
    gather = asyncio.gather(task1, task2)

    # 多数情况下无需调用 task 的 result 方法获取协程函数的 return 值,
    # 因为事件循环的 run_until_complete 方法的返回值就是协程函数的 return 值。
    result = loop.run_until_complete(gather)
    print(result)

    end = time.time()
    print("耗时%ds" % (end - start))

main()

# 协程1 start work
# 协程2 start work
# 协程2 work completed
# 协程1 work completed
# ['协程1', '协程2']
# 耗时3s

3. The event loop has a stop method to stop the loop and a close method to close the loop. None of the above examples call the loop.close method, and there seems to be no problem. So should we call loop.close? Simply put, as long as the loop is not closed, the run_until_complete method can be run again, and it cannot be run after it is closed. Some people will suggest to call loop.close to completely clean up the loop object to prevent misuse. In fact, it is not necessary in most cases.

4. The asyncio module provides two task collection methods, asyncio.gather and asyncio.wait. They have the same function, they both arrange the coroutine tasks in order, and then add the return value as a parameter to the event loop. The former has been used above. The difference between the latter and the former is that it can obtain the execution status of the task (PENING & FINISHED). When there are some special requirements such as canceling the task under certain circumstances, the asyncio.wait method can be used.

cancel task

Before the event loop starts and stops, we can manually cancel the execution of the task. Note that the task in the PENDING state can be canceled, and the task in the FINISHED state has been completed and cannot be canceled. The following example illustrates.

loop cancel

# 停止任务

import asyncio

async def work(id, t):
    print('Working...')
    await asyncio.sleep(t)
    print('Work {} done'.format(id))

def main():
    loop = asyncio.get_event_loop()
    coroutines = [work(i, i) for i in range(1, 4)]
    try:
        loop.run_until_complete(asyncio.gather(*coroutines))
    except KeyboardInterrupt:
        loop.stop()    # stop 方法取消所有未完成的任务,停止事件循环
    finally:
        loop.close()   # 关闭事件循环

if __name__ == '__main__':
    main()

# Working...
# Working...
# Working...
# Work 1 done
# ^C%

task cancel

The cancel method of the task can also cancel the task, and the asyncio.Task.all_tasks method can get all the tasks in the event loop.

import asyncio


async def work(id, t):
    print('Working...')
    await asyncio.sleep(t)
    print('Work {} done'.format(id))

def main():
    loop = asyncio.get_event_loop()
    coroutines = [work(i, i) for i in range(1, 4)]
    # 程序运行过程中,快捷键 Ctrl + C 会触发 KeyboardInterrupt 异常
    try:
        loop.run_until_complete(asyncio.gather(*coroutines))
    except KeyboardInterrupt:
        print()
        # 每个线程里只能有一个事件循环,此方法可以获得事件循环中的所有任务的集合
        # 任务的状态有 PENDING 和 FINISHED 两种
        tasks = asyncio.Task.all_tasks()
        for i in tasks:
            print('取消任务:{}'.format(i))
            # 任务的 cancel 方法可以取消未完成的任务
            # 取消成功返回 True ,已完成的任务取消失败返回 False
            print('取消状态:{}'.format(i.cancel()))
    finally:
        loop.close()

if __name__ == '__main__':
    main()

# Working...
# Working...
# Working...
# Work 1 done
# ^C
# 取消任务:<Task finished coro=<work() done, defined at a.py:5> result=None>
# 取消状态:False
# 取消任务:<Task pending coro=<work() running at a.py:7> wait_for=<Future pending cb=[<TaskWakeupMethWrapper object at 0x102cd8a38>()]> cb=[gather.<locals>._done_callback() at /usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/asyncio/tasks.py:664]>
# 取消状态:True
# 取消任务:<Task pending coro=<work() running at a.py:7> wait_for=<Future pending cb=[<TaskWakeupMethWrapper object at 0x102cd8a98>()]> cb=[gather.<locals>._done_callback() at /usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/asyncio/tasks.py:664]>
# 取消状态:True

scheduled tasks

Arranging the execution order of task/future in the event loop, that is, which coroutine will be executed first, and which task the CPU will turn to run when encountering IO blocking, is a requirement when we are doing asynchronous programming. In the multi-tasking program shown above, the execution sequence of the tasks in the event loop is arranged by asyncio.ensure_future / loop.create_task and asyncio.gather. This section introduces other loop methods.

loop.run_forever

The run_until_complete method of the event loop runs the event loop, and when all the tasks in it are completed, the event loop is automatically stopped; the run_forever method is an infinitely running event loop, which needs to be customized and executed to stop the loop.stop method.

import asyncio

async def work(loop, t):
    print('start')
    await asyncio.sleep(t)  # 模拟 IO 操作
    print('after {}s stop'.format(t))
    loop.stop()             # 停止事件循环,stop 后仍可重新运行

loop = asyncio.get_event_loop()             # 创建事件循环
task = asyncio.ensure_future(work(loop, 1)) # 创建任务,该任务会自动加入事件循环
loop.run_forever()  # 无限运行事件循环,直至 loop.stop 停止
loop.close()

# start
# after 1s stop

The above is a single-task event loop. The loop is passed as a parameter to the coroutine function to create a coroutine, and the loop.stop method is executed inside the coroutine to stop the event loop. The following is the multitasking event loop, using the callback function to execute loop.stop to stop the event loop:

import time
import asyncio
import functools

def loop_stop(loop, future):    # 函数的最后一个参数须为 future / task
    loop.stop()                 # 停止事件循环,stop 后仍可重新运行

async def work(t):              # 协程函数
    print('start')
    await asyncio.sleep(t)      # 模拟 IO 操作
    print('after {}s stop'.format(t))

def main():
    loop = asyncio.get_event_loop()
    # 创建任务收集器,参数为任意数量的协程,任务收集器本身也是 task / future 对象
    tasks = asyncio.gather(work(1), work(2))
    # 任务收集器的 add_done_callback 方法添加回调函数
    # 当所有任务完成后,自动运行此回调函数
    # 注意 add_done_callback 方法的参数是回调函数
    # 这里使用 functools.partial 方法创建偏函数以便将 loop 作为参数加入
    tasks.add_done_callback(functools.partial(loop_stop, loop))
    loop.run_forever()  # 无限运行事件循环,直至 loop.stop 停止
    loop.close()        # 关闭事件循环

if __name__ == '__main__':
    start = time.time()
    main()
    end = time.time()
    print('耗时:{:.4f}s'.format(end - start))

# start
# start
# after 1s stop
# after 2s stop
# 耗时:2.0021s

The loop.run_until_complete method itself is also implemented by calling the loop.run_forever method, and then calling the loop.stop method through the callback function.

call_soon

The call_soon method of the event loop can add ordinary functions as tasks to the event loop and immediately schedule the execution order of the tasks.

import asyncio


def hello(name):          # 普通函数
    print('[hello] Hello, {}'.format(name))


async def work(t, name):  # 协程函数
    print('[work ] start', name)
    await asyncio.sleep(t)
    print('[work ] {} after {}s stop'.format(name, t))


def main():
    loop = asyncio.get_event_loop()
    # 向事件循环中添加任务
    asyncio.ensure_future(work(1, 'A'))     # 第 1 个执行
    # call_soon 将普通函数当作 task 加入到事件循环并排定执行顺序
    # 该方法的第一个参数为普通函数名字,普通函数的参数写在后面
    loop.call_soon(hello, 'Tom')            # 第 2 个执行
    # 向事件循环中添加任务
    loop.create_task(work(2, 'B'))          # 第 3 个执行
    # 阻塞启动事件循环,顺便再添加一个任务
    loop.run_until_complete(work(3, 'C'))   # 第 4 个执行


if __name__ == '__main__':
    main()

# [work ] start A
# [hello] Hello, Tom
# [work ] start B
# [work ] start C
# [work ] A after 1s stop
# [work ] B after 2s stop
# [work ] C after 3s stop

call_later

This method is the same as loop.call_soon, which can put ordinary functions as tasks in the event loop. The difference is that this method can be executed with a delay, and the first parameter is the delay time.

import asyncio
import functools


def hello(name):            # 普通函数
    print('[hello]  Hello, {}'.format(name))


async def work(t, name):    # 协程函数
    print('[work{}]  start'.format(name))
    await asyncio.sleep(t)
    print('[work{}]  stop'.format(name))


def main():
    loop = asyncio.get_event_loop()
    asyncio.ensure_future(work(1, 'A'))         # 任务 1,立即执行,阻塞1秒
    loop.call_later(1.2, hello, 'Tom')          # 任务 2,延时1.2秒执行
    loop.call_soon(hello, 'Kitty')              # 任务 3,立即执行
    task4 = loop.create_task(work(2, 'B'))      # 任务 4,立即执行,阻塞2秒
    loop.call_later(1, hello, 'Jerry')          # 任务 5,延时1秒执行
    loop.run_until_complete(task4)


if __name__ == '__main__':
    main()

# [workA]  start
# [hello]  Hello, Kitty
# [workB]  start
# [hello]  Hello, Jerry
# [workA]  stop
# [hello]  Hello, Tom
# [workB]  stop

call_at

  • call_soon executes immediately, call_later delays execution, call_at executes at a certain time
  • oop.time is a timing method inside the event loop, the return value is time, and the data type is float
import asyncio
import functools


def hello(name):            # 普通函数
    print('[hello]  Hello, {}'.format(name))


async def work(t, name):    # 协程函数
    print('[work{}]  start'.format(name))
    await asyncio.sleep(t)
    print('[work{}]  stop'.format(name))


def main():
    loop = asyncio.get_event_loop()
    start = loop.time()  # 事件循环内部时刻
    asyncio.ensure_future(work(1, 'A'))  # 任务 1
    # loop.call_later(1.2, hello, 'Tom')
    # 上面注释这行等同于下面这行
    loop.call_at(start + 1.2, hello, 'Tom')  # 任务 2
    loop.call_soon(hello, 'Kitty')  # 任务 3
    task4 = loop.create_task(work(2, 'B'))  # 任务 4
    # loop.call_later(1, hello, 'Jerry')
    # 上面注释这行等同于下面这行
    loop.call_at(start + 1, hello, 'Jerry')  # 任务 5

    loop.run_until_complete(task4)


if __name__ == '__main__':
    main()

# [workA]  start
# [hello]  Hello, Kitty
# [workB]  start
# [hello]  Hello, Jerry
# [workA]  stop
# [hello]  Hello, Tom
# [workB]  stop

The functions of these three call_xxx methods are to schedule ordinary functions as tasks into the event loop, and the return values ​​are all instances of asyncio.events.TimerHandle. Note that they are not coroutine tasks and cannot be used as parameters of loop.run_until_complete.

coroutine lock

According to the literal meaning, asyncio.lock should be called an asynchronous IO lock. The reason why it is called a coroutine lock is that it is usually used in sub-coroutines. Its function is to lock a piece of code inside the coroutine until this code Unlock after running. The fixed usage of the coroutine lock is to use async with to create the context of the coroutine lock and write the code block into it.

with is a general context manager keyword, and async with is an asynchronous context manager keyword Objects
that can use the with keyword must have __enter__and __exit__methods
Objects that can use the async with keyword must have __aenter__and __aexit__methods
async withWill automatically run the lock __aenter__method , this method It will call acquire the method lock to
run automatically at the end of the statement block , which will call the method unlock This is the same as with, which simplifies the try ... finally statement__aexit__release

import asyncio


l = []
lock = asyncio.Lock()   # 协程锁

async def work(name):
    print('lalalalalalalala')     # 打印此信息是为了测试协程锁的控制范围
    # 这里加个锁,第一次调用该协程,运行到这个语句块,上锁
    # 当语句块结束后解锁,开锁前该语句块不可被运行第二次
    # 如果上锁后有其它任务调用了这个协程函数,运行到这步会被阻塞,直至解锁
    async with lock:
        print('{} start'.format(name))  # 头一次运行该协程时打印
        if 'x' in l:                    # 如果判断成功
            return name                 # 直接返回结束协程,不再向下执行
        await asyncio.sleep(0); print('----------')  # 阻塞 0 秒,切换协程
        l.append('x')
        print('{} end'.format(name))
        return name

async def one():
    name = await work('one')
    print('{} ok'.format(name))

async def two():
    name = await work('two')
    print('{} ok'.format(name))

def main():
    loop = asyncio.get_event_loop()
    tasks = asyncio.wait([one(), two()])
    loop.run_until_complete(tasks)

if __name__ == '__main__':
    main()
    print(l)

# lalalalalalalala
# one start
# lalalalalalalala
# ----------
# one end
# one ok
# two start
# two ok
# ['x']

reference:

https://www.lanqiao.cn/courses/1278/learning/?id=10349
https://www.cnblogs.com/lizexiong/p/17195462.html
https://blog.csdn.net/c_lanxiaofang/article/details/126394229

Guess you like

Origin blog.csdn.net/qq_43745578/article/details/129862804