Getting Started with Python Asynchronous Programming

1. What is asynchronous?

Speaking of the asynchronous model, we have to mention the commonplace synchronous model, which are relative concepts.

The synchronization model means that the program must be executed sequentially. When the program executes an operation that needs to wait for external resources (network data sending and receiving, file reading and writing), it will fall into a state, and will only continue to execute after the external resources are in place 阻塞. In contrast, the asynchronous model has 非阻塞the characteristic that the program will continue to execute other code while waiting for external resources.

In version 3.4, Python introduced support for asynchronous programming. Under the same thread, multiple coroutines are scheduled through the event loop to switch code slices, which can eliminate the negative impact of blocking on program performance, and thread Additional overhead caused by switching .

In a thread, users can create asynchronous tasks (such as coroutines) and hand them over to the event loop ( event loop) for unified scheduling and management. A coroutine ( Coroutines) is a code execution unit one level smaller than a thread. At the same time, only one coroutine in the event loop is running. When the coroutine waits for external resources, the thread will not be blocked , but the execution right will be handed over , so that the event loop continues to execute other coroutines . The above is the basic content of the Python asynchronous model, and the specific operation will be described in detail in Chapter 3 later.

Second, why asynchronous?

Before learning how to program asynchronously in Python, please think twice, why choose asynchronous in your code?

First, the performance advantage . For the Python interpreter we use every day, GIL seems to be an unavoidable topic (GIL is detailed in the next blog Python multithreading ). Due to the limitation of the GIL, the CPU actually has only one thread executing at the same time, and the concurrent performance is greatly limited. When a thread falls into a blocked state, the GIL lock is released and the execution right is handed over to other threads to continue execution. At first glance, it seems to be very similar to the principle of coroutine execution we mentioned above.

In fact, the process is very similar. When the coroutine encounters IO operations during execution, it switches to other coroutines to continue execution. However, in a multi-threaded scenario, the overhead caused by blocking and thread switching is much higher than the cost of coroutine scheduling, and the memory resources occupied by coroutines are much smaller than threads . Therefore, using coroutines, the handover of code slices can be completed in one thread at a very small cost. This gap, when placed in specific application scenarios, will bring considerable performance improvements.

Furthermore, programming experience . Developers no longer need to consider a series of issues such as lock resource competition and release, deadlock issues, thread synchronization, etc., so asynchronous programming is relatively a simpler and more intuitive programming model.

3. How to be asynchronous?

This section asynciofocuses on the related operations of the asynchronous IO library in the Python standard library. There are other libraries that also support asynchronous programming, such as Twisted, Tornado, etc. Interested friends can learn about it afterwards.

3.1 Coroutines

Coroutines provide the most basic support for Python's asynchronous programming model. Consider the following sample program:

import asyncio

async def func():
	print('Hello World, Asynchronous.')
	return 114514

async def main():
	res = await func()
	print(f'main func executed.{res}')
	
>>> func() # 协程函数的调用结果是一个协程对象
<coroutine object func at 0x000001D9C01C59C0>

>>> await func() # 通过await关键字可以执行协程对象,并获取真正的返回值。
Hello World, Asynchronous.
114514

>>> asyncio.run(main()) # 隐式创建新的事件循环,并执行协程对象。
Hello World, Asynchronous.
main func executed.114514

We asyncdeclared two via keywords 协程函数. The result of a call to a coroutine function is one 协程对象. awaitThe coroutine object can be executed through keywords, and the real return value can be obtained. It should be noted that if another coroutine function needs to be called asynchronously within a function, the function itself must also be a coroutine function . Here mainis a coroutine function, so it can call functhe function.

We mentioned earlier that users can hand over coroutines to the event loop to manage scheduling. awaitThe principle is like this. If there is an event loop in the running state under the current thread, the coroutine object will be delivered to it for scheduling. If not, a new event loop is created and enabled.

Similarly, asyncio.run()the coroutine can also be executed through the event loop, the difference is that asyncio.run()it is intended to be used as the entry point of the function, which will force the creation of a new event loop , so this method is prohibited when there are other active event loops in the current thread. Otherwise it will be thrown RuntimeError.

3.2 Awaitables

Objects that can be awaitreceived and processed by keywords are called awaitable objects, and such objects can also be received and executed by the event loop. Although we can __await__customize awaitable objects by implementing methods, it is generally not recommended to do so. In most cases, we use the three types of waitable objects provided by Python, which are Coroutine, , Taskand Future.

Encapsulating Coroutineobjects as Taskobjects can obtain richer dispatching capabilities, the following is a sample program:

async def func():
	print('func executed.')

# 通过asyncio将func()协程对象封装为任务对象
task1 = asyncio.create_task(func(), name='task 01')

# 等价于通过事件循环
loop = asyncio.get_event_loop()
task2 = loop.create_task(func(), name='task 02')

Unlike Coroutinesthe object that needs to be executed through the event loop after creation, Taskthe object itself is created through the event loop, which means that when you are create_taskrunning, the internal coroutine is already running.

FutureIt is an awaitable object that provides a lower-level API, which represents the result of an asynchronous operation, and can also be used to bind a coroutine that has not yet completed execution. TaskThe inherited Futureproperties and methods Coroutineprovide a higher-level asynchronous API together with it. Generally speaking, it is not recommended to directly create Futureand make asynchronous calls, and Futurethe content about it will be discussed in a later article.

3.3 Task creation and cancellation

TaskIt is an important tool in Python asynchronous programming. We can Tasklearn the current state of the coroutine and perform a certain range of scheduling.

3.3.1 Create

The method of creating a task, as shown in the code in 3.2, can be through asynciothe library or directly through the event loop, which are essentially the same. Need to pay attention: create_taskWhen in, be sure to save Taska reference to that return value. Because the event loop only maintains a weak reference to the object after receiving it Task, if the return value is not properly kept, the task may be garbage collected at any time , regardless of whether the coroutine corresponding to the task is executed or not.

3.3.2 Cancellation

The coroutine encapsulated by the task is scheduled for execution by the event loop. The event loop only executes one coroutine at the same time. When a coroutine hands over the execution right, the event loop will schedule and determine the next coroutine to be executed. Before the execution of a coroutine ends, we can Task.cancel()cancel its execution through the method, as shown in the following code.

import asyncio

# 定义一个协程函数
async def func():
	count = 0
	while True:
		await asyncio.sleep(1)
		print(f'awake {count := count + 1} time(s).')

# 创建任务
task1 = asyncio.create_task(func(), name="task1")
task2 = asyncio.create_task(func(), name="task2")

task1.cancel() # 取消task1的执行

cancel()The principle is to add +1 to the target task 取消请求计数. In the next cycle of the event loop, if the number of cancellation requests of the coroutine is greater than 0, one will be passed to the coroutine function CancelledErrorto terminate its continued execution.

So before officially throwing it into the coroutine CancelledError, we have the opportunity to use uncancel()the method to withdraw the cancellation request of the task, which will count the cancellation request -1. In the above example, we can:

# 交出执行权
task1.cancel()    # cancel_count += 1, curr = 1
task1.cancel()    # cancel_count += 1, curr = 2
task1.uncancel()  # cancel_count -= 1, curr = 1
task1.uncancel()  # cancel_count -= 1, curr = 0
# 下次事件循环,由于取消请求计数为0,不会取消task1的执行

Of course, even if CancelledErrorthe coroutine function is really passed in, we can also try... except...catch the exception through the statement, so as to avoid the interruption of the coroutine function. However, the task will still be marked as cancelled, and we still need to call the undo uncancel()cancel request if we want the event loop to continue.

Alternatively, we can wrap it with asyncio.shield()will Task, which also prevents the task from being canceled.

task = asyncio.create_task(func())
await asyncio.shield(task)

# 由于如果直接await协程的话,协程是无法被取消的,因此上面的操作等价于
await func()

In addition, for tasks that are normally executed, we can Task.done()judge whether the task is over by Task.result()obtaining the task execution result, if the result is not available, it will be thrown InvalidStateError, and if the task that obtained the result has been cancelled, it will be thrown CancelledError.

3.4 Sleep, timeout, wait

3.4.1 Sleep

asyncio.sleep()is a function we often use. It will make a coroutine sleep for the number of seconds we specify, similar to time.sleep()the method in the synchronous programming model, the difference is that asyncio.sleep()it will not block the thread, but will make the current coroutine hand over the execution right.

This feature is very useful. As mentioned above, the event loop can only execute one coroutine at a time. During the normal execution of a coroutine, unless it encounters yieldor awaitIO operations, it will not hand over the execution right. This will cause other coroutine tasks to not be executed. And through asyncio.sleep()the method, even if the number of seconds is set to 0, the coroutine can immediately hand over the execution right and wait for the next scheduling of the event loop.

3.4.2 Timeout

In order to prevent asynchronous tasks such as asynchronous network requests and asynchronous IO operations from taking too long, so that the coroutine cannot be executed normally, we can use the timeout mechanism to plan the execution of the coroutine within a time limit.

asyncio.timeoutIs an asynchronous context manager that can set a timeout. In this asynchronous context, once the coroutine execution times out, it will throw TimeError. The timing data here is not a time interval, but the elapsed time from the start of the current event loop. If we don't know the time when we enable the timeout context, we can temporarily set it to None(no timeout). After entering the context, use time()the method of the event loop object to obtain the elapsed running time, and then reschedule()re-plan the timeout through the method.

async def main():
    try:
        # 启动异步超时上下文管理器
        async with asyncio.timeout(None) as cm:
            # 获取当前事件循环已运行时间,并计算超时时刻
            new_deadline = get_running_loop().time() + 10
            # 重新规划超时
            cm.reschedule(new_deadline)
			# 执行协程
            await long_running_task()
    except TimeoutError:
    	# 捕获超时异常,同时协程得以继续运行
        pass
	# 通过上下文管理器的expired方法判断是否发生过超时
    if cm.expired():
        print("发生超时.")

The event loop initializes a monotonic clock at startup and updates it on each loop iteration. On each loop iteration, the event loop checks the difference between the current time and the previous loop iteration and adds it to the monotonic clock to get the latest time. In the above timeout setting, if the time value is less than loop.time(), the timeout will be triggered immediately when the event loop iterates.

When a timeout occurs, all unfinished tasks in the context will be canceled, and the thrown ones CancelledErrorwill be converted into TimeoutErrorunified throws.

3.4.3 Waiting

In addition to the crude timeout mechanism, we can also asyncio.wait()wait for a batch of tasks, and after timeout, two sets of completed and unfinished tasks will be returned, which is convenient for us to process separately.

import asyncio
async def func(delay: int):
	await asyncio.sleep(delay)
# 超时时间设置为5,对于执行时间1~10的10个协程来说,会有一半完成,另一半未完成,这两个集合都会返回
done, pending = await asyncio.wait([asyncio.create_task(func(i)) for i in range(1, 11)], timeout=5)
print(len(done), len(pending)) # 5 5

Of course, in addition to setting the waiting timeout rules, you can also return_whenset other rules through parameters. Mainly, the following three constants can be passed in:

parameter value describe
FIRST_COMPLETED return when one completes
FIRST_EXCEPTION If an exception is thrown in the executed task, it will return immediately, otherwise it is equivalent to ALL_COMPLETED
ALL_COMPLETED Returns when all tasks are executed or canceled

In the above example, the result obtained by applying the above three rules is:

done, pending = await asyncio.wait([asyncio.create_task(func(i)) for i in range(1, 11)], return_when=asyncio.FIRST_COMPLETED)
# FIRST_COMPLETED 1已完成 9未完成
# FIRST_EXCEPTION 10已完成 0未完成
# ALL_COMPLETED   10已完成 0未完成

3.5 Thread Assignment

Although the synchronous methods in Python currently have asynchronous versions available for developers to use. But what if you encounter a synchronous method that must be called in your coroutine function? Once called, the thread where the coroutine is located is directly blocked, and the event loop, coroutine efficiency, etc. we talked about before are out of the question.

So in order to prevent the important thread where the coroutine is located from being blocked, we can take a compromise, asyncio.to_thread()wrap the synchronization method into a coroutine through the method, and create another thread to execute it.

import time
import asyncio
# 定义同步方法
def blocking_io():
    print(f"{time.strftime('%X')} 阻塞开始")
    # 使用time.sleep()来指代任意的同步方法,例如IO、网络请求、文件读写操作等
    time.sleep(1)
    print(f"{time.strftime('%X')} 阻塞结束")
    
async def main():
    print(f"协程从 {time.strftime('%X')} 开始执行")
    await asyncio.gather(
    	# 使用asyncio.to_thread封装一个同步函数,该方法返回一个协程
        asyncio.to_thread(blocking_io),
        asyncio.sleep(1))
        
    print(f"所有协程到 {time.strftime('%X')} 执行结束")
    
asyncio.run(main())
# 执行结果
#>>> 协程从 22:02:22 开始执行
#>>> 22:02:22 阻塞开始
#>>> 22:02:23 阻塞结束
#>>> 所有协程到 22:02:23 执行结束

During thread assignment, all parameters passed to the coroutine function will be passed to another thread. Moreover, the context variables between threads are not shared originally, but the threads created through thread assignment are actively passed the context variables of the original thread. This ensures that the coroutine function can be executed correctly in another thread.

It can be found that the thread where the coroutine is located is not blocked, and the program takes 1s in total and is executed as scheduled. It should be noted that due to the influence of GIL, to_thread()it can only improve the performance of IO-intensive synchronization methods , while for CPU-intensive synchronization methods, the CPU can only execute one thread at a time, so even if the block is transferred to other threads, It also has no obvious effect.

In addition to temporarily creating a thread, if our programming environment itself is multi-threaded, we can asyncio.run_coroutine_threadsafeassign the coroutine to the event loop of a specified thread to execute through the method, so as to realize the flexible scheduling of the coroutine among multiple threads.

# loop1是来自另一个线程的正在运行的事件循环
future = asyncio.run_coroutine_threadsafe((coro:=asyncio.sleep(10)), loop)
# 返回值是一个Future对象,我们可以用与处理Task类似的方法处理它
try:
	# 获取结果,如果没执行完就等待一段时间
    result = future.result(timeout)
except TimeoutError:
	# 超时会触发超时异常,这个时候我们可以手动取消它
    print('协程执行超时,正在手动取消...')
    future.cancel()
except Exception as exc:
    print(f'协程函数抛出的其他异常: {exc!r}')
else:
	# 如果没问题的话,可以打印结果
    print(f'协程执行结果是{result!r}.')

The above is the entire content of this article, and my level is limited. If there is any inappropriateness, please feel free to enlighten me.

4. References

[1] Official Python Documentation: Coroutines and Tasks
[2] Official Python Documentation: Event Loop

Guess you like

Origin blog.csdn.net/qq_38236620/article/details/131151970