Python basic knowledge combing-Python coroutine

1 Introduction

In the blog post <Python basic knowledge combing-multi-process and multi-threading in Python>, we haven't talked about how to implement Python coroutines through generators.

Coroutine is a way to achieve concurrent programming. Of course, multi-process/multi-thread is also a way to solve concurrency, but when the clients connected to the server at the same time reach a certain magnitude, the context switching of the process takes up a lot of resources, and the threads also Can't withstand such a huge pressure. At this time, we need a scheduler to schedule tasks, saving various overheads such as starting threads, managing threads, and synchronization locks in multithreading. Nginx, being able to maintain low resources, low consumption, and high performance under high concurrency depends on the scheduler (for example: polling algorithm).

In Python, the use of generators to implement coroutines is common in Python2. In Python 3.7 and later, new methods based on asyncio and async/await are provided. Given that it is now 2020[::-1]years (laughs), we start from Python In terms of new features, new coroutines.

2. Implementation of the coroutine

2.1 Example 1: Crawler

xiecheng_1-0a6bfbc9f978462f9055f28b89d187c3

%timeIt is the syntactic sugar jupyter notebookof the ipythoninterpreter, used to test the running time of the statement.

The 4 tasks take 10 seconds in total. Next, we use coroutines to achieve concurrency to optimize and improve efficiency.

import asyncio

async def get_page(url):
    print('acquire page {}'.format(url))
    sleep_time = int(url.split('_')[-1])
    await asyncio.sleep(sleep_time)
    print('ok {}'.format(url))

async def main(urls):
    for url in urls:
        await get_page(url)

asyncio.run(main(['url_1','url_2','url_3','url_4']))
# 输出
acquire page url_1
ok url_1
acquire page url_2
ok url_2
acquire page url_3
ok url_3
acquire page url_4
ok url_4
Wall time: 10 s

After Python 3.7, it is very simple for coroutines to write asynchronous programs. Most of the magic methods used by coroutines are asyncioincluded in the library. We only need to asyncdeclare asynchronous functions with modifiers in the function, and then awaitcall.

2.2 Let's sort out our ideas:

First, in an example, we use the import asyncioimport the package, and then asyncdeclared get_page()and main()asynchronous function, when we call an asynchronous function, we get a coroutine object.

After we declare the asynchronous function, we need to call the asynchronous function. There are 3 commonly used coroutine execution methods:

  1. We can use it awaitto call, awaitand the effect of execution is the same as that of normal Python execution. After the program is executed, it blocks here, enters the called coroutine function, and returns after the execution is complete. This is also awaitthe meaning. await asyncio.sleep(sleep_time)Means to rest here for a few seconds, await get_page(url)it means to execute the get_page() function.

  2. We can also use asyncio.create_task()to create tasks, follow-up might write a blog post detailed tidy concurrent programming, where we will skip it.

  3. Finally, by asyncio.runtriggering the operation, asyncio.runthis function can be very simple to call the coroutine, without paying attention to the event loop in the coroutine, and the usage method refers to the example in the source code.

    Example:
    
        async def main():
            await asyncio.sleep(1)
            print('hello')
    
        asyncio.run(main())
    

We found that the running time is still 10 seconds? What is going on here? awaitIt is a synchronous call, so the get_page(url)next call will not be triggered after the current call ends, which is equivalent to writing a synchronous code with an asynchronous interface.

Here, we use asyncio.create_task()to create tasks, asynchronous.

import asyncio

async def get_page(url):
    print('acquire page {}'.format(url))
    sleep_time = int(url.split('_')[-1])
    await asyncio.sleep(sleep_time)
    print('ok {}'.format(url))

async def main(urls):
    tasks = [asyncio.create_task(get_page(url)) for url in urls]
    for task in tasks:
        await task

asyncio.run(main(['url_1','url_2','url_3','url_4']))
# 输出
acquire page url_1
acquire page url_2
acquire page url_3
acquire page url_4
ok url_1
ok url_2
ok url_3
ok url_4
Wall time: 3.66 s

Obviously, comparing the output results, the four tasks are created almost at the same time. The tasks are scheduled for execution soon after they are created, and the task code will not be blocked here, so we have to wait for all tasks to finish before executing for task in tasks:await task.

Obviously, compared to multithreading, the way of writing coroutines is clearer and clear at a glance. For tasktasks, there is actually another way of writing. Let's take a look:

import asyncio

async def get_page(url):
    print('acquire page {}'.format(url))
    sleep_time = int(url.split('_')[-1])
    await asyncio.sleep(sleep_time)
    print('ok {}'.format(url))

async def main(urls):
    tasks = [asyncio.create_task(get_page(url)) for url in urls]
    await asyncio.gather(*tasks)# 一个解包操作

asyncio.run(main(['url_1','url_2','url_3','url_4']))
# 输出
acquire page url_1
acquire page url_2
acquire page url_3
acquire page url_4
ok url_1
ok url_2
ok url_3
ok url_4
Wall time: 3.66 s

2.3 Summary

Compared with the previous code, there is one *tasksmore unpacking operation, which turns the list into a function parameter; it **tasksturns the dictionary into a function parameter

Compared to python2the yieldcreation of coroutines, python3.7after providing the asyncio.create_task(), asyncio.run(), awaitcompared to the old interface to read and easier to understand, do not need to focus on the internal implementation, more concerned about the code itself (read, read, feel like numpyand pytorchfeel, ha ha ha ha)

3. The underlying implementation of the coroutine

3.1 Example 2

import asyncio

async def work_1():
    print('work 1 start ')
    await asyncio.sleep(1)
    print('work 1 is done!')

async def work_2():
    print('work 2 start ')
    await asyncio.sleep(2)
    print('work 2 is done')

async def main():
    print('before await ')
    await work_1()
    print('awaited work_1')
    await work_2()
    print('awaited work_2')

asyncio.run(main())
# 输出
before await 
work 1 start 
work 1 is done!
awaited work_1
work 2 start 
work 2 is done
awaited work_2

3.2 Example 3

import asyncio

async def work_1():
    print('work 1 start ')
    await asyncio.sleep(1)
    print('work 1 is done!')

async def work_2():
    print('work 2 start ')
    await asyncio.sleep(2)
    print('work 2 is done')

async def main():
    task1 = asyncio.create_task(work_1())
    task2 = asyncio.create_task(work_2())
    print('before await ')
    await task1
    print('awaited work 1')
    await task2
    print('awaited work 2')

asyncio.run(main())
# 输出
before await 
work 1 start 
work 2 start 
work 1 is done!
awaited work 1
work 2 is done
awaited work 2

Is the order of execution in Example 2 and Example 3 different?

  1. asyncio.run(main())Indicates that the program enters the main() function and the event loop starts;
  2. Task1 and task2 tasks are created, enter the event loop to wait, and then print('before await ');
  3. await task1Execute, the user chooses to cut out from the current main task, and the event scheduler starts to schedule work_1;
  4. work_1 starts to execute, run print('work 1 start '), and then run await asyncio.sleep(1), cut out from the current task, and the event scheduler starts to schedule work_2;
  5. work_2 starts to run, runs print('work 2 start '), and then runs await asyncio.sleep(2), cutting out from the current task;
  6. The running time of all the above events should be 1ms~10ms, or even shorter, and the event scheduler will suspend scheduling from this time;
  7. One second later, the sleep of work_1 ends, and the event scheduler re-transfers control to task_1, outputs work 1 is done!, task_1 completes the task, and exits from the event loop;
  8. await task1When completed, the event scheduler passes the controller to the main task, outputs it awaited work 1, and then waits at await task2;
  9. Two seconds later, the sleep of work_2 ends, and the event scheduler re-transfers the control right to task_2, outputs work 2 is done!, task_2 is completed and exits from the event loop;
  10. The main task is output awaited work 2, the coroutine task is completed, and the event loop ends.

3.3 Timeout task

If we configure a crawler in Python, what should I do if something goes wrong when crawling a task? The easiest way is to cancel over time. What should I do?

import asyncio

async def work_1():
    await asyncio.sleep(1)
    return 1
async def work_2():
    await asyncio.sleep(2)
    return 2/0
async def work_3():
    await asyncio.sleep(3)
    return 3
async def main():
    task_1 = asyncio.create_task(work_1())
    task_2 = asyncio.create_task(work_2())
    task_3 = asyncio.create_task(work_3())

    await asyncio.sleep(2)
    task_3.cancel()

    res = await asyncio.gather(task_1,task_2,task_3,return_exceptions=True)
    print(res)

asyncio.run(main())
# 输出
[1, ZeroDivisionError('division by zero'), CancelledError()]

In the above example, work_1 is working normally, an error occurred during work_2, and the execution time of work_3 was too long and we canceled it. The information was returned to res and printed out, where we set it return_exceptions=True. If it is not set True, then we That is, the exception must be caught, and the execution cannot be continued.


3.4 Producer consumer model

import asyncio
import random

async def consumer(queque,id):
    while True:
        val = await queque.get()
        print('{} get a val: {} '.format(id,val))
        await asyncio.sleep(1)

async def producer(queue,id):
    for i in range(5):
        val = random.randint(1,10)
        await queue.put(val)
        print('{} put a val : {}'.format(id,val))
        await asyncio.sleep(1)

async def main():
    queue = asyncio.Queue()

    consumer_1 = asyncio.create_task(consumer(queue,'consumer_1'))
    consumer_2 = asyncio.create_task(consumer(queue,'consumer_2'))

    producer_1 = asyncio.create_task(producer(queue,'producer_1'))
    producer_2 = asyncio.create_task(producer(queue,'producer_2'))

    await asyncio.sleep(10)
    consumer_1.cancel()
    consumer_2.cancel()

    await asyncio.gather(consumer_1,consumer_2,producer_1,producer_2,return_exceptions=True)

asyncio.run(main())
# 输出
producer_1 put a val : 1
producer_2 put a val : 1
consumer_1 get a val: 1 
consumer_2 get a val: 1 
producer_1 put a val : 2
producer_2 put a val : 2
consumer_1 get a val: 2 
consumer_2 get a val: 2 
producer_1 put a val : 6
producer_2 put a val : 10
consumer_1 get a val: 6 
consumer_2 get a val: 10 
producer_1 put a val : 8
producer_2 put a val : 2
consumer_1 get a val: 8 
consumer_2 get a val: 2 
producer_1 put a val : 9
producer_2 put a val : 1
consumer_1 get a val: 9 
consumer_2 get a val: 1 

4. Summary

  • The difference between coroutine and multithreading: ① coroutine is single-threaded; ② coroutine is determined by the user when to hand over control and switch to the next task
  • After the Python 3.7 version, the way to write coroutines is simpler. Combined asynciowith the async/awaitsum in the library , create_taskthere is no pressure on small and medium-level concurrent programming.
  • The use of coroutines, when to pause waiting for I/O, and when to execute to the end, there needs to be a concept of event loop






For the follow-up update of the blog post, please follow my personal blog: Stardust Blog

Guess you like

Origin blog.csdn.net/u011130655/article/details/113018970