1 Introduction
In the blog post <Python basic knowledge combing-multi-process and multi-threading in Python>, we haven't talked about how to implement Python coroutines through generators.
Coroutine is a way to achieve concurrent programming. Of course, multi-process/multi-thread is also a way to solve concurrency, but when the clients connected to the server at the same time reach a certain magnitude, the context switching of the process takes up a lot of resources, and the threads also Can't withstand such a huge pressure. At this time, we need a scheduler to schedule tasks, saving various overheads such as starting threads, managing threads, and synchronization locks in multithreading. Nginx, being able to maintain low resources, low consumption, and high performance under high concurrency depends on the scheduler (for example: polling algorithm).
In Python, the use of generators to implement coroutines is common in Python2. In Python 3.7 and later, new methods based on asyncio and async/await are provided. Given that it is now 2020[::-1]
years (laughs), we start from Python In terms of new features, new coroutines.
2. Implementation of the coroutine
2.1 Example 1: Crawler
%time
It is the syntactic sugar jupyter notebook
of the ipython
interpreter, used to test the running time of the statement.
The 4 tasks take 10 seconds in total. Next, we use coroutines to achieve concurrency to optimize and improve efficiency.
import asyncio
async def get_page(url):
print('acquire page {}'.format(url))
sleep_time = int(url.split('_')[-1])
await asyncio.sleep(sleep_time)
print('ok {}'.format(url))
async def main(urls):
for url in urls:
await get_page(url)
asyncio.run(main(['url_1','url_2','url_3','url_4']))
# 输出
acquire page url_1
ok url_1
acquire page url_2
ok url_2
acquire page url_3
ok url_3
acquire page url_4
ok url_4
Wall time: 10 s
After Python 3.7, it is very simple for coroutines to write asynchronous programs. Most of the magic methods used by coroutines are
asyncio
included in the library. We only need toasync
declare asynchronous functions with modifiers in the function, and thenawait
call.
2.2 Let's sort out our ideas:
First, in an example, we use the import asyncio
import the package, and then async
declared get_page()
and main()
asynchronous function, when we call an asynchronous function, we get a coroutine object.
After we declare the asynchronous function, we need to call the asynchronous function. There are 3 commonly used coroutine execution methods:
-
We can use it
await
to call,await
and the effect of execution is the same as that of normal Python execution. After the program is executed, it blocks here, enters the called coroutine function, and returns after the execution is complete. This is alsoawait
the meaning.await asyncio.sleep(sleep_time)
Means to rest here for a few seconds,await get_page(url)
it means to execute the get_page() function. -
We can also use
asyncio.create_task()
to create tasks, follow-up might write a blog post detailed tidy concurrent programming, where we will skip it. -
Finally, by
asyncio.run
triggering the operation,asyncio.run
this function can be very simple to call the coroutine, without paying attention to the event loop in the coroutine, and the usage method refers to the example in the source code.Example: async def main(): await asyncio.sleep(1) print('hello') asyncio.run(main())
We found that the running time is still 10 seconds? What is going on here? await
It is a synchronous call, so the get_page(url)
next call will not be triggered after the current call ends, which is equivalent to writing a synchronous code with an asynchronous interface.
Here, we use asyncio.create_task()
to create tasks, asynchronous.
import asyncio
async def get_page(url):
print('acquire page {}'.format(url))
sleep_time = int(url.split('_')[-1])
await asyncio.sleep(sleep_time)
print('ok {}'.format(url))
async def main(urls):
tasks = [asyncio.create_task(get_page(url)) for url in urls]
for task in tasks:
await task
asyncio.run(main(['url_1','url_2','url_3','url_4']))
# 输出
acquire page url_1
acquire page url_2
acquire page url_3
acquire page url_4
ok url_1
ok url_2
ok url_3
ok url_4
Wall time: 3.66 s
Obviously, comparing the output results, the four tasks are created almost at the same time. The tasks are scheduled for execution soon after they are created, and the task code will not be blocked here, so we have to wait for all tasks to finish before executing for task in tasks:await task
.
Obviously, compared to multithreading, the way of writing coroutines is clearer and clear at a glance. For task
tasks, there is actually another way of writing. Let's take a look:
import asyncio
async def get_page(url):
print('acquire page {}'.format(url))
sleep_time = int(url.split('_')[-1])
await asyncio.sleep(sleep_time)
print('ok {}'.format(url))
async def main(urls):
tasks = [asyncio.create_task(get_page(url)) for url in urls]
await asyncio.gather(*tasks)# 一个解包操作
asyncio.run(main(['url_1','url_2','url_3','url_4']))
# 输出
acquire page url_1
acquire page url_2
acquire page url_3
acquire page url_4
ok url_1
ok url_2
ok url_3
ok url_4
Wall time: 3.66 s
2.3 Summary
Compared with the previous code, there is one *tasks
more unpacking operation, which turns the list into a function parameter; it **tasks
turns the dictionary into a function parameter
Compared to python2
the yield
creation of coroutines, python3.7
after providing the asyncio.create_task()
, asyncio.run()
, await
compared to the old interface to read and easier to understand, do not need to focus on the internal implementation, more concerned about the code itself (read, read, feel like numpy
and pytorch
feel, ha ha ha ha)
3. The underlying implementation of the coroutine
3.1 Example 2
import asyncio
async def work_1():
print('work 1 start ')
await asyncio.sleep(1)
print('work 1 is done!')
async def work_2():
print('work 2 start ')
await asyncio.sleep(2)
print('work 2 is done')
async def main():
print('before await ')
await work_1()
print('awaited work_1')
await work_2()
print('awaited work_2')
asyncio.run(main())
# 输出
before await
work 1 start
work 1 is done!
awaited work_1
work 2 start
work 2 is done
awaited work_2
3.2 Example 3
import asyncio
async def work_1():
print('work 1 start ')
await asyncio.sleep(1)
print('work 1 is done!')
async def work_2():
print('work 2 start ')
await asyncio.sleep(2)
print('work 2 is done')
async def main():
task1 = asyncio.create_task(work_1())
task2 = asyncio.create_task(work_2())
print('before await ')
await task1
print('awaited work 1')
await task2
print('awaited work 2')
asyncio.run(main())
# 输出
before await
work 1 start
work 2 start
work 1 is done!
awaited work 1
work 2 is done
awaited work 2
Is the order of execution in Example 2 and Example 3 different?
asyncio.run(main())
Indicates that the program enters the main() function and the event loop starts;- Task1 and task2 tasks are created, enter the event loop to wait, and then
print('before await ')
; await task1
Execute, the user chooses to cut out from the current main task, and the event scheduler starts to schedule work_1;- work_1 starts to execute, run
print('work 1 start ')
, and then runawait asyncio.sleep(1)
, cut out from the current task, and the event scheduler starts to schedule work_2; - work_2 starts to run, runs
print('work 2 start ')
, and then runsawait asyncio.sleep(2)
, cutting out from the current task; - The running time of all the above events should be 1ms~10ms, or even shorter, and the event scheduler will suspend scheduling from this time;
- One second later, the sleep of work_1 ends, and the event scheduler re-transfers control to task_1, outputs
work 1 is done!
, task_1 completes the task, and exits from the event loop; await task1
When completed, the event scheduler passes the controller to the main task, outputs itawaited work 1
, and then waits at await task2;- Two seconds later, the sleep of work_2 ends, and the event scheduler re-transfers the control right to task_2, outputs
work 2 is done!
, task_2 is completed and exits from the event loop; - The main task is output
awaited work 2
, the coroutine task is completed, and the event loop ends.
3.3 Timeout task
If we configure a crawler in Python, what should I do if something goes wrong when crawling a task? The easiest way is to cancel over time. What should I do?
import asyncio
async def work_1():
await asyncio.sleep(1)
return 1
async def work_2():
await asyncio.sleep(2)
return 2/0
async def work_3():
await asyncio.sleep(3)
return 3
async def main():
task_1 = asyncio.create_task(work_1())
task_2 = asyncio.create_task(work_2())
task_3 = asyncio.create_task(work_3())
await asyncio.sleep(2)
task_3.cancel()
res = await asyncio.gather(task_1,task_2,task_3,return_exceptions=True)
print(res)
asyncio.run(main())
# 输出
[1, ZeroDivisionError('division by zero'), CancelledError()]
In the above example, work_1 is working normally, an error occurred during work_2, and the execution time of work_3 was too long and we canceled it. The information was returned to res and printed out, where we set it return_exceptions=True
. If it is not set True
, then we That is, the exception must be caught, and the execution cannot be continued.
3.4 Producer consumer model
import asyncio
import random
async def consumer(queque,id):
while True:
val = await queque.get()
print('{} get a val: {} '.format(id,val))
await asyncio.sleep(1)
async def producer(queue,id):
for i in range(5):
val = random.randint(1,10)
await queue.put(val)
print('{} put a val : {}'.format(id,val))
await asyncio.sleep(1)
async def main():
queue = asyncio.Queue()
consumer_1 = asyncio.create_task(consumer(queue,'consumer_1'))
consumer_2 = asyncio.create_task(consumer(queue,'consumer_2'))
producer_1 = asyncio.create_task(producer(queue,'producer_1'))
producer_2 = asyncio.create_task(producer(queue,'producer_2'))
await asyncio.sleep(10)
consumer_1.cancel()
consumer_2.cancel()
await asyncio.gather(consumer_1,consumer_2,producer_1,producer_2,return_exceptions=True)
asyncio.run(main())
# 输出
producer_1 put a val : 1
producer_2 put a val : 1
consumer_1 get a val: 1
consumer_2 get a val: 1
producer_1 put a val : 2
producer_2 put a val : 2
consumer_1 get a val: 2
consumer_2 get a val: 2
producer_1 put a val : 6
producer_2 put a val : 10
consumer_1 get a val: 6
consumer_2 get a val: 10
producer_1 put a val : 8
producer_2 put a val : 2
consumer_1 get a val: 8
consumer_2 get a val: 2
producer_1 put a val : 9
producer_2 put a val : 1
consumer_1 get a val: 9
consumer_2 get a val: 1
4. Summary
- The difference between coroutine and multithreading: ① coroutine is single-threaded; ② coroutine is determined by the user when to hand over control and switch to the next task
- After the Python 3.7 version, the way to write coroutines is simpler. Combined
asyncio
with theasync/await
sum in the library ,create_task
there is no pressure on small and medium-level concurrent programming. - The use of coroutines, when to pause waiting for I/O, and when to execute to the end, there needs to be a concept of event loop
For the follow-up update of the blog post, please follow my personal blog: Stardust Blog