In the latest chapter of the generator when the last write, generated in the Python2 also plays an important role - Python implementation of coroutines. What is the coroutine it?
Coroutine
Coroutine is a way to achieve concurrent programming. Concurrent mentioned, many people will think of Ken multi-threaded / multi-process model, which is to solve one of the classic model of concurrency issues. In the original Internet world, multi-threaded / multi-process will play a pivotal role in the concurrent server.
But with the development of the Internet, and slowly many occasions C10K will encounter a bottleneck, which is connected to both client-server reaches 1W, then, a lot of code and run crash because context switching process takes a lot of resources, the thread top could not so much pressure. At this time, NGINX took the debut event loop.
Event loop start a unified scheduler, allowing the scheduler to decide which tasks to run at a time, then save the overhead of multiple threads in a variety of start threads, thread management, synchronization lock. NGINX the same period, under high concurrency can be kept low resource consumption, high performance, compared to Apache also supports more concurrent connections.
Then later, there was a famous term - hell callback (callback hell) wrote a lot of JavaScript friend understand what it is. We were happy to discover that this tool is perfect successor to the superiority of the event loop, while providing async / await syntactic sugar to solve the problem of the coexistence of execution and readability. So, coroutine gradually more people discovered and optimistic, there are more and more people try to start a Node.js backend development.
Back in Python, using generators to implement coroutines already old method Python2 the era. In Python3.7, there is provided a method based asynico and async / await the. This lesson we abandon the method generator to implement coroutine based on this usage.
Speaking from a reptile
The role of reptiles do not speak, we look at the code directly
import time def craw_page(url): print('crawing{}'.format(url)) sleep_time = int(url.split('_')[-1]) time.sleep(sleep_time) print('OK {}'.format(url)) def main(urls): for url in urls: craw_page(url) main(['url_1','url_2','url_3','url_4'])
Through the above code sequence crawling five pages, and each page crawled time was 1-4 seconds (we used simulated time.sleep captured process data). Total time for the entire program
1 + 2 + 3 + 4 = 10 seconds
Time consuming procedure basically waiting on. That we are not able to optimize how about it? A very simple idea emerged: this operation we can be complicated technology, let us look at how to write
import time import asyncio async def crawl_page(url): print('crawling{}'.format(url)) sleep_time = int(url.split('_')[-1]) await asyncio.sleep(sleep_time) print('OK {}'.format(url)) async def main(urls): for url in urls: await crawl_page(url) time_start = time.perf_counter() asyncio.run(main(['url_1','url_2','url_3','url_4'])) print('totle cost {}s'.format(time.perf_counter()-time_start))
During this code to achieve an easy way to use coroutines to write asynchronous program.
import asyncio
This library contains most of the tools we need to achieve coroutine.
The async asynchronous function qualifier declaration, so here's two functions have become asynchronous function. The asynchronous function calls we get a coroutine objects (coroutine object)
Then call by awaitlai. The results of the implementation and await the normal execution of Python is the same. Also make the program blocking Here, enter the function is called, and then return to continue until finished . This is await literally.
Here we will look at several ways to execute a coroutine, commonly used in three ways:
1. invoked through the await, and on the same above said code await asyncio.sleep (sleep_time) simulates the wait time to get data reptiles, and await crawl_page () will execute craw_page () function.
2. We can
asyncio.create_task()
Method to create the task, specific methods in the next chapter we will say, pointing to it here first
3. We need to get asyncio.run to trigger the run, after this function is really Python3.7 version only features, allowing Python programming interfaces coroutine becomes very simple, we ignore the event loop how to define and how to use them question (we will be mentioned below). And a very good programming specification is
asyncio.run(main())
A main inlet in the operating cycle of the program, with a known off the function.
In this way, we can run at the above code to see what is the conclusion!
# ########### running conclusion ############ crawlingurl_1 OK url_1 crawlingurl_2 OK url_2 crawlingurl_3 OK url_3 crawlingurl_4 OK url_4 totle cost 10.0089496s
Why was it 10s? Yes, we talked about above, await a synchronous call (bold font above section). Therefore, craw_page () function before the end of the current call is not trigger a call. And the code effect is exactly the same as above. Equivalent to write a synchronous code with the asynchronous interface
Then how should we do it?
Actually very simple, it is what we want to say the following things - task (Task). The old rules, we explain the following code
import time import asyncio async def crawl_page(url): print('crawling{}'.format(url)) sleep_time = int(url.split('_')[-1]) await asyncio.sleep(sleep_time) print('OK {}'.format(url)) async def main(urls): tasks = [(asyncio.create_task(crawl_page(url))) for url in urls] for task in tasks: await task time_start = time.perf_counter() asyncio.run(main(['url_1','url_2','url_3','url_4'])) print('totle cost {}s'.format(time.perf_counter()-time_start))
We can see, after we have coroutine objects, you can
asyncio.creat_task()
To create a task it will be scheduled for execution after the task creation, so that our code will not be blocked in the task here, so we have to wait until the end of the job all tasks, to start with tasks in the task cycling
As a result, operating results is not the same
# ######### runs conclusion ########## crawlingurl_1 crawlingurl_2 crawlingurl_3 crawlingurl_4 OK url_1 OK url_2 OK url_3 OK url_4 totle cost 4.0060087s
The total running time is equal to the longest-running long run time crawlers.
In fact, for the implementation of tasks, there is another approach
import time import asyncio async def crawl_page(url): print('crawling{}'.format(url)) sleep_time = int(url.split('_')[-1]) await asyncio.sleep(sleep_time) print('OK {}'.format(url)) async def main(urls): tasks = [(asyncio.create_task(crawl_page(url))) for url in urls] for task in tasks: await asyncio.gather(*tasks) time_start = time.perf_counter() asyncio.run(main(['url_1','url_2','url_3','url_4'])) print('totle cost {}s'.format(time.perf_counter()-time_start))
* Tasks we have to solve through the package list, the list becomes a function of the parameters, the corresponding is ** dict is the dictionary becomes a function of the parameters (and)
Coroutine usage above has been generally described, if required crawling pages are tens of thousands of how should we do? Then compare the wording of coroutines, is not such an approach is more clear.
Cheng decryption co-operation
Here we can see how deep the underlying code to work during the coroutine is running, or to put two pieces of code
import asyncio import time async def work_1(): print('work_1 start') await asyncio.sleep(1) print('work_1 done') async def work_2(): print('work_1 start') await asyncio.sleep(2) print('work_2 done') async def main(): print('befort await') await work_1() print('awaited work_1') await work_2() print('awaited work_2') start_time = time.perf_counter() asyncio.run(main()) print('totle cost:{}s'.format(time.perf_counter()-start_time)) # ######### output ########## befort the await work_1 start work_1 done awaited work_1 work_1 start work_2 done awaited work_2 totle cost:3.0037941s
Code segment 2
import asyncio import time async def work_1(): print('work_1 start') await asyncio.sleep(1) print('work_1 done') async def work_2(): print('work_1 start') await asyncio.sleep(2) print('work_2 done') async def main(): task1 = asyncio.create_task(work_1()) task2 = asyncio.create_task(work_2()) print('before await') await task1 print('awaited work_1') await task2 print('awaited work_2') start_time = time.perf_counter() asyncio.run(main()) print('totle cost:{}s'.format(time.perf_counter()-start_time)) ##########输出########## before await work_1 start work_1 start work_1 done awaited work_1 work_2 done awaited work_2 totle cost:2.0024394s
We analyze the entire process to a more detailed understanding of the specific differences coroutines and threads:
1.asyncio.run (main ()) into the main () function, the event loop open;
2.task1 and task2 two tasks are created, and enters the event loop waiting to run; the program execution to first print, enter 'before await' string
3.task1 await is executed, the user selects the main cut out from the current task, the scheduler begins scheduling event work_1;
4.work_1 running, the first execution after running print output work_1 start to cut out after waiting for the current job, event scheduler start scheduling work_2;
5.work_2 running, after performing print output work_2 start to await at the current job cut out
6. The running time for all events should be between 1-10ms, even shorter event scheduler start from this moment to pause scheduling
After 7.1s, work_1 completion of sleep, time scheduler will pass control back task1, work_1 complete the task after output work_1 done ,, exits from the event loop;
8.await task1 event is complete, the controller will pass the event scheduler master task, the output awaited work_1. And then continue to wait await task2 place;
9. After two seconds, work_2 completion of sleep, the event scheduler controller Task2 passed, the output work_2 done. task2 task exits from the event loop;
10. The main task output await work_2, coroutine end of the mission, the end of the event loop.
We mentioned above basic usage coroutine, but also there are some scenarios that require some additional conditions: for example to define a task coroutine run time, if the time exceeds canceled, or there was some mistake coroutine is running, it how to handle it? We look at the code
import asyncio import time async def work_1(): await asyncio.sleep(1) return 1 async def work_2(): asyncio.sleep the await ( 2 ) return the 2/0 # divisor is not 0, an error is manufactured where the async DEF work_3 (): await asyncio.sleep(10) return 3 async def main(): task_1 = asyncio.create_task(work_1()) task_2 = asyncio.create_task(work_2()) task_3 = asyncio.create_task(work_3()) await asyncio.sleep(3) task_3.cancel() res = await asyncio.gather(task_1,task_2,task_3,return_exceptions = True) print(res) start_time = time.perf_counter() asyncio.run(main()) print('totle cost:{}s'.format(time.perf_counter()-start_time)) ##########输出########## [1, ZeroDivisionError('division by zero'), CancelledError()] totle cost:3.00382s
We can see, work_1 can operate normally, work_2 error occurs during operation, the execution time is too long work_3 cancel out these confidence will be reflected in the final result res,
And there is little doubt that we have just set up a mistake in the return, but if an error occurs during the program and what is effect? We work_2 changed a bit to see results
async def work_2(): l = [1,2,3] l[4] await asyncio.sleep(2) return 2 ##########输出########## [1, IndexError('list index out of range'), CancelledError()] totle cost:3.0029349s
There is not the only ruturn error will return the error, but as long as the program has the wrong error will be returned to the main task. But we must add return_exception = True this condition, otherwise an error will be throw to complete the implementation layer requiring a try except way to capture, then it means that other tasks not performed will be all canceled.
In talking about this, we can find that thread can even achieve, coroutine also be done. Then we will study it a bit above knowledge, a producer-consumer model to do with it coroutine
import asyncio import random async def consumer(queue,id): while True: val = await queue.get() print('{} get a val:{}.'.format(id,val)) await asyncio.sleep(1) async def producer(queue,id): for i in range(5): Val = random.randint (4,20 ) await queue.put(val) print('{} put a val:{}.'.format(id,val)) await asyncio.sleep(1) async def main(): queue = asyncio.Queue() consumer_1 = asyncio.create_task(consumer(queue,'consumer_1')) # consumer_2 = asyncio.create_task(consumer(queue,'consumer_2')) producer_1 = asyncio.create_task(producer(queue,'producer_1')) # producer_2 = asyncio.create_task(producer(queue,'producer_2')) await asyncio.sleep(10) consumer_1.cancel() producer_1.cancel() await asyncio.gather(consumer_1,producer_1,return_exceptions=True) asyncio.run(main())
We define a producer and a consumer, the main producer is launching a consumer. 10s cancel out and the requirements of consumers (in fact producers for years have been defined only by 5 cycles produced by the five elements).
And sleep through the main task of the provisions in the long run when the main task, regardless of whether there are tasks to perform out through the back of the code cancel.
Real
Finally, we conduct combat exercises today through a complete reptile
We pass a page: https://movie.douban.com/cinema/later/xian/ , this page describes Xi'an recently released film, then how to get the names of these films by python ', release time and a poster of it?
This is something we keep improving later.
to sum up
Here, today's content on finished today with a longer length from a simple reptile into interspersed between a real reptile tells Python coroutine relatively new methods and concepts. Here's review:
Coroutine and multithreading in that the difference between two points: 1 coroutine is single-threaded, there are 2 coroutine where the user decides to hand over control to the next task.
Coroutine wording is more simple and clear, the async / await syntax and create_task combination with, for the development of small and medium-level demand has been no pressure
Write coroutine program when the brain should have a clear event loop concept, know when to pause and wait for IO, when you need to perform together in the end.
Finally, remember, when to use what model can achieve optimal on the project, rather than think that technology is cattle, the technology to create the conditions. In a word:
Technical engineering, and engineering are many complicated things of time, resources and manpower compromise.
Finally, think about this:
Coroutine is how to implement a callback function it?