Python core technology and real - fifteen | Python coroutine

In the latest chapter of the generator when the last write, generated in the Python2 also plays an important role - Python implementation of coroutines. What is the coroutine it?

Coroutine

Coroutine is a way to achieve concurrent programming. Concurrent mentioned, many people will think of Ken multi-threaded / multi-process model, which is to solve one of the classic model of concurrency issues. In the original Internet world, multi-threaded / multi-process will play a pivotal role in the concurrent server.

But with the development of the Internet, and slowly many occasions C10K will encounter a bottleneck, which is connected to both client-server reaches 1W, then, a lot of code and run crash because context switching process takes a lot of resources, the thread top could not so much pressure. At this time, NGINX took the debut event loop.

Event loop start a unified scheduler, allowing the scheduler to decide which tasks to run at a time, then save the overhead of multiple threads in a variety of start threads, thread management, synchronization lock. NGINX the same period, under high concurrency can be kept low resource consumption, high performance, compared to Apache also supports more concurrent connections.

Then later, there was a famous term - hell callback (callback hell) wrote a lot of JavaScript friend understand what it is. We were happy to discover that this tool is perfect successor to the superiority of the event loop, while providing async / await syntactic sugar to solve the problem of the coexistence of execution and readability. So, coroutine gradually more people discovered and optimistic, there are more and more people try to start a Node.js backend development.

Back in Python, using generators to implement coroutines already old method Python2 the era. In Python3.7, there is provided a method based asynico and async / await the. This lesson we abandon the method generator to implement coroutine based on this usage.

Speaking from a reptile

The role of reptiles do not speak, we look at the code directly

import time
def craw_page(url):
    print('crawing{}'.format(url))
    sleep_time = int(url.split('_')[-1])
    time.sleep(sleep_time)
    print('OK {}'.format(url))

def main(urls):
    for url in urls:
        craw_page(url)

main(['url_1','url_2','url_3','url_4'])

Through the above code sequence crawling five pages, and each page crawled time was 1-4 seconds (we used simulated time.sleep captured process data). Total time for the entire program

1 + 2 + 3 + 4 = 10 seconds

Time consuming procedure basically waiting on. That we are not able to optimize how about it? A very simple idea emerged: this operation we can be complicated technology, let us look at how to write

import time
import asyncio
async def crawl_page(url):
    print('crawling{}'.format(url))
    sleep_time = int(url.split('_')[-1])
    await asyncio.sleep(sleep_time)
    print('OK {}'.format(url))

async def main(urls):
    for url in urls:
        await crawl_page(url)


time_start = time.perf_counter()
asyncio.run(main(['url_1','url_2','url_3','url_4']))
print('totle cost {}s'.format(time.perf_counter()-time_start))

During this code to achieve an easy way to use coroutines to write asynchronous program.

import asyncio

This library contains most of the tools we need to achieve coroutine.

The async asynchronous function qualifier declaration, so here's two functions have become asynchronous function. The asynchronous function calls we get a coroutine objects (coroutine object)

Then call by awaitlai. The results of the implementation and await the normal execution of Python is the same. Also make the program blocking Here, enter the function is called, and then return to continue until finished . This is await literally.

Here we will look at several ways to execute a coroutine, commonly used in three ways:

1. invoked through the await, and on the same above said code await asyncio.sleep (sleep_time) simulates the wait time to get data reptiles, and await crawl_page () will execute craw_page () function.

2. We can

asyncio.create_task()

Method to create the task, specific methods in the next chapter we will say, pointing to it here first

3. We need to get asyncio.run to trigger the run, after this function is really Python3.7 version only features, allowing Python programming interfaces coroutine becomes very simple, we ignore the event loop how to define and how to use them question (we will be mentioned below). And a very good programming specification is

asyncio.run(main())

A main inlet in the operating cycle of the program, with a known off the function.

In this way, we can run at the above code to see what is the conclusion!

# ########### running conclusion ############ 
crawlingurl_1
OK url_1
crawlingurl_2
OK url_2
crawlingurl_3
OK url_3
crawlingurl_4
OK url_4
totle cost 10.0089496s

Why was it 10s? Yes, we talked about above, await a synchronous call (bold font above section). Therefore, craw_page () function before the end of the current call is not trigger a call. And the code effect is exactly the same as above. Equivalent to write a synchronous code with the asynchronous interface

Then how should we do it?

Actually very simple, it is what we want to say the following things - task (Task). The old rules, we explain the following code

import time
import asyncio
async def crawl_page(url):
    print('crawling{}'.format(url))
    sleep_time = int(url.split('_')[-1])
    await asyncio.sleep(sleep_time)
    print('OK {}'.format(url))

async def main(urls):
    tasks = [(asyncio.create_task(crawl_page(url))) for url in urls]
    for task in tasks:
        await task


time_start = time.perf_counter()
asyncio.run(main(['url_1','url_2','url_3','url_4']))
print('totle cost {}s'.format(time.perf_counter()-time_start))

We can see, after we have coroutine objects, you can

asyncio.creat_task()

To create a task it will be scheduled for execution after the task creation, so that our code will not be blocked in the task here, so we have to wait until the end of the job all tasks, to start with tasks in the task cycling

As a result, operating results is not the same

# ######### runs conclusion ########## 
crawlingurl_1
crawlingurl_2
crawlingurl_3
crawlingurl_4
OK url_1
OK url_2
OK url_3
OK url_4
totle cost 4.0060087s

The total running time is equal to the longest-running long run time crawlers.

In fact, for the implementation of tasks, there is another approach

import time
import asyncio
async def crawl_page(url):
    print('crawling{}'.format(url))
    sleep_time = int(url.split('_')[-1])
    await asyncio.sleep(sleep_time)
    print('OK {}'.format(url))

async def main(urls):
    tasks = [(asyncio.create_task(crawl_page(url))) for url in urls]
    for task in tasks:
        await asyncio.gather(*tasks)


time_start = time.perf_counter()
asyncio.run(main(['url_1','url_2','url_3','url_4']))
print('totle cost {}s'.format(time.perf_counter()-time_start))

* Tasks we have to solve through the package list, the list becomes a function of the parameters, the corresponding is ** dict is the dictionary becomes a function of the parameters (and)

Coroutine usage above has been generally described, if required crawling pages are tens of thousands of how should we do? Then compare the wording of coroutines, is not such an approach is more clear.

Cheng decryption co-operation

Here we can see how deep the underlying code to work during the coroutine is running, or to put two pieces of code

import asyncio
import time
async def work_1():
    print('work_1 start')
    await asyncio.sleep(1)
    print('work_1 done')

async def work_2():
    print('work_1 start')
    await asyncio.sleep(2)
    print('work_2 done')

async def main():
    print('befort await')
    await work_1()
    print('awaited work_1')
    await work_2()
    print('awaited work_2')
    

start_time = time.perf_counter()
asyncio.run(main())
print('totle cost:{}s'.format(time.perf_counter()-start_time))

# ######### output ########## 
befort the await
work_1 start
work_1 done
awaited work_1
work_1 start
work_2 done
awaited work_2
totle cost:3.0037941s

Code segment 2

import asyncio
import time

async def work_1():
    print('work_1 start')
    await asyncio.sleep(1)
    print('work_1 done')

async def work_2():
    print('work_1 start')
    await asyncio.sleep(2)
    print('work_2 done')

async def main():
    task1 = asyncio.create_task(work_1())
    task2 = asyncio.create_task(work_2())
    print('before await')
    await task1
    print('awaited work_1')
    await task2
    print('awaited work_2')
    
start_time = time.perf_counter()
asyncio.run(main())
print('totle cost:{}s'.format(time.perf_counter()-start_time))

##########输出##########
before await
work_1 start
work_1 start
work_1 done
awaited work_1
work_2 done
awaited work_2
totle cost:2.0024394s

We analyze the entire process to a more detailed understanding of the specific differences coroutines and threads:

1.asyncio.run (main ()) into the main () function, the event loop open;

2.task1 and task2 two tasks are created, and enters the event loop waiting to run; the program execution to first print, enter 'before await' string

3.task1 await is executed, the user selects the main cut out from the current task, the scheduler begins scheduling event work_1;

4.work_1 running, the first execution after running print output work_1 start to cut out after waiting for the current job, event scheduler start scheduling work_2;

5.work_2 running, after performing print output work_2 start to await at the current job cut out

6. The running time for all events should be between 1-10ms, even shorter event scheduler start from this moment to pause scheduling

After 7.1s, work_1 completion of sleep, time scheduler will pass control back task1, work_1 complete the task after output work_1 done ,, exits from the event loop;

8.await task1 event is complete, the controller will pass the event scheduler master task, the output awaited work_1. And then continue to wait await task2 place;

9. After two seconds, work_2 completion of sleep, the event scheduler controller Task2 passed, the output work_2 done. task2 task exits from the event loop;

10. The main task output await work_2, coroutine end of the mission, the end of the event loop.

We mentioned above basic usage coroutine, but also there are some scenarios that require some additional conditions: for example to define a task coroutine run time, if the time exceeds canceled, or there was some mistake coroutine is running, it how to handle it? We look at the code

import asyncio
import time
async def work_1():
    await asyncio.sleep(1)
    return 1

async def work_2():
    asyncio.sleep the await ( 2 )
     return the 2/0        # divisor is not 0, an error is manufactured where 

the async DEF work_3 ():
    await asyncio.sleep(10)
    return 3

async def main():
    task_1 = asyncio.create_task(work_1())
    task_2 = asyncio.create_task(work_2())
    task_3 = asyncio.create_task(work_3())

    await asyncio.sleep(3)
    task_3.cancel()

    res = await asyncio.gather(task_1,task_2,task_3,return_exceptions = True)
    print(res)

start_time = time.perf_counter()
asyncio.run(main())
print('totle cost:{}s'.format(time.perf_counter()-start_time))

##########输出##########
[1, ZeroDivisionError('division by zero'), CancelledError()]
totle cost:3.00382s

We can see, work_1 can operate normally, work_2 error occurs during operation, the execution time is too long work_3 cancel out these confidence will be reflected in the final result res,

And there is little doubt that we have just set up a mistake in the return, but if an error occurs during the program and what is effect? We work_2 changed a bit to see results

async def work_2():
    l = [1,2,3]
    l[4]
    await asyncio.sleep(2)
    return 2       


##########输出##########
[1, IndexError('list index out of range'), CancelledError()]
totle cost:3.0029349s

There is not the only ruturn error will return the error, but as long as the program has the wrong error will be returned to the main task. But we must add return_exception = True this condition, otherwise an error will be throw to complete the implementation layer requiring a try except way to capture, then it means that other tasks not performed will be all canceled.

In talking about this, we can find that thread can even achieve, coroutine also be done. Then we will study it a bit above knowledge, a producer-consumer model to do with it coroutine

import asyncio
import random
async def consumer(queue,id):
    while True:
        val = await queue.get()
        print('{} get a val:{}.'.format(id,val))
        await asyncio.sleep(1)

async def producer(queue,id):
    for i in range(5):
        Val = random.randint (4,20 )
        await queue.put(val)
        print('{} put a val:{}.'.format(id,val))
        await asyncio.sleep(1)

async def main():
    queue = asyncio.Queue()

    consumer_1 = asyncio.create_task(consumer(queue,'consumer_1'))
    # consumer_2 = asyncio.create_task(consumer(queue,'consumer_2'))

    producer_1 = asyncio.create_task(producer(queue,'producer_1'))
    # producer_2 = asyncio.create_task(producer(queue,'producer_2'))


    await asyncio.sleep(10)
    consumer_1.cancel()
    producer_1.cancel()

    await asyncio.gather(consumer_1,producer_1,return_exceptions=True)

asyncio.run(main())

We define a producer and a consumer, the main producer is launching a consumer. 10s cancel out and the requirements of consumers (in fact producers for years have been defined only by 5 cycles produced by the five elements).

And sleep through the main task of the provisions in the long run when the main task, regardless of whether there are tasks to perform out through the back of the code cancel.

Real

Finally, we conduct combat exercises today through a complete reptile

We pass a page: https://movie.douban.com/cinema/later/xian/ , this page describes Xi'an recently released film, then how to get the names of these films by python ', release time and a poster of it?

This is something we keep improving later.

to sum up

Here, today's content on finished today with a longer length from a simple reptile into interspersed between a real reptile tells Python coroutine relatively new methods and concepts. Here's review:

Coroutine and multithreading in that the difference between two points: 1 coroutine is single-threaded, there are 2 coroutine where the user decides to hand over control to the next task.

Coroutine wording is more simple and clear, the async / await syntax and create_task combination with, for the development of small and medium-level demand has been no pressure

Write coroutine program when the brain should have a clear event loop concept, know when to pause and wait for IO, when you need to perform together in the end.

Finally, remember, when to use what model can achieve optimal on the project, rather than think that technology is cattle, the technology to create the conditions. In a word:

Technical engineering, and engineering are many complicated things of time, resources and manpower compromise.

Finally, think about this:

Coroutine is how to implement a callback function it?

Python core technology and real - fifteen | Python coroutine

Guess you like