Python 协程:yield,greenlet,gevent,asyncio

process

Process is the smallest unit of resource allocation, with independent memory space, including register information, heap, stack, data segment, code segment, virtual memory, file handle, IO status, signal information, etc., the switching overhead of different processes is relatively large , While the process is relatively independent and stable, usually not affected by other processes

Inter-process communication includes pipes, message queues, semaphores, shared memory, sockets, etc.

Thread

Thread is the smallest unit of system scheduling, only need to save its own stack, register information and other small content, a process must have at least one thread, the switching overhead of different threads is much smaller than the process switching, but the thread is not independent and stable enough to be easily affected by the process And the impact of other threads

Because different threads all share the same memory, communication between threads directly uses shared memory, which is to use globally defined variables. In addition, different threads usually need to achieve synchronization, mutual exclusion and other functions through locks.

Coroutine

Both processes and threads are scheduled by the operating system. Although the thread switching overhead is smaller than the process, if it is switched frequently, it will still seriously affect performance

The operating system usually switches in three situations

  1. The program runs longer
  2. Preemption of programs with higher priority
  3. The program is blocked

In many network applications, a large number of requests will be accepted at the same time. The calculation of these requests is very small. The main time is spent on IO, and the most important is the network IO time, which leads to frequent IO blocking and thread switching, which seriously affects performance

The coroutine is to solve the performance problem of the program with IO as the main overhead in high concurrency scenarios

You can run multiple coroutines in a thread. When a coroutine calls a command that requires IO blocking, it will use asynchronous IO to avoid triggering the operating system to switch and then continue to execute another coroutine. Implemented within a thread, switching overhead is very small, performance will be greatly improved

Note that the coroutine is more obvious in the case of a large amount of IO concurrency, because only in this case can we guarantee that asynchronous IO is ready to be executed at any time. If the amount of IO is small, such as a request in 10 minutes, then After doing an asynchronous operation, you still have to wait for the asynchronous IO to be ready, which will still cause thread switching

Note that the coroutine will only switch in one case: IO call

This function needs to be implemented by the program framework, which is transparent to the operating system and transparent to the application program. Development workload

In the Go language, this function is native. The Go language itself implements this function and supports it
at the syntax level. In the Python language, this function is supported by the gevent package.

The following mainly talks about Python coroutines

yield

Yield is for generator use, such as the following code

def f(max): 
    n = 1
    while n <= max: 
        yield n*n
        n = n + 1
        
for i in f(5):
    print(i)

If you do not use yield, then the function f needs to return a list, if max is very large, then you need to create a large memory to put this list, and after using yield, the function is regarded as an iterator, f (5) It returns an iterator. It triggers the iterator every time the for statement takes a value. When the iterator executes the yield command, it returns n * n and stops execution until the next time the for value is taken. The iterator then starts from n = n + 1 Continue execution so that no matter how large max is, the memory usage is constant

Give another example

def f():
    n = 1
    print("f function with yield inside")
    while True:
        msg = yield n
        print("msg: ", msg)
        n = n + 1

iter = f()
print("before invoke next")
print("receive: ", next(iter))
print("after invoke next")
print("receive: ", next(iter))

What is returned is

before invoke next
f function with yield inside
('receive: ', 1)
after invoke next
('msg: ', None)
('receive: ', 2)

It can be seen that when iter = f () is called, no information is printed out, that is, the f () function is not actually executed, but an iterator is returned, when the next (iter) function is executed (next is a built-in function of Python) , The f () function is executed, and here it only executes until yield n, and then stops executing and returns n as a result (here even the assignment of msg is not executed, which will be discussed further later), when the next next function, It will continue to execute from the assignment of msg until it meets the yield again. If the iterator has been executed, the next function will report a StopIteration exception

Continue to the next example

def f():
    n = 1
    print("f function with yield inside")
    while True:
        msg = yield n
        print("msg: ", msg)
        n = n + 1

iter = f()
print("before invoke next")
print("receive: ", next(iter))
print("after invoke next")
print("receive: ", iter.send("from outside"))

Here the second next is replaced with the send function that calls the iterator
returns

before invoke next
f function with yield inside
('receive: ', 1)
after invoke next
('msg: ', 'from outside')
('receive: ', 2)

The only difference from the previous example is that the printed msg is not None but the parameter of the send function. Like the next, the send function triggers the iterator to continue execution, but at the same time the parameter is assigned to msg as the result of the yield statement.

The following uses yield to simulate coroutines

def f_0():
    n = 5
    while n >= 0:
        print('[f_0] ' + str(n))
        yield
		n = n - 1

def f_1():
    m = 3
    while m >= 0:
        print('[f_1] ' + str(m))
        yield
		m = m - 1

iter_list = [f_0(), f_1()]
while True:
    for it in iter_list:
	    try:
			next(it)
		except:
			iter_list.remove(it)
	
	if len(iter_list) == 0:
		break

The result is

[f_0] 5
[f_1] 3
[f_0] 4
[f_1] 2
[f_0] 3
[f_1] 1
[f_0] 2
[f_1] 0
[f_0] 1
[f_0] 0

It can be seen that the function of continuously switching between the two functions is realized, but the code is cumbersome to write

greenlet

greenlet is a C extension library that implements native coroutines at the bottom

from greenlet import greenlet

def f_0():
    n = 5
    while n >= 0:
        print('[f_0] ' + str(n))
        parent_greenlet.switch()
        n = n - 1

def f_1():
    m = 3
    while m >= 0:
        print('[f_1] ' + str(m))
        parent_greenlet.switch()
        m = m - 1

def parent():
    while True:
        for task in greenlet_list:
            task.switch()
            if task.dead:
                greenlet_list.remove(task)
        if len(greenlet_list) == 0:
            break

parent_greenlet = greenlet(parent)
greenlet_list = [greenlet(f_0, parent_greenlet), greenlet(f_1, parent_greenlet)]
parent_greenlet.switch()

return

[f_0] 5
[f_1] 3
[f_0] 4
[f_1] 2
[f_0] 3
[f_1] 1
[f_0] 2
[f_1] 0
[f_0] 1
[f_0] 0

The switch can also be passed by value, which will be passed to the function parameter according to the running status of the program, or to the return of the switch

def test1(x, y):
    z = gr2.switch(x+y)
    print(z)

def test2(u):
    print(u)
    gr1.switch(42)
    print "end"

gr1 = greenlet(test1)
gr2 = greenlet(test2)
gr1.switch("hello", " world")

return

hello world
42

It can be seen that end is not printed because parent is not specified. By default, one end returns to main, and the other will not be executed. If parent is specified, parent will be returned after the end

peddled

Greenlet is also more complicated to write, and greenlet only implements coroutines, but does not implement the function of capturing IO operations and switching. In fact, general calculations do not require the switching of coroutines, and the performance has no effect, only in high concurrent IO operations When the program can be switched, its performance will be greatly improved

gevent is based on greenlet and uses many optimization measures including the epoll event monitoring mechanism of linux to improve the performance of high concurrent IO. For example, when a greenlet program needs to do network IO operations, it is registered as asynchronous monitoring and switching Go to other greenlet programs, wait for IO to complete, and then switch back to continue execution when appropriate, so that when the IO is very high, you can keep the program running, instead of spending time on IO waiting, while avoiding threading Switching overhead

import gevent

def f_0(param):
    n = param
    while n >= 0:
        print('[f_0] ' + str(n))
        gevent.sleep(0.1)
        n = n - 1

def f_1(param):
    m = param
    while m >= 0:
        print('[f_1] ' + str(m))
        gevent.sleep(0.1)
        m = m - 1

g1 = gevent.spawn(f_0, 5)
g2 = gevent.spawn(f_1, 3)
gevent.joinall([g1, g2])

return

[f_0] 5
[f_1] 3
[f_0] 4
[f_1] 2
[f_0] 3
[f_1] 1
[f_0] 2
[f_1] 0
[f_0] 1
[f_0] 0

It can be seen that the code is concise and clear. Compared with the normal program, it is to replace time.sleep () with gevent.sleep (). It is that gevent can do coroutine switching where it needs to block.

It can actually be simpler

import time
import gevent

from gevent import monkey

monkey.patch_all()

def f_0(param):
    n = param
    while n >= 0:
        print('[f_0] ' + str(n))
        time.sleep(0.1)
        n = n - 1

def f_1(param):
    m = param
    while m >= 0:
        print('[f_1] ' + str(m))
        time.sleep(0.1)
        m = m - 1

g1 = gevent.spawn(f_0, 5)
g2 = gevent.spawn(f_1, 3)
gevent.joinall([g1, g2])

Patching with monkey.patch_all () can intercept a large number of IO operations, such as time sleep, http request, etc., execute them asynchronously, and switch coroutines. This approach allows the original function to be used directly without modification. For developers, coroutines are transparent, no need to modify the code specifically, just leave it to gevent to take care of it

asyncio

Python 3.6 officially introduced the asyncio library as the Python standard library

The most important are the async and await keywords

async is used to declare a function as asynchronous and can be suspended

await is used to declare that the program is suspended. Await can only be followed by an asynchronous program or an object with __await__ attribute

import asyncio
import aiohttp

async def f_0(param):
    n = param
    while n >= 0:
        print('[f_0] ' + str(n))
        await asyncio.sleep(0.1)
        n = n - 1

async def f_1(param):
    m = param
    while m >= 0:
        print('[f_1] ' + str(m))
        await asyncio.sleep(0.1)
        m = m - 1

loop = asyncio.get_event_loop()

tasks = [
    f_0(5),
    f_1(3)
]

loop.run_until_complete(asyncio.wait(tasks))
loop.close()

return

[f_0] 5
[f_1] 3
[f_0] 4
[f_1] 2
[f_0] 3
[f_1] 1
[f_0] 2
[f_1] 0
[f_0] 1
[f_0] 0

another example

import asyncio
import aiohttp

async def request(session, url):
    async with session.get(url) as response:
        return await response.read()

async def fetch(url):
    await asyncio.sleep(1)
    async with aiohttp.ClientSession() as session:
        html = await request(session, url)
        print(html)

url_list = [
    "http://www.qq.com",
    "http://www.jianshu.com",
    "http://www.cnblogs.com"
]

tasks = [fetch(url) for url in url_list]

loop = asyncio.get_event_loop()
loop.run_until_complete(asyncio.wait(tasks))
loop.close()

You can see that you need to add async to support asynchronous calls, and use await to specify the place where
it is suspended. If the code specified by await cannot be suspended, an error will occur
and a specific asynchronous method or class needs to be used.

In comparison, gevent can be transparent to the program.
A normal synchronous program can be achieved asynchronously through gevent without any modification.

However, gevent uses the three-party package, and asyncio is the Python standard library, which provides support at the syntax level.



Guess you like

Origin www.cnblogs.com/moonlight-lin/p/12732813.html