Python coroutine detailed tutorial

1. The concept of coroutine

Coroutine: Concurrency under single thread, also known as microthread, fiber. The English name is Coroutine.

One sentence explains what a coroutine is: a coroutine is a lightweight thread in a user mode, that is, a coroutine is controlled and scheduled by the user program itself.

The cpu is running a task and will switch to perform other tasks in two situations (switching is controlled by the operating system forcibly):

One situation is that the task is blocked; the
other situation is that the calculation time of the task is too long or a higher priority program replaces it.

A coroutine is essentially a thread. Previously, the switching of thread tasks was controlled by the operating system. When encountering I/O automatic switching, the purpose of using coroutines is to reduce the overhead of operating system switching (switching threads, creating registers, Stack, etc., switch between them, etc.), control the switching of tasks in our own program.

The coroutine tells the Cpython interpreter, aren’t you nb? Didn’t you engage a GIL lock? Well, I will create a thread for you to execute it, saving you the time to switch threads. I switch by myself than you switch. It is much faster and avoids a lot of overhead.

For single thread, we inevitably have io operations in the program, but if we can control multiple tasks in a single thread in our own program (that is, at the user program level, not at the operating system level), we can encounter io in one task. When blocking, switch to another task to calculate, so as to ensure that the thread can be in the ready state to the greatest extent, that is, the state that can be executed by the cpu at any time, which is equivalent to maximizing our own io operations at the user program level By hiding it, you can confuse the operating system and let it see: the thread seems to be computing all the time, with less io, so that more CPU execution permissions are allocated to our threads.

The essence of the coroutine is that in a single thread, the user controls a task and switches to another task to execute when it encounters io blocking, so as to improve efficiency. In order to achieve it, we need to find a solution that can satisfy the following conditions at the same time:

1. 可以控制多个任务之间的切换,切换之前将任务的状态保存下来,以便重新运行时,可以基于暂停的位置继续执行。

2. 作为1的补充:可以检测io操作,在遇到io操作的情况下才发生切换

2. Ways to implement the coroutine

1. Greenlet realizes the coroutine☆☆☆

Greenlet is an ancient way to implement coroutines

pip3 install greenlet
from greenlet import greenlet

def eat(name):
    print('%s eat 1' % name)  # 2
    g2.switch('taibai')  # 3
    print('%s eat 2' % name)  # 6
    g2.switch()  # 7

def play(name):
    print('%s play 1' % name)  # 4
    g1.switch()  # 5
    print('%s play 2' % name)  # 8

g1 = greenlet(eat)
g2 = greenlet(play)

g1.switch('taibai')  # 可以在第一次switch时传入参数,以后都不需要  #1

2.yield☆

It's a kind of coroutine, but there is no use for eggs

def func1():
  yield 1
  yield from func2
  yield 2
 def func2():
  yield 3
  yield 4
f1 = func1()
for item in f1:
  print(item)
"""
1
3
4
2
"""

3. Gevent module ☆☆☆☆ (you can also directly look at 4)

Gevent is a third-party library that can easily implement concurrent synchronous or asynchronous programming through gevent. The main mode used in gevent is Greenlet, which is a lightweight coroutine that connects to Python in the form of a C extension module. Greenlets all run inside the main program operating system process, but they are scheduled cooperatively.

i. Installation

pip3 install gevent

ii. Usage

g1=gevent.spawn(func,1,2,3,x=4,y=5)
# 创建一个协程对象g1,spawn括号内第一个参数是函数名,如eat,后面可以有多个参数,可以是位置实参或关键字实参,都是传给函数eat的,spawn是异步提交任务

g2=gevent.spawn(func2)

g1.join() #等待g1结束

g2.join() #等待g2结束  有人测试的时候会发现,不写第二个join也能执行g2,是的,协程帮你切换执行了,但是你会发现,如果g2里面的任务执行的时间长,但是不写join的话,就不会执行完等到g2剩下的任务了


#或者上述两步合作一步:
gevent.joinall([g1,g2])

g1.value #拿到func1的返回值

Automatically switch tasks when encountering IO blocking

import gevent

def eat(name):
    print('%s eat 1' % name)
    gevent.sleep(2)
    print('%s eat 2' % name)

def play(name):
    print('%s play 1' % name)
    gevent.sleep(1)
    print('%s play 2' % name)


g1 = gevent.spawn(eat, 'egon')
g2 = gevent.spawn(play, name='egon')
g1.join()
g2.join()
# 或者gevent.joinall([g1,g2])
print('主')

The above example gevent.sleep(2)simulates the io blocking that gevent can recognize;

And time.sleep(2)or other blocking, gevent cannot be directly identified. You need to use the following line of code and patch to identify it.

from gevent import monkey;
monkey.patch_all() #必须放到被打补丁者的前面,如time,socket模块之前

Or we simply remembered: To use gevent, you need to from gevent import monkey;monkey.patch_all()put it at the beginning of the file:

from gevent import monkey

monkey.patch_all()  # 必须写在最上面,这句话后面的所有阻塞全部能够识别了

import gevent  # 直接导入即可
import time

def eat():
    # print()  
    print('eat food 1')
    time.sleep(2)  # 加上monkey就能够识别到time模块的sleep了
    print('eat food 2')

def play():
    print('play 1')
    time.sleep(1)  # 来回切换,直到一个I/O的时间结束,这里都是我们个gevent做得,不再是控制不了的操作系统了。
    print('play 2')

g1 = gevent.spawn(eat)
g2 = gevent.spawn(play)
gevent.joinall([g1, g2])
print('主')

We can use threading.current_thread().getName()to view each g1 and g2, view the result is DummyThread-nthat false threads, virtual threads, in fact, in a thread inside

The task switching of the process thread is switched by the operating system itself, and you cannot control it yourself

The coroutine is switched through its own program (code), which can be controlled by itself. Only when it encounters an IO operation that the coroutine module can recognize, the program will switch tasks to achieve concurrent effects. If all programs have no IO Operation, then basically belongs to serial execution.

iii. Summary

This is a magic weapon, but the sword is old and the official has a built-in coroutine tool. This method is about to be eliminated by asyncio.

4.asyncio☆☆☆☆☆☆

Introduced when pyhon3.4, built-in modules, no need to install, make sure your Python interpreter version is greater than 3.4

i. Directly upload the code:

import asyncio


@asyncio.coroutine #表示这不再是一个普通函数,已经升级为可以异步的战斗机了!
def func1():
    print(1)
    yield from asyncio.sleep(2)  #模拟io,生产中换成实际的io
    print(2)


@asyncio.coroutine
def func2():
    print(3)
    yield from asyncio.sleep(2)
    print(4)

#把任务放进任务池中
tasks = [
    asyncio.ensure_future(func1()),
    asyncio.ensure_future(func2()),
]

loop = asyncio.get_event_loop()
loop.run_until_complete(asyncio.wait(tasks))

ii.async & await keywords

Python3.5 only appeared, ensure that your version meets the requirements

Essentially the same as the above usage, just replace the keywords

import asyncio


async def func1():  #async替换掉关键字
    print(1)
    await asyncio.sleep(2)  #等待io
    print(2)


async def func2():
    print(3)
    await asyncio.sleep(2) 
    print(4)


tasks = [
    asyncio.ensure_future(func1()),#future对象较为底层,是task的基类
    asyncio.ensure_future(func2()),
]

loop = asyncio.get_event_loop()
loop.run_until_complete(asyncio.wait(tasks))

iii.run method, task object

Python 3.7 only appeared, ensure that your version meets the requirements

The run method contains

loop = asyncio.get_event_loop()
loop.run_until_complete(asyncio.wait(tasks))

The most commonly used coroutine method

import asyncio


async def func1():  # async替换掉关键字
    print(1)
    await asyncio.sleep(2)  # 等待io
    print(2)
    return "func1"


async def func2():
    print(3)
    await asyncio.sleep(2)
    print(4)
    return "func2"


async def main():
    tasks = [
        asyncio.create_task(func1()),  # 必须声名在在异步函数中,因为放在外面没有task对象会报错
        asyncio.create_task(func2()),
    ]
    done, pending = await asyncio.wait(tasks, timeout=None)  # done:返回结果,pending:返回未完成的协程函数
    for i in done:
        print(i.result())
    print(pending)


# 运行
asyncio.run(main())

If you want to put the tasks object outside, you need to modify the code

import asyncio


async def func1():  # async替换掉关键字
    print(1)
    await asyncio.sleep(2)  # 等待io
    print(2)
    return "func1"


async def func2():
    print(3)
    await asyncio.sleep(2)
    print(4)
    return "func2"


tasks = [
    func1(),#不能使用task对象,因为还没有创建事件循环loop对象
    func2(),
]
#wait方法自动把协程函数创建task对象
done, pending = asyncio.run(asyncio.wait(tasks, timeout=None))  # done:返回结果,pending:返回未完成的协程函数
for i in done:
    print(i.result())
print(pending)

iv. Some libraries do not support asyncio syntax, such as requests

When we use the asyncio module to implement asynchronous crawlers

import asyncio
import requests

urls = [
    "http://www.smilenow.top",
    "http://www.baidu.com",
    "http://www.163.com"
]


async def get_cont(url):
    print("准备下载:", url)
    htm = requests.get(url=url).text
    print(url, "已经下载完毕")
    return url


tasks = map(lambda x: get_cont(x), urls)

asyncio.run(asyncio.wait(tasks, timeout=None))  # done:返回结果,pending:返回未完成的协程函数

result

准备下载: http://www.baidu.com
http://www.baidu.com 已经下载完毕
准备下载: http://www.163.com
http://www.163.com 已经下载完毕
准备下载: http://www.smilenow.top
http://www.smilenow.top 已经下载完毕

What the hell, is it not asynchronous at all? How to do? Use thread pool instead!

import asyncio
import requests

urls = [
    "http://www.smilenow.top",
    "http://www.baidu.com",
    "http://www.163.com"
]


async def get_cont(url):
    print("准备下载:", url)
    loop = asyncio.get_event_loop()
    future = loop.run_in_executor(None,requests.get,url)#变成多线程方式运行了
    await future
    print(url, "已经下载完毕")
    return url


tasks = map(lambda x: get_cont(x), urls)

asyncio.run(asyncio.wait(tasks, timeout=None))  # done:返回结果,pending:返回未完成的协程函数

v.asyncio asynchronous operation redis

Download the redis module that supports asynchronous

pip install aioredis
import aioredis
import asyncio
 
class Redis:
    _redis = None
 
    async def get_redis_pool(self, *args, **kwargs):
        if not self._redis:
            self._redis = await aioredis.create_redis_pool(*args, **kwargs)
        return self._redis
 
    async def close(self):
        if self._redis:
            self._redis.close()
            await self._redis.wait_closed()
 
 
async def get_value(key):
    redis = Redis()
    r = await redis.get_redis_pool(('127.0.0.1', 6379), db=7, encoding='utf-8')
    value = await r.get(key)
    print(f'{key!r}: {value!r}')
    await redis.close()         
 
if __name__ == '__main__':
    asyncio.run(get_value('key'))  # need python3.7

vi.aiomysql asynchronous operation mysql

installation:

pip install aiomysql
import asyncio
import aiomysql


async def execute():
    conn = await aiomysql.connect(host='localhost', port=3306, user="root", password='123', db='my')
    cur = await conn.cursor()
    await cur.excute("select * from user")
    result = await cur.fetchall()
    print(result)
    await cur.close()
    await conn.close()


asyncio.run(execute())

v. Not fast enough, uvloop lets the speed fly! ! !

uvloop makes asyncio faster. In fact, it is at least twice as fast as nodejs, gevent, and any other Python asynchronous framework. The performance-based test of uvloop asyncio is close to the Go program .

This is a module favored by major frameworks

installation:

pip install uvloop
import asyncio
import uvloop
asyncio.set_event_loop_policy(uvloop.EventLoopPolicy())

#下面正常书写asyncio代码

Guess you like

Origin blog.csdn.net/qq_40837794/article/details/109708891