1. What is a coroutine
Coroutine, also known as micro-thread, is a concept that realizes concurrency under a single thread. In one sentence, coroutine is a lightweight thread in user mode, which can be controlled and executed by user-defined programs. schedule.
There are two things to note about this:
1. The thread of python belongs to the kernel level, that is, the scheduling is controlled by the operating system (if a single thread encounters io or the execution time is too long, it will be forced to surrender the cpu execution authority and switch other threads to run)
2. After the coroutine is opened in a single thread, once io is encountered, the switching will be controlled from the application level (not the operating system) to improve efficiency (the switching of non-io operations has nothing to do with efficiency)
So using coroutines has the following advantages:
1. The switching of the coroutine belongs to the switching of the program level, the overhead is smaller, the operating system is completely imperceptible, so it is more lightweight
2. Concurrency can be achieved in a single thread, maximizing the use of CPU
Its disadvantages are:
1. The essence of a coroutine is a single thread and cannot use multiple cores. It can be a program to open multiple processes, each process to open multiple threads, and each thread to open a coroutine
2. A coroutine refers to a single thread, so once a coroutine is blocked, the entire thread will be blocked
Summarize the coroutine features:
1. Concurrency can only be achieved in a single thread
2. Modify shared data without locking
3. The user program saves the context stack of multiple control flows by itself
Additional: A coroutine automatically switches to other coroutines when it encounters an IO operation (how to implement detection IO, yield and greenlet cannot be implemented, so the gevent module (select mechanism) is used)
2. How to implement coroutines
1. Using greenlets
It needs to be installed before use: pip3 install greenlet
from greenlet import greenlet def eat(name): print('%s eat 1' %name) g2.switch('egon') print('%s eat 2' %name) g2.switch() def play(name): print('%s play 1' %name) g1.switch() print('%s play 2' %name) g1=greenlet(eat) g2=greenlet(play) g1.switch( ' egon ' ) #You can pass parameters in the first switch, and you don't need them later
Simple switching (in the absence of io or without repeating the operation of opening up memory space) will reduce the execution speed of the program
#Execute import time def f1() sequentially : res=1 for i in range(100000000): res+=i def f2(): res=1 for i in range(100000000): res*=i start=time.time() f1 () f2() stop=time.time() print('run time is %s' %(stop-start)) #10.985628366470337 #切换 from greenlet import greenlet import time def f1(): res=1 for i in range(100000000): res+=i g2.switch() def f2(): res=1 for i in range(100000000): res*=i g1.switch() start=time.time() g1=greenlet(f1) g2=greenlet(f2) g1.switch() stop=time.time() print('run time is %s' %(stop-start)) # 52.763017892837524
Greenlet only provides a convenient switching method. When switching to a task execution, if it encounters IO, it will block in place, but it cannot automatically switch to improve efficiency when encountering IO.
The code of these 20 tasks in a single thread usually has both computing operations and blocking operations. We can completely block when executing task 1, and use the blocking time to execute task 2, which uses the Gevent module.
2、Given
First install Gevent: pip3 install gevent
- Gevent is a third-party library that can easily implement concurrent synchronous or asynchronous programming through gevent. The main mode used in gevent is Greenlet , which is a lightweight coroutine that connects to Python in the form of a C extension module. Greenlets all run inside the main program operating system process, but they are scheduled cooperatively.
#Usage g1= gevent.spawn (func,1,,2,3,x=4,y=5 ) to create a coroutine object g1, the first parameter in spawn brackets is the function name, such as eat, there can be multiple A parameter, which can be a positional argument or a keyword argument, is passed to the function eat g2 = vent.spawn(func2) g1.join() #Wait for g1 to end g2.join() #Wait for g2 to end #Or one step of the above two steps: gevent.joinall([g1,g2]) g1.value #Get the return value of func1
- Automatically switch tasks when encountering IO blocking
import gevent def eat(name): print('%s eat 1' %name) gevent.sleep(2) print('%s eat 2' %name) def play(name): print('%s play 1' %name) gevent.sleep(1) print('%s play 2' %name) g1=gevent.spawn(eat,'egon') g2=gevent.spawn(play,name='egon') g1.join() g2.join() #Or gevent.joinall([g1,g2]) print ( ' main ' )
The above example gevent.sleep(2) simulates the io blocking that gevent can recognize,
And time.sleep(2) or other blocking, gevent can not be directly recognized, you need to use the following line of code, patch, you can recognize
from gevent import monkey;monkey.patch_all() must be placed in front of the patched person, such as time, before the socket module
Or we simply remember: to use gevent, we need to put from gevent import monkey;monkey.patch_all() at the beginning of the file
from gevent import monkey;monkey.patch_all() import gevent import time def eat(): print('eat food 1') time.sleep(2) print('eat food 2') def play(): print('play 1') time.sleep(1) print('play 2') g1=gevent.spawn(eat) g2=gevent.spawn(play_phone) vent.joinall([g1,g2]) print ( ' main ' )
3. Synchronous and asynchronous of Gevent
from gevent import spawn,joinall,monkey;monkey.patch_all() import time def task(pid): """ Some non-deterministic task """ time.sleep(0.5) print('Task %s done' % pid) def synchronous(): for i in range(10): task(i) def asynchronous(): g_l=[spawn(task,i) for i in range(10)] joinall(g_l) if __name__ == '__main__': print('Synchronous:') synchronous() print('Asynchronous:') asynchronous() #The important part of the above program is to encapsulate the task function into the gevent.spawn of the Greenlet internal thread. The list of initialized greenlets is stored in the array threads, which is passed to the gevent.joinall function, which blocks the current process and executes all the given greenlets. The execution flow will only continue down after all greenlets have been executed.
4. Coroutine application: crawler
from gevent import monkey;monkey.patch_all() import gevent import requests import time def get_page(url): print('GET: %s' %url) response=requests.get(url) if response.status_code == 200: print('%d bytes received from %s' %(len(response.text),url)) start_time=time.time() vent.joinall([ gevent.spawn(get_page,'https://www.python.org/'), gevent.spawn(get_page,'https://www.yahoo.com/'), gevent.spawn(get_page,'https://github.com/'), ]) stop_time=time.time() print('run time is %s' %(stop_time-start_time))