Chapter 7 | Concurrent Programming | Coroutines

 

coroutine 

5 tasks are implemented concurrently and placed in one thread; a single thread cannot achieve parallelism; concurrency is that it seems that the tasks are running at the same time, and its essence is to switch back and forth and save the state.

Single thread realizes concurrency, switching + saving state, what the coroutine has to do.

cpu正在运行一个任务,会在两种情况下切走去执行其他的任务(切换由操作系统强制控制),一种情况是该任务发生了阻塞,另外一种情况是该任务计算的时间过长或有一个优先级更高的程序替代了它。

  其中第二种情况并不能提升效率,只是为了让cpu能够雨露均沾,实现看起来所有任务都被“同时”执行的效果,如果多个任务都是纯计算的,这种切换反而会降低效率。为此我们可以基于yield来验证。
yield本身就是一种在单线程下可以保存任务运行状态的方法;
1 Yiled can save the state. The state preservation of yield is very similar to the thread state preservation of the operating system, but yield is controlled at the code level and is more lightweight.
2 send can pass the result of one function to another function to switch between programs in a single thread

  Switching in the first case. When task one encounters io, switch to task two to execute, so that the time when task one is blocked can be used to complete the calculation of task two, and the efficiency improvement lies in this.

yield does not achieve io switching



ps:在介绍进程理论时,提及进程的三种执行状态,而线程才是执行单位,所以也可以将上图理解为线程的三种状态
对于单线程下,我们不可避免程序中出现io操作,但如果我们能在自己的程序中(即用户程序级别,而非操作系统级别)控制单线程下的多个任务能在一个任务遇到io阻塞时就切换到另外一个任务去计算,
这样就保证了该线程能够最大限度地处于就绪态,即随时都可以被cpu执行的状态,相当于我们在用户程序级别将自己的io操作最大限度地隐藏起来,从而可以迷惑操作系统,
让其看到:该线程好像是一直在计算,io比较少,从而更多的将cpu的执行权限分配给我们的线程。 
协程的本质就是在单线程下,由用户自己控制一个任务遇到io阻塞了就切换另外一个任务去执行,以此来提升效率。为了实现它,我们需要找寻一种可以同时满足以下条件的解决方案:
  1. 可以控制多个任务之间的切换,切换之前将任务的状态保存下来,以便重新运行时,可以基于暂停的位置继续执行。
  2. 作为1的补充:可以检测io操作,在遇到io操作的情况下才发生切换
协程是一种用户态的轻量级线程,即协程是由用户程序自己控制调度的。
1. The thread of python belongs to the kernel level, that is, the scheduling is controlled by the operating system (if a single thread encounters io or the execution time is too long, it will be forced to surrender the cpu execution authority and switch other threads to run)
2. Open the coroutine in a single thread. Once io is encountered, the switching will be controlled from the application level (not the operating system) to improve efficiency (!!! The switching of non-io operations has nothing to do with efficiency)

The advantages are as follows:

1. 协程的切换开销更小,属于程序级别的切换,操作系统完全感知不到,因而更加轻量级
2. 单线程内就可以实现并发的效果,最大限度地利用cpu

The disadvantages are as follows:

1. 协程的本质是单线程下,无法利用多核,可以是一个程序开启多个进程,每个进程内开启多个线程,每个线程内开启协程
2. 协程指的是单个线程,因而一旦协程出现阻塞,将会阻塞整个线程

Summarize the coroutine features:

  1. Concurrency must be implemented in only one single thread
  2. Modify shared data without locking
  3. A context stack that saves multiple control flows in the user program
  4. Additional: A coroutine automatically switches to other coroutines when it encounters an IO operation (how to implement detection IO, yield and greenlet cannot be implemented, so the gevent module (select mechanism) is used)
 
# execute concurrently
import time
def producer():
    g=consumer()
    next(g)
    for i in range(10000000):
        g.send(i)

def consumer():
    while True:
        res=yield
start_time=time.time()
producer()
stop_time=time.time()
print(stop_time -start_time) #calculated time + switching time is inefficient

#serial
import time
def producer():
    res=[]
    for i in range(10000000):
        res.append(i)
    return res

def consumer(res):
    pass

start_time=time.time()
res=producer()
consumer(res)
stop_time=time.time()
print(stop_time - start_time) #There is no time to switch
Print:
1.8201038837432861
1.897108793258667

greenlet module

just a little better than yield

If we have 20 tasks in a single thread, it is too cumbersome to use the yield generator to switch between multiple tasks (you need to get the generator that is initialized once, and then call send... very troublesome) , and the use of the greenlet module can be very simple to achieve the direct switching of these 20 tasks.

from greenlet import greenlet
import time
def eat(name):
    print('%s eat 1' %name)
    time.sleep(10)
    g2.switch ( ' egon ' ) #First start
    print('%s eat 2' %name)
    g2. switch () # switch again

def play(name):
    print('%s play 1' %name )
    g1. switch () # switch back
    print('%s play 2' %name )
g1=greenlet(eat)
g2=greenlet(play)

g1. switch ( ' egon ' ) #Pass a parameter when starting for the first time

Print:
egon eat 1
egon play 1
egon eat 2
egon play 2

Greenlet only provides a more convenient switching method than generator. When switching to a task execution, if it encounters IO, it will block in place, and it still does not solve the problem of automatic IO switching to improve efficiency.

The code of these 20 tasks in a single thread usually has both computing operations and blocking operations. We can completely block when executing task 1, and use the blocking time to execute task 2. . . . In this way, the efficiency can be improved, which uses the Gevent module.

gevent module

The essence is to encapsulate the greenlet module, which can detect I/O and automatically switch to another task execution when encountering I/O; it can help us improve efficiency

from gevent import monkey;monkey.patch_all() #Mark all the following I/ O operations for you, which will be recognized by gevent
import guy
import time

def eat(name):
    print('%s eat 1' % name)
    time.sleep( 3 ) # immediately switch to the following execution when encountering I/ O
    print('%s eat 2' % name)

def play(name):
    print('%s play 1' % name)
    time.sleep(4)
    print('%s play 2' % name)

start_time=time.time()
g1 =gevent.spawn(eat, ' egon ' ) #Asynchronous submission method
g2=gevent.spawn(play,'alex')

g1.join() #Wait for execution to finish
g2.join()
stop_time=time.time()
print(stop_time-start_time)

Print:
egon eat 1
alex play 1
egon eat 2
alex play 2
4.0012288093566895

gevent submits tasks asynchronously

from gevent import monkey;monkey.patch_all()
import guy
import time

def eat(name):
    print('%s eat 1' % name)
    time.sleep(3)
    print('%s eat 2' % name)

def play(name):
    print('%s play 1' % name)
    time.sleep(4)
    print('%s play 2' % name)

g1=gevent.spawn(eat,'egon')
g2=gevent.spawn(play,'alex')

# time.sleep(5)

# g1.join()
# g2.join()

gevent.joinall([g1,g2]) #equivalent to the above two lines of code

Print:
egon eat 1
alex play 1
egon eat 2
alex play 2

Implementation of concurrent socket communication based on gevent module

Single-threaded, multitasking I/O operations.

#Based on gevent implementation
 from gevent import monkey,spawn;monkey.patch_all()
 from socket import *

def communicate(conn):
    while True:
        try:
            data=conn.recv(1024)
            if not data:break
            conn.send(data.upper())
        except ConnectionResetError:
            break
    conn.close()

def server(ip,port):
    server = socket (AF_INET, SOCK_STREAM)
    server.bind((ip,port))
    server.listen(5)

    while True:
        conn, addr = server.accept()
        spawn(communicate,conn) #Create a coroutine object, it will not execute after submitting the object
    server.close()

if __name__ == '__main__':
    g=spawn(server,'127.0.0.1',8090)
    g.join()
##Client

from socket import *
from threading import Thread,currentThread

def client():
    client = socket (AF_INET, SOCK_STREAM)
    client.connect(('127.0.0.1',8090))

    while True:
        client.send(('%s hello' %currentThread().getName()).encode('utf-8'))
        data=client.recv(1024)
        print(data.decode('utf-8'))
    client.close()
if __name__ == '__main__':
    for i in range(500):
        t=Thread(target=client)
        t.start()

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324741662&siteId=291194637