About the producer consumer model

 

What is the producer consumer model

In the process of software development, we often encounter such scenarios:
some modules are responsible for producing data, and these data are processed by other modules (the modules here may be: functions, threads, processes, etc.). The module that produces the data is called the producer, and the module that processes the data is called the consumer. The buffer between the producer and the consumer is called the warehouse. The producer is responsible for transporting goods to the warehouse, and the consumer is responsible for taking out the goods from the warehouse, which constitutes the producer-consumer model.

The structure diagram is as follows :

preview

To make it easier for everyone to understand, let's give an example of sending a letter. Suppose you want to send a letter, the general process is as follows:
 1. You write the letter well-it is equivalent to the producer's production data

 2. You put the letter in the mailbox-equivalent to the producer putting the data into the buffer
 3. The postman takes the letter out of the mailbox and do the corresponding processing-equivalent to the consumer taking the data out of the buffer and processing the data

Advantages of the producer consumer model

  • Decoupling
    Assume that the producer and consumer are two threads respectively. If the producer is allowed to directly call a certain method of the consumer, then the producer will have a dependency on the consumer (that is, coupling). If the consumer's code changes in the future, it may affect the producer's code. And if both depend on a certain buffer zone, and there is no direct dependence between the two, the coupling will be reduced accordingly.

For example, we go to the post office to deliver the letter. If you do not use the mailbox (that is, the buffer zone), you have to deliver the letter directly to the postman. Some students will say, isn’t it easy to give the postman directly? It's not easy. You have to know who the postman is before you can give him the letter. This creates a dependency between you and the postman (equivalent to a strong coupling between the producer and the consumer). In case the postman is changed someday, you have to re-recognize it (equivalent to the change of the consumer leading to the modification of the producer code). The mailbox is relatively fixed, and the cost of relying on it is relatively low (equivalent to a weak coupling with the buffer).

  • Concurrency
    Since the producer and the consumer are two independent concurrency, they communicate with each other using a buffer. The producer only needs to drop data into the buffer to continue to produce the next data, and the consumer only needs to start from The buffer just takes the data, so that it will not be blocked due to each other's processing speed.

Continuing the above example, if we don't use the mailbox, we have to wait for the postman at the post office until he comes back and hand over the letter to him. During this time, we can't do anything (that is, the producer is blocked). Or the postman has to go door to door asking who wants to send the letter (equivalent to consumer polling).

  • Support uneven busy and idle.
    When the producer makes data fast, the consumer is too late to process it. Unprocessed data can be temporarily stored in the buffer and processed slowly. It will not cause data loss due to the performance of consumers or affect the production of producers.

Let's take the example of sending letters again. Suppose the postman can only take 1,000 letters at a time. In case of Valentine's Day (or Christmas) sending greeting cards, more than 1,000 letters need to be sent. At this time, the mailbox is a buffer zone. It comes in handy. The postman temporarily stored the letter that was too late to take away in the mailbox, and then took it away when he came next time.

Through the above introduction, everyone should have understood the producer-consumer model.

Multithreaded programming in Python

Before implementing the producer-consumer model, let's learn about multi-threaded programming in Python.
Thread is the execution unit directly supported by the operating system. High-level languages ​​usually have built-in multi-thread support. Python is no exception. Python threads are real Posix Threads, not simulated threads.
Python's standard library provides two modules: _thread and threading. _thread is a low-level module, and threading is a high-level module, which encapsulates _thread. In most cases, we only need to use the advanced module of threading.

Let's first look at a piece of code that implements multithreading in Python.

import time,threading
#线程代码
class TaskThread(threading.Thread):
    def __init__(self,name):
        threading.Thread.__init__(self,name=name)
    def run(self):
        print('thread %s is running...' % self.getName())

        for i in range(6):
            print('thread %s >>> %s' % (self.getName(), i))
            time.sleep(1)

        print('thread %s finished.' % self.getName())

taskthread = TaskThread('TaskThread')
taskthread.start()
taskthread.join()

The following is the execution result of the program:

thread TaskThread is running...
thread TaskThread >>> 0
thread TaskThread >>> 1
thread TaskThread >>> 2
thread TaskThread >>> 3
thread TaskThread >>> 4
thread TaskThread >>> 5
thread TaskThread finished.

The TaskThread class inherits from the Thread thread class in the threading module. The name parameter of the constructor specifies the name of the thread, and the specific task is achieved by overloading the run function of the base class.

After a brief familiarity with Python threads, let's implement a producer-consumer model.

from Queue import Queue
import random,threading,time

#生产者类
class Producer(threading.Thread):
    def __init__(self, name,queue):
        threading.Thread.__init__(self, name=name)
        self.data=queue

    def run(self):
        for i in range(5):
            print("%s is producing %d to the queue!" % (self.getName(), i))
            self.data.put(i)
            time.sleep(random.randrange(10)/5)
        print("%s finished!" % self.getName())

#消费者类
class Consumer(threading.Thread):
    def __init__(self,name,queue):
        threading.Thread.__init__(self,name=name)
        self.data=queue
    def run(self):
        for i in range(5):
            val = self.data.get()
            print("%s is consuming. %d in the queue is consumed!" % (self.getName(),val))
            time.sleep(random.randrange(10))
        print("%s finished!" % self.getName())

def main():
    queue = Queue()
    producer = Producer('Producer',queue)
    consumer = Consumer('Consumer',queue)

    producer.start()
    consumer.start()

    producer.join()
    consumer.join()
    print 'All threads finished!'

if __name__ == '__main__':
    main()

The execution result may be as follows:

Producer is producing 0 to the queue!
Consumer is consuming. 0 in the queue is consumed!
Producer is producing 1 to the queue!
Producer is producing 2 to the queue!
Consumer is consuming. 1 in the queue is consumed!
Consumer is consuming. 2 in the queue is consumed!
Producer is producing 3 to the queue!
Producer is producing 4 to the queue!
Producer finished!
Consumer is consuming. 3 in the queue is consumed!
Consumer is consuming. 4 in the queue is consumed!
Consumer finished!
All threads finished!

Because multithreading is executed preemptively, the running result printed may not be exactly the same as the above.

 

The implementation of coroutine with python is roughly divided into two situations (insights within the scope of personal ability), one is multi-process or multi-thread combined with cache (buffer space) to achieve, of course, the method of caching is not unique, it can be combined with your own business Make a reasonable choice. The other is to implement a yield (also known as single-threaded mode). Next, I will give a few examples to help illustrate:

Coroutine (the above example is a thread plus queue implementation):

def producer(c):
    # 生产者产生消息,之后,yield到消费者执行
    c.send(None)    # 首先调用c.send(None)启动生成器
    n = 0
    while n < 5:
        n = n + 1
        print("[生产者] 正在生产 %s..." % n)
        r = c.send(n)   # 一旦生产了东西,通过c.send(n)切换到consumer执行
        print("[生产者] 消费者 return: %s" % r)
    c.close()

def consumer():
    r = ''
    while True:
        n = yield r
        if not n:
            return
        print('[消费者] 正在消费 %s...' % n)
        r = '200 OK'

c = consumer()  # 生成器对象
producer(c)     # 生成器对象传入producer()函数

Code interpretation:

   1.c = consumer(),不是调用函数def consumer(),而是声明一个生成器对象
   2.producer(c),将生成器对象传入函数def producer()
   3.此时,执行流程跑到:c.send(None),相当于c.__next__()方法,
     此时,执行流程跑到:def consumer(),遇到yield,返回结果 ''。
   4.此时,执行流程跑回:c.send(None),继续往下执行,n=0,
     由于n = 0,符合判断条件,print("[生产者] 正在生产 %s..." % n),
     就是输出结果中的第一条:[生产者] 正在生产 1...
     往下继续执行,r = c.send(n),此时n = 1,
     此时,执行流程跑到:def consumer(),由于生成器会记录上一次yield的状态,
     所以此时,def consumer()的 n = yield r 变为 n = send(1),即:n = 1,
     进入判断条件:if not n:,不符合,所以print([消费者] 正在消费 %s...' % n),
     就是输出结果中的第二条:[消费者] 正在消费 1...
     此时,执行流程跑回:def producer(c)中的print("[生产者] 消费者 return: %s" % r)
     就是输出结果的第三条:[生产者] 消费者 return: 200 OK
   5.之后的流程继续按照上面的步骤执行,直到produce决定不生产了,通过c.close()关闭consumer,整个过程结束。

Process plus queue:

from multiprocessing import Process, Queue
import time, random, os


def consumer(q):
    while True:
        res = q.get()
        if res is None: break  # 收到结束信号则结束
        time.sleep(random.randint(1, 3))
        print('\033[45m%s 吃 %s\033[0m' % (os.getpid(), res))


def producer(name, q):
    for i in range(2):
        time.sleep(random.randint(1, 3))
        res = '%s%s' % (name, i)
        q.put(res)
        print('\033[44m%s 生产了 %s\033[0m' % (os.getpid(), res))


if __name__ == '__main__':
    q = Queue()
    # 生产者们:即厨师们
    p1 = Process(target=producer, args=('包子', q))
    p2 = Process(target=producer, args=('骨头', q))
    p3 = Process(target=producer, args=('泔水', q))

    # 消费者们:即吃货们
    c1 = Process(target=consumer, args=(q,))
    c2 = Process(target=consumer, args=(q,))

    # 开始
    p1.start()
    p2.start()
    p3.start()
    c1.start()

    p1.join()  # 必须保证生产者全部生产完毕,才应该发送结束信号
    p2.join()
    p3.join()
    q.put(None)  # 有几个消费者就应该发送几次结束信号None
    q.put(None)  # 发送结束信号
    print('主')
#有几个消费者就需要发送几次结束信号:相当low

Queue operation instructions:

q.put方法用以插入数据到队列中,put方法还有两个可选参数:blocked和timeout。如果blocked为True(默认值),并且timeout为正值,该方法会阻塞timeout指定的时间,直到该队列有剩余的空间。如果超时,会抛出Queue.Full异常。如果blocked为False,但该Queue已满,会立即抛出Queue.Full异常。
q.get方法可以从队列读取并且删除一个元素。同样,get方法有两个可选参数:blocked和timeout。如果blocked为True(默认值),并且timeout为正值,那么在等待时间内没有取到任何元素,会抛出Queue.Empty异常。如果blocked为False,有两种情况存在,如果Queue有一个值可用,则立即返回该值,否则,如果队列为空,则立即抛出Queue.Empty异常.
 
q.get_nowait():同q.get(False)
q.put_nowait():同q.put(False)

q.empty():调用此方法时q为空则返回True,该结果不可靠,比如在返回True的过程中,如果队列中又加入了项目。
q.full():调用此方法时q已满则返回True,该结果不可靠,比如在返回True的过程中,如果队列中的项目被取走。
q.qsize():返回队列中目前项目的正确数量,结果也不可靠,理由同q.empty()和q.full()一样

summary

This example implements a simple producer-consumer model through Python. The Queue module in Python already provides support for thread synchronization, so this article does not involve multithreading issues such as locks, synchronization, and deadlocks.

ps: A careful partner may notice that the thread in the above code is single-threaded, so why not use multi-threading? It is not that it is not, but multi-threading needs to pay attention to many security issues. For more thread safety issues, please refer to: https ://blog.csdn.net/weixin_43790276/article/details/91069959

Reference and thanks:

https://www.cnblogs.com/earon/p/9601075.html

https://blog.csdn.net/weixin_42471384/article/details/82625657

https://blog.csdn.net/darkdragonking/article/details/89208124

 

Guess you like

Origin blog.csdn.net/weixin_42575020/article/details/107694234