[Best practice] Double buffering technology in the producer and consumer model

[What did this article say]

    This article mainly introduces an efficient solution when there is a large amount of data to interact between production and consumption in the producer-consumer model.

 

[Issues introduced]

1. Problem scenario

    In the design mode, the producer-consumer mode is definitely ranked in the front position, in the actual development process, it is often necessary to use this mode.

 

    In books that explain design patterns, the producer-consumer pattern will only be explained from an abstract perspective. In the face of actual programming problems, specific problems need to be analyzed in detail .

    Here is a usage scenario, you can stop and think about how to solve the problem:

    You need to implement a log library. Each thread calls the log library API function to write log information. All log information needs to be persisted to the local file system.

 

    Let's analyze, what problems need to be solved:

(1) Multithreading calls the API of the log library, so the API function must be thread-safe.

(2) The frequency of log information cannot be predicted, and the peaks and troughs should be considered.

(3) The log information needs to be persisted to the local file system, which involves file operations, and the file IO operation speed is very slow compared to the CPU.

(4) Each log information should not be written to the file immediately, but should be cached in the memory, and then written to the file when the amount of data accumulates to a certain size (of course, consider writing to the file regularly).

 

2. Solutions

    Let's go back to the producer-consumer model. When introducing this mode in books, it is generally a synchronous mode , namely:

        The producer notifies the consumer after generating a data, and then waits for the data to be "consumed";

        After receiving the notification from the producer, the consumer "consumes" the data, and then informs the producer to continue production.

    Production and consumption are executed alternately, so I call it synchronous mode.

    However, in the log system mentioned above, it is obvious that the synchronous mode cannot be used. Because the log generation speed and file writing speed cannot be estimated, for example, the log generation speed in a certain scenario is very fast, and the log generation speed in another scenario is particularly slow.

 

    For such a demand, the speed of the producer (generating the log) and the consumer (writing the log to the file) does not match , and obviously should use different threads to execute. At this point, did you immediately think of using message queues for data buffering, would not solve the problem of speed mismatch? That's it:

    Producer: Insert log information into the head of the queue (enter the queue)

    Consumer: Read log information from the tail of the queue (dequeue)

 

    Of course, you also consider that because they are different threads, you need to use a lock (Mutex) to protect the message queue when operating the same queue. It looks like this:

 

    It looks perfect, but there is a problem: every time a consumer reads a piece of log information from the message queue, it is written to the file system, but the file writing operation is very time-consuming. Obtaining data from the message queue frequently and having to lock each time will definitely affect the producer's log writing efficiency, because the producer also needs to lock the message queue to insert the log information into the head of the queue, if At this time, the message queue is just locked by the consumer , so the producer must wait sadly~~ This will greatly affect the overall throughput rate of the log system.

 

3. Use double buffering

    Since the consumer's file writing speed is relatively slow, it must not affect the producer's writing efficiency, so we can use two message queues to store separately: the log information being written, the log information being read, that is, the so-called The " double buffering " technology. It looks like this with the picture:

 

 

    Note that in the above figure, the space used to cache the log in the memory does not necessarily need to use the message queue, because the log information is often of string type, and it is enough to use a continuous heap or stack space to store it, so the figure is It is represented by "Buffer 1" and "Buffer 2".

    

    In this model, the producer writes log information to buffer 1; and the consumer reads log information from buffer 2. In this case, no matter how slow the consumer's file writing operation is, it will not affect the producer's log generation. Up.

    

    At this point, you must say: Buffer 1 and Buffer 2 are two independent memory spaces. How to "copy" the contents of Buffer 1 to Buffer 2 when buffer 1 is full ?

    Good question, this is also the key to achieving high throughput for the log system!

 

4. Buffer Exchange

    The most intuitive idea is to copy the contents of buffer 1 to buffer 2 using memcpy or other system functions at a certain moment (for example, buffer 1 is full, buffer 2 is empty, and timing). Of course, in the process of copying data, you need to lock both buffers, and complete the copy or move operation in the critical section, and this move operation must be as fast as possible, so as to have the least impact on producers and consumers. But if the amount of data is relatively large, the moving operation is still relatively time-consuming .

 

    Think about it again. In fact, what we need is not a real mobile operation , but a place for producers to store the generated data, and a place for consumers to read the data, as long as this goal is achieved.

 

    A better way is to directly exchange the addresses of the two buffers . We only need to re-point the pointer to buffer 1 when the producer writes data to buffer 2, and re-point the pointer to buffer 2 when the consumer reads data to buffer 1, so that we have reached the swap buffer. The purpose of the district.

    Before swapping the buffer: the producer writes the log to the buffer 1, and the consumer reads the log from the buffer 2.

    After swapping the buffer: the producer writes the log to the buffer 2 and the consumer reads the log from the buffer 1.

 

    When performing a swap operation, these two buffers also need to be locked. But the operation in this critical section is: swap the buffer space pointed to by the two pointers, so the execution speed will be very fast.

    Specific to the language level, for C, it is to exchange two 4-byte addresses. For C++, the container type swap function can be used.

    Draw a picture like this for better understanding:

 

    In the figure: the left side is the state before the exchange operation is performed, and the right side is the state after the exchange operation is performed. It can be seen that producers and consumers operate in different buffers at any time, so there is no mutual influence, and the purpose of quickly exchanging content is also achieved.

    Through the log system implemented by such double buffering technology, the actual test found that the throughput rate is much higher than that of many open source log libraries. If you are interested, you can simply test it.

 

【to sum up】

    At this point, what I want to express is basically over.

    You may have some other doubts, such as: when to swap buffers? What to do when the write buffer is full? These belong to another topic.

    In this actual usage scenario, the double-buffering technology solves the asynchronous operation and speed mismatch between the producer and the consumer, and improves the overall throughput rate of the log system.

 

[After finishing work]

I wonder if this article will bring you a little help?

In addition, comments and reposts are free~~~

I will continue to share various best practices in the embedded development process, welcome to pay attention to the WeChat public account: IOT Internet of Things Town

Guess you like

Origin blog.csdn.net/u012296253/article/details/110630751