Essay-How does the message queue thread pool model ensure that messages are not lost when restarting

background

I saw a post on Maimai today, which is more interesting:
Essay-How does the message queue thread pool model ensure that messages are not lost when restarting

The meaning of this post is: when using Kafka, we have set up multiple partitions, how to improve consumption? If you use the thread pool to improve how to ensure that the message is not lost when restarting.

This question actually asks two points. The first is how to increase consumption capacity, and the second is how we can keep messages from being lost if we choose a thread pool.

Let me first explain how these two problems are. In many message queues, there is a concept called partion, which stands for partition. Partition is the key to improving the consumption of message queues. The consumption channel of our consumers is from each One partition can only be held by one consumer, as shown in the figure below:
Essay-How does the message queue thread pool model ensure that messages are not lost when restarting

It's a bit similar to bank queuing. The more queues there are, the shorter the queuing time will be. Of course, it can also be processed asynchronously, such as thread pool, where all messages are thrown into the thread pool for execution. This leads to the second question the author said. First, let's take a look at why synchronous consumption does not lose messages?
Essay-How does the message queue thread pool model ensure that messages are not lost when restarting

If we are using a synchronous model, we will offset ack back after we consume. If we restart and fail to offset, then this part of the data will be consumed again. If it is consumed by thread pool, how do we proceed? What about ack, for example, if we consume 10, 11, and 12 messages with the thread pool, if 12 is consumed first, then we ack 13? If you do this, restart at this time, and Kafka will think that you have processed the messages of 10 and 11, and the messages will be lost at this time, and the students who posted this post are more confused about this.

Netizen's answer

Let's take a look at some answers from netizens:

Netizen A:
Essay-How does the message queue thread pool model ensure that messages are not lost when restarting

This netizen’s answer essentially still uses thread pools, and the author also replied, but it did not solve the thread pool problem.

Netizen B:

Essay-How does the message queue thread pool model ensure that messages are not lost when restarting
This method is similar to bank queuing. As long as there are more queues, the processing speed will be increased. It is indeed one of the solutions to the first problem.

Netizen C:
Essay-How does the message queue thread pool model ensure that messages are not lost when restarting

Essay-How does the message queue thread pool model ensure that messages are not lost when restarting

This category mainly solves the second problem. By maintaining the offset externally, for example, by storing the offset into the library, we can find the correct offset that should be consumed. This is relatively complicated. Using an MQ requires a database. , In case I use MQ services without a database at all, I have to apply separately.

Netizen D:
Essay-How does the message queue thread pool model ensure that messages are not lost when restarting

Another point of view is that if the code is better written to increase the consumption speed, then the consumption capacity will naturally increase. This is indeed a very important point, which is usually ignored by others. Sometimes consumption is slower. People may start thinking about how to set up middleware, and often ignore their own code.

After reading a reply to so many posts, I feel that there is no answer that really satisfies me. Here are some ideas in my mind.

my thoughts

Regarding the first question, how to improve consumption power? This problem can actually be summarized in three ways:

  1. If the consumption thread of each consumer machine is fixed, then we can expand the consumption machine and partion, similar to the bank queuing to increase the queue window.
  2. If the machine and partion are fixed, it is a better way to increase the consumption thread, but if it is sequential consumption, you cannot increase the consumption capacity by increasing the number of threads, because each partion of sequential consumption is a separate thread. It can only be solved in the first way.
  3. Increase the consumption power of your own code. Think about it if the bank handles affairs and if the teller's efficiency can be improved very high, then the overall queue speed must be very fast.
    For the second question, if we use the thread pool model, how to solve the problem of message loss, here I recommend the approach in RocketMQ. We said before that using a database to save offset is more complicated and the performance is still relatively poor. In RocketMQ A TreeMap structure is used to do the database thing we mentioned above:
private final TreeMap<Long, MessageExt> msgTreeMap = new TreeMap<Long, MessageExt>();

The key of the TreeMap is the offset of each message, and the value is some information of the message. The bottom layer of the TreeMap is implemented using red-black trees. We can quickly get the minimum and maximum values. When we process each time When we finish a certain message, we will remove this message from msgTreeMap,

public long removeMessage(final List<MessageExt> msgs) {
        long result = -1;
        final long now = System.currentTimeMillis();
        try {
            this.lockTreeMap.writeLock().lockInterruptibly();
            this.lastConsumeTimestamp = now;
            try {
                if (!msgTreeMap.isEmpty()) {
                    result = this.queueOffsetMax + 1;
                    int removedCnt = 0;
                    for (MessageExt msg : msgs) {
                        MessageExt prev = msgTreeMap.remove(msg.getQueueOffset());
                        if (prev != null) {
                            removedCnt--;
                            msgSize.addAndGet(0 - msg.getBody().length);
                        }
                    }
                    msgCount.addAndGet(removedCnt);

                    if (!msgTreeMap.isEmpty()) {
                        result = msgTreeMap.firstKey();
                    }
                }
            } finally {
                this.lockTreeMap.writeLock().unlock();
            }
        } catch (Throwable t) {
            log.error("removeMessage exception", t);
        }
        return result;
    }

The removeMessage method is to remove the messages that have been consumed and return the latest consumption offset. The result returned here is msgTreeMap.firstKey(). The value we ack to the message queue server is actually this. Back to our problem, If we restart, we don’t actually need to worry about losing messages.

At last

Here is just a brief introduction to the message queue's improved message capabilities. If you are interested in message queues, you can read some of my previous articles:

  • Kafka you must know
  • RocketMQ you should know
  • In-depth understanding of the use, principle and optimization of RocketMq common messages and sequential messages
  • In-depth analysis of how to implement transaction messages.
    If you think this article is helpful to you, your attention and forwarding are my greatest support.
    Essay-How does the message queue thread pool model ensure that messages are not lost when restarting

Guess you like

Origin blog.51cto.com/14980978/2544571