Do you really know how to implement a delay queue?

Author: xiewang, IEG operators Tencent Development Engineer

Preface

Delay queue is a technical solution that we often contact and need to use in our daily development process. When developing business requirements some time ago, I also encountered a demand scenario that requires the use of delayed message queues. Therefore, I also investigated a series of different delay queue implementation schemes on the Internet. Here is a summary and I will give you a summary. To share.


Delay queue definition

First of all, I believe everyone is familiar with the data structure of queue, it is a first-in first-out data structure. The elements in the ordinary queue are ordered, and the elements that enter the queue first will be taken out first for consumption;

The biggest difference between the delay queue and the ordinary queue is its delay attribute. The elements of the ordinary queue are first-in first-out and are processed in the order of entering the queue, while the elements in the delay queue will be specified when entering the queue A delay time, indicating that it hopes to be able to process after the specified time has passed. In a sense, the structure of the delay queue is not like a queue, but more like an ordered heap structure weighted by time.

Application scenarios

The usage scenario I encountered when developing business needs is like this. Users can subscribe to different WeChat or QQ template messages in the applet, and product students can create a new message push plan on the management end of the applet, when the specified time arrives At the time of the node, message push is performed to all users who subscribe to the template message.

If it only serves a single small program, it may be a timed task, or even manual timed execution can be the most convenient and fastest to complete this demand, but we hope to abstract a message subscription module service for all For business use, a general system solution is needed at this time, and the delay queue is needed at this time.

In addition to the typical requirements I encountered above, the application scenarios of delay queues are actually very extensive, such as the following scenarios:

  1. A newly created order will be automatically cancelled if the user fails to pay within 15 minutes.

  2. The company’s meeting reservation system will notify all users who have booked the meeting half an hour before the meeting starts after the meeting is booked successfully.

  3. If the safety ticket is not processed for more than 24 hours, the company's WeChat group will be automatically pulled to remind the responsible person.

  4. After the user places an order for takeout, there are 10 minutes before the timeout period to remind the takeout brother that it is about to expire.

For scenarios where the amount of data is relatively small and the timeliness requirements are not so high, a relatively simple way is to poll the database, such as polling all data in the database every second, and processing all expired data, such as if I am an internal company As a developer of the conference reservation system, I might use this solution because the amount of data in the entire system will not be very large and the difference between reminding 30 minutes before the start of the meeting and reminding 29 minutes in advance is not big.

But if the amount of data that needs to be processed is relatively large, the real-time requirements are relatively high, such as the automatic timeout of all new orders on Taobao that are not paid within 15 minutes, and the order of magnitude is as high as one million or even ten million. At this time, if you dare to poll the database If you want to be beaten to death by your boss, you will probably be beaten to death by your operation and maintenance classmates if you don’t.

In this scenario, we need to use our protagonist today-the delay queue. Delay queues provide us with an efficient solution for processing a large number of messages that need to be delayed. So not much to say, let's take a look at several common delay queue solutions and their respective advantages and disadvantages.

Implementation plan

Redis ZSet

We know that Redis has an ordered set of data structure ZSet, each element in ZSet has a corresponding Score, and all elements in ZSet are sorted according to their Score.

Then we can use Redis's ZSet to implement a delay queue through the following operations:

  1. ZADD KEY timestamp taskEnqueue operation:, We add the tasks that need to be processed and add them to ZSet as Score according to their needs. Redis ZAdd the time complexity is O(logN), Nis the number of elements in ZSet, so we can be relatively efficient enqueue operation.

  2. Since the timing of a process (for example every second) by the ZREANGEBYSCOREmethod of the smallest element Score query ZSet, specific operation is: ZRANGEBYSCORE KEY -inf +inf limit 0 1 WITHSCORES. There are two cases of query results:

    a. The score obtained by the query is less than or equal to the current timestamp, indicating that the time for the task to be executed is reached, and the task is processed asynchronously;

    b. The score obtained by the query is greater than the current timestamp. Since the query operation just took out the element with the smallest score, it means that all tasks in the ZSet have not reached the time required to execute, and the query will continue after sleeping for one second;

    Similarly, the ZRANGEBYSCOREtime complexity of the operation O(logN + M), which Nis the number of elements in ZSet, Mas the number of elements of the query, we regularly check the operation is relatively efficient.

Here is a set of Redis implementation delay queue back-end architecture from the Internet, which has carried out a series of optimizations on the original Redis ZSet implementation, making the entire system more stable and robust, able to cope with high concurrency scenarios, and has better The scalability of is a very good architecture design, the overall architecture diagram is as follows:

Its core design ideas:

  1. Routing delayed message tasks to different Redis Keys through the hash algorithm has two major advantages:

    a. It avoids the problem of slower enqueue operation and query operation speed when a KEY stores more delayed messages (the time complexity of both operations is both O(logN)).

    b. The system has better horizontal scalability. When the amount of data increases sharply, we can quickly expand the entire system by increasing the number of Redis Keys to resist the increase in data volume.

  2. Each Redis Key establishes a corresponding processing process, called Event process. The key is polled through the ZRANGEBYSCORE method described in step 2 above to check whether there is a delayed message to be processed.

  3. All Event processes are only responsible for distributing messages, and the specific business logic is processed asynchronously through an additional message queue. The benefits of doing so are also obvious:

    a. On the one hand, the Event process is only responsible for distributing messages, so the speed of processing messages will be very fast, and it is unlikely that messages will accumulate due to complex business logic.

    b. On the other hand, after using an additional message queue, the scalability of message processing will be better. We can expand the message processing capability of the entire system by increasing the number of consumer processes.

  4. Event process adopts Zookeeper to select the main single process deployment method to avoid the accumulation of messages in Redis Key after the event process goes down. Once the leader host of Zookeeper goes down, Zookeeper will automatically select a new leader host to process the messages in the Redis Key.

From the above discussion, we can see that implementing a delay queue through Redis Zset is a more intuitive solution that can be quickly implemented. And we can rely on Redis's own persistence to achieve persistence. Using Redis cluster to support high concurrency and high availability is a good solution for delay queue implementation.

RabbitMQ

RabbitMQ itself does not directly provide support for delay queues. We rely on RabbitMQ's TTL and dead letter queue functions to achieve the effect of delay queues. Let us first understand the dead letter queue and TTL function of RabbitMQ.

Dead letter queue

The dead letter queue is actually a message processing mechanism of RabbitMQ. When RabbmitMQ is producing and consuming messages, the messages will become "dead letters" in the following situations:

  1. The message was rejected basic.reject/ basic.nackand will not be re-deliveredrequeue=false

  2. The message is not consumed after timeout, that is, the TTL has expired

  3. The message queue reaches the maximum length

Once the message becomes a dead letter, it will be re-delivered to the Dead-Letter-Exchange, and then the Dead-Letter-Exchange will be forwarded to the corresponding dead-letter queue according to the binding rules, and the message will be sent by monitoring the queue. Reconsume.

Message time to live TTL

TTL (Time-To-Live) is an advanced feature of RabbitMQ, which represents the maximum lifetime of a message in milliseconds. If a message is not consumed within the time set by the TTL, it will become a dead letter and enter the dead letter queue we mentioned above.

There are two different ways to set the TTL attribute of a message. One way is to directly set the TTL expiration time of the entire queue when creating the queue. All messages entering the queue are set to a uniform expiration time. Once the message expires , Will be discarded immediately and enter the dead letter queue. The reference code is as follows:

Map<String, Object> args = new HashMap<String, Object>();
args.put("x-message-ttl", 6000);
channel.queueDeclare(queueName, durable, exclusive, autoDelete, args);

This method is more suitable when the delay time of the delay queue is a fixed value.

Another way is to set a single message, the reference code is as follows, the message is set with an expiration time of 6 seconds:

AMQP.BasicProperties.Builder builder = new AMQP.BasicProperties.Builder();
builder.expiration("6000");
AMQP.BasicProperties properties = builder.build();
channel.basicPublish(exchangeName, routingKey, mandatory, properties, "msg content".getBytes());

If you need to set different delay times for different messages, the TTL setting for the queue above cannot meet our needs, and you need to use this TTL setting for a single message.

However, it should be noted that with the TTL set in this way, the message may not die on time, because RabbitMQ will only check whether the first message has expired. For example, in this case, if the first message is set with a TTL of 20s and the second message is set with a TTL of 10s, RabbitMQ will wait until the first message expires before letting the second message expire.

The solution to this problem is also very simple, just install a plug-in of RabbitMQ:

https://www.rabbitmq.com/community-plugins.html

After installing this plugin, all messages can expire according to the set TTL.

RabbitMQ implements delay queue

Well, after introducing the two features of RabbitMQ's dead letter queue and TTL, we are only one step away from implementing the delay queue.

Smart readers may have discovered that TTL is not the time to delay the message in the delay queue? If we set the TTL to the delay time of the message that needs to be delayed, and deliver it to the normal queue of RabbitMQ, and never consume it, then after the TTL time, the message will be automatically delivered to the dead letter queue. At this time We use the consumer process to consume the messages in the dead letter queue in real time, which not only achieves the effect of the delayed queue.

The overall process of using RabbitMQ to implement the delay queue can be intuitively seen from the following figure:

Using RabbitMQ to implement delay queues, we can make good use of some of the characteristics of RabbitMQ, such as reliable message sending, reliable message delivery, and dead letter queue to ensure that messages are consumed at least once and that messages that are not processed correctly will not be discarded. In addition, through the characteristics of the RabbitMQ cluster, the single point of failure problem can be solved very well, and the delay queue will not be unavailable or the message will be lost because a single node hangs up.

TimeWheel

The TimeWheel time wheel algorithm is an ingenious and efficient algorithm for implementing delay queues. It is used in various frameworks such as Netty, Zookeeper, and Kafka.

Time wheel

As shown in the figure above, the time wheel is a circular queue that stores delayed messages. Its bottom layer is implemented by an array, which can be traversed efficiently. Each element in this circular queue corresponds to a delayed task list. This list is a doubly circular linked list, and each item in the linked list represents a delayed task that needs to be executed.

The time wheel will have a dial pointer to indicate the time currently pointed to by the time wheel. As time goes by, the pointer will continue to advance and process the delayed task list at the corresponding position.

Add a delayed task

Since the size of the time wheel is fixed, and each element in the time wheel is a two-way circular linked list, we can O(1)add delayed tasks to the time wheel under the time complexity of.

As shown in the figure below, for example, we have such a time wheel. When the dial pointer points to the current time of 2, we need to add a new task with a delay of 3 seconds. We can quickly calculate the position of the delayed task in the time wheel as 5 , And add it to the end of the task list at position 5.


Multi-layer time wheel

So far everything is great, but careful students may have discovered that the size of the time wheel above is fixed, only 12 seconds. If we have a task that needs to be delayed by 200 seconds at this time, what should we do? Do you directly expand the size of the entire time wheel? This is obviously undesirable, because in this way we need to maintain a very, very large time wheel, the memory is unacceptable, and the addressing efficiency will be reduced after the underlying array is large, affecting performance.

To this end, Kafka introduced the concept of a multi-layer time wheel. In fact, the concept of the multi-layered time wheel is very similar to the concept of the hour, minute, and second hands on our mechanical watches. When the current time cannot be expressed using only the second hand, the minute hand combined with the second hand is used to indicate it. Similarly, when the due time of a task exceeds the time range indicated by the current time wheel, it will try to add it to the upper time wheel, as shown in the following figure:

The time range represented by the entire time wheel of the first level time wheel is 0-12 seconds, and the time range that each grid of the second level time wheel can represent is the range represented by the entire first level time wheel, which is 12 seconds, so the entire second level The time range that the layer time wheel can represent is 12*12=144 seconds, and so on, the third layer time wheel can represent 1728 seconds, the fourth layer is 20736 seconds, and so on.

For example, now we need to add a delayed message with a delay of 200 seconds, and we find that it has exceeded the time range that the first layer of time wheel can represent, we need to continue to look at the upper time wheel and add it to the second layer of time Wheel 200/12 = 17 position, and then we find that 17 also exceeds the representation range of the second time wheel, then we need to continue to look up and add it to the 17/12 = 2 of the third time wheel position.

The core process of adding delayed tasks in the time wheel algorithm in Kafka and pushing the time wheel to scroll is as follows. Bucket is the delayed task queue in the time wheel, and the DelayQueue introduced by Kafka solves the problem of inefficient time wheel scrolling caused by most buckets being empty:

The delay queue implemented by the time wheel can support the efficient triggering of a large number of tasks. And in the implementation of Kafka's time wheel algorithm, DelayQueue is also introduced, and DelayQueue is used to push the time wheel scroll, and the addition and deletion of delayed tasks are placed in the time wheel. This design greatly improves the entire delay queue. effectiveness.

to sum up

Delay queues are widely used in our daily development. This article introduces three different solutions for implementing delay queues. The three solutions have their own characteristics. For example, the implementation of Redis is the simplest to understand and can be implemented quickly, but Redis after all It is based on memory. Although there are data persistence solutions, there is still the possibility of data loss. The implementation of RabbitMQ, due to RabbitMQ's own features such as reliable message sending, reliable message delivery, and dead letter queues, can ensure that messages are consumed at least once and that messages that are not processed correctly will not be discarded, making the reliability of the message more reliable. Guaranteed. Finally, Kafka's time wheel algorithm, I personally feel that it is the most difficult to understand among the three implementations, but it is also a very ingenious implementation. Finally, I hope that the above content can help you provide some ideas when implementing your own delay queue.

Welcome to follow our video number: Tencent programmer

Guess you like

Origin blog.csdn.net/Tencent_TEG/article/details/109192502