If someone asks you how to implement distributed delayed message, this article will be thrown to him

1. Background

The last article introduced the overall architecture and principles of RocketMQ . If you are interested, you can read it. In the delayed message part of this article, I wrote that the open source version of RocketMQ only provides 18 levels of message queue delay. This function is in The open source version is very tasteless, but RocketMQ in Alibaba Cloud provides support for any second-level delay queue within 40 days. Sure enough, some functions can only be obtained by charging money. Of course, you may want to change to an open source message queue. In the open source community, many message queue delay messages are not supported. For example, RabbitMQ, Kafka, etc., can only complete the delay function through some special methods. Why are so many not implementing this feature? Is it because the technical difficulty is more complicated? Next, we analyze how to implement a delayed message.

2. Local delay

Before implementing the delayed message of the distributed message queue, let's think about how we usually implement some delay functions in our own applications? In Java, we can complete our delay function in the following ways:

  • ScheduledThreadPoolExecutor: ScheduledThreadPoolExecutor inherits ThreadPoolExecutor. When we submit a task, it will first submit the task to a priority queue of DelayedWorkQueue, and sort it according to the expiration time. This priority queue is also our heap structure, and the complexity of the sorting of tasks submitted each time is O(logN). Then when the task is fetched, our task will be taken from the top of the heap, that is, the task with the smallest delay time. The advantage of ScheduledThreadPoolExecutor is that executing delayed tasks can support multi-threaded parallel execution because it inherits from ThreadPoolExecutor.

  • Timer: Timer is also made by using the priority queue structure, but it does not inherit the thread pool. It is relatively independent and does not support multi-threading, and can only use a single thread.

3. Distributed message queue delay

It is relatively simple for us to implement local delay, and we can directly use the ready-made ones in Java. What are the difficulties in the implementation of our distributed message queue?

Many students will first think that we implement the delayed task of distributed message queue. Can we directly use the local set, use ScheduledThreadPoolExecutor, Timer, of course, this is possible, provided that your message volume is small, but we distribute Message queues are often enterprise-level middleware, and the amount of data is very large, so our pure memory solution will definitely not work. So we have the following solutions to solve our problem.

3.1 Database

The database is generally a method that we can easily think of. We can usually create the following table:

CREATE TABLE `delay_message` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `excute_time` bigint(16) DEFAULT NULL COMMENT '执行时间,ms级别',
  `body` varchar(4096) COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT '消息体',
  PRIMARY KEY (`id`),
  KEY `time_index` (`excute_time`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

In this table, we use execute_time to represent our real execution time, and build an index on it, and then in our message service, start a scheduled task, regularly scan the executable messages from the database, and then start execution. The specific process is as follows Shown on the surface:

The method of using the database is a relatively primitive method. Before there is no concept of delayed messages, the function of how many minutes an order expires is usually done using this method. And this method is usually limited to our single business. If we want to expand it into a middleware of our enterprise level, it will not work, because mysql, due to the characteristics of BTree, will increase the overhead of maintaining secondary indexes, resulting in Writes will get slower and slower, so this solution is usually not considered.

RocksDB/LevelDB

We previously introduced that RocketMQ only implements 18 levels of delayed messages in the open source version, but many companies have made their own set of delayed messages that support any time based on RocketMQ, and encapsulated RocketMQ in Meituan and used LevelDB to do it. For the encapsulation of delayed messages, in Didi's open source DDMQ, RocksDB is used to encapsulate the delayed message part of RocketMQ.

The principle is basically similar to that of Mysql, as shown in the following figure:

  • Step1: When DDMQ sends a message, there will be a broker layer to distribute the message, because there are various message queues, kafka, rocketMQ, etc., if it is a delayed message, the message will be sent to the storage of RockesDB.
  • Step2: Forward and deliver the data to the RocketMQ cluster through the scheduled task rotation scan.
  • Step3: Consumers consume.

Why is the same database RocksDB more suitable than Mysql? Because the feature of RocksDB is LSM tree, its usage scenario is suitable for a large number of writes, which is more in line with the scenario of message queue, so this is also the storage medium that Didi and Meituan chose as the storage medium for delayed message encapsulation.

3.2 Time Wheel + Disk Storage

Before talking about the time wheel, let's go back to the ScheduledThreadPoolExecutor and Timer that we used when we implemented the local delay. They are all done using the priority queue. The priority queue is essentially the heap structure, and the insertion of the heap structure The time complexity is O(LogN). If our memory can be unlimited in the future, we use the priority queue to store delayed messages, but as the number of messages increases, the efficiency of our message insertion will also increase. It is getting lower and lower, so how can we make the efficiency of inserting messages not decrease with the increase of messages? The answer is the wheel of time.

What is the time wheel? In fact, we can simply think of it as a multidimensional array. In many frameworks, the time wheel is used to do some timed tasks to replace our Timer. For example, I mentioned an article about the local cache Caffeine. In Caffeine, it is a two-layer time wheel, that is, two A dimensional array whose one-dimensional data represents a larger time dimension such as seconds, minutes, hours, days, etc., and its two-dimensional data represents a time dimension with a smaller time dimension, such as a certain interval within seconds. When a TimeWhile[i][j] is located, its data structure is actually a linked list, which records our Node. In Caffeine, we use the time wheel to record our data that expires at a certain time, and then process it.

Since the time wheel is an array structure, its insertion complexity is O(1). After we solve the efficiency, but our memory is still not infinite, how do we use the time wheel? The answer is of course the disk. The time wheel + disk storage has been implemented in the open source QMQ of Qunar. For the convenience of description, I will convert it into the structure in RocketMQ to explain. The implementation diagram is as follows:

  • Step 1: The producer delivers the delayed message to the CommitLog. At this time, the trick of secretly changing the topic is used to achieve the following effect.
  • Step 2: There is a Reput task in the background that pulls regularly and delays Topic-related Messages.
  • Step 3: Determine whether the message is in the current time wheel range. If not, go to Step 4. If it is, directly deliver the message into the time wheel.
  • Step 4: Find the scheduleLog to which the current message belongs, and then write it into it. The default division of where to go is one hour, which can be adjusted according to the business.
  • Step 5: The time wheel will periodically preload the scheduleLog of the next time period to the memory.
  • Step 6: The message to the point will restore the topic and deliver it to the CommitLog again. If the delivery is successful, the dispatchLog will be recorded here. The reason for recording is because the time wheel is in memory, and you don't know where it has been executed. If it hangs in the last 1s of execution, all the data before this time wheel has to be reloaded. Here is used Filter messages that have already been delivered.

I personally think that the time wheel + disk storage is more formal and unified than the above RocksDB. It can be completed without relying on other middleware, and the availability is naturally higher. Of course, how Alibaba Cloud's RocketMQ implements these two solutions is possible. .

3.3 repeat

There are also many companies in the community using Redis to make delayed messages. In Redis, there is a data structure called Zest, which is an ordered set. It can implement functions similar to our priority queue, and it is also a heap structure. So the insertion algorithm complexity is still O(logN), but since Redis is fast enough, this piece can be ignored. (There is no benchmark for comparison, just guesswork). Some students will ask, is redis not pure memory k, v, the same should also be limited by memory, why choose him?

In fact, in this scenario, Redis is easy to scale horizontally. When one Redis memory is not enough, two or more can be used here to meet our needs. The schematic diagram of redis delay message (the original image is from: https:// www.cnblogs.com/lylife/p/7881950.html ) as follows:

  • Delayed Messages Pool: Redis Hash structure, the key is the message ID, and the value is the specific message. Of course, it can also be replaced by a disk or a database. This mainly stores the content of all our messages.
  • Delayed Queue: ZSET data structure, the value is the message ID, and the score is the execution time. Here the Delayed Queue can be extended horizontally to increase the amount of data we can support.
  • Worker Thread Pool: There are multiple workers, which can be deployed on multiple machines to form a cluster. All workers in the cluster coordinate through ZK and allocate Delayed Queue.

How can we know that the message in the Delayed Queue has expired? There are two methods here:

  • Each worker scans regularly, and the minimum execution time of ZSET is taken out if it arrives. This method is a waste of resources when there are few messages. When the amount of messages is large, the delay time is inaccurate due to untimely rotation training.
  • Because the first method has many problems, here are some ideas from Timer. Through wait-notify, a better delay effect can be achieved, and resources will not be wasted. For the first time, the smallest value in ZSET is obtained. time, and then wait (execution time - current time), so that there is no need to waste resources and it will automatically respond when the time arrives. If there is a new message entering the current ZSET, and it is smaller than the message we are waiting for, then directly notify wake up, Refetch this smaller message, then wait again, and so on.

Summarize

This article introduces three ways to implement distributed delayed messages, hoping to provide some ideas when you implement your own delayed messages. In general, the first two methods may be more applicable. After all, in RocketMQ, these large-scale message queue middleware, there are some other integrated functions, such as sequential messages, transaction messages, etc., delayed messages may be more inclined So a function in the distributed message queue, rather than exist as a separate component. Of course, there are still some details that are not introduced one by one. For details, please refer to the source code of QMQ and DDMQ.

If you think this article is helpful to you, your attention and forwarding are the greatest support for me, O(∩_∩)O:

{{o.name}}
{{m.name}}

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324123650&siteId=291194637