Distributed Delay Message

background

Open source version of RocketMQ only provides the message queuing delay 18 level, this function in the open source version is particularly tasteless, but Ali cloud RocketMQ has provided support any-second delay queue 40 days, some really function you can only charge money can have. Of course, you might want to change an open source message queue, message queue in the open source community in delaying a message many are not supported for example: RabbitMQ, Kafka, have delayed function can only be completed by some special methods. Why do so many do not realize this function? It is more complicated because of the technical difficulty of it? Next, we analyze how to achieve a delay message.

After RocketMQ message generation producers wish to consume the scene may be used in the timing interval of time after the message, does not currently support custom RocketMQ delay time, but you can specify the level of delay, the delay 18 may be selected levels, respectively corresponding to the delay time is 1s 5s 10s 30s 1m 2m 3m 4m 5m 6m 7m 8m 9m 10m 20m 30m 1h 2h.

      RocketMQ delay message theme SCHEDULE_TOPIC_XXXX, 18 delay level corresponds to 18 message queue when the message is delivered to the Broker, if the message is specified delay level (DelayTimeLevel), a message topic changes to SCHEDULE_TOPIC_XXXX, queueId change the delay corresponding to the class message queue, and the original topic will put queueId REAL_TOPIC msg attribute in the REAL_QID.

Local Delay

Before you implement distributed message latency message queue, we usually think about how we implement some delay function in your application? In Java can be done we delay function in the following way:

  • ScheduledThreadPoolExecutor: ScheduledThreadPoolExecutor inherited ThreadPoolExecutor, we submit the task, the task will first be submitted to DelayedWorkQueue a priority queue, sorted by expiration time, the priority queue is what we stack structure, the complexity of the task each time you submit sorting is O (logN). Then the task will take time out from the top of the heap in our mission, which is the minimum time delay our task. ScheduledThreadPoolExecutor has the advantage that the delay task execution can support multi-threaded parallel execution, because he inherited ThreadPoolExecutor.
  • Timer: Timer also use the priority queue structure to do, but it does not inherit the thread pool, is relatively independent, does not support multi-threading, only use a separate thread.

Distributed message queuing delay

We realize local delay is relatively simple, direct use of Java in the ready-to, then we distributed message queues realize what difficulties it?

Many students will think of us first realize delay task distributed message queue, Can directly use the local stuff, with ScheduledThreadPoolExecutor, Timer, of course this is possible, provided that your message is very small, but we distributed style message queues are often middleware, enterprise-level data volume is very large, then we certainly pure memory of the program does not work. So we have the following several programs to solve our problem.

database

Generally database is a way we can easily think of, we can usually create such a table below:

CREATE TABLE `delay_message` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`excute_time` bigint(16) DEFAULT NULL COMMENT '执行时间,ms级别',
`body` varchar(4096) COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT '消息体',
PRIMARY KEY (`id`),
KEY `time_index` (`excute_time`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

We use this table represent our real excute_time execution time, and its index, then our news service, start a regular tasks, timing messages can scan has been performed from the database, and begins execution, the specific process is as follows FIG surface:

If someone ask how you implement distributed delay the message, he threw this article

 

Using the database approach is a more primitive way, there is no delay before the news this concept, how many minutes do an order expired this function, usually used this method to complete. And this method is usually relatively limited our individual business, if you want to expand our enterprise middleware is a class of words is not enough, because mysql due to the characteristics of BTree, as will the cost of maintenance of the secondary index is growing, leading to written will become slower, so the program will not normally be considered.

RocksDB/LevelDB

Before we introduce RocketMQ in the open source version only achieved 18 Level delay of the message, but there are a lot of companies based RocketMQ do its own set of support any time delay information inside the US group encapsulates RocketMQ done using LevelDB package delay of the message, the open source DDMQ drops, the use of a delay message RocketMQ RocksDB of part of the package.

The principle and Mysql substantially similar, as shown below:

If someone ask how you implement distributed delay the message, he threw this article

 

  • Step1: When there is a message sent DDMQ agent layer, for distributing messages to make, since it has a variety of internal message queue, kafka, rocketMQ etc., if the message is sent to the delay message will RockesDB storage.
  • Step2: By regular tasks in rotation scan forwards the data delivered to RocketMQ cluster.
  • Step3: consumer spending.

Why is the database RocksDB would be more appropriate than Mysql it? Because the characteristics of the LSM RocksDB tree, using a large number of scenes suitable for writing, and a message queue more fit the scene, so this bit is selected and US group delay as a storage medium encapsulated message.

3.2 + disk storage time round

Besides time before the wheel, let's go back ScheduledThreadPoolExecutor still Timer local implementation delay when we use again the priority queue priority queue essence they are used to complete the heap structure that is inserted into the structure of the heap the time complexity is O (LogN), if in the future we can do unlimited memory, we do use use the priority queue messages stored delay, but with the proliferation of the message, our message will be more efficient insertion to lower, then how can we make our message is not inserted efficiency with the increase of the message becomes low? The answer is the time round.

What is the time round it? In fact, we can simply be seen as a multidimensional array. In many frameworks use the time to do some regular round of tasks, to replace our Timer, for example, I talked before about an article in the local cache Caffeine, in Caffeine is a two-story time round, viz., 20 dimensional array, one-dimensional data representing the time dimension, such as a large, seconds, minutes, hours, days, etc., which the two-dimensional data representing the time dimension smaller time dimension, such as an interval within the second section. When targeting a TimeWhile [i] [j], which actually is a linked list data structure, record our Node. Caffeine use in record time round we expire at a certain time data, and then to deal with.

If someone ask how you implement distributed delay the message, he threw this article

 

由于时间轮是一个数组的结构,那么其插入复杂度是O(1)。我们解决了效率之后,但是我们的内存依旧不是无限的,我们时间轮如何使用呢?答案当然就是磁盘,在去哪儿开源的QMQ中已经实现了时间轮+磁盘存储,这里为了方便描述我将其转化为RocketMQ中的结构来进行讲解,实现图如下:

If someone ask how you implement distributed delay the message, he threw this article

 

  • Step 1: 生产者投递延时消息到CommitLog,这个时候使用了偷换Topic的那招,来达到后面的效果。
  • Step 2: 后台有一个Reput的任务定时拉取,延时Topic相关的Message。
  • Step 3: 判断这个Message是否在当前时间轮范围中,如果不在则来到Step4,如果在的话就直接将消息投递进入时间轮。
  • Step 4: 找到当前消息所属的scheduleLog,然后写入进去,去哪儿默认划分是一个小时为一段,这里可以根据业务自行调整。
  • Step 5:时间轮会定时预加载下个时间段的scheduleLog到内存。
  • Step 6: 到点的消息会还原topic再次投递到CommitLog,如果投递成功这里会记录dispatchLog。记录的原因是因为时间轮是内存的,你不知道已经执行到哪个位置了,如果执行到最后最后1s钟的时候挂了,这段时间轮之前的所有数据又得重新加载,这里是用来过滤已经投递过的消息。

时间轮+磁盘存储我个人觉得比上面的RocksDB要更加正统一点,不依赖其他的中间件就可以完成,可用性自然也就更高,当然阿里云的RocketMQ具体怎么实现的这个两种方案都有可能。

3.3 redis

在社区中也有很多公司使用的Redis做的延时消息,在Redis中有一个数据结构是Zest,也就是有序集合,他可以实现类似我们的优先级队列的功能,同样的他也是堆结构,所以插入算法复杂度依然是O(logN),但是由于Redis足够快,所以这一块可以忽略。(这块没有做对比的基准测试,只是猜测)。有同学会问,redis不是纯内存的k,v吗,同样的应该也会受到内存限制啊,为什么还会选择他呢?

其实在这个场景中,Redis是很容易水平扩展的当一个Redis内存不够,这里可以使用两个甚至更多,来满足我们的需要,redis延时消息的原理图(原图出自:https://www.cnblogs.com/lylife/p/7881950.html)如下:

If someone ask how you implement distributed delay the message, he threw this article

 

  • Delayed Messages Pool: Redis Hash结构,key为消息ID,value为具体的message,当然这里也可以用磁盘或者数据库代替。这里主要存储我们所有消息的内容。
  • Delayed Queue: ZSET数据结构,value为消息ID,score为执行时间,这里Delayed Queue可以水平扩展从而增加我们可以支持的数据量。
  • Worker Thread Pool: 其中有多个Worker,可以部署在多个机器上形成一个集群,集群中的所有Worker通过ZK进行协调,分配Delayed Queue。

我们怎么才能知道Delayed Queue中的消息到期了呢?这里有两种方法:

  • 每个Worker定时扫描,ZSET的最小执行时间,如果到了就取出,这个方法在消息少的时候特别浪费资源,在消息量多的时候,由于轮训不及时导致延时的时间不准确。
  • 因为第一个方法问题比较多,所以这里借鉴了Timer中的一些思想,通过wait-notify可以达到一个比较好的延时效果,并且资源也不会浪费,第一次的时候还是获取ZSET中最小的时间,然后wait(执行时间-当前时间),这样就不需要浪费资源到达时间时会自动响应,如果当前ZSET有新的消息进入,并且比我们等待的消息还要小,那么直接notify唤醒,重新获取这个更小的消息,然后又wait,如此循环。

总结

This article describes three ways to implement distributed delaying a message, hoping to provide some ideas when you achieve your delayed message. Overall for the first two methods may be more widely applicable face a little, after all, in these large RocketMQ message queue middleware, there are other integrated functions, such as order information, transactional information and other information may be more inclined to delay so a distributed message queue function, rather than as a separate component is present. Of course, there are some details of which have not introduced one by one, the details can refer to QMQ and DDMQ source.

Guess you like

Origin www.cnblogs.com/yizhou35/p/12026180.html