You need to know RoketMQ

1 Overview

This article will try to introduce a comprehensive comparison of all the key points RocketMQ and Kafka, I hope you can gain something to read.

RocketMQ formerly called the MetaQ, released version 3.0 of the MeataQ renamed RocketMQ time, their design ideas on the nature and similar to Kafka, but Kafka and different is its use of Java development, due to the domestic audience is much more than Java Scala Therefore RocketMQ many Java language-based company's first choice. The same RocketMQ and Kafka are the Apache Foundation top-level project, community activity they are very high, very fast iterative project updates.

2. Getting Started examples

2.1 Producer

public class Producer {
    public static void main(String[] args) throws MQClientException, InterruptedException {

        DefaultMQProducer producer = new DefaultMQProducer("ProducerGroupName");
        producer.start();

        for (int i = 0; i < 128; i++)
            try {
                {
                    Message msg = new Message("TopicTest",
                        "TagA",
                        "OrderID188",
                        "Hello world".getBytes(RemotingHelper.DEFAULT_CHARSET));
                    SendResult sendResult = producer.send(msg);
                    System.out.printf("%s%n", sendResult);
                }

            } catch (Exception e) {
                e.printStackTrace();
            }

        producer.shutdown();
    }
}

Direct definition of a good producer, created Message, you can call the send method.

2.2 Consumers

public class PushConsumer {

    public static void main(String[] args) throws InterruptedException, MQClientException {
        DefaultMQPushConsumer consumer = new DefaultMQPushConsumer("CID_JODIE_1");
        consumer.subscribe("TopicTest", "*");
        consumer.setConsumeFromWhere(ConsumeFromWhere.CONSUME_FROM_FIRST_OFFSET);
        //wrong time format 2017_0422_221800
        consumer.setConsumeTimestamp("20181109221800");
        consumer.registerMessageListener(new MessageListenerConcurrently() {

            @Override
            public ConsumeConcurrentlyStatus consumeMessage(List<MessageExt> msgs, ConsumeConcurrentlyContext context) {
                System.out.printf("%s Receive New Messages: %s %n", Thread.currentThread().getName(), msgs);
                return ConsumeConcurrentlyStatus.CONSUME_SUCCESS;
            }
        });
        consumer.start();
        System.out.printf("Consumer Started.%n");
    }
}

3.RocketMQ architecture principles

For RocketMQ to throw a few questions:

  • RocketMQ the topic and what kind of queue, and Kafka partitions What is the difference?

  • RocketMQ network model of how what, and Kafka contrast?

  • RocketMQ message store model is what, how to ensure highly reliable storage, and how to compare Kafka?

3.1 RocketMQ Chart

You need to know RoketMQ

For RocketMQ architecture diagram, in general look and Kafka was not much difference, but in many details there are many differences, the next will tell one by one.

3.2 RocketMQ Glossary

3.1 architecture we have more Producer, multiple master Broker, more from the Broker, each of the Producer can correspond to multiple Topic, each Consumer can consume more Topic.

Broker information will be reported to the NameServer, Consumer and Broker will pull information from NameServer the Topic.

  • Producer: Manufacturer message, sending a message to the client Broker

  • Consumer: Consumer news, read a message from the Broker client

  • Broker: intermediate message processing node, here and kafka different, kafka of Broker has no concept of master and slave, can be a write request and the backup other node data, RocketMQ only the master Broker node to write, generally by the master node to read, when the master node faulty or some other special patterns will be used to read from the node, somewhat similar - the main mysql from architecture.

  • Topic: message subject, type a message, send a message producers, consumers read their messages.

  • Group: divided ProducerGroup, ConsumerGroup, on behalf of a class of producers and consumers, in general, the same service can be used as a Group, with a Group generally send and consume messages are the same.

  • Tag: Kafka is no concept, Tag belong to two message types, generally associated business can use the same Tag, such as order message queue, use Topic_Order, Tag can be divided into Tag_ food order, Tag_ clothing orders and many more.

  • Queue: kafka called in the Partition, each internal Queue is ordered, it is divided into two kinds of queue read and write in RocketMQ, in general read and write the same number of queues, and if not there will be a lot of problems.

  • NameServer: Kafka used address information ZooKeeper save Broker, and the Leader of Broker's elections and no elections Broker strategies adopted in RocketMQ in, so the use of stateless NameServer too storage, due NameServer is stateless, clusters and will not communicate, so when uploading data need to be sent to all nodes between nodes.

Many of my friends are asking what is stateless it? State whether the data is actually going to do a store stateful session data is persistent, stateless services can be understood that a memory services, NameServer service itself is a memory, all data is stored in memory, then restart It will be lost.

3.3 Topic and Queue

Each message in the RocketMQ, there is a Topic, used to distinguish different messages. A theme generally have multiple subscribers messages when producers publish messages to a topic, subscribe to the theme consumers can receive a new message producer writes.

For more Queue Topic partakers in, this is actually that we send / read messages channel minimum unit, we need to send a message to a specified write a Queue, pull message, they also need to specify a pull queue, so we can maintain the order of the message based on our orderly queue queue dimension, if you want to do then you need to orderly global queue size is set to 1, so that all the data will be ordered in the queue.

You need to know RoketMQ

Our Producer Queue will be selected by some of the strategies in the image above:

  • Non message sequence: non-sequential messages are typically sent in rotation by way of direct transmission.

  • Order news: According to a Key such as our common orders Id, User Id, were Hash, the same type of data in the same queue, to ensure that our sequential.

We will be based on the same set of Consumer several strategies to choose Queue, such as common or consistency Hash evenly distributed.

Note that when the Consumer appear offline or on-line, where the need to do weight balance, that is, Rebalance, RocketMQ re-balance mechanism is as follows:

  • A timing to pull in the information broker, topic of

  • 20s do every weight balance

  • A randomly selected current main Broker Topic, there should be noted that not every weight balance all the main Broker will be selected, because there will be more cases and then a Broker Broker.

  • Get all the current machine ID Broker, the current ConsumerGroup.

  • Then policy assignments.

Because rebalancing is time to do, so here it is a Queue may occur simultaneously consume two Consumer, so there will repeat the message delivery.

Different Kafka weight balancing mechanism and RocketMQ, rebalancing Kafka is through Consumer and Coordinator contact to complete, when the Coordinator perceived changes in consumer group will be sent counterbalancing signal heart procedure, and then re-balanced by one ConsumerLeader select, and then informed of the results by the Coordinator to all consumers.

3.3.1 Queue inconsistent with the number of read and write

Is divided into two kinds of reading and writing in RocketMQ the Queue, in the beginning of the contact RocketMQ always thought that the number of read and write queue configuration inconsistencies not be any problem, such as when a lot of time reading a lot of consumers we configure the machine queue, but the actual situation found during the message can not consume and consume no messages will appear.

  • When the number of the queue is greater than the number of queue write read, write queue when data is larger than the read queue ID will not be part of this spending, because it would not be assigned to the consumer.

  • When the number is greater than the number of queues queue read write, then there would be no more than the number of queue messages are delivered in.

This feature RocketMQ obviously useless in my opinion, because basically read and write queue will be set to the same size, why not direct this to be unified, but not easy for the user to configure the same error.

This question has not received a good answer RocketMQ of Issue in.

3.4 Consumption Model

Message queue consumption model generally divided into two types, push-based message (push) models and the pull-based model (poll) a.

Messaging push model based on consumption by the state news agency records. After the Message Agent will push the message to the consumer, mark the message as having been consumed, but this approach can not guarantee a good processing semantics consumption. For example, when we have to send a message to the consumer, because the consumer or hang up the process due to network reasons do not receive this message, if we in the consumer agency to mark it as consumption, the news permanently lost. If we use this method producers receive a reply message, the message broker need to record consumption state, this is not desirable.

Used RocketMQ students certainly can not help but think, in RocketMQ offers two consumer is not it?

MQPullConsumer And  MQPushConsumer which  MQPushConsumer do not is our push model it? In fact, these two models are the client take the initiative to pull the message that implementation differences are as follows:

  • MQPullConsumer: Every incoming message needs to pull offset pull message and the message how many messages the amount of each pull, pull specific where, how much is controlled by the client pull.

  • MQPushConsumer: The client is also active pull message, but the message is saved by the progress of the server, Consumer regularly report their own consumption to where, so the next time Consumer spending is to be found the last point of consumption, in general use PushConsumer we do not care how much data offset and pull, can be used directly.

3.4.1 Cluster consumption and consumer broadcasting

We divided into two consumption patterns, consumer cluster, radio consumption:

  • Cluster consumption: the same GroupId belongs to a cluster, in general, a message will only be a consumer in any process.

  • Broadcast Consumer: Consumption of broadcast messages are all clustered in consumer news, but should pay attention to because broadcast consumption offset stored in the server cost is too high, so the client will be restarted every time the latest news from the consumer, rather than the last save the offset.

3.5 Network Model

Native socket used in Kafka in network communication, and RocketMQ using the Netty network framework, now more and more middleware will not choose native socket directly, but use the Netty framework, mainly due to the following for several reasons:

  • API is simple to use, does not require too much care about the details of the network, to focus more on middleware logic.

  • High performance.

  • Mature and stable, jdk nio of the bug have been fixed.

Select the frame is one thing, but want to ensure efficient network communication network threading model is on the one hand, we have a common 1 + N (1 a Acceptor threads, N number IO thread), 1 + N + M (1 one acceptor thread, N number IO thread, a worker thread M) like model, RocketMQ using a model 1 + N1 + N2 + M, as shown in FIG:

You need to know RoketMQ

An acceptor thread, a IO thread Nl, N2 threads used for Shake-hand, SSL authentication, codec; M threads used for service processing. The advantage would codec, and SSL authentication and some may be time-consuming operation on a separate thread pool thread will not occupy our business and IO threads.

3.6 highly reliable distributed storage model

As a good news system, high-performance storage, high availability are essential.

3.6.1 High-performance log storage

RocketMQ和Kafka的存储核心设计有很大的不同,所以其在写入性能方面也有很大的差别,这是16年阿里中间件团队对RocketMQ和Kafka不同Topic下做的性能测试:

You need to know RoketMQ

从图上可以看出:

  • Kafka在Topic数量由64增长到256时,吞吐量下降了98.37%。

  • RocketMQ在Topic数量由64增长到256时,吞吐量只下降了16%。

    这是为什么呢?kafka一个topic下面的所有消息都是以partition的方式分布式的存储在多个节点上。同时在kafka的机器上,每个Partition其实都会对应一个日志目录,在目录下面会对应多个日志分段。所以如果Topic很多的时候Kafka虽然写文件是顺序写,但实际上文件过多,会造成磁盘IO竞争非常激烈。

那RocketMQ为什么在多Topic的情况下,依然还能很好的保持较多的吞吐量呢?我们首先来看一下RocketMQ中比较关键的文件:

You need to know RoketMQ

这里有四个目录(这里的解释就直接用RocketMQ官方的了):

  • commitLog:消息主体以及元数据的存储主体,存储Producer端写入的消息主体内容,消息内容不是定长的。单个文件大小默认1G ,文件名长度为20位,左边补零,剩余为起始偏移量,比如00000000000000000000代表了第一个文件,起始偏移量为0,文件大小为1G=1073741824;当第一个文件写满了,第二个文件为00000000001073741824,起始偏移量为1073741824,以此类推。消息主要是顺序写入日志文件,当文件满了,写入下一个文件;

  • config:保存一些配置信息,包括一些Group,Topic以及Consumer消费offset等信息。

  • consumeQueue:消息消费队列,引入的目的主要是提高消息消费的性能,由于RocketMQ是基于主题topic的订阅模式,消息消费是针对主题进行的,如果要遍历commitlog文件中根据topic检索消息是非常低效的。Consumer即可根据ConsumeQueue来查找待消费的消息。其中,ConsumeQueue(逻辑消费队列)作为消费消息的索引,保存了指定Topic下的队列消息在CommitLog中的起始物理偏移量offset,消息大小size和消息Tag的HashCode值。consumequeue文件可以看成是基于topic的commitlog索引文件,故consumequeue文件夹的组织方式如下:topic/queue/file三层组织结构,具体存储路径为:HOME \store\index\${fileName},文件名fileName是以创建时的时间戳命名的,固定的单个IndexFile文件大小约为400M,一个IndexFile可以保存 2000W个索引,IndexFile的底层存储设计为在文件系统中实现HashMap结构,故rocketmq的索引文件其底层实现为hash索引。

我们发现我们的消息主体数据并没有像Kafka一样写入多个文件,而是写入一个文件,这样我们的写入IO竞争就非常小,可以在很多Topic的时候依然保持很高的吞吐量。有同学说这里的ConsumeQueue写是在不停的写入呢,并且ConsumeQueue是以Queue维度来创建文件,那么文件数量依然很多,在这里ConsumeQueue的写入的数据量很小,每条消息只有20个字节,30W条数据也才6M左右,所以其实对我们的影响相对Kafka的Topic之间影响是要小很多的。我们整个的逻辑可以如下:

You need to know RoketMQ

Producer不断的再往CommitLog添加新的消息,有一个定时任务ReputService会不断的扫描新添加进来的CommitLog,然后不断的去构建ConsumerQueue和Index。

注意:这里指的都是普通的硬盘,在SSD上面多个文件并发写入和单个文件写入影响不大。

读取消息

Kafka中每个Partition都会是一个单独的文件,所以当消费某个消息的时候,会很好的出现顺序读,我们知道OS从物理磁盘上访问读取文件的同时,会顺序对其他相邻块的数据文件进行预读取,将数据放入PageCache,所以Kafka的读取消息性能比较好。

RocketMQ读取流程如下:

  • 先读取ConsumerQueue中的offset对应CommitLog物理的offset

  • 根据offset读取CommitLog

ConsumerQueue也是每个Queue一个单独的文件,并且其文件体积小,所以很容易利用PageCache提高性能。而CommitLog,由于同一个Queue的连续消息在CommitLog其实是不连续的,所以会造成随机读,RocketMQ对此做了几个优化:

  • Mmap映射读取,Mmap的方式减少了传统IO将磁盘文件数据在操作系统内核地址空间的缓冲区和用户应用程序地址空间的缓冲区之间来回进行拷贝的性能开销

  • 使用DeadLine调度算法+SSD存储盘

  • 由于Mmap映射受到内存限制,当不在Mmmap映射这部分数据的时候(也就是消息堆积过多),默认是内存的40%,会将请求发送到SLAVE,减缓Master的压力

3.6.2 可用性

3.6.2.1 集群模式

我们首先需要选择一种集群模式,来适应我们可忍耐的可用程度,一般来说分为三种:

  • 单Master:这种模式,可用性最低,但是成本也是最低,一旦宕机,所有都不可用。这种一般只适用于本地测试。

  • 单Master多SLAVE:这种模式,可用性一般,如果主宕机,那么所有写入都不可用,读取依然可用,如果master磁盘损坏,可以依赖slave的数据。

  • 多Master:这种模式,可用性一般,如果出现部分master宕机,那么这部分master上的消息都不可消费,也不可写数据,如果一个Topic的队列在多个Master上都有,那么可以保证没有宕机的那部分可以正常消费,写入。如果master的磁盘损坏会导致消息丢失。

  • 多Master多Slave:这种模式,可用性最高,但是维护成本也最高,当master宕机了之后,只会出现在这部分master上的队列不可写入,但是读取依然是可以的,并且如果master磁盘损坏,可以依赖slave的数据。

一般来说投入生产环境的话都会选择第四种,来保证最高的可用性。

3.6.2.2 消息的可用性

当我们选择好了集群模式之后,那么我们需要关心的就是怎么去存储和复制这个数据,rocketMQ对消息的刷盘提供了同步和异步的策略来满足我们的,当我们选择同步刷盘之后,如果刷盘超时会给返回FLUSH_DISK_TIMEOUT,如果是异步刷盘不会返回刷盘相关信息,选择同步刷盘可以尽最大程度满足我们的消息不会丢失。

除了存储有选择之后,我们的主从同步提供了同步和异步两种模式来进行复制,当然选择同步可以提升可用性,但是消息的发送RT时间会下降10%左右。

3.6.3 Dleger

我们上面对于master-slave部署模式已经做了很多分析,我们发现,当master出现问题的时候,我们的写入怎么都会不可用,除非恢复master,或者手动将我们的slave切换成master,导致了我们的Slave在多数情况下只有读取的作用。RocketMQ在最近的几个版本中推出了Dleger-RocketMQ,使用Raft协议复制CommitLog,并且自动进行选主,这样master宕机的时候,写入依然保持可用。

3.7 定时/延时消息

定时消息和延时消息在实际业务场景中使用的比较多,比如下面的一些场景:

  • 订单超时未支付自动关闭,因为在很多场景中下单之后库存就被锁定了,这里需要将其进行超时关闭。

  • 需要一些延时的操作,比如一些兜底的逻辑,当做完某个逻辑之后,可以发送延时消息比如延时半个小时,进行兜底检查补偿。

  • 在某个时间给用户发送消息,同样也可以使用延时消息。

在开源版本的RocketMQ中延时消息并不支持任意时间的延时,需要设置几个固定的延时等级,目前默认设置为: 1s 5s 10s 30s 1m 2m 3m 4m 5m 6m 7m 8m 9m 10m 20m 30m 1h 2h ,从1s到2h分别对应着等级1到18,而阿里云中的版本(要付钱)是可以支持40天内的任何时刻(毫秒级别)。我们先看下在RocketMQ中定时任务原理图:

You need to know RoketMQ

  • Step1:Producer在自己发送的消息上设置好需要延时的级别。

  • Step2: Broker发现此消息是延时消息,将Topic进行替换成延时Topic,每个延时级别都会作为一个单独的queue,将自己的Topic作为额外信息存储。

  • Step3: 构建ConsumerQueue

  • Step4: 定时任务定时扫描每个延时级别的ConsumerQueue。

  • Step5: 拿到ConsumerQueue中的CommitLog的Offset,获取消息,判断是否已经达到执行时间

  • Step6: 如果达到,那么将消息的Topic恢复,进行重新投递。如果没有达到则延迟没有达到的这段时间执行任务。

可以看见延时消息是利用新建单独的Topic和Queue来实现的,如果我们要实现40天之内的任意时间度,基于这种方案,那么需要40 24 60 60 1000个queue,这样的成本是非常之高的,那阿里云上面的支持任意时间是怎么实现的呢?这里猜测是持久化二级TimeWheel时间轮,二级时间轮用于替代我们的ConsumeQueue,保存Commitlog-Offset,然后通过时间轮不断的取出当前已经到了的时间,然后再次投递消息。具体的实现逻辑需要后续会单独写一篇文章。

3.8 事务消息

事务消息同样的也是RocketMQ中的一大特色,其可以帮助我们完成分布式事务的最终一致性,有关分布式事务相关的可以看我以前的很多文章都有很多详细的介绍。

You need to know RoketMQ

具体使用事务消息步骤如下:

  • Step1:调用sendMessageInTransaction发送事务消息

  • Step2:  如果发送成功,则执行本地事务。

  • Step3:  如果执行本地事务成功则发送commit,如果失败则发送rollback。

  • Step4:  如果其中某个阶段比如commit发送失败,rocketMQ会进行定时从Broker回查,本地事务的状态。

事务消息的使用整个流程相对之前几种消息使用比较复杂,下面是事务消息实现的原理图:

You need to know RoketMQ

  • Step1: 发送事务消息,这里也叫做halfMessage,会将Topic替换为HalfMessage的Topic。

  • Step2: 发送commit或者rollback,如果是commit这里会查询出之前的消息,然后将消息复原成原Topic,并且发送一个OpMessage用于记录当前消息可以删除。如果是rollback这里会直接发送一个OpMessage删除。

  • Step3: 在Broker有个处理事务消息的定时任务,定时对比halfMessage和OpMessage,如果有OpMessage且状态为删除,那么该条消息必定commit或者rollback,所以就可以删除这条消息。

  • Step4: 如果事务超时(默认是6s),还没有opMessage,那么很有可能commit信息丢了,这里会去反查我们的Producer本地事务状态。

  • Step5: 根据查询出来的信息做Step2。

我们发现RocketMQ实现事务消息也是通过修改原Topic信息,和延迟消息一样,然后模拟成消费者进行消费,做一些特殊的业务逻辑。当然我们还可以利用这种方式去做RocketMQ更多的扩展。

4.总结

这里让我们在回到文章中提到的几个问题:

  • RocketMQ的topic和队列是什么样的,和Kafka的分区有什么不同?

  • RocketMQ network model of how what, and Kafka contrast?

  • RocketMQ message store model is what, how to ensure highly reliable storage, and how to compare Kafka?

Presumably reading this article, your heart has the answer

Guess you like

Origin blog.51cto.com/14230003/2458094