Hello RocketMQ

http://rocketmq.apache.org/

Editor's note

In recent years, the open source atmosphere has been getting better and better, and major IT companies have open sourced some self-developed codes. In 2012, Alibaba open sourced its self-developed third-generation distributed messaging middleware, RocketMQ. After several years of technical polishing, Alibaba said that based on RocketMQ technology, the current double eleven day message capacity can reach trillions.

In November 2016, Alibaba donated RocketMQ to the Apache Software Foundation and officially became an incubation project. Ali said it would make it a top-level project. This is a big step taken by Ali, because joining the Open Source Software Foundation requires the assessment and observation of the reviewer. Frankly speaking, the industry still maintains a stereotype of Chinese people’s participation in open source code; and among the 342 projects in the Apache Foundation, there are only four projects led by Chinese technicians, namely Kylin, CarbonData, Eagle and RocketMQ.

On February 20, 2017, RocketMQ officially released version 4.0. Experts said that the new version is suitable for e-commerce, finance, big data, and the programming model of the Internet of Things.

What kind of technical connotation does the RocketMQ project use? Why did you win the initial recognition of the foundation? What enlightenment can the settlement of the foundation give to the technical circle? InfoQ conducted an exclusive interview with the two project co-founders with such questions, and the content is organized as follows.

Respondent Profile

Wang Xiaorui , Hua Mingxiaojia, head of Alibaba's middleware messaging team, has rich experience in building high-availability and high-reliability distributed systems. patents in the field. Co-founder of Apache RocketMQ. Contact: [email protected]

Feng Jia , nicknamed Weasel, Alibaba middleware architect, has rich experience in distributed software architecture, high-concurrency website design, performance tuning, and holds a number of patents in the distributed field. Open source enthusiast, focus on distributed and big data fields, and pay attention to big data technology stacks such as Hbase/Hadoop/Spark/Flink. Currently responsible for the ecological output of Ali's message middleware, commercialization on the cloud, and the co-founder of Apache RocketMQ. Contact: [email protected]

The origin of RocketMQ

Talking about the highlights of RocketMQ, we have to mention the evolution history of Alibaba's message engine. The Ali middleware message engine has developed to this day and has undergone three generations of evolution.

The first generation, push mode , uses relational database for data storage. In this mode, messages have low latency characteristics and can easily support distributed transactions. Especially in high-frequency trading scenarios such as Ali Taobao, it has a very wide range of applications. Typical representatives include Notify, Napoli.

The second generation, pull mode , self-developed proprietary message storage. In terms of log processing, it can match the throughput performance of Kafka, but considering the application scenarios of Taobao, especially the high reliability requirements of its transaction links, the message engine does not blindly pursue throughput, but puts stability and reliability first. Because the long connection pull mode is adopted, the real-time message is not inferior to the push mode. Typical representative of MetaQ.

The third generation, based on the pull mode and the push mode , is a high-performance, low-latency messaging engine RocketMQ. On the basis of the second-generation features, it adds reliable retry and file storage-based distributed messaging for the e-commerce financial field. Transactions and other features, and a lot of optimization has been done. Since 2012, it has undergone previous double 11 core transaction link inspections. It has been donated to the Apache Foundation. Today, RocketMQ has served thousands of applications of Alibaba Group, large and small. On Double 11, there was an incredible flow of trillions of messages, which played a pivotal role in the stability of the group's large and medium-sized platforms.

It is not difficult to see that RocketMQ is actually a message middleware with high performance and low latency that can meet the most demanding scenarios in the e-commerce and financial fields at the same time along with the growth of Alibaba's entire ecosystem.

Technical overview of RocketMQ

  In our opinion, its biggest innovation lies in its ability to continuously meet the requirements of high throughput, high reliability, and low latency for the ever-increasing mass of messages through sophisticated horizontal and vertical expansion.

At present, RocketMQ is mainly composed of four parts: NameServer, Broker, Producer and Consumer, as shown in the following figure.

All clusters are horizontally scalable with no single point of barriers.

NameServer provides service discovery and routing functions in a lightweight way. Each NameServer stores full routing information, provides peer-to-peer read and write services, and supports rapid expansion and contraction.

Broker is responsible for message storage, supports lightweight queues with topic as the latitude, a single machine can support tens of thousands of queues, supports message push-pull model, has multi-copy fault tolerance mechanism (2 copies or 3 copies), powerful peak-shaving and valley-filling and upstream The ability to accumulate hundreds of millions of messages can strictly guarantee the order of messages. In addition, Broker also provides the same city and remote disaster recovery capability, rich Metrics statistics and alarm mechanism. These are unmatched by traditional messaging systems.

Producer is deployed by users in a distributed manner, and messages are sent to the Broker cluster by the Producer through various load balancing modes, with low latency and fast failure support.

Consumer is also deployed by users, supports PUSH and PULL consumption modes, supports cluster consumption and broadcast messages, and provides a real-time message subscription mechanism to meet most consumption scenarios.    

Heroes who experienced double 11 baptism

When preparing for Double Eleven in 2016, the team focused on two things, optimizing slow requests and unifying the storage engine.

  • Optimize slow requests : This is mainly to solve the problem of reducing the jitter and glitch caused by slow requests to the entire cluster in massive and high concurrency scenarios. This is a very challenging technical job. After more than a month of follow-up and tuning, the teammates found that 99.996% of the delays fell within 10ms, and 99.6% of the delays were within 10ms. The delay is within 1ms. Optimization mainly focuses on RocketMQ storage layer algorithm optimization, JVM and operating system tuning. For more details, you can refer to the e-book chapter "Distributed Message Engine under Trillion-Level Data Floods" [1].
  • Let's take a look at the unified storage engine : the main problem of high availability and cost of the message engine. Under the premise of the coexistence of multiple generations of message engines, we have fully transplanted and replaced Notify's storage modules.

In this way, Alibaba's internal message middleware has fully embraced RocketMQ's low-latency storage engine. Based on the above-mentioned active technical preparations, during the 2016 Double 11, Alibaba Group had a total amount of news circulation of about 1.2 trillion, almost double the 2015 Double 11 promotion. During the peak period, the throughput of message production is around 2000 w/s, and the throughput of message consumption is also on the order of 1500 w/s. The whole sale came down, in our internal words, silky smooth.

RocketMQ VS several other message middleware

Please compare RocketMQ, RabbitMQ, Kafka, ActiveMQ and ZeroMQ from the perspective of technical concept, practical performance and applicable scenarios? In addition to technical competition, can we compare the community operations and business case applications behind these middleware?        

1. Is it CS architecture?

If we need to do a horizontal comparison between similar products, we will give priority to ZeroMQ. ZeroMQ is just like its name 0MQ. It is more like an embedded network class library, a communication component focused on the transportation layer, rather than the traditional sense. MQ for CS architecture.

2. Which specification/protocol is implemented?

Next, let's take a look at some comparisons between RabbitMQ, ActiveMQ, Kafka and RocketMQ. From the design point of view, RabbitMQ is the reference implementation of the AMQP specification. AMQP is a line layer protocol, which is comprehensive, systematic and slightly complicated. At present, RabbitMQ has become the preferred messaging service of the OpenStack Iaas platform, and the support behind it is self-evident.

ActiveMQ was originally developed by LogicBlaze, and now mainly develops Red Hat. It is the reference implementation of the JMS specification and an old message service engine under Apache. Although JMS is an API-level protocol, it still defines some implementation constraints, but lacks multi-language support. The ecology of ActiveMQ is rich and colorful. Under the Apache top-level project, there are many sub-projects, including Artemis, which evolved from HornetMQ, and Apollo, which is known as the next-generation AMQ based on Scala.

3. What kind of scenarios are applicable?

Kafka was originally designed for log processing. It is an uncompromising big data channel. It pursues high throughput and may lose messages. The R&D team behind it has also carried out commercial packaging around Kafka, which is currently widely used in some small and medium-sized companies, and there are many loyal supporters in China.

RocketMQ was born for the financial Internet field. It pursues high reliability, high availability, high concurrency, and low latency. It is a model that Alibaba has successfully nurtured from the inside out. In addition to the thousands of applications of Alibaba Group, according to our incomplete statistics, At least hundreds of units, scientific research and educational institutions are using it in China. For a more detailed feature comparison of these MQ products, you can refer to the description on our official website [2].

Three technical power points

(1) Sequence of messages

Undeniably, sequential messaging is a selling point of RocketMQ's features. Currently we have achieved global order preservation. It needs to be emphasized that the overall situation here is premised, for a unique identifier (which can be Hash into a unique identifier), such as a large seller account, an order for a certain type of product, etc. Its technical implementation principle is relatively simple, ensuring a single instance of the channel operation , such as single process, single thread write, single process, thread read, like ActiveMQ's Exclusive Consumer is also a similar implementation.

It is not difficult to see that this implementation actually makes some sacrifices in throughput. It also brings another problem - hot spots. For example, on Double Eleven, if a simple hashing strategy is used, messages from Tmall merchants that have transacted over 100 million in a short period of time will be sent to one channel, causing single-channel and even single-machine hotspot issues. In the latest RocketMQ version, We will improve the current implementation to improve the single-pass hotspot problem caused by the order , this feature is expected to be released in the middle of this year.

(2) Deduplication of messages

There is a QoS definition for message delivery in the message field, which is divided into: at most once (At most once), at least once (At least once), only once (Exactly once).

Almost all MQ products claim to be at least once. Since it is at least once, message repetition cannot be avoided, especially in a distributed network environment, and this shortcoming can also be regarded as a part of the TCP protocol in the final analysis, such as failure retransmission. Businesses are often sensitive to message duplication. The current version of RocketMQ does not support deduplication. We usually recommend that users perform deduplication processing through external global storage. In the next generation of feature planning, we will have built-in solutions. Let’s talk about common practices in the industry, such as Artemis, IronMQ, etc., which are judged by global storage on the server side. This is an IO-sensitive operation that brings a certain load to the server. RocketMQ hopes to effectively reduce server IO by adopting a secondary weight judgment strategy.

(3) Distributed challenges

First, clarify the concept of distributed system: a distributed system is a coherent software system composed of a series of decentralized autonomous components that cooperate in parallel and concurrently through the Internet. It has the characteristics of resource sharing, parallel concurrency, reliable fault tolerance, transparency and openness. Like CAP, BASE, Paxos, transactions, etc. together form the basic theory of distributed.

这里我们再来重温下CAP理论:CAP分别代表一致性(Consistency),可用性(Availability),分区容忍性(Partition tolerance)。一致性,Eric Brewer(CAP理论提出者)用一个服务要么被执行,要么不被执行来定义(原文:A service that is consistent operates fully or not at all)。请注意,这里的一致性是有别于数据库ACID属性中的C,数据库层面的C指的是数据的操作不能破坏数据之间的完整性约束,如外键约束。在分布式环境中,可以把C简单理解为多节点看到的是数据单一或者同一副本。可用性,意味着服务是可用的(原文:the service is available (to operate fully or not as above))。可用性又可以细分为写可用和读可用。在分布式环境中,往往指的是系统在确定时间内可返回读写操作结果,也即读写均可用。分区容忍性,除了整个网络故障外(如光纤被掘断),其它故障(如丢包、乱序、抖动、甚至是网络分区节点 crash )都不能导致整个系统无法正确响应(原文:No set of failures less than total network failure is allowed to cause the system to respond incorrectly)。

CAP理论可以看做是探索适合不同应用的一致性与可用性平衡问题。

  • 没有分区的情况:可以同时满足C与A,以及完整的ACID事务支持。可以选择牺牲一定的C,获得更好的性能与扩展性。
  • 分区的情况:选择A(集中关注分区的恢复),需要有分区开始前、进行中、恢复后的处理策略,应用合适的补偿处理机制。像RocketMQ这样的分布式消息引擎,更多的追求AP。再强的系统也一定有容量底线,足够的容量是可用性的有效前提。通常情况下,会通过降级、限流、熔断机制来保障洪峰下的可用性。具体的技术细节可以参看电子书章节[1]

另外,考虑到在金融高频交易典型场景,我们也为RocketMQ设计了CP机制,在满足分布式系统的分区容错性的前提下,牺牲系统可用性来保证数据的一致性。而技术实现上,则基于Zab一致性协议,利用分布式锁和通知机制,以此来保障多副本数据的一致性。

开源捐赠和社区运营

目前国内外有很多公司会把一些通用问题的解决方案,尤其是那些久经考验、愈久弥坚的产品开源出来,以期望在品牌宣传、人才引进方面有所建树。把RocketMQ开源出来,甚至捐赠给Apache,内部也是经过了深思熟虑,层层审批与讨论,期望能够在生态化、规范化、国际化、商业化方面深耕细作。

开源捐赠的想法实际上始于2014年。当时,我们甄选了几位Apache社区权威人士,遗憾的是反复沟通不断修改草案之后突然间失去了联系。2015年,我们有幸结识了Kylin Principal Architect蒋旭和VP Luke以及RedHat Principal Software Engineer姜宁,请教了一些Apache禁忌事项,重新活跃起来了捐赠进程。接下来,最重要的是征集champion候选人,很开心的是ActiveMQ VP Bruce爽快地接收了我们的邀请,经过前前后后接近100封邮件来往,我们终于正式开启了Apache之旅。捐赠投票是在双十一当天,我们准备充分很好地回答了评委会的犀利问题。不过,面对“中国开发者不喜欢邮件沟通”突然刁难,还要感谢社区华人的防御性声明回应。经过很多磨难,投票结果总算出来了:还不算坏10票赞同,带binding(IPMC成员的有效投票)的+1,无反对票,正式进入孵化期。孵化成功后有望成为国内首个互联网中间件在Apache上的顶级项目,成为全球继ActiveMQ,Kafka之后,分布式消息引擎家族中的重要成员。

接下来,我们想强调下知识产权这个对大多数工程师来说陌生的领域,尤其是专利权、著作权、商标权。在国外,每年因为这些问题导致的侵权官司不在少数。而我们在开源之初,对这块的选择、保护也是极其谨慎,包括开源许可协议的选择、授权方面,代码署名权等,这些都是很好的智力保护,也是我们产品的核心竞争力之一。尊重知识,尊重产权,才能构建一个和谐积极向上的开源氛围,打造真正的自主知识产权品牌产品。     

在Alibaba,我们基于开源引擎的RocketMQ,为云上用户提供了商业化版本的Aliware MQ。两个产品都是由阿里中间件消息团队出品。商业版Aliware MQ 在支持 TCP 、HTTP 和MQTT 协议接入,功能方面增强了运维管控方面,生态集成的能力(包括可视化的消息轨迹、资源报表统计以及监控报警、Kafka集成等)。它在公有云上本身具备多机房部署同城高可用容灾特性,目的是满足企业级要求。     

关于社区的运营,我们采取了和Apache顶级项目基本相似的策略。首先,必须立足于高质量产品本身,从版本规划开始,我们建立了里程碑讨论,Features设计,编码自测,结对Review,集成测试,Release讨论,Release公告等等一系列规范且高效的软件研发流程。其次,在社区运营层面,则有一系列与社区互动的活动,如线下meetup、workshop、ApacheCon、不定期的编程马拉松等,吸纳新的Contributor和Committer进来。

新一代RocketMQ,蓄势待发

最近,团队也在着手构建下一代RocketMQ,期望构建一套厂商无关的集线路层、API层于一体的规范,这也是第四代消息引擎最大的亮点。目前,我们联系了Twitter、Yahoo等公司相关技术负责人,共同起草完善这一规范,而RocketMQ将会是第一批率先成为参考实现的产品。我们非常期望国内的MQ厂商亦或是分布式爱好者能够参与进来,积极在国际开源社区代表国人发声呐喊。

另外,本周,团队刚刚发布了第四代引擎的第一个版本,该版本也是进入Apache社区后的首次发版。按照我们的规划,将在今年4月左右完成整个引擎的升级重组,非常欢迎大家的使用、反馈以及参与。

最后,更多信息可以移步Apache RocketMQ官网云栖社区中间件官方博客以及阿里巴巴电子书

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326440783&siteId=291194637