Comparison of message middleware such as Kafka, RabbitMQ, RocketMQ, etc. - message sending performance and difference

Click on the " Mutual Pulling Program " above, and select "Put the Top Official Account"

Excellent article, delivered as soon as possible!

KS

Knowledge Sharing

knowledge sharing

    Now is the era of resource sharing, and it is also the era of knowledge sharing. If you think this article can learn knowledge, please share the knowledge with others.


Which message middleware has the best performance?


With this question in mind, our middleware test group compared the performance of three common message products (Kafka, RabbitMQ, RocketMQ).


Kafka is LinkedIn's open source distributed publish-subscribe messaging system, which is currently part of the Apache rating project. The main feature of Kafka is that it processes message consumption based on the Pull mode and pursues high throughput. The initial purpose is to use for log collection and transmission. Version 0.8 starts to support replication, but does not support transactions. There are no strict requirements for message duplication, loss, and errors, and it is suitable for data collection services of Internet services that generate large amounts of data.



RabbitMQ is an open source message queue system developed using Erlang language and implemented based on AMQP protocol. The main features of AMQP are message-oriented, queuing, routing (including point-to-point and publish/subscribe), reliability, and security. The AMQP protocol is more used in enterprise systems. In scenarios that require high data consistency, stability, and reliability, performance and throughput requirements are second.


640?wx_fmt=jpeg


RocketMQ is Alibaba's open source message middleware. It is developed in pure Java and features high throughput, high availability, and suitable for large-scale distributed system applications. The idea of ​​RocketMQ originated from Kafka, but it is not a copy of Kafka. It optimizes the reliable transmission and transactionality of messages. At present, it is widely used in transactions, recharge, stream computing, message push, log stream processing, etc. binglog distribution and other scenarios.


640?wx_fmt=jpeg


Testing purposes


Compare the performance of Kafka, RabbitMQ, and RocketMQ to send small messages (124 bytes). In this stress test, we only focus on the performance indicators of the server, so the standards of the stress test are:


Continue to increase the pressure on the sender until the system throughput no longer rises and the response time lengthens. At this time, the performance bottleneck of the server has occurred, and the corresponding optimal throughput of the system can be obtained.


testing scenarios


In the synchronous sending scenario, the performance of the three message middleware is clearly distinguished:


The throughput of Kafka is as high as 17.3w/s, which is worthy of being the industry leader of high-throughput message middleware. This mainly depends on its queue mode to ensure that the process of writing to disk is linear IO. At this point, the broker disk IO has reached the bottleneck.


RocketMQ also performed well, with a throughput of 11.6w/s and a disk IO %util close to 100%. After the RocketMQ message is written into the memory, it returns to ack, and a separate thread is dedicated to the operation of flushing the disk. All messages are written to files sequentially.


The throughput of RabbitMQ is 5.95w/s, and the CPU resource consumption is high. It supports the AMQP protocol, which is very heavyweight. In order to ensure the reliability of the message, a trade-off is made in throughput. We also did a performance test of RabbitMQ in the message persistence scenario, and the throughput was around 2.6w/s.


Test conclusion


640?wx_fmt=png

In terms of the performance of the server to process synchronous transmission, Kafka>RocketMQ>RabbitMQ.


appendix:


test environment


The server is deployed on a single machine, and the machine configuration is as follows:

640?wx_fmt=png

App version:

640?wx_fmt=png

test script

640?wx_fmt=png

Earlier we compared the simplest small message sending scenario, and Kafka temporarily won. However, as RocketMQ, which has undergone previous double eleven baptisms, it has more advantages in Internet application scenarios.


RabbitMQ


It is an open source message queue written in Erlang, which supports many protocols: AMQP, XMPP, SMTP, STOMP, which is exactly what makes it very heavyweight and more suitable for enterprise-level development.


At the same time, a broker (Broker) architecture is implemented, which means that messages are queued in the central queue before being sent to the client. It has good support for routing, load balance or data persistence.


Redis


It is a Key-Value NoSQL database with active development and maintenance. Although it is a Key-Value database storage system, it supports MQ functions, so it can be used as a lightweight queue service.


For the enqueue and dequeue operations of RabbitMQ and Redis, each is executed 1 million times, and the execution time is recorded every 100,000 times. The test data is divided into four different sizes of 128Bytes, 512Bytes, 1K and 10K. Experiments show that: when entering the queue, when the data is relatively small, the performance of Redis is higher than that of RabbitMQ, and if the data size exceeds 10K, Redis is unbearably slow; when leaving the queue, regardless of the data size, Redis shows very good performance , while the dequeue performance of RabbitMQ is much lower than that of Redis.


ZeroMQ


Known as the fastest message queue system, especially for high-throughput demand scenarios. ZMQ can implement advanced/complex queues that RabbitMQ is not good at, but developers need to combine multiple technical frameworks by themselves. The technical complexity is a challenge to the successful application of this MQ.


ZeroMQ has a unique non-middleware model, you don't need to install and run a message server or middleware because your application will act as this service. All you need is a simple reference to the ZeroMQ library, which can be installed using NuGet, and you can happily send messages between applications.


But ZeroMQ only provides non-persistent queues, which means that if the machine goes down, the data will be lost. Among them, Twitter's Storm uses ZeroMQ as the transmission of data streams.


ActiveMQ


Is a sub-project under Apache. Similar to ZeroMQ, it can implement queues in broker and peer-to-peer technology. At the same time, similar to RabbitMQ, it can efficiently implement advanced application scenarios with a small amount of code. RabbitMQ, ZeroMQ, ActiveMQ all support commonly used multiple language clients C++, Java, .Net, Python, Php, Ruby, etc.


Jafka/Kafka


Kafka is a sub-project under Apache. It is a high-performance cross-language distributed Publish/Subscribe message queue system. Jafka is incubated on top of Kafka, which is an upgraded version of Kafka.


Has the following characteristics:


  • Fast persistence, message persistence can be performed under O(1) system overhead;

  • High throughput, a throughput rate of 10W/s can be achieved on an ordinary server; in a complete distributed system, Broker, Producer, and Consumer all natively support distribution and automatically achieve complex balance;

  • Supports parallel loading of Hadoop data, which is a viable solution for log data and offline analysis systems like Hadoop, but requires real-time processing constraints


Kafka unifies online and offline message processing through Hadoop's parallel loading mechanism, which is also important to the system studied in this topic. Apache Kafka is a very lightweight messaging system relative to ActiveMQ, and besides being very performant, it is a distributed system that works well.


rabbitmq is more reliable than kafka, and kafka is more suitable for high-throughput IO processing, such as ELK log collection**

Kafka, like RabbitMq, is a general purpose message broker, both of which are intended for distributed deployment. But their assumptions about the definition of the message semantic model are very different. I'm skeptical of the "AMQP is more mature" argument. Let's talk facts and see what solutions are used to solve your problem.


You are more suitable for using Kafka in the following scenarios.


You have a large number of events (more than 100,000/sec), you need to deliver success at least once in a partitioned, sequential order to consumers with a mix of online and packaged consumption, you want to be able to reread messages, your ability to accept is currently limited Node-level high availability or you don't mind getting support for toddler-age software via forums/IRC tools. 


You are more suitable to use RabbitMQ in the following scenarios.


You have fewer events (20k+/sec) and need to go through complex routing logic to find consumers, you want message delivery to be reliable, you don't care about the order of message delivery, you need to support cluster-node now The level of high availability means that you need 7*24 hours of paid support (of course, you can also use forums/IRC tools).


Redis message push (based on distributed pub/sub) is mostly used for high real-time message push and does not guarantee reliability.


Redis message push (based on distributed pub/sub) is mostly used for high real-time message push and does not guarantee reliability. Other mq and kafka are guaranteed to be reliable but have some latency (non-real-time systems have no guaranteed latency).


The redis-pub/sub is emptied when the power is turned off, and the use of redis-list as the message push has persistence, but it is too mentally retarded, and it is not completely reliable and will not be lost. In addition, redis publish and subscribe does not support grouping except for different topics. For example, when a thing is published in kafka, multiple subscribers can be grouped. Only one subscriber in the same group will receive the message, which can be used as a load balancing.


For example, if Kafka publishes: topic = "publish post" data = "article 1", there are one hundred servers behind, each server is a subscriber, and they all subscribe to this topic, but they may be divided into three groups, A group of 50 units is used to actually publish articles. All subscribers in the A group of 50 units have subscribed to this topic.


Since it is in the same group, this message (topic="publish post", data="article 1") will only be received by a currently idle machine in group A. The 25 servers in group B are used for statistics, the 25 servers in group C are used for archive backup, and only one server in each group will receive.


Use different groups to decide how many points to CC each message, and use which subscribers in the same group are busy and which are idle to decide which server the message will be assigned to for processing, the producer-consumer model. Redis has no such mechanism at all, and these two points are the biggest difference.


redis is an in-memory database! Redis his father made disque, do you want to try it. MQ generally adopts a subscription-publishing model. If you consider performance, the main focus is on whether the consumption model is pull or push. The biggest impact should be the storage structure.


The performance of Kafka can only exert its power when the number of topics is less than 64. partition decided. The message is lost in extreme cases, for example: after the main machine writes the message, the main machine crashes and the hard disk is damaged. Found it while reviewing the code. Rabbit does not know, but the performance of rocket is (10,000 per second), and it can scale infinitely horizontally. When the number of topics on a single machine is 256, the performance loss is small. Rocket can be said to be a variant of kafka. It is metaQ developed by Ali after fully reviewing the kafka code. After continuous updates and repairs, Ali renamed metaQ3.0 to rocket, and rocket is written in java for easy maintenance.


In addition, rocket and kafka have similar infinite stacking capabilities. Think about it, the power outage will not lose the message, the backlog of 200 million messages is no pressure, and the performance of niubilitykafka and rocket is not a problem you need to consider at all.


In terms of application scenarios


RabbitMQ, which follows the AMQP protocol and is developed by the inherently high concurrency erlanng language, is used for real-time message delivery that requires high reliability.


Kafka is an open source message publishing and subscription system opened by Linkedin in December 2010. It is mainly used to process active streaming data and data processing of large amounts of data.


in terms of architectural models


RabbitMQ follows the AMQP protocol. The broker of RabbitMQ is composed of Exchange, Binding, and queue, where exchange and binding constitute the routing key of the message; the client Producer communicates with the server by connecting the channel, and the Consumer obtains messages from the queue for consumption (long connection, queue A message will be pushed to the consumer side, and the consumer loop reads data from the input stream). rabbitMQ is broker-centric; there is a confirmation mechanism for messages.


Kafka follows the general MQ structure, producer, broker, and consumer. It is centered on the consumer. Message consumption information is stored on the client-side consumer. According to the consumption point, the consumer pulls data from the broker in batches; there is no message confirmation mechanism.


in throughput


Kafka has high throughput, and internally uses batch processing of messages and zero-copy mechanism. The storage and acquisition of data is a local disk sequential batch operation, with O(1) complexity, and the efficiency of message processing is very high.

RabbitMQ is slightly inferior to kafka in terms of throughput. Their starting points are different. rabbitMQ supports reliable delivery of messages, supports transactions, and does not support batch operations; based on storage reliability requirements, storage can use memory or hard disk.


in terms of usability


RabbitMQ supports mirror queue, the main queue fails, and the mirror queue takes over.

Kafka's broker supports active-standby mode.


In terms of cluster load balancing


Kafka uses zookeeper to manage brokers and consumers in the cluster, and can register topics on zookeeper; through the coordination mechanism of zookeeper, the producer saves the broker information of the corresponding topic, which can be sent to the broker randomly or in a polling manner; and the producer can be specified based on semantics Sharding, the message is sent to a shard of the broker.


The load balancing of rabbitMQ requires a separate loadbalancer to support.


Kafka is a reliable distributed log storage service. In simple terms, you can think of Kafka as a large reel of tape that can be written sequentially, which can be rewinded at any time, fast-forwarded to a certain point in time and replayed.


Let's talk about the definition of the log first: the log is the core of the database and is a strictly ordered record of all changes to the database, and the "table" is the result of the change. Other names for logs are: Changelog, Write Ahead Log, Commit Log, Redo Log, Journaling.


The characteristics of Kafka are as follows:


High write speed: Kafka can write this tape at a speed of more than 1Gbps NIC (actually up to SATA 3 speed, refer to Benchmarking Apache Kafka: 2 Million Writes Per Second (On Three Cheap Machines)), making full use of the physical characteristics of the disk , that is, random writing is slow (head flushing), and sequential writing is fast (head levitating).


High reliability: Distributed consistency through zookeeper, synchronized to any number of disks, automatic failover and master selection, and self-healing.


High capacity: Through horizontal expansion, LinkedIn stores up to 175TB of new data and 800 billion messages every day through Kafka, which can be expanded infinitely, similar to sticking two tapes together.


The fundamental flaws of traditional business databases are:


  • Too slow, too expensive to read and write, unavoidable random addressing. (Disk can be addressed as fast as 5ms, and solid state is too expensive.)

  • It simply cannot adapt to the continuously generated data flow, and the more it is used, the slower it becomes. (index efficiency issue)

  • Cannot scale horizontally. (Mostly read-write separation, one master and multiple backups. Another: NewSQL has multiple masters through the consistency algorithm.)


In response to these problems, Kafka proposes a method: "log-centric approach (log-centric approach)." The traditional database is divided into two independent systems, namely the log system and the index system. "Persistence and indexing are separated, logs land as fast as possible, and indexes catch up at their own speed.


"On the premise that data reliability is guaranteed by Kafka, which is a fast, tape-like sequential recording method. The presentation and usage of data become very flexible, and data streams can be sent to the search system, RDBMS system, and Data warehouse system, graph database system, log analysis, etc. These different database systems. These different systems are just an interpretation of Kafka tape data, a side, an index, a snapshot. The data is lost, It doesn't matter, just replay the tape once. More often, the maintenance of these various database systems only requires taking a snapshot regularly and copying it to a secure object storage (such as S3).


In one sentence: "Logs are all the same log, and each index is different." Regarding stream computing: under the storage model with streams as the basic abstraction, between data streams and data streams, multi-stream mixed processing can be performed, or streams can be mixed. and state, state and state JOIN processing, that's what Kafka Stream provides. A simple example is that after the user triggers an event, it is mixed with the user table to generate data augmentation (Augment), and then enters the data warehouse for correlation analysis. Some simple window statistics and real-time analysis can also be easily done. Satisfaction, for example, when receiving a user login message, the number of online users is +1, and when offline, it is -1, which reflects the total number of online users in the current system.


In this regard, you can refer to PipelineDB https://www.pipelinedb.com/Kafka will make you rethink the way the system is built, making the impossible possible before. It is the most important and core part of a system, no exaggeration. That said, system design needs to be done around Kafka.


Which message middleware has the best performance! Presumably the little friends know it in their hearts, but for the product, it is not the best, just the right one!


Recommended reading


Technology: Daniel: Do you really understand reflection?

Technology: You don't even know how to draw design drawings, and you still want to be an "architect"?

Technology: play with linux These commands are enough

Technology: Google takes another step forward in driving AI adoption - Learn with Google AI

Technology: play with linux These commands are enough

Technique: How to Learn to Use Shiro in 30 Minutes 


Tools: How to "kill" annoying ads in video apps through technical means?

Tool: "kill" one of the annoying advertisements in the video app through technical means (Tencent Video)


Dry goods sharing:


Share: 1T software development video resource sharing

Sharing: Deep Machine Learning 56G Video Resource Sharing


The blogger has 11 years of java development experience, and is currently engaged in the research and development of intelligent voice work. Pay attention to the WeChat public account for technical exchanges with bloggers! More dry goods resources are waiting for you!

640?wx_fmt=jpeg

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325762236&siteId=291194637