Middleware technology selection

Commonly used MQ components are ActiveMQ, RabbitMQ, RocketMQ, ZeroMQ, MetaMQ, Kafka. Of course, Kafka is more powerful. Although different MQs have their own characteristics and advantages, no matter which MQ is, there are some characteristics of MQ itself. Below, I will introduce the characteristics of MQ.

characteristic Active RabbitMQ RocketMQ Kafka
Development language Java Erlang Java Scala
Click Throughput Ten thousand Ten thousand One hundred thousand One hundred thousand
Timeliness ms level us (microsecond) level ms level Within ms
Availability High (master-slave architecture) High (master-slave architecture) Very high (distributed architecture) Very high (distributed architecture)
Features Mature products, used in many companies, have many mature documents, and support various protocols Strong concurrency, extremely good performance, extremely low latency, and rich management interface MQ function is relatively complete, strong scalability Only the main MQ functions are supported, and some message query, message backtracking and other functions are not provided. After all, it is provided for big data, which is widely used in the field of big data.

Middleware selection


Kafka

Kafka is LinkedIn's open source distributed publish-subscribe messaging system, which currently belongs to the top Apache project. The main feature of Kafka is to process messages based on the Pull mode . The messages are in order. Through control, it can ensure that all messages are consumed and only consumed once. It pursues high throughput . The initial purpose is to use for log collection and transmission . It supports replication and transactions, and has no strict requirements on message duplication, loss , and error. It is suitable for data collection services for Internet services that generate large amounts of data. Fully distributed system, Broker, Producer, Consumer all natively and automatically support distributed, and automatically realize load balancing. It supports parallel loading of Hadoop data. This is a feasible solution for log data and offline analysis systems like Hadoop, but requires real-time processing limitations. Kafka uses Hadoop's parallel loading mechanism to unify online and offline message processing, which is also valued by the system studied in this topic. Compared with ActiveMQ, Apache Kafka is a very lightweight messaging system. In addition to very good performance, it is also a well-working distributed system. Detailed blog post link

Known as the killer of big data, when it comes to message transmission in the field of big data, Kafka cannot be bypassed. This message middleware born for big data, with its million-level TPS (single machine writes about one million TPS) /Sec ) throughput is famous, and it has quickly become the darling of the big data field, playing a pivotal role in the process of data collection, transmission, and storage. The availability is very high. Kafka is distributed, with multiple copies of one data, and a small number of machines are down. No data will be lost or unavailability. There is an excellent third-party Kafka Web management interface Kafka-Manager .

Disadvantages: ① If Kafka has more than 64 queues/partitions on a single machine, the load will increase significantly. The more queues, the higher the load, and the longer the response time for sending messages. ②Using the short polling mode, the real-time performance depends on the polling interval. ③Retry is not supported for consumption failure. ④The order of messages is supported, but when an agent goes down, the order of messages will be out of order. ⑤ The community is updated slowly.

RabbitMQ

RabbitMQ is an open source message queuing system developed using Erlang language, supporting many protocols AMQP , XMPP, SMTP, and STOMP. The main characteristics of AMQP are message-oriented, queue, routing (including point-to-point and publish/subscribe), reliability, and security. AMQP protocol so that it becomes very heavyweight, is more suitable for the development of enterprise-level routing (Routing), load balancing (Load balance), data consistency , stability and reliability of high demand scenario , performance and The throughput requirement is second. Robust, stable, easy to use, cross-platform, support multiple languages, complete documentation. The management interface provided by open source is great and easy to use. The community is highly active. Detailed blog post link

Disadvantages: ① Erlang development, it is difficult to understand the source code. The basic functions depend on the rapid maintenance and bug fixes of the open source community, which is not conducive to secondary development and maintenance. RabbitMQ does have a lower throughput because of the heavier implementation mechanism. ③It is necessary to learn more complicated interfaces and protocols, and the cost of learning and maintenance is high.

RocketMQ

RocketMQ is Alibaba's open source message middleware. It is developed in pure Java and has the characteristics of high throughput, high availability, and suitable for large-scale distributed system applications. The RocketMQ idea originated from Kafka and made some improvements of its own. It optimizes the reliable transmission and transactional properties of messages. It is currently widely used in Alibaba Group for transactions, recharge, stream computing , message push, and log stream processing. , Binglog distribution and other scenarios. The message reliability is very high . After parameter optimization configuration, the message can achieve zero loss. The MQ function is relatively complete, is distributed, and has good scalability. Support 1 billion level of message accumulation , will not cause performance degradation due to accumulation. The source code is Java, we can read the source code ourselves, customize our company's MQ, and control it.

Born in the field of financial Internet, it has high reliability requirements , especially in e-commerce order deductions and business peak cuts. When a large number of transactions flood in, the back-end may not be able to handle the situation in time. RoketMQ may be more trustworthy in terms of stability. These business scenarios have been tested many times in Alibaba Double 11. If your business has the above concurrency scenarios, it is recommended to choose RocketMQ.

Disadvantages: ①There are not many client languages ​​supported, currently java and c++, of which c++ is immature. ②The community activity is average. ③There is no interface such as JMS implemented in the MQ core, and some systems need to modify a lot of code to migrate.

ZeroMQ

Known as the fastest message queuing system, especially for high throughput demand scenarios. ZMQ can implement advanced/complex queues that RabbitMQ is not good at, but developers need to combine multiple technical frameworks by themselves. Technical complexity is a challenge to the successful application of MQ. ZeroMQ has a unique non-middleware model, you don't need to install and run a message server or middleware, because your application will play this service role. You only need to simply reference the ZeroMQ library, which can be installed using NuGet, and then you can happily send messages between applications. But ZeroMQ only provides non-persistent queues, which means that if the machine is down, data will be lost. Among them, Twitter's Storm uses ZeroMQ as the data stream transmission.

ActiveMQ

Apache ActiveMQ is fast, supports many cross-language clients and protocols, has an easy-to-use enterprise integration mode and many advanced features, and fully supports JMS 1.1 and J2EE 1.4. Apache ActiveMQ is released under the Apache 2.0 license. There is a low probability of losing data. Detailed blog post link

Disadvantages: The official community now maintains less and less ActiveMQ 5.x and is less used in large-scale throughput scenarios .

Redis

It is a Key-Value NoSQL database. Its development and maintenance are very active. Although it is a Key-Value database storage system, it supports MQ function, so it can be used as a lightweight queue service. The enqueue and dequeue operations of RabbitMQ and Redis are executed 1 million times each, and the execution time is recorded every 100,000 times. The test data is divided into four different sizes of 128Bytes, 512Bytes, 1K and 10K. Experiments show that when entering the team, the performance of Redis is higher than RabbitMQ when the data is relatively small, and if the data size exceeds 10K, Redis is unbearably slow; when leaving the team, regardless of the size of the data, Redis shows very good performance , And RabbitMQ's dequeue performance is much lower than Redis.

pressure test


Compare the performance of Kafka, RabbitMQ, RocketMQ sending messages (124 bytes). For stress testing, we only focus on the performance indicators of the server, so the standard for stress testing is to continuously increase the pressure on the sending end until the system throughput no longer rises and the response time is lengthened. At this time, the server has a performance bottleneck, and the best throughput of the corresponding system can be obtained. In the synchronous sending scenario, the performance of the three message middleware is clearly distinguished:

Kafka

Kafka's throughput is as high as 17.3w/s, and it is the industry leader in high-throughput messaging middleware. This mainly depends on its queue mode to ensure that the process of writing to the disk is linear IO . At this time, the Broker disk IO has reached the bottleneck.

RocketMQ

RocketMQ also performed well, with a throughput of 11.6w/s and disk IO close to 100%. After RocketMQ's message is written into the memory, it will return ack . A separate thread is dedicated to flushing the disk. All messages are written to the file sequentially .

RabbitMQ

The throughput of RabbitMQ is 5.95w/s, and the CPU resource consumption is high. It supports the AMQP protocol and is very heavyweight. In order to ensure the reliability of the message, it has made trade-offs on throughput. We also did a performance test of RabbitMQ in a message persistence scenario, and the throughput was around 2.6w/s.

Test conclusion: the performance of synchronous sending is Kafka>RocketMQ>RabbitMQ

In terms of architecture model


RabbitMQ

RabbitMQ follows the AMQP protocol. The Broker of RabbitMQ is composed of Exchange, Binding, and Queue, among which Exchange and Binding form the routing key of the message. The client Producer communicates with the Server by connecting to the Channel, and the Consumer obtains messages from the Queue for consumption (long connection, messages on the Queue will be pushed to the Consumer, and the Consumer cyclically reads data from the input stream). RabbitMQ takes Broker as the center and has a message confirmation mechanism.

Kafka

Kafka complies with the general MQ structure, Producer, Broker, Consumer, with Consumer as the center, on the client Consumer where the consumption information of the message is stored, and the Consumer pulls data in batches from the Broker according to the point of consumption, without a message confirmation mechanism.

In throughput


Kafka

Kafka has high throughput, internally uses message batch processing , zero-copy mechanism , data storage and acquisition are local disk sequential batch operations , with O(1) complexity, and message processing efficiency is very high.

RabbitMQ

RabbitMQ is slightly inferior to Kafka in terms of throughput. Their starting point is different. RabbitMQ supports reliable delivery of messages , supports transactions , and does not support batch operations . Based on the requirement of reliable storage , storage can use memory or hard disk .

In terms of usability


RabbitMQ

RabbitMQ supports the mirror queue, the main queue fails, and the mirror queue takes over.

Kafka

Kafka's Broker supports active and standby mode.

In terms of cluster load balancing


Kafka

Kafka uses Zookeeper to manage Brokers and Consumers in the cluster, and Topics can be registered to Zookeeper. Through the coordination mechanism of Zookeeper, the Producer saves the Broker information corresponding to the Topic, which can be sent to the Broker randomly or by poll. And the Producer can specify fragments based on semantics, and the message is sent to a certain fragment of the Broker.

RabbitMQ

RabbitMQ's load balancing requires a separate loadbalancer to support.

to sum up


Rabbitmq is more reliable than Kafka, and Kafka is more suitable for IO high-throughput processing, such as ELK log collection

Both Kafka and RabbitMq are designed for distributed deployment. But their assumptions on the definition of the message semantic model are very different. I am skeptical of the argument that "AMQP is more mature". Let us speak with facts and see what solutions are used to solve your problem.
[1] Kafka is more suitable for the following scenarios. You have a large number of events (more than 100,000/sec), you need to deliver it successfully at least once in a partitioned, sequential order to consumers who are mixed with online and packaged consumption, you want to be able to reread the message, and you can accept the current limit The node level is highly available or you don’t mind getting the support of the software in the infant stage through the forum/IRC tool.
[2] You are more suitable to use RabbitMQ in the following scenarios. You have fewer events (more than 20,000/sec) and need to find consumers through complex routing logic, you want the message delivery to be reliable, you don't care about the order of message delivery, you need to support cluster-nodes now The level of high availability means that you need 7*24 hours of paid support (of course, you can also use the forum/IRC tool).

Guess you like

Origin blog.csdn.net/zhengzhaoyang122/article/details/109384180