This is my summary of the differences between ActiveMQ, RocketMQ, RabbitMQ, Kafka

Message Queue
Reposted from: http://www.cnblogs.com/williamjie/p/9481780.html Respect the original work,
thank you for writing this article?
There are two friends of the blogger, A and B:

Small A, working in the traditional software industry (a software outsourcing company of a social security bureau), the daily work content is to talk with the product about the needs and change the business logic. Otherwise, it is to chat with the operation, write a few SQL, and generate the next report. Or you may receive a notification from the customer service that the XX function is broken, change the data, and then deploy and go online after work. It is this kind of life that we live every day, with zero growth in technology.
Little B, working in a certain state-owned enterprise, although he has access to some middleware technologies. However, he will only subscribe / publish messages. In layman's terms, it is to adjust the API. Why use these middlewares? How to ensure high availability? Not fully understood.
Fortunately, the two friends are very motivated, so the blogger wrote this article to help them review the main points about the message queue middleware

Key
points for review This article is about the following points:

Why use message queues?
What are the disadvantages of using message queue
?
How to choose message queue ? How to ensure that message queue is highly available?
How to ensure that the message is not repeatedly consumed?
How to ensure the reliable transmission of consumption?
How to ensure the order of the message?
We elaborate on the above seven points. It needs to be explained that this article is not a "message queue from entry to proficiency" course, so it is only to provide a review idea, not to teach you how to call the API of the message queue. It is recommended that people who do not know Message Queuing go to the Blog of Message Queuing, and then read this article to gain more.
Text
1. Why use Message Queuing?
Analysis: A person who uses Message Queuing, I do n’t know why This is a bit awkward. Without reviewing this point, it is easy to be asked about Mongolia, and then began to talk nonsense.
Answer: For this question, we only answer the three most important application scenarios (it is undeniable that there are others, but only three main ones), namely the following six words: decoupling, asynchronous, peak clipping

(1) Decoupling
traditional mode:
Insert picture description here
the disadvantages of traditional mode:

The coupling between systems is too strong. As shown in the figure above, System A directly calls the codes of System B and System C in the code. If System D is connected in the future, System A needs to modify the code, too troublesome!
Middleware mode:
Insert picture description here
The advantages of middleware mode:

Write the message to the message queue, and the system that needs the message subscribes from the message queue itself, so that System A does not need to make any modifications.
(2) Asynchronous
Traditional Mode:
Insert picture description here
The disadvantages of traditional mode:

Some non-essential business logic runs in a synchronized manner, which is too time-consuming.
Middleware mode:
Insert picture description here
The advantages of middleware mode:

Write messages to the message queue, non-essential business logic runs in an asynchronous manner, speeding up the response speed
(3) Peak shaving
Traditional mode
Insert picture description here
Disadvantages of traditional mode:

When the amount of concurrency is large, all requests are directly sent to the database, resulting in an abnormal database connection.
Middleware mode:
Insert picture description here
The advantages of the middleware mode:

System A slowly pulls messages from the message queue according to the amount of concurrency that the database can handle. In production, this short peak backlog is allowed.

2. What are the disadvantages of using Message Queuing?
Analysis: A project that uses MQ. If you haven't even considered this issue, then MQ is introduced, which will bring risks to your project. We introduce a technology, and we must fully understand the disadvantages of this technology in order to prevent it. Remember, do n’t dig into the company!
Answer: The answer is also very easy, from the following two angles

Reduced system availability: You think, as long as other systems are running well, then your system is normal. Now you have to add a message queue, the message queue hangs, your system is not huh. Therefore, the system availability is reduced and the
system complexity is increased: many issues must be considered, such as consistency issues, how to ensure that messages are not repeatedly consumed, and how to ensure reliable transmission of messages. Therefore, there are more things to consider and the complexity of the system increases.
However, we should still use it.

3. How to
choose the message queue? Let me talk about it first, bloggers will only be ActiveMQ, RabbitMQ, RocketMQ, Kafka, and have no understanding of other MQ such as ZeroMQ, so they can only give answers based on these four MQs.
Analysis: Since MQ is used in the project, it is sure to investigate the popular MQ in the industry in advance. If you do n’t know the advantages and disadvantages of each type of MQ, you can use your MQ according to your preference. pit. If the interviewer asks: "Why do you use this kind of MQ?" You directly answer "The leader decided." This answer is very low. Again, do n’t dig into the company.
Answer: First of all, let's go to the ActiveMQ community first and see how often the MQ is updated:

Apache ActiveMQ 5.15.3 Release

Christopher L. Shannon posted on Feb 12, 2018

Apache ActiveMQ 5.15.2 Released

Christopher L. Shannon posted on Oct 23, 2017

Apache ActiveMQ 5.15.0 Released

Christopher L. Shannon posted on Jul 06, 2017

Omit the following records


We can see that ActiveMq only publishes a version every few months. It is said that the research focus is on their next-generation product Apollo.
Next, let's go to the RabbitMQ community to take a look. The update frequency of RabbitMQ

RabbitMQ 3.7.3 release 30 January 2018

RabbitMQ 3.6.15 release 17 January 2018

RabbitMQ 3.7.2 release23 December 2017

RabbitMQ 3.7.1 release21 December 2017

Omit the following records


We can see that the RabbitMQ version is released much more frequently than ActiveMq. As for RocketMQ and kafka, I will not show it to everyone. In short, they are more active than ActiveMQ. For details, you can check it yourself.
Here is another performance comparison table
Insert picture description here
. The above materials are combined to obtain the following two points:
(1) Small and medium-sized software companies, it is recommended to choose RabbitMQ. On the one hand, erlang language is inherently highly concurrency, and his management interface is very convenient to use. As the saying goes, success is Xiao He, and defeat is Xiao He! His shortcomings are also here. Although RabbitMQ is open source, how many programmers in China can customize erlang? Fortunately, the RabbitMQ community is very active and can solve bugs encountered during development, which is very important for small and medium-sized companies. The reason for not considering rocketmq and kafka is that, on the one hand, small and medium-sized software companies are not as large as Internet companies, and the data volume is not so large. If you choose message middleware, you should prefer the more complete function, so kafka is excluded. The reason for not considering rocketmq is that rocketmq is produced by Alibaba. If Ali gives up maintaining rocketmq, small and medium-sized companies generally can't get people to develop customized rocketmq, so it is not recommended.
(2) For large software companies, choose between rocketMq and kafka according to specific use. On the one hand, large software companies have sufficient funds to build a distributed environment, as well as a large amount of data. For rocketMQ, large software companies can also draw people to customize the development of rocketMQ. After all, there are still quite a few people who have the ability to change the JAVA source code in China. As for kafka, according to the business scenario, if there is a log collection function, kafka is definitely the first choice. Which one to choose depends on the usage scenario.

4. How to ensure that the message queue is highly available?
Analysis: As mentioned in the second point, after the introduction of message queues, the availability of the system decreased. In production, no one uses the message queue in stand-alone mode. Therefore, as a qualified programmer, you should have a deep understanding of the high availability of message queues. If during the interview, the interviewer asks, how can your message middleware ensure high availability? Your answer is just to show that you will only subscribe and publish messages, and the interviewer will wonder if you are just playing together, which is not used in production at all. Please be a programmer who loves to think, thinks and understands.
Answer: This question actually requires a deep understanding of the clustering mode of the message queue before it can be answered.
Taking rcoketMQ as an example, his cluster has multi-master mode, multi-master multi-slave asynchronous replication mode, and multi-master multi-slave synchronous double write mode. Multi-master multi-slave mode deployment architecture diagram (found on the Internet, lazy, too lazy to draw):
Insert picture description here
In fact, the blogger sees this picture at first glance, and feels like kafka, but the NameServer cluster. In kafka, zookeeper is used instead. Both are used to save and discover master and slave. The communication process is as follows:
Producer establishes a long connection with one of the nodes in the NameServer cluster (randomly selected), periodically obtains Topic routing information from the NameServer, establishes a long connection with the Broker Master that provides Topic services, and periodically sends heartbeats to the Broker. Producer can only send messages to the Broker master, but Consumer is different. It also establishes a long connection with the Master and Slave providing the Topic service. It can subscribe to messages from either the Broker Master or Broker Slave.
So for kafka, in order to compare and explain directly the topology architecture diagram of kafka (also looking for, too lazy to draw)
Insert picture description here
As shown in the above figure, a typical Kafka cluster contains several Producers (which can be Page View generated by the web front end, or server logs, system CPU, Memory, etc.), and several brokers (Kafka supports horizontal expansion, generally the more brokers, the more The higher the cluster throughput rate), several Consumer Groups, and a Zookeeper cluster. Kafka manages the cluster configuration through Zookeeper, elects the leader, and rebalances when the Consumer Group changes. Producer uses push mode to publish messages to brokers, and Consumer uses pull mode to subscribe and consume messages from brokers.
As for rabbitMQ, there are also ordinary cluster and mirror cluster models. It is relatively simple to understand by yourself, and it will be understood in two hours.
It is required that when answering high-availability questions, you should be able to draw your own MQ cluster architecture logically or clearly.

5. How to ensure that the message is not repeatedly consumed?
Analysis: In fact, another way to ask this question is, how to ensure the idempotency of the message queue? This problem can be considered as a basic problem in the field of message queues. In other words, you are investigating your design capabilities. The answer to this question can be based on specific business scenarios, and there is no fixed answer.
Answer: Let me talk about why it causes repeated consumption.
  In fact, no matter what kind of message queue, the cause of repeated consumption is actually similar. Under normal circumstances, when a consumer consumes a message, after the consumption is completed, a confirmation message is sent to the message queue. The message queue knows that the message is consumed, and deletes the message from the message queue. It's just that the confirmation information sent by different message queues is different. For example, RabbitMQ sends an ACK confirmation message, RocketMQ returns a CONSUME_SUCCESS success sign, kafka actually has the concept of offset, briefly talk about it (if you do n’t understand, go out and find one Kafka Introduction to Mastery Tutorial), that is, every message has an offset. After Kafka consumes the message, you need to submit the offset to let the message queue know that it has been consumed. The reason for repeated consumption? It is because of network transmission failures, etc., the confirmation information has not been sent to the message queue, causing the message queue to not know that it has consumed the message, and distribute the message to other consumers again.
  How to solve this problem? Answer the following points according to the business scenario
  (1) For example, you get this message to do the insert operation of the database. That's easy. Make a unique primary key for this message. Even if repeated consumption occurs, it will lead to primary key conflicts and avoid dirty data in the database.
  (2) For another example, if you get this message to do the redis set operation, it will be easy and you don't need to solve it, because the result is the same no matter how many times you set, the set operation is originally considered to be an idempotent operation.
  (3) If the above two situations are not enough, go for the big move. Prepare a third-party medium for consumption records. Taking redis as an example, assign a global id to the message. Once the message has been consumed, write <id, message> to redis in the form of KV. Before consumers start to consume, first go to redis to check whether there is a consumption record.

6. How to ensure the reliable transmission of consumption?
Analysis: In the process of using the message queue, we should make sure that the messages cannot be consumed more or less. If reliable transmission is not possible, it may bring tens of millions of property losses to the company. Similarly, if reliability transmission is not used in the course of use, is this not a digging hole for the company, you can pat the ass away, and the company loses money, who will bear it. Again, take every project seriously and do n’t dig into the company.
Answer: In fact, for this reliable transmission, each MQ must be analyzed from three perspectives: producers lose data, message queues lose data, and consumers lose data.

RabbitMQ
(1) Producers lose data
From the perspective of producers losing data, RabbitMQ provides transaction and confirm modes to ensure that producers do not lose messages.
The transaction mechanism means that before sending the message, open the transaction (channel.txSelect ()) and then send the message. If something abnormal occurs during the transmission process, the transaction will be rolled back (channel.txRollback ()). (channel.txCommit ()).
However, the disadvantage is that the throughput drops. Therefore, according to the experience of bloggers, the majority of confirm mode is used in production. Once the channel enters confirm mode, all messages posted on the channel will be assigned a unique ID (starting from 1). Once the message is delivered to all matching queues, rabbitMQ will send an Ack to the producer ( Contains the unique ID of the message), which allows the producer to know that the message has reached the destination queue correctly. If rabiitMQ fails to process the message, it will send you a Nack message, and you can retry the operation. The code to deal with Ack and Nack is as follows (the code is not good, secretly):

channel.addConfirmListener(new ConfirmListener() {

@Override

public void handleNack(long deliveryTag, boolean multiple) throws IOException {

System.out.println("nack: deliveryTag = “+deliveryTag+” multiple: "+multiple);

}

@Override

public void handleAck(long deliveryTag, boolean multiple) throws IOException {

System.out.println("ack: deliveryTag = “+deliveryTag+” multiple: "+multiple);

}

});
(2) Data loss in message queues To
handle data loss in message queues, the configuration of persistent disks is generally turned on. This persistence configuration can be used in conjunction with the confirm mechanism. You can send an Ack signal to the producer after the message is persisted to disk. In this way, if the rabbitMQ is killed before the message is persisted to the disk, the producer will not receive the Ack signal and the producer will automatically resend it.
So how to persist it? By the way, it ’s actually very easy. In the following two steps:
1. Set the queue ’s persistent flag durable to true, which means it is a persistent queue.
2. When sending a message, set deliveryMode = 2 After
this setting, rabbitMQ can hang up and recover data even after restarting.
(3) Consumers lose data
Consumers usually lose data because of the automatic confirmation message mode. In this mode, consumers will automatically confirm receipt of information. At this time, rahbitMQ will immediately delete the message. In this case, if the consumer is abnormal and fails to process the message, the message will be lost.
As for the solution, the manual confirmation message is sufficient.

Kafka
first introduces a Kafka Replication data flow graph. When the
Insert picture description here
Producer publishes a message to a Partition, it first finds the Leader of the Partition through ZooKeeper, and then no matter what the Replication Factor of the Topic is (that is, how many Replica the Partition has) , Producer only sends the message to the Leader of the Partition. Leader will write this message to its local Log. Each Follower pulls data from the Leader.
In view of the above situation, the following analysis is obtained
(1) Producer loses data
In kafka production, there is basically a leader and multiple follwer. Follwer will synchronize the leader's information. Therefore, in order to avoid producers losing data, do the following two configurations

The first configuration is to set acks = all on the producer side. This configuration guarantees that the message will not be considered successful until the follwer synchronization is completed.
Set retries = MAX on the producer side, once the write fails, this will retry indefinitely
(2) Message queue lost data For
the message queue lost data, nothing more than, the data is not synchronized, the leader hangs, then zookpeer Will switch the other follwer to leader, then the data will be lost. For this situation, two configurations should be made.

replication.factor parameter, this value must be greater than 1, that is, each partition must have at least 2 replicas
min.insync.replicas parameter, this value must be greater than 1, this requires a leader to at least perceive that there is at least one follower and also Keep in touch
These two configurations plus the producer's configuration above are used together to basically ensure that kafka does not lose data

(3) Consumers lose data. In
this case, the offset is usually automatically submitted, and then you hang in the process of processing the program. Kafka thought you had dealt with it.
Let me emphasize again what the offset is: the subscript consumed by each consumer group in the topic of kafka. Simply put, a message corresponds to an offset index. If you submit an offset every time you consume data, the next consumption will start from the offset plus one.
For example, there are 100 data in a topic, I consumed 50 and submitted, then the Kafka server records the submitted offset at this time is 49 (offset starts from 0), then the next time to consume, the offset starts from 50 .
The solution is also very simple, just change to manual submission.

ActiveMQ and RocketMQ, please
check it yourself

7. How to ensure the order of the messages?
Analysis: In fact, not all companies have such business needs, but still have to review this issue.
Answer: In response to this problem, a certain algorithm is used to put the messages that need to be kept in order into the same message queue (partition in kafka, partition in rabbitMq). Then only one consumer consumes the queue.
Some people may ask: What if there are multiple consumers consuming for throughput?
There is no fixed answer to this question. For example, we have a Weibo operation, sending Weibo, writing comments, deleting Weibo, these three asynchronous operations. If it is such a business scenario, then just retry. For example, one of your consumers first performed the operation of writing a comment, but at this time, Weibo has not yet posted it. Writing a comment must fail, and wait for a while. Wait for another consumer to perform the operation of writing a comment first, and then execute it, and it will succeed.
In short, in view of this problem, my view is to ensure that the enqueue is in order, and the order after leaving the team is given to the consumer to ensure that there is no fixed routine.

RabbitMQ high availability

RabbitMQ is more representative, because it is based on master-slave (non-distributed) high availability, we will use RabbitMQ as an example to explain how to achieve the high availability of the first MQ.
RabbitMQ has three modes: stand-alone mode (demo), ordinary cluster mode (no high availability price), mirror cluster mode (with high availability).

Summary
Writing here, I hope that the readers will make these questions raised in this article, after thorough preparation, generally speaking, can cover most of the knowledge of the message queue. If the interviewer does not ask what to do with these questions, it is simple. I will clarify the questions and highlight the comprehensiveness of my own consideration.
Finally, I do n’t really advocate such a surprise review. I hope you can lay a good foundation and be a programmer who loves to think, understands and thinks.

Published 17 original articles · Likes0 · Visits 224

Guess you like

Origin blog.csdn.net/weixin_42531204/article/details/104808778