[Original] Review of distributed message queues

introduction

Why write this article?

The blogger has two friends, Little A and Little B:

Xiao A, works in the traditional software industry (a software outsourcing company of a social security bureau), and the daily work content is to chat with the product about the needs and change the business logic. Otherwise, chat with the operation, write a few SQL, and generate the next report. Or receive a notification from customer service that a certain function is faulty, change the data, and then deploy and go online after get off work. This is the kind of life we live every day, with zero technological growth.
Little B, works in a state-owned enterprise, although he has access to some middleware technologies. However, he will only subscribe/publish messages. In layman's terms, it is to adjust the API. Why use these middleware? How to ensure high availability? Not fully aware.

Fortunately, the two friends are very motivated, so the blogger wrote this article to help them review the main points about the message queue middleware

Review points

This article will roughly focus on the following points:

Why use message queues?
What are the disadvantages of using message queues?
How to choose a message queue?
How to ensure that the message queue is highly available?
How to ensure that messages are not repeatedly consumed?
How to ensure the reliable transmission of consumption?
How to ensure the order of messages?

We elaborate on the above seven points. It should be noted that this article is not a course like "message queue from entry to master", so it is only a review idea, rather than teaching you how to call the API of message queue. It is recommended that people who don't know about message queues go to some blogs about message queues, and then read this article to gain even more

text

1. Why use message queues?

Analysis : A person who uses message queues, does not know why, it is a bit embarrassing. Without reviewing this point, it is easy to be questioned and start nonsense.
Answer : For this question, we only answer three main application scenarios (it is undeniable that there are others, but only three main ones), namely the following six words: decoupling, asynchronous, peak clipping

(1) Decoupling

Legacy Mode: Disadvantages

of Legacy Mode :

The coupling between systems is too strong. As shown in the figure above, system A directly calls the code of system B and system C in the code. If system D is connected in the future, system A needs to modify the code, which is too troublesome!

Middleware pattern:

The advantages of middleware pattern :

To write a message to the message queue, the system that needs the message to subscribe from the message queue itself, so that system A does not need to make any modifications.

(2) Asynchronous

Legacy Mode: Disadvantages

of Legacy Mode :

Some unnecessary business logic runs in a synchronous manner, which is too time-consuming.

Middleware pattern:

The advantages of middleware pattern :

Write messages to the message queue, and non-essential business logic runs asynchronously to speed up the response

(3) Peak clipping

Traditional mode Disadvantages

of traditional mode :

When the amount of concurrency is large, all requests are directly sent to the database, resulting in abnormal database connection

Middleware pattern:

The advantages of middleware pattern :

System A slowly pulls messages from the message queue according to the amount of concurrency that the database can handle. In production, this brief peak backlog is permissible.

2. What are the disadvantages of using message queues?

Analysis : For a project that uses MQ, if it has not even considered this issue and then introduces MQ, it will bring risks to its own project. When we introduce a technology, we must have a full understanding of the drawbacks of this technology in order to prevent it. Remember, don't dig a hole for the company!
Answer : The answer is also very easy, from the following two perspectives

Reduced system availability : You think, as long as other systems run well, then your system is normal. Now you have to add a message queue into it, then the message queue hangs up, and your system is not haha. Therefore, system availability is reduced
Increased system complexity : Many aspects need to be considered, such as consistency issues, how to ensure that messages are not repeatedly consumed, and how to ensure reliable transmission of messages. Therefore, there are more things to consider and the complexity of the system increases.

However, we should still use it.

3. How to choose a message queue?

Let me first say that bloggers only know ActiveMQ, RabbitMQ, RocketMQ, Kafka, and have no understanding of other MQs such as ZeroMQ, so they can only give answers based on these four MQs.
Analysis : Since MQ is used in the project, you must research the popular MQs in the industry in advance. If you don’t even know the advantages and disadvantages of each MQ, you can use a certain MQ according to your preferences, or dig for the project. pit. If the interviewer asks: "Why do you use this MQ?" You directly answer "The leader decides." This answer is very LOW. Again, don't dig a hole for the company.
Answer : First, let's go to the ActiveMQ community to see the update frequency of the MQ:

Apache ActiveMQ 5.15.3 Release
Christopher L. Shannon posted on Feb 12, 2018
Apache ActiveMQ 5.15.2 Released
Christopher L. Shannon posted on Oct 23, 2017
Apache ActiveMQ 5.15.0 Released
Christopher L. Shannon posted on Jul 06, 2017
省略以下记录
...

We can see that ActiveMq only releases a version every few months, and it is said that the research focus is on their next-generation product, Apollo.
Next, let's go to the RabbitMQ community to see the update frequency of RabbitMQ

RabbitMQ 3.7.3 release  30 January 2018
RabbitMQ 3.6.15 release  17 January 2018
RabbitMQ 3.7.2 release23 December 2017
RabbitMQ 3.7.1 release21 December 2017
省略以下记录
...

We can see that RabbitMQ version releases are much more frequent than ActiveMq. As for RocketMQ and kafka, I won't show you, in short, they are more active than ActiveMQ. For details, you can check it yourself.
Another performance comparison table

characteristic	ActiveMQ	RabbitMQ	RocketMQ	kafka
Development language	java	erlang	java	scala
Single machine throughput	10,000 class	10,000 class	100,000	100,000
Timeliness	ms level	us class	ms level	within ms
Availability	High (master-slave architecture)	High (master-slave architecture)	Very high (distributed architecture)	Very high (distributed architecture)
Features	Mature products are used in many companies; there are more documents; various protocols are well supported	Based on erlang development, the concurrency capability is very strong, the performance is extremely good, and the delay is very low; the management interface is richer	MQ has relatively complete functions and good scalability	It only supports the main MQ functions, such as some message query, message backtracking and other functions are not provided, after all, it is prepared for big data and is widely used in the field of big data.

Based on the above materials, the following two points are drawn:
(1) For small and medium-sized software companies, it is recommended to choose RabbitMQ. On the one hand, the erlang language is born with the characteristics of high concurrency, and its management interface is very convenient to use. As the saying goes, success is also Xiao He, and defeat is also Xiao He! His drawbacks are also here. Although RabbitMQ is open source, how many programmers in China can customize erlang? Fortunately, the RabbitMQ community is very active and can solve the bugs encountered during the development process, which is very important for small and medium-sized companies. The reason for not considering rocketmq and kafka is that, on the one hand, small and medium-sized software companies are not as good as Internet companies, and the amount of data is not so large. When choosing message middleware, you should prefer those with more complete functions, so kafka is excluded. The reason for not considering rocketmq is that rocketmq is produced by Ali. If Ali abandons the maintenance of rocketmq, small and medium-sized companies will generally not be able to recruit people for customized development of rocketmq, so it is not recommended.
(2) Large software companies, choose between rocketMq and kafka according to the specific use. On the one hand, large software companies have enough funds to build a distributed environment and a large enough amount of data. For rocketMQ, large software companies can also draw people to customize rocketMQ development. After all, there are quite a lot of people in China who can change the JAVA source code. As for kafka, depending on the business scenario, if there is a log collection function, kafka is definitely the first choice. Which one to choose depends on the usage scenario.

4. How to ensure that the message queue is highly available?

Analysis : As mentioned in the second point, after the introduction of message queues, the availability of the system decreases. In production, no one uses message queues in stand-alone mode. Therefore, as a qualified programmer, you should have a deep understanding of the high availability of message queues. If during the interview, the interviewer asks, how can your message middleware ensure high availability? Your answer just shows that you can only subscribe and publish news, and the interviewer will wonder if you are just playing with it yourself and have never used it in production. Please be a programmer who loves to think, can think, and understands thinking.
Answer : This question, in fact, requires a deep understanding of the cluster mode of the message queue to answer.
Taking rcoketMQ as an example , his cluster has multi-master mode, multi-master multi-slave asynchronous replication mode, multi-master multi-slave synchronous double-write mode. Multi-master multi-slave mode deployment architecture diagram (found on the Internet, being lazy, too lazy to draw):

In fact, when the blogger first saw this diagram, he felt that it was similar to kafka, but the NameServer cluster was replaced by zookeeper in kafka. They are used to save and discover master and slave. The communication process is as follows:
The Producer establishes a long connection with one of the nodes (randomly selected) in the NameServer cluster, periodically obtains topic routing information from the NameServer, establishes a long connection to the Broker Master that provides topic services, and regularly sends a heartbeat to the Broker. The Producer can only send messages to the Broker master, but the Consumer is different. It establishes a long connection with the Master and Slave that provide Topic services at the same time. It can subscribe to messages from the Broker Master and from the Broker Slave.
So what about kafka , in order to compare and illustrate the topology diagram of kafka (also found, too lazy to draw)

As shown in the figure above, a typical Kafka cluster contains several Producers (which can be Page View generated by the web front-end, or server logs, system CPU, Memory, etc.), and several brokers (Kafka supports horizontal expansion, generally the more brokers, the more The higher the cluster throughput), several Consumer Groups, and a Zookeeper cluster. Kafka manages the cluster configuration through Zookeeper, elects leaders, and rebalances when the Consumer Group changes. Producers use push mode to publish messages to brokers, and consumers use pull mode to subscribe and consume messages from brokers.
As for rabbitMQ , there are also common cluster and mirror cluster modes, you can understand it yourself, it is relatively simple, and you can understand it in two hours.
It is required that when answering high availability questions, you should be able to logically and clearly draw your own MQ cluster architecture or describe it clearly.

5. How to ensure that messages are not repeatedly consumed?

Analysis : In fact, another way to ask this question is, how to ensure the idempotency of message queues? This question can be considered as a basic problem in the field of message queues. In other words, it is examining your design ability. The answer to this question can be answered according to specific business scenarios, and there is no fixed answer.
Answer : Let's talk about why repeated consumption is caused?
In fact, no matter what kind of message queue, the reasons for repeated consumption are actually similar. Under normal circumstances, when a consumer consumes a message, it will send a confirmation message to the message queue after consumption. The message queue will know that the message has been consumed and will delete the message from the message queue. It's just that the confirmation information sent by different message queues is different. For example, RabbitMQ sends an ACK confirmation message, RocketMQ returns a CONSUME_SUCCESS success flag, and kafka actually has the concept of offset. Let's briefly talk about it (if you don't understand, go out and find one Kafka entry to master tutorial), that is, each message has an offset. After Kafka consumes the message, it needs to submit the offset to let the message queue know that it has been consumed. The reason for repeated consumption? It is because of network transmission and other failures, the confirmation information is not transmitted to the message queue, so the message queue does not know that it has consumed the message, and distributes the message to other consumers again.
How to solve this problem? For the business scenario, answer the following points
(1) For example, you get this message to do the insert operation of the database. That's easy. Make a unique primary key for this message, then even if there is repeated consumption, it will lead to a primary key conflict and avoid dirty data in the database.
(2) For another example, if you get this message and do the set operation of redis, it is easy, and you don't need to solve it, because the result is the same no matter how many times you set it, and the set operation is originally an idempotent operation.
(3) If the above two situations do not work, go to the big move. Prepare a third-party medium for consumption records. Take redis as an example, assign a global id to the message, and write <id, message> to redis in the form of KV as long as the message is consumed. Before consumers start to consume, they can go to redis to check whether there are consumption records.

6. How to ensure the reliable transmission of consumption?

Analysis : In the process of using message queues, we should ensure that messages cannot be consumed more or less. If reliable transmission cannot be achieved, it may bring tens of millions of property losses to the company. Similarly, if reliable transmission is not taken into account during use, isn't this digging a hole for the company? Again, take every project seriously and don’t dig holes for the company.
Answer : In fact, for this reliable transmission, each MQ must be analyzed from three perspectives: the producer loses data, the message queue loses data, and the consumer loses data

RabbitMQ

(1) The producer loses data
From the perspective of the producer losing data, RabbitMQ provides transaction and confirm modes to ensure that the producer does not lose messages.
The transaction mechanism means that before sending a message, open the transaction (channel.txSelect()), and then send the message. If something goes wrong during the sending process, the transaction will be rolled back (channel.txRollback()), and if the transmission is successful, the transaction will be submitted. (channel.txCommit()).
The downside, however, is that throughput drops. Therefore, according to the experience of bloggers, the confirm mode is mostly used in production. Once the channel enters confirm mode, all messages published on the channel will be assigned a unique ID (starting from 1), and once the message has been delivered to all matching queues, rabbitMQ will send an Ack to the producer ( Contains the unique ID of the message), which allows the producer to know that the message has arrived at the destination queue correctly. If RabiitMQ fails to process the message, it will send a Nack message to you, and you can retry the operation. The code for handling Ack and Nack is as follows (I can't say the code, I sneak it):

channel.addConfirmListener(new ConfirmListener() {  
                @Override  
                public void handleNack(long deliveryTag, boolean multiple) throws IOException {  
                    System.out.println("nack: deliveryTag = "+deliveryTag+" multiple: "+multiple);  
                }  
                @Override  
                public void handleAck(long deliveryTag, boolean multiple) throws IOException {  
                    System.out.println("ack: deliveryTag = "+deliveryTag+" multiple: "+multiple);  
                }  
            });

(2) The message queue loses data To
deal with the loss of data in the message queue, the configuration of the persistent disk is generally enabled. This persistence configuration can be used in conjunction with the confirm mechanism. You can send an Ack signal to the producer after the message is persisted to disk. In this way, if rabbitMQ is killed before the message is persisted to disk, the producer will not receive the Ack signal, and the producer will automatically resend it.
So how to persist, by the way, it is actually very easy, just the following two steps
1. Set the queue's persistent identifier durable to true, which means it is a persistent queue
2. When sending a message, set deliveryMode= 2
After this setting, even if rabbitMQ hangs, the data can be restored after restarting
(3) Consumers lose data
Consumers lose data generally because of the automatic confirmation message mode. In this mode, consumers automatically acknowledge receipt of information. At this time, rahbitMQ will delete the message immediately. In this case, if the consumer fails to process the message due to an exception, the message will be lost.
As for the solution, a manual acknowledgment message is sufficient.

kafka

Here, a data flow graph of kafka Replication is introduced first .

When the Producer publishes a message to a Partition, it first finds the Leader of the Partition through ZooKeeper, and then no matter what the Replication Factor of the Topic is (that is, how many replicas the Partition has), The Producer only sends the message to the Partition's Leader. Leader will write the message to its local Log. Each Follower pulls data from the Leader.
In view of the above situation, the following analysis is obtained
(1) The producer loses data
In kafka production, there is basically one leader and multiple followers. The follower will synchronize the leader's information. Therefore, in order to prevent the producer from losing data, do the following two configurations

The first configuration is to set acks=all on the producer side. This configuration ensures that the message is sent successfully after the follower synchronization is completed.
Set retries=MAX on the producer side, once the write fails, this will be retried infinitely

(2) Message Queue Lost Data
For the case of message queue losing data, nothing more than that, the leader hangs before the data is synchronized. At this time, zookpeer will switch other followers to the leader, and the data will be lost. In this case, two configurations should be done.

replication.factor parameter, this value must be greater than 1, that is, each partition must have at least 2 copies
min.insync.replicas parameter, this value must be greater than 1, this is to require a leader to at least perceive that there is at least one follower and keep in touch with itself

The combination of these two configurations and the configuration of the above producer can basically ensure that kafka does not lose data

(3) Consumers lose data
. In this case, the offset is automatically submitted, and then you hang up in the process of processing the program. kafka thinks you've dealt with it. Re-emphasize what offset is for
offset : it refers to the subscript consumed by each consumer group in kafka's topic. Simply put, a message corresponds to an offset subscript. If the offset is submitted each time data is consumed, the next consumption will start from the submitted offset plus one.
For example, there are 100 pieces of data in a topic, and I have consumed 50 pieces of data and submitted them, then the offset submitted by the kafka server record at this time is 49 (offset starts from 0), then the offset will be consumed from 50 the next time it is consumed. .
The solution is also very simple, just change it to manual submission.

ActiveMQ and RocketMQ

Check it out for yourself

7. How to ensure the order of messages?

Analysis : Not all companies have this business need, but it's worth reviewing this question.
Answer : In response to this problem, through a certain algorithm, the messages that need to be kept in order are put into the same message queue (partition in kafka, queue in rabbitMq). Then use only one consumer to consume the queue.
Some people will ask: What if there are multiple consumers to consume for throughput?
There is no fixed answer to this question. For example, we have a Weibo operation, sending Weibo, writing comments, and deleting Weibo, these three asynchronous operations. If it is such a business scenario, just try again. For example, if you, a consumer, first performed the operation of writing a comment, but at this time, the Weibo has not been posted yet, writing a comment must have failed, and you have to wait for a while. Waiting for another consumer to execute the operation of writing a comment first, and then execute it, it can be successful.
In short, in response to this problem, my point of view is to ensure that the queue is in an orderly manner, and the order after leaving the queue is left to the consumer to ensure that there is no fixed routine.

Summarize

Written here, I hope that readers can cover most of the knowledge points of message queues after thorough preparation of the questions raised in this article. What if the interviewer doesn't ask these questions, it's simple, explain the questions clearly, and highlight the comprehensiveness of the following considerations.
Finally, in fact, I do not advocate such a surprise review. I hope that everyone can lay down the basic skills and be a programmer who loves thinking, understands thinking, and can think .