Distributed Message Queue Essentials

  1. Little A works in the traditional software industry (a software outsourcing company of a certain social security bureau). His daily job is to discuss requirements with products and modify business logic. Otherwise, just chat with the operation, write some SQL, and generate reports. Or you may receive a notification from customer service that a certain function is malfunctioning, change the data, and then deploy it online after get off work. This is the kind of life we ​​live every day, with zero technological growth.
  2. Little B works in a state-owned enterprise, although he has access to some middleware technologies. However, he will only subscribe/publish messages. In layman's terms, it is to adjust the API. Why use these middleware? How to ensure high availability? Not fully aware.

Main points

This article will roughly focus on the following points:

  1. Why use message queue?
  2. What are the disadvantages of using message queues?
  3. How to choose a message queue?
  4. How to ensure that the message queue is highly available?
  5. How to ensure that messages are not consumed repeatedly?
  6. How to ensure reliable transmission of consumption?
  7. How to ensure the order of messages?

We will elaborate on the above seven points. It should be noted that this article is not a course like "Message Queuing from Beginner to Master", so it only provides a review idea, rather than teaching you how to call the message queue API. It is recommended that people who don’t know anything about message queues go and read some message queue blogs, and then read this article to gain more.

text

1. Why use message queue?

Analysis : A person who uses a message queue doesn’t know why he uses it, which is a bit embarrassing. Without reviewing this, it is easy to get confused and start talking nonsense.
Answer : For this question, we will only answer the three most important application scenarios (it is undeniable that there are others, but only the three main ones), namely the following six words: decoupling, asynchronous, peak clipping

(1)Decoupling

Traditional mode:
 


Disadvantages of the traditional model :

  • The coupling between systems is too strong. As shown in the figure above, system A directly calls the code of system B and system C in the code. If system D is connected in the future, system A will need to modify the code, which is too troublesome!

Middleware pattern:
 


Advantages of the middleware pattern :

  • Write the message to the message queue, and the system that needs the message subscribes from the message queue itself, so system A does not need to make any modifications.
(2)Asynchronous

Traditional mode:
 


Disadvantages of the traditional model :

  • Some non-essential business logic is run in a synchronous manner, which is too time-consuming.

Middleware pattern:
 


Advantages of the middleware pattern :

  • Write messages to the message queue, and non-essential business logic runs asynchronously to speed up response.
(3) Peak clipping

traditional model
 


Disadvantages of the traditional model :

  • When the amount of concurrency is large, all requests are directly sent to the database, causing database connection abnormalities.

Middleware pattern:
 


Advantages of the middleware pattern :

  • System A slowly pulls messages from the message queue according to the amount of concurrency that the database can handle. In production, this brief peak backlog is allowed.

2. What are the disadvantages of using message queue?

Analysis : For a project that uses MQ, if MQ is introduced without even considering this issue, it will bring risks to the project. When we introduce a technology, we must have a full understanding of the disadvantages of this technology in order to take preventive measures. Remember, don’t dig holes for the company!
Answer : The answer is also very easy. Answer from the following two perspectives:

  • Reduced system availability : Think about it, as long as other systems are running well, your system will be normal. Now if you insist on adding a message queue, if the message queue hangs up, your system will be broken. As a result, system availability is reduced
  • Increased system complexity : Many aspects need to be considered, such as consistency issues, how to ensure that messages are not consumed repeatedly, and how to ensure reliable transmission of messages. Therefore, there are more things to consider and the system complexity increases.

However, we still need to use it.

3. How to choose a message queue?

First of all, the blogger only knows ActiveMQ, RabbitMQ, RocketMQ, and Kafka. He has no understanding of ZeroMQ and other MQs, so he can only give answers based on these four MQs.
Analysis : Since MQ is used in the project, it is necessary to conduct research on popular MQs in the industry in advance. If you do not even understand the advantages and disadvantages of each MQ, you can just use a certain MQ based on your preferences, or dig for the project. pit. If the interviewer asks: "Why do you use this kind of MQ?" and you directly answer "It's decided by the leader." This kind of answer is very LOW. Again, don’t dig holes for the company.
Answer : First, let’s go to the ActiveMQ community to see the update frequency of the MQ:

<span style="color:#333333"><span style="background-color:#ffffff"><code class="language-mipsasm">Apache ActiveMQ <span style="color:#880000">5</span>.<span style="color:#880000">15</span>.<span style="color:#880000">3</span> Release
Christopher L. <span style="color:#0000ff">Shannon </span>posted on Feb <span style="color:#880000">12</span>, <span style="color:#880000">2018</span>
Apache ActiveMQ <span style="color:#880000">5</span>.<span style="color:#880000">15</span>.<span style="color:#880000">2</span> Released
Christopher L. <span style="color:#0000ff">Shannon </span>posted on Oct <span style="color:#880000">23</span>, <span style="color:#880000">2017</span>
Apache ActiveMQ <span style="color:#880000">5</span>.<span style="color:#880000">15</span>.<span style="color:#880000">0</span> Released
Christopher L. <span style="color:#0000ff">Shannon </span>posted on <span style="color:#0000ff">Jul </span><span style="color:#880000">06</span>, <span style="color:#880000">2017</span>
省略以下记录
...
</code></span></span>

We can see that ActiveMq only releases a version every few months, and it is said that the research focus is on their next-generation product Apollo.
Next, let’s go to the RabbitMQ community to see the update frequency of RabbitMQ

<span style="color:#333333"><span style="background-color:#ffffff"><code class="language-yaml"><span style="color:#a31515">RabbitMQ</span> <span style="color:#880000">3.7</span><span style="color:#880000">.3</span> <span style="color:#a31515">release</span>  <span style="color:#880000">30</span> <span style="color:#a31515">January</span> <span style="color:#880000">2018</span>
<span style="color:#a31515">RabbitMQ</span> <span style="color:#880000">3.6</span><span style="color:#880000">.15</span> <span style="color:#a31515">release</span>  <span style="color:#880000">17</span> <span style="color:#a31515">January</span> <span style="color:#880000">2018</span>
<span style="color:#a31515">RabbitMQ</span> <span style="color:#880000">3.7</span><span style="color:#880000">.2</span> <span style="color:#a31515">release23</span> <span style="color:#a31515">December</span> <span style="color:#880000">2017</span>
<span style="color:#a31515">RabbitMQ</span> <span style="color:#880000">3.7</span><span style="color:#880000">.1</span> <span style="color:#a31515">release21</span> <span style="color:#a31515">December</span> <span style="color:#880000">2017</span>
<span style="color:#a31515">省略以下记录</span>
<span style="color:#a31515">...</span>
</code></span></span>

We can see that RabbitMQ versions are released much more frequently than ActiveMq. As for RocketMQ and kafka, I will not show you. In short, they are much more active than ActiveMQ. You can check the details yourself.
Here’s another performance comparison table

characteristic ActiveMQ RabbitMQ RocketMQ kafka
Development language java erlang java scala
Single machine throughput Level 10,000 Level 10,000 Level 100,000 Level 100,000
Timeliness ms level us level ms level Within ms level
Availability High (master-slave architecture) High (master-slave architecture) Very high (distributed architecture) Very high (distributed architecture)
Features Mature product, used in many companies; has more documentation; has good support for various protocols Developed based on Erlang, it has strong concurrency capabilities, extremely good performance, low latency, and rich management interfaces. MQ has relatively complete functions and good scalability Only the main MQ functions are supported. Some message query, message traceback and other functions are not provided. After all, it is prepared for big data and is widely used in the field of big data.
Based on the above materials, the following two points can be drawn:
(1) For small and medium-sized software companies, it is recommended to choose RabbitMQ. On the one hand, the Erlang language is inherently characterized by high concurrency, and its management interface is very convenient to use. As the saying goes, Xiao He is also a success, and Xiao He is a failure! Its shortcomings are also here. Although RabbitMQ is open source, how many programmers in China can customize and develop Erlang? Fortunately, the RabbitMQ community is very active and can solve bugs encountered during the development process, which is very important for small and medium-sized companies. The reason for not considering rocketmq and kafka is that on the one hand, small and medium-sized software companies are not as good as Internet companies, and the amount of data is not that large. When choosing message middleware, one should prefer one with relatively complete functions, so kafka is excluded. The reason for not considering rocketmq is that rocketmq is produced by Alibaba. If Alibaba gives up maintaining rocketmq, small and medium-sized companies generally cannot spare people to carry out customized development of rocketmq, so it is not recommended.
(2) For large software companies, choose between rocketMq and kafka based on specific usage. On the one hand, large software companies have enough funds to build a distributed environment and a large enough amount of data. For rocketMQ, large software companies can also dedicate manpower to customized development of rocketMQ. After all, there are still quite a few people in China who have the ability to modify JAVA source code. As for Kafka, depending on the business scenario, if it has a log collection function, Kafka is definitely the first choice. Which one to choose depends on the usage scenario.

4. How to ensure that the message queue is highly available?

Analysis : As mentioned in the second point, after the introduction of message queue, the availability of the system decreases. In production, no one uses message queues in standalone mode. Therefore, as a qualified programmer, you should have a deep understanding of the high availability of message queues. If during the interview, the interviewer asks, how does your message middleware ensure high availability? Your answer only shows that you only subscribe and publish messages, and the interviewer will wonder if you are just playing around and have never used it in production. Please be a programmer who loves to think, knows how to think, and understands thinking.
Answer : This question actually requires a deep understanding of the cluster mode of message queues before you can answer it.
Taking rcoketMQ as an example , its cluster has multi-master mode, multi-master and multi-slave asynchronous replication mode, and multi-master and multi-slave synchronous dual-write mode. Multi-master multi-slave mode deployment architecture diagram (found online, too lazy to draw):

 

image


In fact, when the blogger first saw this picture, he thought it was similar to Kafka, except that the NameServer cluster was replaced by ZooKeeper in Kafka, which is used to save and discover the master and slave. The communication process is as follows:
Producer establishes a long connection with one of the nodes (randomly selected) in the NameServer cluster, regularly obtains Topic routing information from NameServer, establishes a long connection with the Broker Master that provides Topic services, and regularly sends heartbeats to the Broker. The Producer can only send messages to the Broker master, but the Consumer is different. It establishes long-term connections with the Master and Slave that provide Topic services at the same time. It can subscribe to messages from the Broker Master or the Broker Slave.
As for kafka , for comparison and explanation, I will directly show the topology architecture diagram of kafka (I also found it, too lazy to draw it)

 

image


As shown in the figure above, a typical Kafka cluster contains several Producers (which can be Page View generated by the web front-end, or server logs, system CPU, Memory, etc.) and several brokers (Kafka supports horizontal expansion. Generally, the greater the number of brokers, The higher the cluster throughput rate), several Consumer Groups, and a Zookeeper cluster. Kafka uses Zookeeper to manage cluster configuration, elect leaders, and rebalance when the Consumer Group changes. Producer uses push mode to publish messages to broker, and Consumer uses pull mode to subscribe and consume messages from broker.
As for rabbitMQ , there are also ordinary cluster and mirror cluster modes. You can understand it by yourself. It is relatively simple and can be understood in two hours.
It is required that when answering high-availability questions, you should be able to draw your own MQ cluster architecture logically and clearly or describe it clearly.

5. How to ensure that messages are not consumed repeatedly?

Analysis : Another way to ask this question is, how to ensure the idempotence of message queues? This question can be considered a basic issue in the field of message queues. In other words, it is testing your design capabilities. The answer to this question can be based on specific business scenarios. There is no fixed answer.
Answer : First, let’s talk about why repeated consumption occurs.
  In fact, no matter what kind of message queue it is, the reasons for repeated consumption are actually similar. Under normal circumstances, when a consumer consumes a message, after the consumption is completed, a confirmation message will be sent to the message queue. The message queue will know that the message has been consumed and will delete the message from the message queue. It’s just that different message queues send different forms of confirmation information. For example, RabbitMQ sends an ACK confirmation message, and RocketMQ returns a CONSUME_SUCCESS success flag. Kafka actually has the concept of offset. Let’s talk about it briefly (if you still don’t understand, go out and find one) Kafka Getting Started to Mastering Tutorial), each message has an offset. After Kafka consumes the message, it needs to submit the offset to let the message queue know that it has been consumed. What is the reason for repeated consumption? It is because of network transmission and other failures, the confirmation information is not transmitted to the message queue, causing the message queue to not know that it has consumed the message and distribute the message to other consumers again.
  How to solve this problem? This question is divided into the following points according to the business scenario
  (1) For example, you get this message to perform an insert operation in the database. That would be easy. Give this message a unique primary key. Then even if there is repeated consumption, it will lead to primary key conflicts and avoid dirty data in the database.
  (2) For another example, if you get this message and perform a redis set operation, it will be easy and there is no need to solve it, because the result will be the same no matter how many times you set it, and the set operation is inherently idempotent.
  (3) If the above two situations still don't work, use the ultimate move. Prepare a third-party medium to record consumption. Taking redis as an example, assign a global id to the message. As long as the message is consumed, <id, message> will be written to redis in the form of KV. Before the consumer starts consuming, he can first check whether there is any consumption record in redis.

6. How to ensure reliable transmission of consumption?

Analysis : In the process of using message queues, we should ensure that messages cannot be consumed more or less. If reliable transmission cannot be achieved, it may cause property losses in the tens of millions to the company. Similarly, if reliable transmission is not considered during use, isn't this digging a hole for the company? You can just walk away and who will bear the money lost by the company. Again, take every project seriously and don’t dig holes for the company.
Answer : In fact, this reliable transmission must be analyzed from three perspectives for each MQ: the producer loses data, the message queue loses data, and the consumer loses data.

RabbitMQ

(1) The producer loses data.
From the perspective of the producer losing data, RabbitMQ provides transaction and confirm modes to ensure that the producer does not lose messages.
The transaction mechanism means that before sending a message, open the transaction (channel.txSelect()), and then send the message. If any exception occurs during the sending process, the transaction will be rolled back (channel.txRollback()). If the message is sent successfully, the transaction will be submitted. (channel.txCommit()).
However, the disadvantage is that throughput is reduced. Therefore, according to the blogger's experience, the confirm mode is mostly used in production. Once the channel enters confirm mode, all messages published on the channel will be assigned a unique ID (starting from 1). Once the message is delivered to all matching queues, rabbitMQ will send an Ack to the producer ( Contains the unique ID of the message), which allows the producer to know that the message has correctly arrived at the destination queue. If rabiitMQ fails to process the message, it will send a Nack message to you, and you can retry the operation. The code for processing Ack and Nack is as follows (I said it was not code, but I did it secretly):

<span style="color:#333333"><span style="background-color:#ffffff"><code class="language-java">channel.addConfirmListener(<span style="color:#0000ff">new</span> <span style="color:#a31515">ConfirmListener</span>() {  
                <span style="color:#2b91af">@Override</span>  
                <span style="color:#0000ff">public</span> <span style="color:#0000ff">void</span> <span style="color:#a31515">handleNack</span>(<span style="color:#a31515">long</span> deliveryTag, <span style="color:#a31515">boolean</span> multiple) <span style="color:#0000ff">throws</span> IOException {  
                    System.out.println(<span style="color:#a31515">"nack: deliveryTag = "</span>+deliveryTag+<span style="color:#a31515">" multiple: "</span>+multiple);  
                }  
                <span style="color:#2b91af">@Override</span>  
                <span style="color:#0000ff">public</span> <span style="color:#0000ff">void</span> <span style="color:#a31515">handleAck</span>(<span style="color:#a31515">long</span> deliveryTag, <span style="color:#a31515">boolean</span> multiple) <span style="color:#0000ff">throws</span> IOException {  
                    System.out.println(<span style="color:#a31515">"ack: deliveryTag = "</span>+deliveryTag+<span style="color:#a31515">" multiple: "</span>+multiple);  
                }  
            });  
</code></span></span>

(2) The message queue loses data.
To deal with the situation where the message queue loses data, the configuration of persistent disk is usually enabled. This persistence configuration can be used in conjunction with the confirm mechanism. You can send an Ack signal to the producer after the message is persisted to disk. In this way, if rabbitMQ dies before the message is persisted to disk, the producer will not receive the Ack signal and the producer will automatically resend it.
So how to persist, by the way, it is actually very easy, just follow the following two steps:
1. Set the persistence flag of the queue durable to true, which means it is a durable queue
2. When sending a message, set deliveryMode= 2
After this setting, even if rabbitMQ hangs, the data can be restored after restarting
(3) Consumer data loss
Consumer data loss is generally due to the automatic confirmation message mode. In this mode, consumers automatically acknowledge receipt of the message. At this time, rahbitMQ will delete the message immediately. In this case, if the consumer encounters an exception and fails to process the message, the message will be lost.
As for the solution, simply confirm the message manually.

kafka

Here is a data flow diagram of kafka Replication.
 

image


When the Producer publishes a message to a Partition, it first finds the Leader of the Partition through ZooKeeper, and then no matter what the Replication Factor of the Topic is (that is, how many Replicas the Partition has), the Producer only sends the message to the Leader of the Partition. . The Leader will write the message to its local Log. Each Follower pulls data from the Leader.
In response to the above situation, the following analysis is drawn:
(1) The producer loses data.
In kafka production, there is basically one leader and multiple followers. follower will synchronize leader information. Therefore, in order to prevent the producer from losing data, make the following two configurations:

  1. The first configuration is to set acks=all on the producer side. This configuration ensures that the message is considered to be sent successfully only after follower synchronization is completed.
  2. Set retries=MAX on the producer side. Once writing fails, it will retry infinitely.

(2) The message queue loses data.
When the message queue loses data, it is nothing more than that the leader hangs up before the data is synchronized. At this time, zoopeer will switch other followers to leaders, and the data will be lost. For this situation, two configurations should be made.

  1. replication.factor parameter, this value must be greater than 1, that is, each partition must have at least 2 replicas
  2. min.insync.replicas parameter, this value must be greater than 1. This requires a leader to at least sense that at least one follower is still in contact with itself.

These two configurations combined with the above producer configuration can basically ensure that Kafka does not lose data.

(3) The consumer loses data.
In this case, the offset is usually automatically submitted, and then the program hangs during processing. kafka thinks you have taken care of it.
Let me emphasize again what offset is for : it refers to the subscript consumed by each consumer group in Kafka's topic. To put it simply, a message corresponds to an offset subscript. If you submit an offset each time you consume data, then the next consumption will start from the submitted offset plus one.
For example, there are 100 pieces of data in a topic, and I consume 50 pieces and submit them. Then the offset of the kafka server record submitted at this time is 49 (offset starts from 0), then the next time I consume, the offset starts from 50. .
The solution is also very simple, just change to manual submission.

ActiveMQ and RocketMQ

Please check it out yourself

7. How to ensure the order of messages?

Analysis : In fact, not all companies have this business need, but we still need to review this issue.
Answer : In response to this problem, through a certain algorithm, messages that need to be kept in order are placed in the same message queue (partition in kafka, queue in rabbitMq). Then only use one consumer to consume the queue.
Some people may ask: What if there are multiple consumers consuming for the sake of throughput?
There is no fixed answer to this question. For example, we have a Weibo operation, including posting a Weibo post, writing a comment, and deleting a Weibo post. These are three asynchronous operations. If it is such a business scenario, then just try again. For example, if you, a consumer, first perform the operation of writing a comment, but at this time, Weibo has not posted yet, writing the comment must have failed. Wait for a while. Wait for another consumer to execute the operation of writing a comment first, then execute it, and it will be successful.
In short, regarding this issue, my view is that it is enough to ensure orderly entry into the queue, and the order after leaving the queue is left to the consumers themselves. There is no fixed routine.

Guess you like

Origin blog.csdn.net/weixin_45925028/article/details/132984520