Introduction to MQ theory and comparison with mainstream MQ

1. What is MQ?

MQ (Message Queue) message queue is a data structure of "first in, first out" in the basic data structure. It refers to putting the data (message) to be transmitted in the queue, and using the queue mechanism to realize message delivery-the producer generates a message and puts the message into the queue, and then the consumer processes it. Consumers can go to the specified queue to pull messages, or subscribe to the corresponding queue, and the MQ server will push messages to it.

Similar to a database, an application that needs to be independently deployed on a server provides an interface for other system calls.

It is mainly used for the decoupling of communication between various systems.

For example:
For example, when logging in to the system, after logging in, you need to call the SMS system to send a text message to the user saying that you have logged in. At the same time, you need to call the log system to record the login log, and you need to call the point system to increase the points for login and sign-in, etc.
In this case, the login system is strongly coupled with the log system, SMS system, point system, etc., which may cause calls to fail, information loss and other risks, and will increase the complexity of the system.
For example, if the call to the log system fails after logging in, the log information of this login will be lost and cannot be retrieved.
Moreover, sequential execution will lead to low operating efficiency of the login system.
Then if you use message middleware, you only need to push the task into the message queue after logging in, and you don't have to worry about it. Other systems fetch tasks from the queue.
Realize decoupling and asynchronous calls (asynchronous is relative to synchronous, synchronous is to wait, when the system executes a certain task, it must wait until the end of the task, the system will continue to execute, asynchronous does not wait.)
At the same time, it also has the advantages of being able to achieve horizontal expansion, safety and reliability


JMS (Java Message Service), that is, Java Message Service, is a set of Java application programming interfaces (Java API), which provide services for creating, sending, receiving, and reading messages. The JMS API designed by Sun and its partners defines a set of common application programming interfaces and corresponding syntax, enabling Java programs to communicate with other messaging components.
AMQP , or Advanced Message Queuing Protocol, is an application-layer standard Advanced Message Queuing Protocol that provides unified messaging services. It is an open standard for application-layer protocols designed for message-oriented middleware. The client and message middleware based on this protocol can transmit messages, and it is not limited by different client/middleware products, different development languages ​​and other conditions. Implementations in Erlang include RabbitMQ, etc.

2. Why use MQ (the role of MQ)

What kind of business scenarios are in actual work, what technical challenges does this business scenario have, it may be more troublesome if MQ is not used, including what are the benefits of using MQ now, etc.
The core role of MQ: decoupling, asynchronous, cutting edge.

2.1 System decoupling

Scenario 1. Initially, system A sends data to the three systems of BCD through interface calls. If system D suddenly says: now the data is not needed, you don’t need to send data to me. At this time, system A can only modify the code and call system D Delete the code; system E said it needs this data, then the person in charge of system A has no choice but to change the code again... Then there is something more crashing, system A must always consider what to do if the four systems of BCDE hang up , do you want to resend, do you want to save the message...?
insert image description here

  • In the above scenarios, BCDE needs to use the data provided by system A. System A is heavily coupled with the other four systems. It is necessary to always consider what to do if the other four systems fail, and whether to resend data to them. This At that time, the heart of System A was broken.
  • But after using MQ, the data of system A only needs to be placed in MQ, and other systems only need to consume in MQ if they want to request data. If they suddenly don’t want to request, they can cancel the consumption of MQ. A The system does not need to consider who will respond to this data at all, nor does it need to maintain the code, nor does it need to consider whether other systems call successfully, fail to time out, etc.
    insert image description here

Summary: Through the model of publishing and subscribing messages through MQ, system A is successfully decoupled from other systems.
Interview skills: You need to think about whether there is a similar situation in your own system. A system or module calls multiple systems or modules. The calls between them are very complicated and troublesome to maintain, but in fact This call does not need to directly call the interface synchronously. If you use MQ to decouple it asynchronously, you need to think about whether you can use MQ to decouple the system in your project and organize it yourself. Answer in language.

2.2 Asynchronous calls

(For details, see the article – asynchronous transmission )
Scenario 2 is still four systems of ABCD. System A receives a request and needs to write the library locally, and also needs to write the library to the three systems of BCD. It takes 3ms for system A to write the local library by itself , writing libraries to other systems is relatively slow, 200ms for system B, 350ms for system C, and 400ms for system D. In this way, the time from request to response of the entire function is 3ms+200ms+350ms+400ms=953ms, which is close to one second. For users, it is basically unacceptable to wait for such a long time to click a button, and it also reflects that the technology of this developer is not good.
insert image description here
In general Internet companies, the response time for user requests is required to be between 100ms and 200ms. In this way, the user's eyes will have a visual pause phenomenon, and the user response time can be within this range, so the above phenomenon is not advisable.
If MQ is used, it takes 3ms for the user to send a request to system A, and system A sends three messages to MQ. If it takes 5ms, the user only takes 8ms from sending the request to the corresponding 3ms+5ms=8ms, and the user experience is very good .
insert image description here

2.3 Flow clipping

(Reduce the pressure on the server during the peak period)
Scenario 3: The outbreak of the new crown virus in 2020 has caused the masks in the APPs of major online shopping malls to be sold out. The activity of rushing to buy 3Q masks, make an appointment at 3:00 pm every day, and rush to buy at 8:00 pm. Since the JD Mall just launched this event, Xiao Ming has been robbing it for nearly a week, which can be regarded as witnessing a million concurrent system from problems to perfection. A process, on the first day, when rushing to buy, there were more than one million reservations, and it is estimated that there will be a million concurrency by 8:00, but on the first day, when rushing to 8:00, due to the high concurrency , directly crashed the JD server, and reported an exception directly. Maybe JD didn’t expect such a high concurrency when the event was launched, and it was caught off guard, but this was only reported an exception a day or two ago However, the abnormal information did not appear later, and the response time became very slow when the panic buying was made later, but the JD system did not crash. In this case, MQ is generally used (or MQ was used before, but this time it was changed to MQ with a higher throughput level), it is also the use of one of the three major benefits of MQ-peak shaving.
The JD system is calm from 0 to 19 o'clock every day. As a result, when buying from 1 to 8 o'clock, the number of concurrent requests per second reaches one million.
Assume that the JD database can handle 1.50,000 concurrent requests per second (not actual data, mainly for example). When rushing to buy at 8 o'clock, millions of transactions are sent per second, which directly leads to system abnormalities, but after 8 o'clock, there may only be tens of thousands of users online, and the number of requests per second may be only a few hundred, which puts no pressure on the entire system .
insert image description here
If MQ is used, millions of requests are written into MQ per second, because the JD system can process 1W+ requests per second, after the JD system finishes processing, then go to MQ to pull 1W+ requests for processing, each time do not exceed what you can handle The maximum number of requests is ok. In this way, the system will not hang up until the peak time at 8 o'clock, but within an hour, the speed of processing requests by the system will definitely not be able to keep up with the concurrent requests of users, so they will be backlogged in MQ There may even be a backlog of tens of millions, but after the peak period, there will only be more than a thousand concurrent requests entering MQ per second, but the JD system will still process requests at a speed of 1W+ per second, so once the peak period passes, the JD system will It will quickly digest the backlog of requests in MQ, and the user may wait a little longer on the side, but it will never let the system hang up.
insert image description here
Decoupling : A business requires multiple modules to be implemented together, or a message needs to be processed by multiple systems. It only needs to send an MQ after the main business is completed, and the rest of the modules consume MQ messages to realize the business and reduce the communication between modules. coupling.
Asynchronous : After the execution of the main business, the subordinate business is executed asynchronously through MQ, which reduces the response time of the business and improves the user experience.
Peak shaving : In the case of high concurrency, the business is processed asynchronously, providing peak business processing capabilities and avoiding system paralysis.


3. Advantages and disadvantages of message queue

  • advantage

Decoupling, asynchronous, peak clipping (system decoupling, asynchronous call, traffic peak clipping)

  • shortcoming
  1. 系统可用性降低: The more external dependencies the system introduces, the higher the risk the system has to face. Take scenario 1 as an example. Originally, the four systems of ABCD are well coordinated and there is no problem, but you have to get an MQ to come in and intervene. Although there are many benefits, if MQ hangs up, then your system will hang up too.
  2. 系统复杂程度提高: If you have to add an MQ, how can you ensure that there is no repeated consumption ? How to handle the case of message loss ? How to ensure the order of message delivery ? There are too many problems about how to ensure the consistency of multi-system messages .
  3. 一致性的问题: After system A finishes processing and then passes it to MQ, it returns success directly. The user thinks your request is successful. However, if in the BCD system, the BC and BC systems write to the database successfully, what should I do if the D system fails to write the database? This leads to data inconsistencies.

so. The message queue is actually a very complex architecture. While enjoying the benefits brought by MQ, you also need to make various technical solutions to solve a series of problems brought by MQ. After everything is done, the system The level of complexity has abruptly increased by a level. Maybe it's several times more complicated. But at critical moments, use, still have to use...

What are the advantages and disadvantages of kafka, activemq, rabbitmq, rocketmq

characteristic ActiveMQ RabbitMQ RocketMQ kafka
Stand-alone throughput 10,000, the throughput is an order of magnitude lower than that of RocketMQ and Kafka 10,000, the throughput is an order of magnitude lower than that of RocketMQ and Kafka 100,000 level, RocketMQ is also a kind of MQ that can support high throughput 100,000 levels, this is the biggest advantage of Kafka, that is, its high throughput. Generally cooperate with big data systems to perform real-time data calculation, log collection and other scenarios
The impact of the number of topics on throughput The topic can reach hundreds or thousands of levels, and the throughput will drop slightly. This is a major advantage of RocketMQ. Under the same machine, it can support a large number of topics When the number of topics ranges from dozens to hundreds, the throughput will drop significantly. Therefore, under the same machine, Kafka tries to ensure that the number of topics is not too large. If you want to support large-scale topics, you need to add more machine resources
Timeliness ms level Microsecond level, this is a major feature of rabbitmq, the delay is the lowest ms level The delay is within ms level
availability High, based on master-slave architecture to achieve high availability High, based on master-slave architecture to achieve high availability very high, distributed architecture Very high, Kafka is distributed, multiple copies of one data, a few machines down, no data loss, no unavailability
message reliability There is a lower probability of losing data After parameter optimization and configuration, zero loss can be achieved After parameter optimization configuration, the message can achieve zero loss
function support The functions in the MQ field are extremely complete Developed based on erlang, so the concurrency capability is very strong, the performance is extremely good, and the delay is very low The MQ function is relatively complete, or distributed, and has good scalability The functions are relatively simple, and mainly support simple MQ functions. Real-time computing and log collection in the field of big data are used on a large scale, which is the de facto standard
Summary of advantages and disadvantages Very mature and powerful. It is used in a large number of companies and projects in the industry. Occasionally, there is a low probability of losing messages. And now there are fewer and fewer community and domestic applications. The official community now maintains less and less ActiveMQ 5.x A version was released in a few months and it is indeed mainly based on decoupling and asynchronous use, and is rarely used in large-scale throughput scenarios Developed in erlang language, the performance is extremely good, and the delay is very low; the throughput reaches 10,000 levels, the MQ function is relatively complete, and the management interface provided by the open source is very good, it is easy to use and the community is relatively active, and several releases are released almost every month Version classification In recent years, some domestic Internet companies have used rabbitmq more, but the problem is obvious. RabbitMQ does have a lower throughput, because its implementation mechanism is relatively heavy. And erlang development, how many domestic companies have the strength to do erlang source level research and customization? If you don’t have this ability, you will occasionally have some problems. It is difficult for you to read and understand the source code. Your company’s control over this thing is very weak, and the basic functions depend on the rapid maintenance and bug fixes of the open source community. And the dynamic expansion of the rabbitmq cluster will be very troublesome, but I think this is okay. In fact, it is mainly a problem caused by the erlang language itself. Difficult to read source code, difficult to customize and control The interface is simple and easy to use, and after all, it has been applied on a large scale in Ali. The Ali brand guarantees that it can process tens of billions of messages per day. It can achieve large-scale throughput, performance is also very good, distributed expansion is also very convenient, and community maintenance is OK. , the reliability and availability are ok, and it can also support a large number of topics and complex MQ business scenarios. And a big advantage is that Ali’s products are all java-based, and we can read the source code by ourselves and customize our own company’s MQ, it is relatively common to be able to control the activity of the community, but it is also possible. The documentation is relatively simple, and the interface is not in accordance with the standard JMS specification. Some systems need to modify a lot of code to migrate The characteristics of kafka are actually obvious, that is, it only provides fewer core functions, but it provides ultra-high throughput, ms-level delay, high availability and reliability, and the distribution can be expanded arbitrarily. At the same time, it is best for kafka to support more A small number of topics is enough to ensure its ultra-high throughput and the only disadvantage of Kafka is the possibility of repeated consumption of messages, which will have a very slight impact on data accuracy. In the field of big data and log collection, this point Slight impact can be ignored. This feature is naturally suitable for big data real-time computing and log collection.

In summary, after various comparisons, my personal opinion:

The general business system needs to introduce MQ. At first, everyone used ActiveMQ, but now it is true that people don’t use it much. It has not been verified in large-scale throughput scenarios, and the community is not very active, so let’s forget it. I personally don’t It is recommended to use this;

Later, everyone started to use RabbitMQ, but it is true that the erlang language prevented a large number of java engineers from in-depth research and control. For the company, it is almost in an uncontrollable state, but it is true that people are open source, relatively stable support, and active. high;

But now more and more companies will use RocketMQ, which is really good. RocketMQ is a distributed message middleware open sourced by Alibaba in 2012. It has been donated to the Apache Software Foundation and released on September 25, 2017. became an Apache top-level project. As a domestic middleware that has experienced the baptism of Alibaba's Double Eleven many times and has stable and outstanding performance, it has been used by more and more enterprises in recent years due to its high performance, low latency and high reliability.

Therefore, for small and medium-sized companies, the technical strength is relatively average, and the technical challenges are not particularly high. It is a good choice to use RabbitMQ. In fact, RocketMQ can also be used (but some people worry that if this technology is abandoned, the risk of the community becoming yellow, although RocketMQ has donated For Apache, but the main consideration is that the activity on GitHub is actually not high, but I personally feel that the possibility of yellowing is relatively small); for large companies, with strong infrastructure research and development capabilities, using RocketMQ is a good choice

For scenarios such as real-time computing and log collection in the field of big data, using Kafka is the industry standard. There is absolutely no problem. The community is very active and it is almost a de facto specification in this field.

Another point of view comparison:
insert image description here
the figure below lists the search frequency of these MQs in Google Trends from 2018.12 to 2019.12 worldwide, which can reflect the popularity of these middleware to some extent.
insert image description here
From this picture, we can see that Kafka is thriving, followed by RabbmitMQ, ActiveMQ and Apache Pulsar also have a certain proportion. The search volume of RocketMQ can be said to be negligible. In fact, in addition to several other MQ products of RocketMQ, the popularity can be compared based on this picture. However, RocketMQ must be excluded. For some reasons, many domestic users cannot search through Google, so the statistics about RocketMQ are actually inaccurate.
Compare the functional differences between RocketMQ and other MQs. Functional features mainly depend on product positioning. For example, Kafka is positioned in high-throughput loss logs and real-time computing scenarios; ActiveMQ, RabbitMQ, etc. are positioned in enterprise-level message middleware, so they provide many useful functions for enterprise development, such as delay Messages, transaction messages, message retries, message filtering, etc., and Kafka does not have these features, but the throughput of such products is significantly lower than Kafka.
      RocketMQ combines the features of Kafka, ActiveMQ, and RabbitMQ. In terms of performance, it can compete with Kafka; in terms of enterprise-level MQ features, it has many features provided by ActiveMQ and RabbitMQ. Therefore, when enterprises choose message middleware, RocketMQ is a product worth considering.

Redis

It is a Key-Value NoSQL database with active development and maintenance. Although it is a Key-Value database storage system, it supports the MQ function itself, so it can be used as a lightweight queue service. For the enqueue and dequeue operations of RabbitMQ and Redis, execute 1 million times each, and record the execution time every 100,000 times. The test data is divided into four different sizes of data: 128Bytes, 512Bytes, 1K and 10K. Experiments show that when entering the team, the performance of Redis is higher than that of RabbitMQ when the data is relatively small, and if the data size exceeds 10K, Redis is unbearably slow; when leaving the team, regardless of the size of the data, Redis shows very good performance , while RabbitMQ's dequeue performance is much lower than Redis.
insert image description here

RabbitMQ

It is an open-source message queue written in Erlang. It supports many protocols: AMQP, XMPP, SMTP, STOMP. This is exactly what makes it very heavyweight and more suitable for enterprise-level development. At the same time, a broker (Broker) architecture is implemented, which means that messages are first queued in the central queue when they are sent to the client. It has good support for routing (Routing), load balancing (Load balance) or data persistence.




refer1
refer2

Guess you like

Origin blog.csdn.net/JemeryShen/article/details/126742783