Those things about the message queue - the choice of MQ

Before talking about how to choose a message queue? First of all, we must understand a question, that is, why use message queues? Can the message queue be replaced by other technologies?

  • Why use message queues?

The reason for using message queues can be seen from three aspects, that is, from the three core characteristics of message queues: decoupling, peak clipping, and asynchrony.

decoupling

As for the coupling, we can first look at such an example: there are four systems A, B, C, and D, and system A needs to send messages to the three systems BCD and perform corresponding operations. At the beginning, such a scenario is OK, but after the system has been running for a period of time, a new system E is added. System E also needs system A to send data, and uses the interface of E. At this time, system A has to be modified. Code, which is dedicated to sending data to the E system in the code. But after a period of time, system D told system A that there is no need for A to send messages to it, and system A went to comment or delete the part of the code that sends data to D. For system A, it may be easy to make one or two changes at the beginning, but if more and more requirements are required, it will be very troublesome to modify at this time, and system A is seriously coupled with various messy systems. In addition, system A also needs to consider what it says if the associated system is down? What should I do if the access to System D times out? Do you need a retry mechanism? And so on. Let the coupling of system A be higher, and the processing is more cumbersome.

At this time, if MQ is introduced, the situation will be very different. In the same scenario as above, the new system E needs the message of system A, and only needs to read it from MQ. It does not need to inform system A and let it do the corresponding The modification; system D does not need the message of A, and only needs to cancel the subscription to the message sent by system A in MQ. Now the whole process becomes:

1. System A generates a piece of data and sends it to MQ.

2. Which system needs data to be consumed in MQ by itself.

3. If the new system requires data to be consumed directly from MQ.

4. If a system does not need this data, just cancel the consumption of MQ messages.

With MQ, system A does not need to consider who to send data to, maintain the code, and does not need to consider whether the caller succeeds in calling or fails to time out. In short, through such a model of MQ, publishing and subscribing messages, system A is completely decoupled from other systems.

flow clipping

Traffic peak shaving, as the name suggests, is to coordinate the processing of traffic peaks, allowing traffic to be processed in batches, thereby reducing the pressure on the server. A vivid analogy is that message queues are like "reservoirs", storing upstream floods and reducing flood peaks entering downstream rivers flow, so as to achieve the purpose of reducing flood disasters. For the rush-buying of train tickets during the Spring Festival, a large number of users rush to buy them at the same time. Another example is the well-known Ali Double 11 flash sale. Hundreds of millions of users flooded in in a short period of time, and the traffic was huge in an instant.

Use the number of Mysql requests to explain the benefits of traffic peak clipping. The number of Mysql requests is limited, depending on the server. Just like my server, the default is 151 requests. If at a certain moment, two hundred users perform a large number of operations on my website at the same time, a large number of requests will flood into my system, and the peak period may reach three or four hundred requests. The system is based on Mysql, so At this time, there are three or four hundred SQL operations performed on Mysql. At this time, it is obvious that my Mysql cannot support so many operations, Mysql cannot run normally, and then users cannot operate on my website. But usually it is impossible to have such a large amount of concurrency, and there is almost no pressure on the operation of the system. Of course, this is just an example. The maximum number of online Mysql requests cannot be so low, but there will be such a problem.

What about the situation after joining MQ? When a large number of operations and requests flood into the server, assuming that there are 500 requests per second, and write these 500 requests into MQ, the system can only process 151 requests per transaction because the maximum concurrent number of Mysql is 151. The system slowly pulls requests from MQ, and it is enough to fetch 151 requests per second. At this time, even at the highest peak time, the system will not hang up, because all requests are stored in MQ, and Mysql also sends requests per second. Handles the maximum number of concurrent requests. Although for MQ, 500 requests come in per second, and 151 requests go out, during the peak period, there may be a backlog of thousands of requests, but these are no problems, because after the peak period, Mysql will still follow each 151 requests per second are processed. Therefore, as long as the peak period has passed, the system will quickly resolve the backlog of information.

asynchronous processing

The so-called asynchrony means that the content of a request operation is divided into several steps, and these steps do not need to be synchronous. For example, handle a request for a seckill system.

A seckill goes through many steps, such as: risk control, inventory lock, order generation, SMS notification, and update statistics. If the message queue has not been optimized, the normal processing flow is: the App sends a request through the gateway, calls the above steps in turn, and after the call is completed, then returns the result to the App through the gateway. For these 5 steps, whether the seckill can be determined to be successful or not, in fact, there are only 2 steps of risk control and inventory locking. As long as the user’s seckill request passes the risk control and the inventory is locked on the server side, the seckill result can be returned to the user. Subsequent steps such as generating orders, SMS notifications, and updating statistics do not necessarily have to be processed in the seckill request Finish.

Therefore, when the server completes the previous two steps and determines the result of the seckill request, it can immediately return a response to the user, and then put the requested data into the message queue, which will perform subsequent operations asynchronously.

Processing a seckill request is reduced from 5 steps to 2 steps, so that not only the response speed is faster, but also during the seckill period, we can use a lot of server resources to process the seckill request. After the seckill is over, resources are used to process the following steps, making full use of limited server resources to process more seckill requests. From this, we can see that the introduction of message queue can get the return result faster and reduce waiting, which naturally realizes the concurrency between steps and improves the overall performance of the system.

  • What are the advantages and disadvantages of message queues

The advantages of message queues are also the three core features mentioned above. The corresponding disadvantages are also caused by the advantages.

Reduced system availability: The more external dependencies the system introduces, the easier it is to hang up. Before, it was just a system A that called the interface of the BCD three systems. Now an MQ is added. If the MQ system hangs up, the whole system will collapse. .

The complexity of the system increases: after joining the MQ system, it is impossible to guarantee whether the message will be repeatedly consumed, how to deal with the loss of the message, how to ensure the order of message delivery, and other problems will follow.

Consistency problem: A system returns success after processing, you think this request is successful. But the problem is that if one of the three BCD systems fails, the data will not be consistent.

  • select message queue

Common MQs now include ActiveMQ, RabbitMQ, RocketMQ, and Kafka. Each of the four MQs has its own advantages and disadvantages.

characteristic

ActiveMQ

RabbitMQ

RocketMQ

Kafka

Stand-alone throughput

10,000, the throughput is an order of magnitude lower than that of RocketMQ and Kafka

10,000, the throughput is an order of magnitude lower than that of RocketMQ and Kafka

100,000 level, RocketMQ is also a kind of MQ that can support high throughput

100,000 levels, this is the biggest advantage of Kafka, that is, its high throughput.

 

Generally cooperate with big data systems to perform real-time data calculation, log collection and other scenarios

The impact of the number of topics on throughput

 

 

The topic can reach hundreds or thousands of levels, and the throughput will drop slightly

 

This is a major advantage of RocketMQ. Under the same machine, it can support a large number of topics

When the topic ranges from dozens to hundreds, the throughput will drop significantly

 

Therefore, under the same machine, Kafka tries to ensure that the number of topics is not too large. If you want to support large-scale topics, you need to add more machine resources

Timeliness

ms level

Microsecond level, this is a major feature of rabbitmq, the delay is the lowest

ms level

The delay is within ms level

availability

High, based on master-slave architecture to achieve high availability

High, based on master-slave architecture to achieve high availability

very high, distributed architecture

Very high, Kafka is distributed, multiple copies of one data, a few machines down, no data loss, no unavailability

message reliability

There is a lower probability of losing data

 

After parameter optimization and configuration, zero loss can be achieved

After parameter optimization configuration, the message can achieve zero loss

function support

The functions in the MQ field are extremely complete

Developed based on erlang, so the concurrency capability is very strong, the performance is extremely good, and the delay is very low

The MQ function is relatively complete, or distributed, and has good scalability

The functions are relatively simple, and mainly support simple MQ functions. Real-time computing and log collection in the field of big data are used on a large scale, which is the de facto standard

Summary of advantages and disadvantages

Very mature and powerful, it has been applied in a large number of companies and projects in the industry

 

Occasional low probability of missing messages

 

And now there are fewer and fewer community and domestic applications. The official community now maintains less and less ActiveMQ 5.x, and only releases a version in a few months.

 

And indeed it is mainly used based on decoupling and asynchrony, and is rarely used in large-scale throughput scenarios

 

Developed in erlang language, the performance is extremely good and the delay is very low;

 

The throughput reaches 10,000 levels, and the MQ function is relatively complete

 

And the management interface provided by open source is very good and easy to use

 

The community is relatively active, and several versions are released almost every month.

 

Some domestic Internet companies have used rabbitmq more in recent years

 

But the problem is also obvious. RabbitMQ does have a lower throughput because the implementation mechanism it makes is heavier.

 

And erlang development, how many domestic companies have the strength to do erlang source level research and customization? If you don’t have this ability, you will occasionally have some problems. It is difficult for you to read and understand the source code. Your company’s control over this thing is very weak, and the basic functions depend on the rapid maintenance and bug fixes of the open source community.

 

And the dynamic expansion of the rabbitmq cluster will be very troublesome, but I think this is okay. In fact, it is mainly a problem caused by the erlang language itself. Difficult to read source code, difficult to customize and control.

The interface is simple and easy to use, and after all, it has been applied on a large scale in Ali, and it is guaranteed by the Ali brand

 

It processes tens of billions of messages a day, can achieve large-scale throughput, and has very good performance. Distributed expansion is also very convenient, community maintenance is OK, reliability and availability are ok, and it can support a large number of topics , support complex MQ business scenarios

 

And a big advantage is that Ali’s products are all java-based, we can read the source code by ourselves, customize our company’s MQ, and control

 

The community activity is relatively average, but it’s okay. The documentation is relatively simple, and the interface is not in accordance with the standard JMS specification. Some systems need to modify a lot of code to migrate

 

There is also the technology introduced by Ali. You have to do well in case this technology is abandoned and the risk of the community becoming yellow. If your company has technical strength, I think it is very good to use RocketMQ

The characteristics of kafka are actually very obvious, that is, it only provides fewer core functions, but it provides ultra-high throughput, ms-level delay, extremely high availability and reliability, and the distribution can be expanded arbitrarily

 

At the same time, it is best for Kafka to support a small number of topics to ensure its ultra-high throughput

 

Moreover, the only disadvantage of Kafka is the possibility of repeated consumption of messages, which will have a very slight impact on data accuracy. In the field of big data and log collection, this slight impact can be ignored

 

This feature is naturally suitable for big data real-time computing and log collection

因此一般的业务系统要引入MQ,最早的时候都用AactiveMQ。但是现在用的不多,没有经过大规模吞吐量场景的验证,社区也不是很活跃。后来开始使用RabbitMQ,但是由于RabbitMQ是由erlang语言阻止了大梁Java工程师的研究,对公司而言,几乎处于不可控的状态,但是确实是开源的,比较稳定的支持,活跃度也高。另外现在也有越来越多的公司会去使用RocketMQ,但是社区万一突然黄掉的风险,所以自己公司没有实力,就还是使用RabbitMQ。如果是大数据领域的实时计算、日志采集等场景,用Kafka是业内标准的,绝对没问题,社区活跃度很高,绝对不会黄。

 

 

Guess you like

Origin blog.csdn.net/qq_35363507/article/details/105436286