[Message Queue] Introduction to General Basics

message queue

1. The essence of MQ

Message Queue (Message Queue, hereinafter referred to as MQ) can be understood as "one send, one store, one consumption", and more directly is a "transponder"

For MQ, whether it is RocketMQ, Kafka or other message queues, their essence is: one send, one save and one consumption.
From this essence as the root, let's explore the basic knowledge of MQ together from the shallower to the deeper.
Insert picture description here

The most primitive model of message queue is shown in the figure above, which contains two key points: message and queue.

  1. Message: It is the data to be transmitted, which can be the simplest text string or a custom complex format (as long as it can be parsed in a predetermined format).
  2. Queue: You should be familiar with it. It is a first-in, first-out data structure. It is a container for storing messages. The message enters the team from the end of the team, leaves the team from the head of the team, the process of sending messages when entering the team, and the process of receiving messages when leaving the team.

2. Model evolution

Nowadays the most commonly used message queue products (RocketMQ, Kafka, etc.), you will find that
they are all expanded on the most primitive message model, and some new terms are proposed, such as: topic, partition, Queue and so on.

We can simplify the complex first, starting from the evolution of the message model:

2.1 Queue model

The original message queue was a queue in the strict sense. Messages are written in the order in which they are read out.
However, there is no "read" operation in the queue. Reading means leaving the queue and "deleting" the message from the head of the queue.
Insert picture description here
The above figure is the queue model: it allows multiple producers to send messages to the same queue. However, if there are multiple consumers, it is actually a competitive relationship, that is, a message can only be received by one of the consumers, and it will be deleted when it is read.

2.2 Publish-subscribe model

The original queue model has an unsatisfactory scenario requirement: multiple consumers need the same message data, and each consumer needs to receive the full amount of messages.

There is a solution: create a separate queue for each consumer and let the producer send multiple copies. This approach is stupid, and the same data will be copied multiple copies, which is a waste of space.

In order to solve this problem, a new message model has evolved: the publish-subscribe model . As shown below.
Insert picture description here
In the publish-subscribe model, the container storing the message becomes the "topic", and the subscriber needs to "subscribe to the topic" before receiving the message. In the end, each subscriber can receive the full amount of messages on the same topic.

Carefully compare the similarities and differences between it and the "queue model": the producer is the publisher, the queue is the topic, and the consumer is the subscriber. There is no essential difference. The only difference is whether a piece of message data can be consumed multiple times.

2.3 Conclusion

Finally, to make a summary, the above two models are plainly: the difference between unicast and broadcast .
Moreover, when there is only 1 subscriber in the publish-subscribe model, it is the same as the queue model, so it is fully compatible with the queue model in function. This also explains why the modern mainstream RocketMQ and Kafka are implemented directly based on the publish-subscribe model? In addition, why is there an Exchange module in RabbitMQ? In fact, in order to solve the problem of message delivery, the publish-subscribe model can be implemented in disguise.

Including the concepts of "consumer group", "cluster consumption", and "broadcast consumption" that you have come across, they are all related to the above two models, as well as the most common situations at the application level: inter-group broadcast, intra-group unicast, It also belongs to this category.

3. Application scenarios of MQ

There are many application scenarios for MQ, and the following scenarios are often encountered:

  1. System decoupling
  2. Asynchronous communication
  3. Traffic peak clipping
    In addition, the more common ones are:
  • Delayed notification
  • Final consistency guarantee
  • Sequential message
  • Streaming

The order in which application scenarios and message models appear must be the application scenario (problem requirements) before the message model (solution)

MQ can develop from the most primitive queue model to today's various message middleware (platform-level solutions) blooming in abundance. It is inextricably changeable, thanks to the wide adaptability of the message model .

For the message model, we can also understand it as: solving the communication problem between the producer and the consumer.
Insert picture description here
The figure above is a comparison diagram of the communication model of RPC and the communication model of MQ.
Through comparison, we can see the main two points:

  • After the introduction of MQ, the previous RPC has become the current two RPCs, and the producer is only coupled with the queue, it does not need to know the existence of the consumer at all.
  • One more intermediate node "queue" for message dumping is equivalent to changing synchronization into asynchronous.

4. Design an MQ yourself

4.1 The prototype of MQ

Let's start with the simple version of MQ first. If we only implement a very crude MQ without considering the requirements of the production environment, how should we design it?

As I said at the beginning of the article, any MQ is nothing more than: one issue, one storage and one consumption, which is the core functional requirement of MQ. In addition, from a technical perspective, the MQ communication model can be understood as: two RPC + message dumps.

With these understandings, I believe that as long as you have a certain programming foundation, you can write a prototype of MQ in less than an hour:

1. Directly use the mature RPC framework (Dubbo or Thrift) to implement two interfaces: sending and reading messages.
2. The message can be stored in the local memory, and the data structure can be the ArrayBlockingQueue that comes with the JDK.

4.2 Write an MQ suitable for production environment

Of course, our goal is by no means limited to an MQ prototype, but to realize a message middleware that can be used in a production environment. The difficulty is definitely not an order of magnitude. How do we start?

1. Grasp the key points of this problem first

If we still only consider the most basic functions: send messages, store messages, consume messages (support publish-subscribe mode).
In a production environment, what challenges will these basic functions face? We can quickly think of the following:
1. How to ensure the performance of sending and receiving messages in high concurrency scenarios?

2. How to ensure the high availability and high reliability of the message service?

3. How to ensure that the service can be horizontally expanded arbitrarily?

4. How to ensure that the message storage is also horizontally scalable?

5. How to manage various metadata (such as each node, topic, consumer relationship in the cluster, etc.), do you need to consider the consistency of the data?

It can be seen that the three high problems in high concurrency scenarios will be encountered when you design an MQ. "How to meet the non-functional requirements such as high performance and high reliability" is the key to this problem.

2. Overall design ideas

Looking at the overall structure, there are three types of roles involved:
Insert picture description here
In addition, after further detailing the core process of "one issue, one deposit, one consumption", the more complete data flow is as follows:
Insert picture description here
Based on the above two diagrams, we can quickly figure out 3 The roles of class roles are as follows:

1. Broker (server): The core part of MQ is the server of MQ. The core logic is almost all here. It provides RPC interfaces for producers and consumers, responsible for message storage, backup and deletion, and consumption Maintenance of relationships, etc.
2. Producer: One of the clients of MQ calls the RPC interface provided by Broker to send messages.
3. Consumer (consumer): Another client of MQ calls the RPC interface provided by Broker to receive messages and complete the consumption confirmation at the same time.

3. Detailed design

Below, we will discuss some specific technical difficulties and feasible solutions.

  • Difficulty 1: RPC communication

The solution is the communication problem between Broker and Producer and Consumer. If you don't reinvent the wheel, you can directly use the mature RPC framework Dubbo or Thrift to implement it, so you don't need to consider a series of issues such as service registration and discovery, load balancing, communication protocols, and serialization methods.

Of course, you can also do the underlying communication based on Netty, use Zookeeper, Euraka, etc. as the registry, and then customize a set of new communication protocols (similar to Kafka), or implement it based on the standardized MQ protocol such as AMQP ( Similar to RabbitMQ). Compared with directly using the RPC framework, this solution has more customization capabilities and optimization space.

  • Difficulty 2: High-availability design

High availability mainly involves two aspects: high availability of Broker services and high availability of storage solutions. The discussion can be taken apart.

The high availability of the Broker service only needs to ensure that the Broker can be horizontally expanded for cluster deployment. It is further guaranteed by automatic service registration and discovery, load balancing, timeout retry mechanism, and ack mechanism when sending and consuming messages.

There are two ideas for the high availability of storage solutions: 1) Refer to Kafka's partition + multi-copy mode, but need to consider data replication and consistency solutions in distributed scenarios (similar to Zab, Raft and other protocols), and realize automatic failover; 2 ) You can also use mainstream DBs, distributed file systems, and KV systems with persistence capabilities, all of which have their own high-availability solutions.

  • Difficulty 3: Storage design

The message storage scheme is the core part of MQ. The reliability guarantee has been discussed in the high-availability design. If the reliability requirements are not high, the memory or distributed cache can be used directly. Here is the focus on how to ensure the high performance of storage? The decisive factor of this problem lies in the design of the storage structure.

The current mainstream solution is: append log file (data part) + index file method (many mainstream open source MQ uses this method), dense index or sparse index can be considered in index design, and jump table can be used to find messages , Double search, etc., and can also improve the read and write performance of disk files through technologies such as page caching and zero copy of the operating system.

If you do not pursue high performance, you can also consider an existing distributed file system, KV storage or database solution.

  • Difficulty 4: Consumer relationship management

In order to support the publish-subscribe broadcast model, the Broker needs to know which Consumers are subscribed to each topic, and deliver messages based on this relationship.

Since Broker is deployed in a cluster, the consumer relationship is usually maintained on public storage, which can be managed and notified of changes based on configuration centers such as Zookeeper and Apollo.

  • Difficulty 5: High-performance design

The high performance of storage has been discussed before, of course, performance can be further optimized from other aspects.

For example, the Reactor network IO model, the design of the business thread pool, the batch sending on the production side, the asynchronous brushing on the Broker side, the batch pulling on the consumer side, and so on.

4.3 Summary

In summary, we must answer: how to design an MQ?

1. You need to start with both functional requirements (receiving and sending messages) and non-functional requirements (high performance, high availability, high expansion, etc.).

2. Functional requirements are not the key point. It can cover the most basic functions of MQ. As for advanced features such as delayed messages, transaction messages, and retry queues, they are just icing on the cake.

3. The most important thing is: the ability to combine functional requirements, clarify the overall data flow, and then follow this line of thinking to consider how to meet non-functional requirements. This is the technical difficulty.

Guess you like

Origin blog.csdn.net/weixin_40849588/article/details/114903186