One day to understand the message queue interview stereotyped essay

The content is taken from my learning website: topjavaer.cn

Why use message queues?

To sum up, there are three main reasons: decoupling, asynchrony, and peak shaving .

1. Decoupling. For example, after the user places an order, the order system needs to notify the inventory system. If the inventory system cannot be accessed, the order will fail to reduce the inventory, resulting in the failure of the order operation. The order system is coupled with the inventory system. If you use the message queue at this time, you can return the success to the user. First, persist the message. After the inventory system recovers, you can consume normally and subtract the inventory.

2. Asynchronous. Write messages to the message queue, and non-essential business logic runs asynchronously without affecting the main process business.

3. Peak cutting. The consumer slowly pulls messages from the message queue according to the amount of concurrency that the database can handle. In production, this brief peak backlog is allowed. For example, the spike activity usually causes a surge in traffic and the application hangs due to excessive traffic. Add the message queue at this time. After receiving the user's request, the server first writes into the message queue. If the message queue length exceeds the maximum number, it will directly discard the user request or jump to the error page.

What are the disadvantages of using message queues?

  • System availability is reduced. After the message queue is introduced, if the message queue hangs, it may affect the availability of the business system.
  • Increased system complexity. After joining the message queue, many issues need to be considered, such as: consistency issues, how to ensure that messages are not repeatedly consumed, and how to ensure reliable transmission of messages, etc.

This article has been included in the Github warehouse, which includes core knowledge points such as computer foundation, Java foundation, multi-threading, JVM, database, Redis, Spring, Mybatis, SpringMVC, SpringBoot, distributed, microservices, design patterns, architecture, school recruitment and social recruitment sharing, welcome to star ~

Github address

If you can't access Github, you can access the gitee address.

gitee address

Common message queue comparison

contrast direction overview
throughput The throughput of 10,000-level ActiveMQ and RabbitMQ (ActiveMQ has the worst performance) is an order of magnitude lower than that of 100,000-level or even million-level RocketMQ and Kafka.
availability High availability can be achieved. Both ActiveMQ and RabbitMQ are based on a master-slave architecture to achieve high availability. RocketMQ is based on a distributed architecture. Kafka is also distributed, with multiple copies of one data, a few machines down, no data loss, no unavailability
Timeliness RabbitMQ is developed based on erlang, so it has strong concurrency capability, extremely good performance, and low latency, reaching the microsecond level. The other three are all at the ms level.
function support Except for Kafka, the other three functions are relatively complete. Kafka has relatively simple functions and mainly supports simple MQ functions. It is a de facto standard for real-time computing and log collection in the field of big data.
message lost The possibility of ActiveMQ and RabbitMQ being lost is very low, and RocketMQ and Kafka will not be lost theoretically.

Summarize:

  • The ActiveMQ community is relatively mature, but the performance of ActiveMQ is relatively poor compared to the current version, and the version iteration is very slow, so it is not recommended.
  • Although RabbitMQ is slightly inferior to Kafka and RocketMQ in terms of throughput, but because it is developed based on erlang, it has strong concurrency, excellent performance, and low latency, reaching the microsecond level. But also because RabbitMQ is developed based on erlang, few companies in China have the strength to do erlang source level research and customization. If the business scenario does not require too much concurrency (100,000-level, million-level), then among these four message queues, RabbitMQ must be your first choice. For scenarios such as real-time computing and log collection in the field of big data, using Kafka is the industry standard, absolutely no problem, the community is very active, and it will definitely not be yellow, not to mention that it is almost a factual specification in this field all over the world.
  • RocketMQ is produced by Alibaba. It is an open source Java project. We can read the source code directly, and then customize our own company's MQ. RocketMQ has the actual test of Alibaba's actual business scenarios. The RocketMQ community is relatively active, but it’s okay. The documentation is relatively simple, and the interface is not in accordance with the standard JMS specification. Some systems need to modify a lot of code to migrate. There is also the technology introduced by Ali. You have to do well in case this technology is abandoned and the risk of the community becoming yellow. If your company has technical strength, I think it is very good to use RocketMQ
  • The characteristics of Kafka are actually obvious, that is, it only provides fewer core functions, but it provides ultra-high throughput, ms-level delay, extremely high availability and reliability, and the distribution can be expanded arbitrarily. At the same time, it is best for Kafka to support a small number of topics to ensure its ultra-high throughput. The only disadvantage of Kafka is the possibility of repeated consumption of messages, which will have a very slight impact on data accuracy. In the field of big data and log collection, this slight impact can be ignored. This feature is naturally suitable for real-time computing of big data and log collection.

Let me share with you a Github warehouse, which has more than 300 classic computer book PDFs compiled by Dabin, including C language, C++, Java, Python, front-end, database, operating system, computer network, data structure and algorithm, machine learning, programming life, etc. You can star it, next time you look for a book directly search on it, the warehouse is constantly being updated~

Github address

How to ensure high availability of message queues?

RabbitMQ: mirrored cluster mode

RabbitMQ is based on master-slave high availability. Rabbitmq has three modes: stand-alone mode, common cluster mode, and mirror cluster mode. The stand-alone mode is generally rarely used in the production environment. The ordinary cluster mode only improves the throughput of the system, allowing multiple nodes in the cluster to serve the read and write operations of a certain Queue. Then what really realizes the high availability of RabbitMQ is the mirror cluster mode.

The difference between the mirrored cluster mode and the normal cluster mode is that the created Queue, regardless of the metadata or the messages in the Queue, will exist on multiple instances, and then every time you write a message to the Queue, it will automatically synchronize with the Queue of multiple instances. The advantage of this design is that the downtime of any one machine does not affect the use of other machines. The disadvantages are: 1. Too much performance overhead: messages are synchronized to all machines, resulting in heavy network bandwidth pressure and consumption; 2. Poor scalability: If a Queue is heavily loaded, even if a machine is added, the new machine will contain all the data of the Queue, and there is no way to linearly expand your Queue.

Kafka: partition and replica mechanism

The basic architecture of Kafka is composed of multiple brokers, and each broker is a node. Creating a topic can be divided into multiple partitions, each partition can exist on different brokers, and each partition can store a part of the data, which is a natural distributed message queue. That is to say, the data of a topic is scattered on multiple machines, and each machine stores a part of the data.

Before Kafka 0.8, there was no HA mechanism. If any broker went down, its partitions could not be written or read. There was no high availability to speak of.

After Kafka 0.8, the HA mechanism is provided, which is the replica copy mechanism. The data of each partition will be synchronized to other machines to form its own multiple replica copies. Then all replicas will elect a leader, and both production and consumption will deal with this leader, and then other replicas will be followers. When writing, the leader will be responsible for synchronizing the data to all followers, and when reading, just read the data on the leader directly. Kafka will evenly distribute all replicas of a partition on different machines, so as to improve fault tolerance.

MQ common protocol

  • AMQP protocol AMQP is the Advanced Message Queuing Protocol, an application layer standard advanced message queuing protocol that provides unified message services. It is an open standard of the application layer protocol and is designed for message-oriented middleware. The client and message middleware based on this protocol can transfer messages, and it is not limited by different client/middleware products, different development languages ​​and other conditions.

    Pros: reliable, versatile

  • MQTT Protocol MQTT (Message Queuing Telemetry Transport) is an instant messaging protocol developed by IBM and may become an important part of the Internet of Things. The protocol supports all platforms and can connect almost all connected things with the outside world. It is used as a communication protocol for sensors and actuators (such as connecting houses through Twitter).

    Advantages: simple format, small bandwidth, mobile communication, PUSH, embedded system

  • STOMP protocol STOMP (Streaming Text Oriented Message Protocol) is a streaming text oriented message protocol, which is a simple text protocol designed for MOM (Message Oriented Middleware, message-oriented middleware). STOMP provides an interoperable connection format that allows clients to interact with any STOMP message broker (Broker).

    Advantages: command mode (non-topic/queue mode)

  • XMPP Protocol XMPP (Extensible Messaging and Presence Protocol) is a protocol based on Extensible Markup Language (XML), which is mostly used for instant messaging (IM) and online presence detection. Suitable for quasi-instant operations between servers. At its core, it's based on XML streaming, a protocol that may eventually allow Internet users to send instant messages to anyone else on the Internet, even if their operating system and browser are different.

    Advantages: general openness, strong compatibility, scalability, and high security, but the XML encoding format takes up a lot of bandwidth

  • Other TCP/IP-based custom protocols : Some special frameworks (such as: redis, kafka, zeroMq, etc.) do not strictly follow the MQ specification according to their own needs, but encapsulate a set of protocols based on TCP\IP, and transmit them through the network socket interface to realize the function of MQ.

MQ communication mode

  1. Point-to-point communication : Point-to-point communication is the most traditional and common communication method. It supports multiple configuration methods such as one-to-one, one-to-many, many-to-many, and many-to-one, and supports various topologies such as tree and mesh.
  2. Multicast : MQ is suitable for different types of applications. One of the most important and developing ones is the "multicast" application, which can send messages to multiple target sites (Destination List). A single MQ command can be used to send a single message to multiple target sites and ensure reliable delivery of information to each site. MQ not only provides the function of multicast, but also has an intelligent message distribution function. When sending a message to multiple users on the same system, MQ sends a copy of the message and the list of receivers on the system to the target MQ system. The target MQ system replicates these messages locally and sends them to queues on the list, minimizing network traffic.
  3. Publish/Subscribe (Publish/Subscribe) mode : The publish/subscribe function enables the distribution of messages to break through the limitation of the geographical point of the destination queue, so that the messages can be distributed according to specific topics or even content, and users or applications can receive the required messages according to the topic or content. The publish/subscribe function makes the coupling relationship between the sender and receiver looser. The sender does not need to care about the destination address of the receiver, and the receiver does not need to care about the sending address of the message, but only sends and receives the message according to the subject of the message. Among MQ family products, MQ Event Broker is a product specially used for data communication using publish/subscribe technology. It supports publishing and subscribing based on queues and directly based on TCP/IP.
  4. Cluster (Cluster) : In order to simplify the system configuration in the point-to-point communication mode, MQ provides a Cluster solution. A cluster is similar to a Domain. When the queue managers in the cluster communicate with each other, there is no need to establish a message channel between them. Instead, they use the Cluster channel to communicate with other members, which greatly simplifies the system configuration. In addition, the queue managers in the cluster can automatically perform load balancing. When a queue manager fails, other queue managers can take over its work, thereby greatly improving the high reliability of the system.

How to guarantee the sequence of messages?

RabbitMQ

Split multiple Queues, and each Queue has a Consumer; or just one Queue but corresponds to a Consumer, and then the Consumer uses a memory queue for queuing, and then distributes it to different underlying Workers for processing.

Kafka

  1. A Topic, a Partition, a Consumer, internal single-threaded consumption, single-threaded throughput is too low, generally do not use this.

  2. Write N memory Queues, and the data with the same key all go to the same memory Queue; then for N threads, each thread consumes a memory Queue respectively, so that the order can be guaranteed.

How to avoid repeated message consumption?

During message production, MQ internally generates a unique id for each message sent by the producer, as the basis for deduplication and idempotence (message delivery fails and retransmitted), to avoid repeated messages entering the queue.

When consuming messages, it is required to have a globally unique id in the message body as the basis for deduplication and idempotence, so as to avoid repeated consumption of the same message.

A large number of messages are backlogged in MQ for a long time, how to solve it?

Generally, at this time, the capacity can only be expanded temporarily and urgently. The specific operation steps and ideas are as follows:

  1. First fix the problem of the consumer to ensure its consumption speed is restored, and then stop all existing consumers;
  2. Create a new topic, the partition is 10 times the original, and the number of queues that are 10 times the original is temporarily established;
  3. Then write a temporary consumer program for distributing data. This program is deployed to consume the backlog of data. After consumption, it does not do time-consuming processing, and directly evenly polls and writes 10 times the number of temporarily established queues;
  4. Then temporarily use 10 times as many machines to deploy consumers, and each batch of consumers consumes data from a temporary queue. This approach is equivalent to temporarily expanding the queue resources and consumer resources by 10 times, and consuming data at a normal 10 times speed;
  5. After the backlog of data is quickly consumed, the original deployment architecture must be restored, and the original consumer machine must be used to consume messages again.

What should I do if the message in MQ expires?

If RabbitMQ is used, RabbtiMQ can set the expiration time (TTL). If the message is backlogged in Queue for more than a certain period of time, it will be cleared by RabbitMQ, and the data will be gone. The problem at this time is not that a large amount of data will be backlogged in MQ, but that a large amount of data will be lost directly. In this case, it is not to increase the backlog of consumer consumption news, because there is actually no backlog, but a lot of news is lost.

One solution we can take is batch redirection. That is, when there is a large backlog, the data is written directly to the database, and then after the peak period, the data is checked out bit by bit, and then re-filled into MQ to make up for the lost data.

How does message middleware achieve high availability?

Take Kafka as an example.

Kafka's basic cluster architecture brokerconsists of multiple nodes, each of which brokeris a node. When you create one topic, it can be divided into multiple partition, and each partitionpart of the data exists on different brokers. That is to say, the data of a topic is scattered on multiple machines, and each machine puts a part of the data.

Each partitionpart of the data is stored. If the corresponding broker hangs up, will this part of the data be lost? Doesn't that guarantee high availability?

After Kafka 0.8, a multi-copy mechanism is provided to ensure high availability, that is, the data of each partition will be synchronized to other machines to form multiple copies. Then all replicas will elect a leader, let the leader deal with production and consumers, and other replicas are followers. When writing data, the leader is responsible for synchronizing the data to all followers. When reading messages, just read the data on the leader directly. How to ensure high availability? It is assuming that a certain broker is down, and the partition on this broker has copies on other machines. What if the leader's broker is linked? Other followers will re-elect a leader.

How to ensure data consistency and how to implement transaction messages?

An ordinary MQ message, from generation to consumption, the approximate process is as follows:

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-TI0XlZCs-1690088390260)(http://img.topjavaer.cn/img/message consistency 1.png)]

  1. The producer generates a message and sends it to the MQ server
  2. After MQ receives the message, it persists the message to the storage system.
  3. The MQ server returns ACK to the producer.
  4. The MQ server pushes the message to the consumer
  5. The consumer consumes the message and responds with ACK
  6. After receiving the ACK, the MQ server considers that the message consumption is successful, and deletes the message in the storage.

Let's take the example of placing an order. After the order system creates the order, it sends a message to the downstream system. If the order is successfully created and the message is not sent successfully, the downstream system will not be able to perceive this matter, resulting in data inconsistency.
How to ensure data consistency? Transactional messages can be used . Let's take a look at how the transaction message is implemented.

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-ZpKrwgYi-1690088390261)(http://img.topjavaer.cn/img/message consistency 2.png)]

  1. The producer generates a message and sends a semi-transactional message to the MQ server
  2. After MQ receives the message, it persists the message to the storage system, and the status of this message is to be sent.
  3. The MQ server returns ACK to confirm to the producer, at this time MQ will not trigger the message push event
  4. Producers execute local transactions
  5. If the local transaction is successfully executed, the commit execution result is sent to the MQ server; if the execution fails, a rollback is sent.
  6. If it is a normal commit, the MQ server updates the message status to be sendable; if it is a rollback, it deletes the message.
  7. If the status of the message is updated to be sendable, the MQ server will push the message to the consumer. Consumers return ACK after consumption.
  8. If the MQ server has not received the producer's commit or rollback for a long time, it will check the producer back, and then execute the final state according to the query result.

How to design a message queue?

The first is the overall process of the message queue. The producer sends a message to the broker, the broker stores it, and the broker sends it to the consumer for consumption, and the consumer replies to confirm the consumption.

The producer sends a message to the broker, and the broker sends a message to the consumer for consumption. Then two RPCs are required. How to design RPC? You can refer to the open source framework Dubbo, you can talk about service discovery, serialization protocols, etc.

The broker considers how to persist, whether to put it in the file system or the database, will there be message accumulation, and how to deal with the message accumulation.

How to preserve the consumer relationship? Point-to-point or broadcast? How is the broadcasting relationship maintained? zk or config server

How to ensure the reliability of the message? If the message is repeated, how to deal with it idempotently?

How to design high availability of message queue? You can refer to Kafka's high availability guarantee mechanism. Multiple copies -> leader & follower -> broker hangs up and re-elects the leader to serve the outside world.

Message transaction characteristics, the same transaction as the local business, the local message is stored in the database; the message is delivered to the server, and the local is deleted; the scheduled task scans the local message database and compensates for the sending.

MQ has scalability and scalability. If there is a backlog of messages or insufficient resources, how to support rapid expansion and improve throughput? You can refer to the design concept of Kafka, broker -> topic -> partition, each partition puts a machine, and only a part of the data is stored. If the resources are not enough now, simply add a partition to the topic, then perform data migration and add machines, so that more data can be stored and higher throughput can be provided.

reference link

The difference between multithreaded asynchronous and MQ

  • CPU consumption . There may be CPU competition in multi-thread asynchronous, but MQ will not consume the CPU of the machine.
  • The MQ way to achieve asynchrony is completely decoupled and suitable for large-scale Internet projects.
  • Peak clipping or message accumulation capabilities . When the business system is in high concurrency, MQ can accumulate messages in the Broker instance, and multi-threading will create a large number of threads, and even trigger the rejection strategy.
  • The use of MQ introduces middleware, which increases project complexity and difficulty in operation and maintenance.

In general, small-scale projects can use multithreading to achieve asynchrony, and large projects are recommended to use MQ to achieve asynchrony.


Finally, I would like to share a Github warehouse with more than 300 classic computer book PDFs compiled by Dabin, including C language, C++, Java, Python, front-end, database, operating system, computer network, data structure and algorithm, machine learning, programming life, etc. You can star it .

Github address : https://github.com/Tyson0314/java-books

Guess you like

Origin blog.csdn.net/Tyson0314/article/details/131878508