22 common RocketMQ interview questions and answers

With the interview book in hand, it is no longer a problem to get the interview done. For the address of the series of articles, please click this link.

1. What is RocketMQ?

2. What is the role of RocketMQ?

3. The architecture of RoctetMQ

4. Advantages and disadvantages of RoctetMQ

8. How to implement message filtering?

9. Message deduplication. If multiple duplicate messages are delivered to the consumer due to network and other reasons, how do you deduplicate messages?

10. How does RocketMQ implement distributed transaction messages?

11. What is a half-message in distributed messages?

12. Availability of messages, how can RocketMQ guarantee the availability/reliability of messages? (Another way of asking this question: how to ensure that messages are not lost)

13. What is the flashing mechanism of RocketMQ?

14. How does RocketMQ achieve load balancing

15. What is a dead letter?

16. What is message idempotence?

17. What is push-pull message mode?

18. What should I do if any Broker suddenly goes down?

19. On which NameServer does the Broker register its own information?

20. Will the messages in RocketMQ Broker be deleted immediately after being consumed?

21. What is the storage and sending mechanism of messages?

22. What is the message storage structure of RocketMQ?


 The use of RocketMQ has developed rapidly in recent years, and it is often mentioned in the recruitment interviews of large enterprises.

1. What is RocketMQ?

        A pure Java message middleware with a distributed queue model, which was open-sourced by Alibaba in 2012. It has been donated to the Apache Software Foundation and became a top-level project of Apache on September 25, 2017 . As a domestic middleware that has experienced the baptism of "super engineering" such as Alibaba's Double Eleven many times and has stable and outstanding performance, it has the characteristics of high performance, low latency and high reliability. It is mainly used for decoupling, peak shaving, message distribution, etc. 


2. What is the role of RocketMQ?

1. Application decoupling
The higher the coupling of the system, the lower the fault tolerance. Taking an e-commerce application as an example, after a user creates an order, if the inventory system, logistics system, and payment system are coupled and called, if any subsystem fails or is temporarily unavailable due to upgrades, etc., it will cause an abnormal order operation and affect the user's use. experience. Using message queue decoupling, the coupling of the system will be improved. For example, if the logistics system breaks down, it takes a few minutes to repair. During this time, the data to be processed by the logistics system is cached in the message queue, and the user's order operation is completed normally. After the logistics system responds, it only needs to supplement and process the order messages stored in the message queue, and the terminal system will not be aware of the failure of the logistics system for a few minutes.

2. Traffic peak clipping
If the application system encounters a sudden surge in system request traffic, it may overwhelm the system. With the message queue, a large number of requests can be cached and processed over a long period of time, which can greatly improve system stability and user experience.

3. Data distribution
Through the message queue, data can be circulated among multiple systems. The data generator does not need to care about who will use the data, it only needs to send the data to the message queue, and the data user can directly obtain the data in the message queue

3. The architecture of RoctetMQ

Producer: sender of the message; example: sender
Consumer: message receiver; example: receiver
Broker: temporary storage and transmission of messages; example: post office
NameServer: management Broker; example: the management organization of each post office
Topic: distinguish the types of messages ;A sender can send messages to one or more Topics; a receiver of a message can subscribe to one or more Topic messages.
Message Queue: equivalent to a Topic partition; used to send and receive messages in parallel

4. Advantages and disadvantages of RoctetMQ

Advantages: decoupling, peak clipping, data distribution

Disadvantages include the following :

1. Reduced system availability

The more external dependencies the system introduces, the worse the system stability will be. Once MQ goes down, it will affect the business. How to ensure the high availability of MQ?

2. Increased system complexity

The addition of MQ has greatly increased the complexity of the system. In the past, there were synchronous remote calls between systems, but now asynchronous calls are made through MQ. How to ensure that messages are not consumed repeatedly? How to deal with message loss? So to ensure the order of message delivery?

3. Consistency issues

After processing the business, system A sends message data to three systems B, C, and D through MQ. If system B and system C process successfully, system D fails to process. How to ensure the consistency of message data processing?

5. Characteristics of RoctetMQ cluster

  • NameServer is an almost stateless node that can be deployed in a cluster without any information synchronization between nodes.
  • Broker deployment is relatively complicated. Broker is divided into Master and Slave. One Master can correspond to multiple Slaves, but one Slave can only correspond to one Master. The corresponding relationship between Master and Slave is defined by specifying the same BrokerName and different BrokerId. BrokerId is 0 Indicates Master, and non-zero indicates Slave. Master can also deploy multiple. Each Broker establishes a persistent connection with all nodes in the NameServer cluster, and regularly registers Topic information to all NameServers.
  • The Producer establishes a long connection with one of the nodes in the NameServer cluster (selected at random), periodically fetches Topic routing information from the NameServer, establishes a long connection to the Master that provides Topic services, and sends heartbeats to the Master regularly. Producer is completely stateless and can be deployed in clusters.
  • Consumer establishes a long-term connection with one of the nodes in the NameServer cluster (selected at random), periodically fetches Topic routing information from NameServer, establishes long-term connections to Master and Slave that provide Topic services, and sends heartbeats to Master and Slave regularly. Consumers can subscribe to messages from the Master or from the Slave, and the subscription rules are determined by the Broker configuration.

6. The mode of RoctetMQ cluster

1) Single Master mode

This method is risky. Once the Broker restarts or goes down, the entire service will be unavailable. It is not recommended to be used in an online environment, but can be used for local testing.

2) Multi-Master mode

A cluster has no slaves and is all masters, such as 2 masters or 3 masters. The advantages and disadvantages of this mode are as follows:

Advantages: simple configuration, single Master downtime or restart maintenance has no impact on the application, when the disk is configured as RAID10, even if the machine downtime is unrecoverable, because the RAID10 disk is very reliable, the message will not be lost (a small amount of asynchronous brushing is lost message, synchronously swiping the disk without losing one message), with the highest performance;
Disadvantage: When a single machine is down, unconsumed messages on this machine cannot be subscribed until the machine recovers, and the real-time performance of messages will be affected.
3) Multi-Master and multi-Slave mode (asynchronous)

Each Master configures a Slave, and there are multiple pairs of Master-Slave. HA adopts asynchronous replication mode, and the master and backup have a short message delay (millisecond level). The advantages and disadvantages of this mode are as follows:

Advantages: Even if the disk is damaged, the message loss is very small, and the real-time performance of the message will not be affected. At the same time, after the Master is down, the consumer can still consume from the Slave, and this process is transparent to the application, no manual intervention is required, and the performance is the same The Master mode is almost the same;
Disadvantages: The Master is down, and a small amount of messages will be lost in the case of disk damage.
4) Multi-Master and multi-Slave mode (synchronization)

Each Master is configured with a Slave, and there are multiple pairs of Master-Slave. HA adopts a synchronous double-write method, that is, only when both the master and the backup are successfully written, can the success be returned to the application. The advantages and disadvantages of this mode are as follows:

Advantages: There is no single point of failure for both data and services. When the Master is down, there is no delay in messages, and the service availability and data availability are very high. Disadvantages: The performance is slightly lower
than the asynchronous replication mode (about 10% lower), and a single message is sent The RT will be slightly higher, and in the current version, after the master node goes down, the standby machine cannot automatically switch to the master.

7. RoctetMQ sequential messages, how to ensure the sequence?

        The sequence of messages sent from the producer to the broker is FIFO, so the sending is sequential, and the messages in a single queue are sequential. Simultaneous consumption of multiple Queues cannot absolutely guarantee the order of messages. Therefore, for the same topic and the same queue, one thread sends a message when sending a message, and one thread consumes a message in a queue when consuming. RocketMQ provides us with the MessageQueueSelector interface, which can rewrite the interface inside and implement its own algorithm, such as judging that i%2==0, then send the message to queue1 or send it to queue2.

8. How to implement message filtering ?

        There are two solutions. One is to filter on the broker side according to the deduplication logic of the consumer. The advantage of this is to avoid the transmission of useless messages to the consumer side. The disadvantage is that it increases the burden on the broker and is relatively complicated to implement. The other is to filter on the consumer side, such as deduplication according to the tag set in the message. The advantage of this is that it is simple to implement, but the disadvantage is that a large number of useless
messages arrive at the consumer side and can only be discarded without processing.

9. Message deduplication. If multiple duplicate messages are delivered to the consumer due to network and other reasons, how do you deduplicate messages?

        Let’s first talk about the principle of idempotency of the message: that is, the result of multiple requests initiated by the user for the same operation is the same, and different results will not be produced because of multiple operations. As long as the idempotence is maintained, no matter how many messages come, the final processing result will be the same, which needs to be implemented by the consumer side. Deduplication solution: Because each message has a MessageId, each message is guaranteed to have a unique key, which can be the primary key or unique constraint of the database, or the key in the Redis cache. Before consuming a message, check Whether this unique key exists in the database or cache, if it exists, the message will no longer be processed. If the consumption is successful, it is necessary to ensure that the unique key is inserted into the deduplication table.

10. How does RocketMQ implement distributed transaction messages?

The above figure is the implementation process of distributed transaction messages, relying on half-message, secondary confirmation and message review mechanism.

1. The Producer sends a half message to the broker.
2. The Producer receives a response, and the message is sent successfully. At this time, the message is a half message, marked as "undeliverable", and the Consumer cannot consume it.
3. The Producer executes local transactions.
4. Under normal circumstances, the local transaction execution is completed, and the Producer sends a Commit/Rollback to the Broker. If it is a Commit, the Broker will mark the half message as a normal message, and the Consumer can consume it. If it is a Rollback, the Broker discards this message.
5. In abnormal circumstances, the Broker side has not been able to wait for the second confirmation. After a certain period of time, all half-messages will be queried, and then the execution status of the half-messages will be queried on the Producer side.
6. The producer side queries the status of the local transaction.
7. Submits commit/rollback to the broker side according to the status of the transaction. (5, 6, 7 are message review)

11. What is a half-message in distributed messages?

Semi-message: It refers to the message that cannot be consumed by the consumer for the time being. The message that the producer successfully sends to the broker end, but this message is marked as "temporarily undeliverable" status, only after the second confirmation after the producer end executes the local transaction , Consumer can consume this message.

12. Availability of messages, how can RocketMQ guarantee the availability/reliability of messages? (Another way of asking this question: how to ensure that messages are not lost)

Answer from three aspects: Producer, Consumer and Broker.

From the perspective of the Producer, how to ensure that the message is successfully sent to the Broker?

1. Synchronous sending can be used , that is, sending a piece of data and waiting for the receiver to return a response before sending the next data packet. If the response OK is returned, it means that the message was successfully sent to the broker, and the status timeout or failure will trigger a second retry.
2. The delivery method of distributed transaction messages can be adopted .
3. If a message times out after being sent, you can also check whether it is successfully stored in the Broker by querying the log API.

In general, Producer still uses synchronous sending to guarantee.

From the perspective of Broker, how to ensure message persistence?

1. As long as the message is persisted to the CommitLog (log file), even if the Broker is down, the unconsumed message can be recovered and consumed again.

2. Broker's disk flushing mechanism: synchronous disk flushing and asynchronous disk flushing , no matter which type of disk flushing can guarantee that the message will be stored in the pagecache (in memory), but synchronous disk flushing is more reliable, it is the data waiting for the Producer to send the message After persisting to the disk, the response is returned to the Producer.

3. Broker supports multi- master multi-slave synchronous double writing and multi-master multi-slave asynchronous replication mode. Messages are sent to the master host, but consumption can be consumed from the master or from the slave. The synchronous double-write mode can ensure that even if the Master goes down, the message must be backed up in the Slave, ensuring that the message will not be lost.

From the perspective of Consumer, how to ensure that messages are successfully consumed?

The Consumer itself maintains a persistent offset (corresponding to the min offset in the Message Queue), which is used to mark the subscript of the message that has been successfully consumed and successfully sent back to the Broker. If the consumer fails to consume, it will send back the status of consumption failure to the Broker, and will update its own offset if the response is successful. If the broker hangs up when sending back to the broker, the consumer will retry regularly. If the consumer and the broker hang up together, the message is still stored on the broker side, and the offset on the consumer side is also persistent. After restarting, continue to pull the offset before Messages are consumed.

13. What is the flashing mechanism of RocketMQ?

RocketMQ provides two flushing strategies: synchronous flushing and asynchronous flushing

Synchronous brush disk

After the message reaches the Broker's memory, it must be flushed to the commitLog log file to be considered successful, and then return to the Producer that the data has been sent successfully.

Asynchronous brushing

Asynchronous flushing means that after the message reaches the Broker memory, it will return to the Producer that the data has been sent successfully, and a thread will be awakened to persist the data to the CommitLog log file.

Advantages and disadvantages analysis

Synchronous flushing ensures that messages will not be lost, but the response time is about 10% longer than asynchronous flushing, which is suitable for scenarios that require high message reliability. The throughput of asynchronous disk brushing is relatively high, and the RT is small. However, if the broker is powered off, some data in the memory will be lost. It is suitable for scenarios with relatively high throughput requirements.

14. How does RocketMQ achieve load balancing

1. Producer load balancing

On the Producer side, when each instance sends a message, it will poll all message queues by default to send messages, so that the messages fall on different queues on average. And because the queue can be scattered in different brokers, the message is sent to different brokers, as shown in the following figure:

The labels on the arrow lines in the figure represent the sequence. The publisher will send the first message to Queue 0, then the second message to Queue 1, and so on.

2. Consumer load balancing

1 ) cluster mode

In the cluster consumption mode, each message only needs to be delivered to an instance under the Consumer Group that subscribes to this topic. RocketMQ uses active pulling to pull and consume messages. When pulling, it is necessary to specify which message queue to pull.

Whenever the number of instances changes, a load balancing of all instances will be triggered. At this time, queues will be evenly allocated to each instance according to the number of queues and the number of instances.

The default allocation algorithm is AllocateMessageQueueAveragely, as shown below:

There is another average algorithm, AllocateMessageQueueAveragelyByCircle, which also allocates each queue equally, but divides the queues in a circular manner, as shown in the following figure:

It should be noted that in the cluster mode, the queue is only allowed to allocate only one instance. This is because if multiple instances consume messages from a queue at the same time, which messages to pull are actively controlled by the consumer, which will result in the same message. It is consumed multiple times under different instances, so the algorithm is that one queue is only assigned to one consumer instance, and one consumer instance can be assigned to different queues at the same time.

By adding consumer instances to share the consumption of the queue, it can play a role in horizontally expanding the consumption capacity. When an instance goes offline, load balancing will be triggered again. At this time, the originally allocated queue will be allocated to other instances to continue consumption.

However, if the number of consumer instances is greater than the total number of message queues, the extra consumer instances will not be assigned to queues, and messages will not be consumed, and they will not be able to share the load. Therefore, it is necessary to control the total number of queues to be greater than or equal to the number of consumers.

2 ) broadcast mode

Since the broadcast mode requires that a message needs to be delivered to all consumer instances under a consumer group, there is no saying that the message is allocated for consumption.

In terms of implementation, one of the differences is that when consumers allocate queues, all consumers are allocated to all queues.

15. What is a dead letter?

When a message fails to be consumed for the first time, the message queue RocketMQ will automatically retry the message; after reaching the maximum number of retries, if the consumption still fails, it indicates that the consumer cannot consume the message correctly under normal circumstances. At this time, the message queue RocketMQ The message will not be discarded immediately, but will be sent to the special queue corresponding to the consumer.

In the message queue RocketMQ, messages that cannot be consumed under normal circumstances are called Dead-Letter Messages, and special queues that store dead-letter messages are called Dead-Letter Queues.

1. Dead letter feature

Dead letter messages have the following properties:

  1. It will no longer be consumed by consumers normally.
  1. The validity period is the same as normal messages, both are 3 days, and will be automatically deleted after 3 days. Therefore, please deal with the dead letter message in time within 3 days after it is generated.

A dead letter queue has the following characteristics :

  1. A dead letter queue corresponds to a Group ID, not to a single consumer instance.
  1. If a Group ID does not generate a dead letter message, the message queue RocketMQ will not create a corresponding dead letter queue for it.
  1. A dead letter queue contains all dead letter messages generated by the corresponding Group ID, no matter which Topic the message belongs to.

2. View dead letter information

Query the topic information of the dead letter queue in the console

Query dead letter messages according to the subject in the message interface

Choose to resend message

A message enters the dead letter queue, which means that some factors prevent consumers from consuming the message normally, so you usually need to handle it specially. After troubleshooting suspicious factors and solving the problem, you can resend the message on the RocketMQ console of the message queue, so that consumers can consume it again.

16. What is message idempotence?

After the message queue RocketMQ consumer receives the message, it is necessary to perform idempotent processing on the message according to the unique key in the business.

1. The necessity of message idempotency

In Internet applications, especially when the network is unstable, the messages of the message queue RocketMQ may be repeated. This repetition can be summarized as follows:

A. The message is repeated when sending

When a message has been successfully sent to the server and persisted, there is a sudden network disconnection or the client crashes, causing the server to fail to respond to the client. If the producer realizes that the message has failed to send and tries to send the message again, the consumer will receive two messages with the same content and the same Message ID.

B. The message is repeated during delivery

In the message consumption scenario, the message has been delivered to the consumer and the business processing has been completed. When the client responds to the server, the network is disconnected. In order to ensure that the message is consumed at least once, the message queue RocketMQ server will try to deliver the previously processed message again after the network is restored, and the consumer will subsequently receive two messages with the same content and the same Message ID.

C. Message duplication during load balancing (including but not limited to network jitter, Broker restart, and subscriber application restart)

When the Broker or client of the message queue RocketMQ restarts, expands or shrinks, Rebalance will be triggered, and consumers may receive duplicate messages at this time.

2. Processing method

Because Message ID may conflict (duplicate), it is not recommended to use Message ID as the basis for truly safe idempotent processing. The best way is to use the unique identifier of the business as the key basis for idempotent processing, and the unique identifier of the business can be set through the message Key:

Message message = new Message();

message.setKey("ORDERID_100");

SendResult sendResult = producer.send(message);

When the subscriber receives the message, it can perform idempotent processing according to the Key of the message:

consumer.subscribe("ons_test", "*", new MessageListener() {

    public Action consume(Message message, ConsumeContext context) {

        String key = message.getKey()

        // Do idempotent processing according to the key uniquely identified by the business

    }

});

17. What is push-pull message mode?

PULL : A pull-type consumer actively pulls message consumption from the broker. As long as the message is pulled, it will start the consumption process, which is called active consumption.

PUSH : A push consumer is a listener that needs to register messages, and the listener needs to be implemented by the user. When the message reaches the broker server, it will trigger the listener to pull the message, and then start the consumption process. But in fact, it still pulls messages from the broker, which is called passive consumption.

18. What should I do if any Broker suddenly goes down?

Broker master-slave architecture and multi-copy strategy. After the Master receives the message, it will send it to the Slave synchronously, so that there will be more than one copy of a message. If the Master is down, the messages in the slave will be available, which ensures the reliability and high availability of MQ. Moreover, Rocket MQ4.5.0 has supported the Dlegder mode since it was based on raft, and achieved real HA.

19. On which NameServer does the Broker register its own information?

This question is obviously cheating you, because Broker will register its own information with all NameServers, not just one, but each and all!

20. Will the messages in RocketMQ Broker be deleted immediately after being consumed?

No, each message will be persisted in the CommitLog. After each Consumer connects to the Broker, it will maintain the consumption progress information. When there is a message consumed, only the consumption progress of the current Consumer (the offset of the CommitLog) will be updated.

21. What is the storage and sending mechanism of messages?

Message storage : RocketMQ uses files to store messages, and disk operations need to be used properly, and its speed can match the transmission speed of uploading on the network. The current high-performance disk, the sequential write speed can reach 600MB/s, exceeding the transmission speed of the general network card. However, the random write speed of the disk is only about 100KB/s, which is 6000 times different from the sequential write performance! Because there is such a huge speed difference, a good message queuing system will be orders of magnitude faster than a normal message queuing system. RocketMQ messages are written sequentially , which ensures the speed of message storage .

Message sending : the process of sending a file message to the receiver. The corresponding process is as follows. The Linux operating system is divided into [user mode] and [kernel mode]. File operations and network operations need to involve switching between these two modes, and data processing is inevitable copy. A server sends the content of the local disk file to the client, generally divided into two steps:

1) read; read local file content;

2) write; send the read content through the network.

These two seemingly simple operations actually performed 4 data replications, namely:

  1. Copy data from disk to kernel-mode memory;
  1. Copy from kernel mode memory to user mode memory;
  1. Then copy from the user mode memory to the kernel mode memory of the network driver;
  1. Finally, it is copied from the kernel mode memory of the network driver to the network card for transmission.

Using the mmap method can save the memory copy to the user mode and improve the speed. This mechanism is implemented in Java through MappedByteBuffer

RocketMQ makes full use of the above features, which is the so-called " zero copy " technology, to improve the speed of message storage and network sending .

22. What is the message storage structure of RocketMQ?

The storage of RocketMQ messages is completed by the cooperation of ConsumeQueue and CommitLog. The real physical storage file of the message is CommitLog. ConsumeQueue is the logical queue of the message, similar to the index file of the database, which stores the address pointing to the physical storage. Each Message Queue under each Topic has a corresponding ConsumeQueue file.

  1. CommitLog: store the metadata of the message
  1. ConsumerQueue: Store the index of the message in the CommitLog
  1. IndexFile: Provides a method for querying messages by key or time interval for message query. This method of searching for messages through IndexFile does not affect the main process of sending and consuming messages

Guess you like

Origin blog.csdn.net/wanghaiping1993/article/details/131274523