I heard you want to enter a big factory? First, take 11 questions about MQ's life-threatening series!

Following the previous mysql life-death series, I found that my title was used a lot, what life-death zookeeper, a lot of life-death multi-threading, this time, I started the interview question series MQ topic, message queue as a common daily use middle The interview is also one of the points that must be asked. Let’s take a look at the MQ interview questions.

Why do you use mq? What is the specific usage scenario?

The role of mq is very simple, cutting peaks and filling valleys. Taking the scenario of placing an order in an e-commerce transaction, the process of a forward transaction may involve creating an order, deducting inventory, deducting activity budget, deducting points, and so on. If the time-consuming of each interface is 100ms, then theoretically the entire link for placing an order will take 400ms, which is obviously too long.

If these operations are all processed synchronously, firstly, the calling link is too long to affect the performance of the interface, and secondly, the problem of distributed transactions is difficult to deal with. At this time, the requirements for real-time consistency such as budget deduction and points are not so high. It can be processed asynchronously by mq. At the same time, considering the inconsistency caused by asynchrony, we can use job to retry to ensure that the interface call is successful, and generally companies will have a verification platform, such as the problem of successful order but no points deduction can be verified through verification A solution to the bottom.

After using mq, our link has become simpler, and at the same time, the stress resistance of our entire system for sending messages asynchronously has also increased.

What mq do you use? Based on what kind of selection?

We mainly investigated several mainstream mqs, kafka, rabbitmq, rocketmq, and activemq. The selection is mainly based on the following points:

  1. Since the qps pressure of our system is relatively large, performance is the primary consideration.

  2. Development language, since our development language is java, it is mainly for the convenience of secondary development.

  3. It is necessary for high-concurrency business scenarios, so it is necessary to support the design of distributed architecture.

  4. The functions are comprehensive. Due to different business scenarios, sequential messages, transaction messages, etc. may be used.

Based on the above considerations, we finally chose RocketMQ.


Kafka RocketMQ RabbitMQ ActiveMQ
Single machine throughput 100,000 level 100,000 level Ten thousand Ten thousand
Development language Scala Java Erlang Java
High availability Distributed architecture Distributed architecture Master-slave architecture Master-slave architecture
performance ms level ms level us class ms level
Features Only supports the main MQ function Complete functions such as sequence message and transaction message Strong concurrency, good performance, low latency Mature community products, rich documentation

You mentioned asynchronous sending above. How can the reliability of the message be guaranteed?

Message loss may occur in three aspects: the producer sends a message, the MQ itself loses the message, and the consumer loses the message.

Producer lost

The possible point for the producer to lose the message is that the program fails to send and throws an exception without retrying, or the sending process is successful but the network flash MQ does not receive it during the process, and the message is lost.

Since synchronous sending generally does not use this way, we do not consider the problem of synchronous sending, we are based on the asynchronous sending scenario.

Asynchronous sending is divided into two methods: asynchronous with callback and asynchronous without callback , no callback method. After the producer sends it, regardless of the result, the message may be lost. We use the form of asynchronous sending + callback notification + local message table. A solution can be made. Examples of scenarios for the following orders.

  1. After placing the order, save the local data and the MQ message table. At this time, the status of the message is sending. If the local transaction fails, the order fails and the transaction rolls back.

  2. If the order is successful, it will directly return to the client successfully and send the MQ message asynchronously

  3. MQ callback notification message sending result, corresponding to update database MQ sending status

  4. JOB polling exceeds a certain time (time according to business configuration) and has not sent a successful message to retry

  5. Configure the monitoring platform or JOB program to process unsuccessful messages, alarms, and manual intervention for more than a certain number of times.

Generally speaking, for most scenarios, the form of asynchronous callback is sufficient, and we will make a complete solution only for scenarios that need to be completely guaranteed not to lose messages.

MQ is missing

If the producer guarantees that the message is sent to the MQ, and the MQ is still in the memory after receiving the message, the machine is down and there is no time to synchronize to the slave node, which may cause the message to be lost.

For example, RocketMQ:

RocketMQ is divided into two methods: synchronous flashing and asynchronous flashing. The default is asynchronous flashing, which may cause the message to be lost before flashing to the hard disk. You can set the method of synchronous flashing to ensure message reliability. , In this way, even if MQ is hung up, the message can be restored from the disk when it is restored.

For example, Kafka can also be configured:

acks=all 只有参与复制的所有节点全部收到消息,才返回生产者成功。这样的话除非所有的节点都挂了,消息才会丢失。
replication.factor=N,设置大于1的数,这会要求每个partion至少有2个副本
min.insync.replicas=N,设置大于1的数,这会要求leader至少感知到一个follower还保持着连接
retries=N,设置一个非常大的值,让生产者发送失败一直重试

Although we can achieve the purpose of high availability of MQ itself through configuration, all have performance losses. How to configure needs to be weighed according to the business.

Consumer lost

The scenario where the consumer loses the message: The consumer just received the message, and the server is down at this time. MQ thinks that the consumer has already consumed it and will not send the message repeatedly, and the message is lost.

By default, RocketMQ requires consumers to reply to ack confirmation, while Kafka needs to manually enable the configuration and disable the automatic offset.

The consumer does not return an ack confirmation. The retransmission mechanism varies according to the different transmission time intervals and times of the MQ type. If the number of retries exceeds the number of times, it will enter the dead letter queue and need to be processed manually. (Kafka does not have these)

You talked about the problem of consumer consumption failure. What if the consumption failure has caused a backlog of messages?

Because of the problem that consumers always make mistakes in consumption, we can consider from the following perspectives:

  1. The consumer's error must be caused by a program or other problem. If it is easy to fix, fix the problem first and let the consumer resume normal consumption

  2. If time is too late to deal with it is troublesome, do the forwarding process, write a temporary consumer consumption plan, first consume the message, and then forward it to a new topic and MQ resource. The machine resource of this new topic is applied for separately and must be able to carry it. Current backlog of news

  3. After processing the backlog data, repair the consumer to consume new MQ and existing MQ data, and restore the original state after the new MQ consumption is completed

What if the message backlog reaches the disk limit and the messages are deleted?

This. . . What the hell is there to delete me. . . Calm down, think again. . Yes.

Initially, the message record we sent was saved in the database, and the forwarded data was also saved. Then we can use this part of the data to find the missing part of the data, and then run a separate script to resend it. If the forwarded program is not in the library, compare it with the consumer's record, but the process will be a bit more difficult.

Having said so much, can you talk about the implementation principle of RocketMQ?

RocketMQ is composed of NameServer registry cluster, Producer producer cluster, Consumer consumer cluster and several brokers (RocketMQ processes). Its architecture principle is as follows:

  1. Broker registers with all NameServers when it starts, and maintains long connections, sending a heartbeat every 30s

  2. Producer obtains the Broker server address from NameServer when sending a message, and selects a server to send the message according to the load balancing algorithm

  3. When Conusmer consumes messages, it also obtains the Broker address from the NameServer, and then actively pulls the message to consume

Why doesn't RocketMQ use Zookeeper as the registration center?

I think there are several reasons for not using zookeeper:

  1. According to CAP theory, at most two points can be met at the same time, while zookeeper satisfies CP, which means that zookeeper does not guarantee the availability of services. When zookeeper conducts elections, the entire election takes too long and the entire cluster is in Unavailable status, which is definitely unacceptable for a registry. As a service discovery, it should be designed for availability.

  2. Based on performance considerations, the implementation of NameServer itself is very lightweight, and it can be scaled horizontally by adding machines to increase the stress resistance of the cluster. The writing of zookeeper is not scalable, and zookeeper can only solve this problem by dividing the field. , To divide multiple zookeeper clusters to solve, firstly it is too complicated to operate, and secondly, it violates the design of A in CAP, resulting in disconnection between services.

  3. The problem caused by the persistence mechanism, ZooKeeper's ZAB protocol will keep writing a transaction log on each ZooKeeper node for each write request, and at the same time, add a regular snapshot of memory data to disk to ensure data For a simple service discovery scenario, this is not really necessary. This implementation scheme is too heavy. And the data stored by itself should be highly customized.

  4. Message transmission should rely weakly on the registry, and RocketMQ’s design concept is based on this. The producer obtains the Broker address from the NameServer when sending the message for the first time and caches it locally. If the entire cluster of the NameServer is unavailable, a short time It will not have much impact on producers and consumers.

How does Broker save data?

The main storage files of RocketMQ include commitlog files, consumequeue files, and indexfile files.

After the broker receives the message, it will save the message to the commitlog file, and at the same time in the distributed storage, each broker will save a part of the topic data, and at the same time, the consumequeue file will be generated under the messagequeue corresponding to each topic. To save the physical location offset of the commitlog, the corresponding relationship between key and offset will be saved in the indexfile.

The CommitLog file is saved in the ${Rocket_Home}/store/commitlog directory. From the figure, we can clearly see the offset of the file name. Each file defaults to 1G, and a new file is automatically generated when it is full.

Since the messages of the same topic are not continuously stored in the commitlog, it is very inefficient for consumers to obtain messages directly from the commitlog, so the physical address of the offset of the message in the commitlog is saved through the consumequeue, so that the consumer first consumes Locate the specific commitlog physical file according to the offset from the consumequeue, and then quickly locate it in the commitlog according to certain rules (offset and file size modulo).

How to synchronize data between Master and Slave?

The synchronization of messages between master and slave is carried out according to the raft protocol:

  1. After the broker receives the message, it will be marked as uncommitted

  2. Then the message will be sent to all slaves

  3. The slave returns an ack response to the master after receiving the message

  4. After receiving more than half of the ack, the master marks the message as committed

  5. Send a committed message to all slaves, and the slave also changes the status to committed

Do you know why RocketMQ is fast?

It is because of the use of sequential storage, Page Cache and asynchronous flashing.

  1. When we write commitlog, we write sequentially, so the performance will be much better than random write

  2. When writing the commitlog, it is not written directly to the disk, but first written to the PageCache of the operating system

  3. Finally, the operating system asynchronously flushes the data in the cache to disk

What are transactional and semi-transactional messages? How did it happen?

Transaction messages are the distributed transaction capabilities similar to XA provided by MQ. The final consistency of distributed transactions can be achieved through transaction messages.

A semi-transactional message is a message that MQ has received a message from the producer, but has not received a second confirmation and cannot be delivered.

The realization principle is as follows:

  1. The producer first sends a semi-transactional message to MQ

  2. MQ returns ack confirmation after receiving the message

  3. Producer starts executing local affairs

  4. If the transaction is executed successfully, send commit to MQ, if it fails, send rollback

  5. If MQ does not receive the producer's second confirmation commit or rollback for a long time, MQ initiates a message check back to the producer

  6. Producer queries the final status of transaction execution

  7. Submit a second confirmation according to the query transaction status

Finally, if MQ receives the second confirmation commit, it can deliver the message to the consumer, otherwise, if it is a rollback, the message will be saved and deleted after 3 days.

< END >

Friends who like this article, welcome to follow the official account  programmer Xiaohui , and watch more exciting content

点个[在看],是对小灰最大的支持!

Guess you like

Origin blog.csdn.net/bjweimengshu/article/details/110412272