[MQ] Introduction to message queues and solutions to common problems

MQ

The basic concept of MQ

The full name of MQ is Message Queue (message queue), which is actually a container for storing messages during the transmission of messages. It is mostly used for communication between distributed systems.

Two communication methods of distributed systems : direct call and indirect completion with the help of a third party

The sender becomes the producer and the receiver is called the consumer
insert image description here
Advantages and Disadvantages of MQ

Advantage:

  1. Application Decoupling: Improving Fault Tolerance and Maintainability
  2. Asynchronous Speed ​​Up: Improve user experience and system throughput
  3. Shaving peaks and filling valleys: Improve system stability. For example, my usual service can only support tens of thousands of qps, such as Taobao Jingdong's spike, when the service suddenly comes in, the service will be directly crushed to death. But if the message queue is used, all the requests coming in from this seckill will not be directly sent to the specific service, but will be sent to the message queue first, and then the services behind me will be consumed slowly.

Disadvantages:

  1. Reduced system availability: how to ensure high availability?
  2. Increased system complexity: How to ensure that messages are not consumed repeatedly? How to deal with message loss? How to ensure the order of message delivery?
  3. Consistency problem: How to ensure the consistency of message data?

Frequently Asked Questions about MQ

  1. How does mq avoid the problem of message accumulation.

    Message accumulation: The production rate of the producer is much greater than the consumption rate of the consumer, so that a large number of messages are accumulated in the message queue.

    solution:

    1. Increase the consumption rate of consumers (increase consumer clusters)
    2. Consumers are processed in batches and multi-threaded
    3. Current limiting, to ensure that all messages entering the message queue are useful messages
  2. How to avoid the double consumption problem

    **cause: **

    1. The producer produces two identical messages.
    2. Consumer consumes a message multiple times

    Message retry : Message retry generally occurs when a consumer has an exception (network fluctuation or system suspended animation), at this time the consumer will notify the producer to resend. This will lead to the problem of repeated consumption.

    Commonly used idempotent solutions (distributed locks) can be used, and global id + business scenarios ensure uniqueness. All duplicate submission problems can be solved with idempotence.

    To be on the safe side, you can also make a unique index on the database.

  3. How to ensure that messages are not lost

    1. Message confirmation mechanism, the producer must confirm that the message is successfully flashed to the hard disk before confirming that the message is sent successfully.

      The acks parameter specifies how many partition replicas must receive the message before the producer considers the message write to be successful. This parameter has a significant impact on the probability of message loss. This parameter has the following options.

      • If acks=0 , the producer will not wait for any response from the server before successfully writing the message. That is to say, if there is a problem and the server does not receive the message, the producer will have no way of knowing and the message will be lost. However, because the producer does not need to wait for a response from the server, it can send messages as fast as the network can support, resulting in high throughput.

      • If acks=1 , the producer will receive a success response from the server as long as the leader node of the cluster receives the message. If the message cannot reach the leader node (for example, the leader node crashes and a new leader has not been elected), the producer will receive an error response, and the producer will resend the message to avoid data loss. However, if a node that has not received the message becomes the new Leader, the message will still be lost. The throughput at this time depends on whether synchronous or asynchronous transmission is used. If you let the sending client wait for the server's response (by calling the Future object's get() method), you will obviously increase the delay (one round trip delay on the network). If the client uses callbacks, the latency problem can be alleviated, but the throughput will still be limited by the number of messages being sent (for example, how many messages the producer can send before receiving a response from the server).

      • If acks=-1 (or all) , the producer will receive a success response from the server only when all nodes participating in the replication have received the message. This mode is the safest. It can ensure that more than one server receives the message. Even if a server crashes, the entire cluster can still run. However, it has a higher latency than acks=1 because it waits for more than one server node to receive the message.

      Choose based on specific business scenarios.

    2. The message persistence mechanism, as mq middleware, will persist the message to the hard disk, because the data in the memory will be lost when the power is turned off

    3. The consumer must confirm that the message consumption is successful, otherwise it will retry, and after a certain number of retries, the developer will be notified to take compensation measures.

    4. Turn off the automatic submission. After the consumer completes the consumption, submit it manually to prevent the automatic submission of mq. When the consumer receives the message, mq thinks that the consumer has already consumed, but if the consumer hangs up at this time, the message will not be consumed.

  4. How to ensure the sequential consistency of consumption

    Most projects do not need to ensure sequential consistency, and some special scenarios must ensure sequential consistency. For example, mq is used to ensure data consistency between redis and mysql.

    Bind to the same consumption queue. When consuming, pay attention to avoid recreating the list if multi-threaded processing is used, and modify the original list.

  5. How mq is used to ensure data consistency between redis and database

    1. After executing update, send mq to notify consumers to update redis data

    Advantages: decoupling, improved interface response speed, and corresponding compensation strategies

    Disadvantages: relatively high latency

    1. Listen to the binlog log, combined with mq, to ​​update redis (canal implementation)

    Pros: more decoupled

    Cons: higher latency

    1. double deletion strategy
  6. How do consumers know that there is news in mq

    Two options:

    1. mq active notification (push)

      When there is a message in mq, the consumer will be notified to consume. This model has a fatal flaw, that is, slow consumption.

    2. Consumer polling (pull)

      Consumers go to poll to see if there is any news that they want to consume. This model also has disadvantages such as message delay and busyness.

      If the speed of the consumer is much slower than that of the sender, it will inevitably cause the accumulation of messages in MQ. Assuming that these messages are useful and cannot be discarded, the messages will always be saved on the mq side. Of course, this is not the most deadly thing. The most deadly thing is that mq pushes a bunch of unprocessable messages to consumers. Consumers either refuse or report errors, and then kick the ball back and forth.

      In contrast to the pull mode, consumers can consume on demand without worrying about harassing themselves with messages they cannot handle, and mq stacking messages will be relatively simple. There is no need to record the status of each message to be sent, and only need to maintain the queue and offset of all messages. Therefore, when the amount of messages is limited and the arrival speed is uneven, the pull mode is more suitable.

      Since the initiative is on the consumer side, the consumer side cannot accurately decide when to pull the latest news. If you get the message in one pull, you can continue to pull. If you don’t get the pull, you need to wait for a while and pull again.

      But how long to wait is hard to tell. Of course, it does not mean that there is no solution to delay. The more mature approach in the industry is to start with a short period of time (without too much burden on mq), and then wait for exponential growth. For example, start waiting for 5ms, then 10ms, then 20ms, then 40ms... until a message comes, and then go back to 5ms.

      Even so, there is still a delay problem: assuming that a 50ms message arrives between 40ms and 80ms, the message is delayed by 30ms, and for a message that comes once in half an hour, these overheads are wasted.

      In Ali's RocketMq, there is an optimization method - long polling, to balance the respective shortcomings of the push-pull model. The basic idea is: if the consumer fails to try to pull, instead of returning directly, it hangs the connection there and waits. If the server has a new message, it is a good idea to reuse the connection. However, the overhead of massive long-connection MQ on the system should not be underestimated, and the time interval should be evaluated reasonably.

  7. If MQ goes down, how will the producer deal with it.

    When the producer delivers a message to mq, it can record the message to be delivered (it can insert a piece of data in the database, and can also output the corresponding log record). Later, it can write a timed task to send unsuccessful messages to mq regularly.

  8. mq consumption strategy

    **Cluster consumption:** The same consumer cluster can only consume one message, but one message can be consumed by multiple consumer clusters.

    Broadcast consumption: Notify all nodes in the cluster to consume (scenario involving data sharding processing). Common hash can be used for scenarios that are not sensitive to data, and hash ring can be used for scenarios that are sensitive to data .

Conditions for using MQ

  1. Producers do not need feedback from consumers
  2. Tolerate brief inconsistencies
  3. MQ's decoupling asynchronous peak clipping is greater than the negative impact

Common MQ products

  1. RabbitMQ
  2. RocketMQ
  3. ActiveMQ
  4. Kafka
  5. ZeroMQ
  6. MetaMQ
  7. Redis can also

insert image description here

Guess you like

Origin blog.csdn.net/qq_51383106/article/details/131499479