RabbitMQ traps producer transaction submission after consumer

I encountered some problems while using RabbitMQ, mainly due to the fact that the producer transaction has not been fully submitted and the consumer code has started running. Let me share with you the step-by-step solution process. Students who have better solutions are welcome to comment and provide guidance.

question

There is a business where users obtain 30g and 130g of feed every day and report the data to the partner, and a reporting record will be saved at the same time.
One day, someone from the partner reported that he had obtained 30g of feed but had not received the reported data, so he quickly found the problem.
After checking, I found that the missing data accounted for 1% of the normal data.

Code explanation: We first save the user's feed record, and then decouple the queue. The consumer queries the feed record, finds out the cumulative number of feeds obtained by the user that day, and determines whether the number of feeds reaches 30 and 130.

1. Channel closure caused by exceeding TPS limit

2023-06-15 15:16:01.887 ERROR 70300 --- [.55.10.135:5672] o.s.a.r.c.CachingConnectionFactory       : Shutdown Signal: channel error; protocol method: #method<channel.close>(reply-code=530, reply-text=denied for too many requests, ErrorHelp[dce6ef75-0993-4361-8ec8-1ff1c951a887], class-id=60, method-id=40)
2023-06-15 15:16:01.893 ERROR 70300 --- [.55.10.135:5672] o.s.a.r.c.CachingConnectionFactory       : Shutdown Signal: channel error; protocol method: #method<channel.close>(reply-code=530, reply-text=denied for too many requests, ErrorHelp[ecf0f9e7-243e-49e0-b2db-5638cc8b2758], class-id=60, method-id=40)
2023-06-15 15:16:01.893 ERROR 70300 --- [.55.10.135:5672] o.s.a.r.c.CachingConnectionFactory       : Shutdown Signal: channel error; protocol method: #method<channel.close>(reply-code=530, reply-text=denied for too many requests, ErrorHelp[148ccb82-8d48-4833-bc6a-0fb534b42dc7], class-id=60, method-id=40)

There was a flash sale scenario before, and this error often occurred, reply-text=denied for too many requests, which literally translates to "denied too many requests." We bought Alibaba Cloud's RabbitMQ product, and asked Alibaba Cloud's technical staff at the time. The reply was that the request exceeded the TPS limit of the product, and it was necessary to spend money to increase TPS.

The error reported above was not the one found in this scenario, but something similar had happened before, so I suspected that the TPS might occasionally be exceeded, causing the channel to be closed, and then the message was lost. Then decided to use producer confirm to ensure that data will not be lost.
Refer to this, it is simpler to write: link

The entire message delivery path of rabbitmq is: producer—>exchange—>queue—>consumer.
If the producer fails to exchange, there will be a confirmCallback. If the exchange fails to queue, there will be a returnCallback. Here we use confirmCallback.

Define confirmCallback

@Slf4j
@Component
public class ConfirmCallbackService implements RabbitTemplate.ConfirmCallback {
    
    

    @Autowired
    IAlipayGameCenterService alipayGameCenterService;

    @Override
    public void confirm(CorrelationData correlationData, boolean ack, String cause) {
    
    
        if (ack) {
    
    
            //接收成功,不做处理
        } else {
    
    
            //接收失败
            if (correlationData != null){
    
    
                ReturnedMessage returnedMessage = correlationData.getReturned();
                if (returnedMessage != null){
    
    
                    String key = returnedMessage.getRoutingKey();
                    Message message = returnedMessage.getMessage();
                    if (message != null){
    
    
                        String body = new String(message.getBody(), StandardCharsets.UTF_8);
                        if (QueueName.GAMECENTER.equals(key)){
    
    
                            //执行消费者代码,此处省略部分业务代码
                            JSON.parseObject(body, ?.class);
                        }
                    }
                }
            }
        }
    }
}

Use confirmCallback when sending to queue

    @Autowired
    private ConfirmCallbackService confirmCallbackService;
    
    public void sendMessageConfirm(String key, Object object) {
    
    
        //设置回调
        masterRabbitTemplate.setConfirmCallback(confirmCallbackService);
        String uuid = IdUtil.randomUUID();
        CorrelationData correlationData = new CorrelationData(uuid);
        MessageProperties properties = new MessageProperties();
        properties.setMessageId(uuid);//消息唯一ID,用力防止幂等性
        Message message = new Message(JSONObject.toJSONString(object).getBytes(StandardCharsets.UTF_8), properties);
        ReturnedMessage returnedMessage = new ReturnedMessage(message, 0, null, null, key);
        correlationData.setReturned(returnedMessage);
        masterRabbitTemplate.convertAndSend(key, message, correlationData);
    }

It is more complicated to write in use, because in the confirmation callback it is found that the message body in CorrelationData is null, so I manually new ReturnedMessage in the producer and put it into CorrelationData. Those blogs can automatically get the message body without manually new ReturnedMessage. I don't know how.

After writing the test, deploy it online and run it for a while to see the effect.

2. Consumption has started before the producer transaction is submitted.

After running for a while, I found that there was no online message entering confirmCallback, but there were still users who obtained 30g feed but did not report data. I re-shared the code, but I didn’t see the problem. There is no other way, so I added a lot of logs to see which node has the problem.
Tucao: We have multiple distributed servers, but we don’t have a centralized log service. You have to look for logs one by one. I don’t have to, and I don’t want to add a log.

Producers
producer
and consumers
Insert image description here
add logs and deploy them online, and then check the data after running for a period of time.
Then it was found that the time when the producer saved the feed record was 12:00:04, and the time when the consumer found out the user's feed number was 12:00:03.568. The producer database insert must be executed before putting it into the queue, but the actual database time is later than the time when the consumer finds out the feed number.
It is very likely that after the insert is executed, the transaction has not yet been submitted. After the queue is sent for execution, the entire method ends, and then the transaction is submitted, and the data is actually dropped into the database. The consumer code is executed immediately after the queue is sent, and the producer transaction has not yet been submitted.
Insert image description here
Insert image description here

3. Execute the queue after the transaction is submitted

Since it is deduced that the transaction is too slow, it is better to control the transaction to be executed before the sending queue. There is a way to manually commit the transaction, and there is a way to perform subsequent operations after the transaction is committed. Reference
However, the TransactionSynchronizationAdapter inside is outdated.

We transform the code from the producer.
Insert image description here
If you want to copy the code, please go to the reference and copy it.

Normally, the problem should be solved. But in actual testing, there are many problems. Because we have multiple data sources, we use a @DSTransactional here, a multi-data source transaction annotation provided by MybatisPLUS. This transaction is separated from the native transaction management provided by spring. A series of spring transaction functions cannot be used under this annotation.

4. Determine whether the generator’s data is inserted in the consumer

Since it cannot be solved from a business perspective, we can only think of other ways. We thought of two options:

  1. Use a delay queue to delay the consumer, so that when the consumer code is executed, the producer's data has been stored in mysql.
  2. The consumer determines whether the producer's data is safely stored.

Because data loss only accounts for 1%, we chose the second option.
When the producer saves the feed record, he can get the primary key id of the record and pass the primary key id to the consumer through the queue. When the consumer queries the user's feed number, it can determine the query based on the latest feed record id and the record id passed in the queue. Is the latest data received? If the latest data is not found, wait 1 second and check again.

The idea is determined, and then the code is changed again.
Producers
Insert image description here
and consumers query the number of user feeds.
Insert image description here
After the test is completed, the system is deployed and put online. After running for a period of time, the data is queried to confirm that the problem has been successfully resolved.

Guess you like

Origin blog.csdn.net/a805727141/article/details/131227577