The data backlog optimization process of the ActiveMQ message queue in the production environment

1 Overview

Recently, a large amount of data backlog has occurred in the message notification queue of the production environment, which affects the transactions of the entire platform merchants cannot proceed normally. Finally, the problem of message queue backlog can only be alleviated by temporarily closing merchants with a large transaction volume. Analysis, our message queue cannot quickly consume and process the data in the queue in the face of sudden transaction peaks. Considering that various transaction volume emergencies will occur in the future, the following is for the message queue ( ActiveMQ ) optimization process.

 

2 Message Queue Communication Diagram

images/ZDmZJZFzGazb7tMxMS3FQG7Yhx56EEWZ.png

3 Problem location and analysis

 

3.1 Why is the message notification data backlogged?

Analysis: The occurrence of each transaction in the platform may generate one or more pieces of message notification data. These notification data will be transferred and processed through the message queue (ActiveMQ), so in the case of a sudden surge in transaction volume, a large number of messages will be generated. If the consumption capacity of the message queue (ActiveMQ) is blocked, it will seriously affect the data throughput, so that the backlog of large amounts of data cannot be processed quickly!

 

3.2 Why is the data backlog still unable to be alleviated when multiple ActiveMQ consumers are configured?

Analysis: After analyzing the code of the data consumption processing module of the message queue, the consumption processing of the message is received by the listener SessionAwareMessageListener asynchronously calling back the onMessage method, but the synchronized synchronization lock is added to the callback method onMessage, the problem is here, Because the entire onMessage method is locked, the program can only process data serially (only one piece of data can be consumed at a time), but cannot process data concurrently through multiple threads, thus affecting the data consumption capability of the entire queue.

public synchronized void onMessage(Message message, Session session)

3.3 Will removing the synchronized synchronization lock cause the security problem of multi-threaded concurrency?

Analysis: First of all, the data processed concurrently by multiple consumers is different, and the shared variables are not used when multiple consumer threads concurrently call back the onMessage method, all of which are in the method stacks of their respective threads, so theoretically they will not appear. Security issues arising from multi-threaded concurrency.

 

3.4 Will the message be consumed multiple times?

analyze:

(1) By analyzing the source code of ActiveMQ's consumer message receiving and processing, it is found that whether a message has been consumed is guaranteed by the ack confirmation mechanism. If the message is received by asynchronous callback, it will be immediately after the onMessage callback function returns. After confirming the submission with ack, as long as the onMessage function does not throw an exception and needs to catch the exception internally, the message will not be repeated.

(2) Because our system will first store the message in the db for persistence after receiving the message, and each message has a unique constraint when it is stored in the database, so even if there are duplicate messages, it will not be normal. deal with.

 

4 stage one optimization scheme

 

4.1 Prepare test data

Start multiple threads to send data to the MQ message queue, send a total of 15,000 messages, and then start the consumer module to consume messages, set the processing time of each message to 10ms, and configure the number of ActiveMQ consumers as concurrency = 5-100

 

4.2 Performance test before optimization

queuePrefetch

 Number of tests Whether to process concurrently Number of messages consumers Time-consuming
     1          no     15000         1000         15 151s
     2          no     15000         1000         16 151s
     3          no     15000         1000         15 151s

Through the test data before optimization, it is found that although concurrency = 5-100 (consumer dynamic scaling) is configured, only 15 consumers are busy, and the messages are all executed serially. It takes 151s for 15,000 messages. The efficiency is very poor, ps: Haha, I don't know which developer added the synchronization lock!

Note: queuePrefetch is the number of MQ consumers pulled from the Queue at one time, the default is 1000, and consumers is the number of consumers processing messages

 

4.3 Performance test after optimization

 

4.3.1 Cancel synchronization lock

Cancel the synchronized synchronization lock on the listener's callback method onMessage

 

4.3.2 Performance test after canceling the synchronization lock

queuePrefetch

 Number of tests Whether to process concurrently Number of messages consumers Time-consuming
     1          Yes     15000         1000         14  13s
     2          Yes     15000         1000         15  13s
     3          Yes     15000         1000         15  13s

Through the above data, it is found that 15,000 messages can be processed in only 13s if the synchronization lock is cancelled, which is nearly 12 times faster than before. Although the speed has improved a lot, it is found that there are only 15 consumers configured with 5-100 consumers. Consumers are busy, other consumers have no messages to process, and the data is skewed, so the next step is to optimize the queuePrefetch parameter.

 

4.3.3 Optimize the queuePrefetch parameter of ActiveMQ

The number of prefetched messages is one of the important tuning parameters in MQ. In order to improve the transmission efficiency of the network, ActiveMQ pushes 1000 messages in batches to the Consumer by default, which can be known from the DEFAULT_QUEUE_PREFETCH field of the ActiveMQPrefetchPolicy class in the ActiveMQ source code. Considering our The consumption processing of notification messages involves database operations and comprehensive network transmission efficiency. Here, the value of queuePrefetch is set to 100, which needs to be configured to the connection address of ActiveMQ, such as:

tcp://localhost:61616?jms.prefetchPolicy.queuePrefetch=100

4.3.4 Performance test after optimizing queuePrefetch parameters

queuePrefetch

 Number of tests Whether to process concurrently Number of messages consumers Time-consuming
     1          Yes     15000          100         40  7s
     2          Yes     15000          100         47  5s
     3          Yes     15000          100         41  6s

Change the queuePrefetch parameter of ActiveMQ to 100, then it is found that nearly half of the consumers are processing data, and the last 15,000 messages can be processed in 6s.

 

4.3.5 Conclusion

Through the optimized test results of the above two steps, it can be concluded that the consumption capacity of the queue is increased by nearly 11 times after the synchronization lock is cancelled. Combining the above two-step optimization processing, the overall consumption capacity of the queue has been increased by more than 30 times.

images/P6apYdBQDDR8mHN4y5if3JGdrTf5yFDh.png

 

5 Stage Two Optimization Scheme

The optimization scheme of stage two is the optimization process carried out on the basis of stage one.

 

5.1 Single Queue Processing

images/KNRAKFBzwpGDAwmJcak3YWeF3W7N8xW6.png

 

Since our message notification business is an idempotent operation, the notification processing will be repeated according to the set number of notifications until the notification is successful. The current practice of our system is to temporarily store the received MQ messages in the DelayQueue (DelayQueue) , and then take it out through multi-threaded rotation training, and then notify other modules through HTTP for processing. If the notification fails, it will be put back into the same delay queue for the next execution. queue.

Note: the insufficiency of single-queue processing

Due to the use of single-queue processing, messages that can be successfully notified at one time are mixed with messages that notify multiple failures. In this way, messages that fail to be notified in the queue will be blocked to subsequent messages that can be notified normally, and eventually lead to the overall message. A throughput drop

5.2 Double Queue Processing

images/5EFSdy2S7dRypxzThmjYr7WDM6SKjiiQ.png

 

In view of the shortcomings of 5.1 single queue, we can redesign and design single queue as double queue processing. The core idea of ​​double queue is that if the message notification in queue 1 fails, it will no longer be put into queue 1, but into the queue. 2 Go to notification, which can play the role of message data separation, and the data of failure notification will no longer affect the subsequent messages that can be successfully notified, thereby improving the overall performance of queue message notification!

 

6 Stage Three Optimization Scheme

 

6.1 Re-selection of MQ components

ActiveMQ is an old-fashioned message queue component, and its throughput performance is not very satisfactory. It is suitable for use in scenarios with small business volume. Now there are many mature, high-performance and high-throughput message queue components for us to choose from, such as: RabbitMQ, RocketMQ, Kafka, and ActiveMQ components can be replaced according to the actual situation.

 

7 Summary

In response to the data backlog problem of message queues, we have mainly done three optimizations, canceling synchronization locks, optimizing ActiveMQ parameters, and optimizing local double queues. Through these three aspects of optimization, the problem of queue data backlog is basically solved.

Article source: https://my.oschina.net/feinik/blog/1674168

Related content recommendation: http://www.roncoo.com/article/index?title=mq

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326025391&siteId=291194637