Ultimate User Experience: On the Performance Optimization of Batch Processing Interfaces

Errata

In the previous lecture, I wrote an article about the performance optimization process of batch import requests. Among them, I was wrong about the maximum number of connections written in the Elasticsearch source code. Some students left a message saying that it can be modified in HttpClientConfigCallback. It has been confirmed that it can indeed be modified. Please pay attention to it. At the same time, I am also very grateful to this classmate for your message.

Well, let's enter the content of this article, let's talk about the performance optimization of the batch processing interface.

background

Like batch import, there are a large number of batch processing interfaces in our system, such as batch acquisition of waybills, batch delivery, batch printing, etc. There are about 10 such interfaces.

These requests often have the following characteristics:

  1. It takes a long time to process a single piece of data, generally 200ms or more
  2. The data batch is large. For example, the largest page of our system is 1000 pieces of data, and the maximum batch that the user can choose is 1000.
  3. The overall time is long. For example, according to 200ms and 1000 pieces of data, it takes a total of 200s, which is too long.
  4. These individual pieces of data cannot be combined for processing

Therefore, it is necessary for us to perform uniform performance optimization on the batch processing interface.

But how to optimize it? (This article was first published on the public account Tongge reading the source code, welcome to pay attention)

How to optimize

We know that the performance of a single machine has an upper limit. Batch requests like this will occupy a lot of memory on the one hand, but also a high CPU. All processing in the same process will inevitably lead to further processing time. Rise, so, for this kind of batch request, the best way is to divide and conquer.

What is divide and conquer?

Divide and conquer is used in many scenarios, such as the batch import we mentioned in the previous article, which is generally divided into four parts:

  1. receive request
  2. distribution request
  3. process the request
  4. Aggregate requests

So, how to apply the divide and conquer idea in our batch process?

First of all, we need to change the large batch of requests into small requests one by one. The "change" here refers to our back-end to make changes, not the front-end calls to modify. The front-end still calls a large batch of requests.

Then, these small requests are distributed to multiple machines for processing through some mechanism, for example, using Kafka as the distributor.

Finally, the completion of each small request is counted. When all small requests are completed, the front end is notified that the entire request has been completed.

The notification here can go through the message module, and at the same time, after completing the transformation of the small request above, it can return to the front end, and wait until it is all completed before asynchronous notification.

Well, let's look directly at my architecture design diagram:

Overall, it's still quite complicated, let's go through each step:

  1. Receive requests, the front-end requests the back-end bulk interface
  2. Record the information of this batch processing request, such as the allocation request number, which user, which operation, how many in total, 0 successes, 0 failures, etc.
  3. The status of these data in the batch update database is xxx processing, and the original status is recorded. Here, the batch update of mysql is used, which is very fast.
  4. Send a large batch of data to Kafka one by one, and Kafka assumes the role of the distributor
  5. Return a response to the front end, indicating that the request has been received and is being processed, so the result of the query on the interface is that these documents are being processed by xxx
  6. Multiple service instances pull messages from Kafka for consumption
  7. Process each piece of data, such as checking permissions, parameters, processing complex business logic, etc., and writing the results of mysql processing
  8. Record the processing result of each piece of data into redis, such as the number of successful entries + 1, the number of failed entries + 1, etc.
  9. When it is detected that all data has been processed, that is, the total number of records = the number of successful records + the number of failed records, a message will be sent to the message service
  10. The message service sends a new notification to the front end: the XXX operation you just performed has been completed, with X successes and X failures
  11. After the front end receives this notification, check if it is still on this interface, automatically refresh the next page, etc., you can do some very friendly interactions

This is the processing process of the overall batch request, how about it, is it acceptable? (This article was first published on the public account Tongge reading the source code, welcome to pay attention)

In addition, because there are too many batch processing interfaces in our system, if each interface is implemented in this way, there will be a lot of repetitive code.

Therefore, we can make a general batch interface and implement it in the form of configuration metadata. The format of the metadata is: {action: xx operation, targetStatus: xx processing}, so that the process of processing messages in the middle cannot be reused. , other parts can be reused.

Okay, let's review what scenarios this kind of show operation can be applied to?

Use the scene

  1. It takes a long time to process a single piece of data, and it is unnecessary if the processing time of a single piece of data is very short.
  2. The data batch is large, if the batch is not large at a time, it is not necessary
  3. The overall time-consuming is long, the superposition of the above two factors, if the overall time-consuming is not long, it is not necessary
  4. In the scenario where the database cannot be updated in batches, it is unnecessary if the database can be updated in batches

Finally, let's see what other improvements are there?

improvement measures

I think there are two main improvements:

  1. It is not necessarily that each request is in large batches. For example, if the amount of data requested at a time is less than 10, is it faster for the machine to process it directly?
  2. Not every batch scenario needs to be optimized, see the unnecessary scenario using scenario analysis above

Ok, that's it for today's article, I want to ask, do you have such a batch processing scenario in your system? How did you optimize it? Welcome to leave a message to share, learn and progress together.

{{o.name}}
{{m.name}}

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324147143&siteId=291194637