How to optimize the slow reading and writing of a large amount of table data (1) [cold and hot separation]

The content discussed today is the separation of hot and cold. Perhaps the concept is not unfamiliar, and the usage scenarios are also familiar. However, we still need to think carefully about the content of the lock. There are still many "pits" in our actual development. .

Business scenario one

I have experienced structural optimization related to the supply chain. At that time, there was an order function on the platform. The main table inside had tens of millions of data, and the amount of data reached hundreds of millions with the addition of related tables.
Such a huge amount of data makes the query order of the platform extremely slow. A query takes 20 to 30 seconds, and there will be downtime with a few more clicks. For example, when a salesperson makes multiple queries, the CPU of the database will immediately spike, and the server thread will not be able to drop.

At that time, we tried to optimize the table structure, business code, index, SQL statement and other methods to improve the response speed, but these methods treat the symptoms rather than the root cause, and the query speed is still very slow.

Considering that we have other high-priority requirements that need to be dealt with, we report to the business side: "If you don’t use this feature in the future, you don’t need it, just put it up for the time being." But after a period of time, the business side really I couldn't stand it anymore, so I told us to speak cruelly, and in desperation we gave in.

In the end, we decided to adopt a cost-effective solution to solve this problem simply and conveniently. When processing data, we divided the database into two libraries: cold storage and hot storage, infrequently used data stored in cold storage, and frequently used data exothermic storage.

After processing in this way, because the salespersons are basically querying recently commonly used data, the amount of commonly used data is greatly reduced, there will be no downtime, and the database response speed has also been greatly improved.

In fact, the above method is "cold and hot separation".

1. What is cold and hot separation

The cold and hot separation is to divide the database into cold storage and hot storage when processing data. The cold storage refers to the database that stores the data that has reached the final state, and the hot storage refers to the database that stores the data that needs to be modified.

2. Under what circumstances are cold and hot separation used?

Assuming that the business requirements are as follows, you can consider using a cold and hot separation solution:

  • After the data reaches the final state, there are only read but not write requirements, such as the order completion state;
  • Users can accept separate query of new and old data. For example, some e-commerce websites only allow to query orders within 3 months by default. If you want to query orders 3 months ago, you also need to visit a separate page.

Three, the realization of cold and heat separation ideas

In the actual operation process, the overall realization idea of ​​cold and heat separation is as follows:

1. How to judge whether a piece of data is cold data or hot data?

2. How to trigger the separation of hot and cold data?

3. How to realize the separation of hot and cold data?

4. How to use hot and cold data?

Next, we conduct a detailed analysis of the above 4 questions.

(1) How to judge whether a piece of data is cold data or hot data?

Generally speaking, when judging whether a piece of data is cold data or hot data, we mainly use a combination of one or more fields in the main table as a distinguishing indicator. Among them, this field can be a time dimension, such as the "order time" field, we can regard the order data 3 months ago as cold data, and the data within 3 months as hot data.

Of course, this field can also be a status dimension, for example, according to the "order status" field to distinguish, completed orders as cold data, unfinished orders as hot data.

We can also use a combination of fields to distinguish. For example, we mark orders with an order time of> 3 months and the status "Completed" as cold data, and others as hot data.

In actual work, which field is ultimately used to judge, still needs to be determined according to your actual business.

Regarding the logic of judging hot and cold data, here are two more important points to note:

  • If a piece of data is marked as cold data, the business code will no longer write to it;
  • There will be no need to read cold/hot data at the same time.

(2) How to trigger the separation of hot and cold data?

After understanding the judgment logic of hot and cold data, we will begin to consider how to trigger the separation of hot and cold data. Generally speaking, there are three types of trigger logic for separation of cold and hot data.

1. Directly modify the business code, and trigger the separation of cold and hot every time the data is modified (for example, every time the status of the order is updated, this logic is triggered);

2. If you don't want to modify the original business code, you can trigger it by monitoring the binlog of the database change log (database triggers are also available);

3. Triggered by scanning data regularly (database timing task or program timing task to trigger);

For the above three trigger logics, which one is better? After reading the analysis in the following table, you will have the answer in your mind.

Modify the business code of the write operation Monitor database change log Scan the database regularly
advantage 1. The code is flexible and controllable. 2. Guarantee real-time 1. Decouple from business code. 2. Low latency can be achieved. 1. Decouple from business code. 2. It can cover scenes that distinguish hot and cold data based on time.
Disadvantage 1. It is not possible to distinguish hot and cold according to time. When the data becomes cold data, no operation may be performed during the period. 2. It is necessary to modify the code of all data write operations. 1. It is not possible to distinguish hot and cold according to time. When the data becomes cold data, no operation is performed during the period. 2. The problem of concurrent operation of data needs to be considered, that is, the business code and the hot and cold change code operate on the same data at the same time. 1. Can't achieve real-time

According to the comparison of the contents of the table, we can get the suggested scenarios for each starting logic.

  1. Modify the business code of the write operation : It is recommended to use it when the business code is relatively simple and does not distinguish between hot and cold data according to time.
  2. Monitor the database change log : It is recommended to use when the business code is complex, cannot be changed at will, and does not distinguish between hot and cold data according to time.
  3. Periodically scan the database : It is recommended to use when distinguishing hot and cold data according to time.

(3) How to separate hot and cold data?

The basic logic for separating hot and cold data is as follows:

1. Judge whether the data is cold or hot;

2. Insert the data to be separated into the cold data;

3. Delete the separated data from the hot database.

This logic seems simple, but when we actually make a plan, we have to take the following three points into consideration, which is not simple.

(1) Consistency: How to ensure data consistency after modifying a database at the same time

​ The consistency requirements mentioned here refer to how we can ensure that the data is consistent after any step error occurs. The solution is to ensure that each step can be retried and the operation is idempotent. The specific logic is divided into four steps.

  • In the hot database, add a flag to the data to be moved: flag=1. (1 represents cold data, 0 represents hot data)

  • Find out all the data to be moved (flag=1): This step is to ensure that some threads fail due to some reasons, and some data to be moved are not moved.

  • Save a copy of data in the cold database, but need to add a judgment in the preservation logic to ensure idempotence (here you need to surround it with transactions). In layman's terms, if the data we save already exists in the cold database, Make sure that this logic can continue.

  • Delete the corresponding data from the hot database.

(2) Large amount of data: Assuming that the amount of data is large and cannot be processed at one time, what should I do? Do I need to use batch processing?

​ The three types of trigger logic that separate cold and heat mentioned above, the first two types basically do not have the problem of large amount of data, because each time you only need to manipulate the data that changes at that moment, but if you use the logic of timing scanning, you need to consider the data. Measure this problem.

​ This implementation logic is also very simple, we can add a batch logic where the data is moved. To facilitate understanding, let's look at an example.

​ Suppose we can move 50 pieces of data each time:

​ a. Add a mark to the data to be moved in the hot database: flag=1;

​ b. Find the first 50 pieces of data to be moved (flag=1);

​ c. Save a copy of data in the cold database;

​ d. Delete the corresponding data from the hot database;

​ e. Execute b in a loop.

(3) Concurrency: Assuming that the amount of data is so large that it needs to be distributed to multiple locations for parallel processing, what should we do?

​ In a scenario where hot and cold data is transferred regularly (such as every day), if the amount of data processed every day is so large that it is too late for single-threaded batch processing, what should we do? At this time, we can open multiple threads for concurrent processing. (Although multi-threading is faster in most cases, I have encountered this situation: when the single-threaded batch size reaches a certain value, the efficiency is particularly high, faster than any multi-threaded batch size. So, you need to pay attention: if you encounter When the speed of multi-threading is not fast, we consider controlling single-threaded.)

​ When multiple threads carry hot and cold data at the same time, we need to consider the following implementation logic.

Step 1: How to Start multithreading?

​ Because we use timer trigger logic, the most cost-effective way to trigger logic is to set multiple timers and make the interval between each timer shorter, and then start moving data every time a thread is started regularly .

​ Another more appropriate way is to build a thread pool by yourself, and then trigger the following operations at regular intervals: first calculate the number of hot data to be moved, and then calculate the number of threads to be started at the same time, if it is greater than the number of thread pools, take it The number of threads in the thread pool. Assuming that this number is N, the threads that start the thread pool are looped N times to carry hot and cold data.

Step 2: A thread is announced that a data operation, other threads do not move (lock).

​ Regarding this logic, we need to consider three characteristics.

  • The atomicity of lock acquisition: When a thread finds that a certain data to be processed is not locked, and then locks it, these two operations must be atomic, that is, either succeed together or fail together. The actual operation is to add the LockThread and LockTime fields to the table first, then use a SQL statement to find out the unlocked or lock timeout data to be migrated, and then update LockThread=current thread, LockTime=current time, and finally use MySQL The update lock mechanism achieves atomicity.

  • Acquiring the lock must be consistent with the start of processing: when the current thread starts to process this piece of data, you need to check again whether the data of the operation is successfully locked by the current thread. The actual operation is to query LockThread= the data of the current thread again, and then process the query. The data.

  • The release of the lock must be consistent with the completion of the processing: after the current thread has processed the data, the lock must be released.

    Step 3: After a thread is processed normally, the data is not in the hot storage, but ran directly to the cold storage. This is normal logic, but there is nothing special to pay attention to.

    Step 4: What if a thread fails and exits, but the lock is not released (lock timeout)?

    The lock cannot be released: If the thread that locked the data exits abnormally and it is too late to release the lock, causing other threads to be unable to process the data, what should I do at this time? The solution is to set a timeout period for the lock. If the lock timeout has not been released, other threads can process the data normally.

    When setting the timeout period, we should also consider what should we do if the thread that is being processed does not exit due to the timeout caused by the data being processed? The solution is to try to set the timeout time to exceed the reasonable time for processing data, and the code for processing hot and cold data must be guaranteed to be idempotent.

    Finally, we have to consider an extreme case: if the current thread lock data is still processing the data is being processed at this time-out, and another thread is processing the data has been locked, this time how to do? We only need to add fault tolerance at each step, because the code for handling hot and cold data is relatively simple, and the current thread processing through this operation will not destroy the consistency of the data.

(4) How to use cold data

​ On the query interface of functional design, there is generally an option for us to choose whether we need to query cold data or hot data. If it is not provided on the interface, we can directly distinguish it in the business code. (Note: When judging whether it is cold data or hot data, we must ensure that users are not allowed to read hot and cold data at the same time.)

How to migrate historical data?
Generally speaking, as long as the architecture solutions related to the persistence layer, we all need to consider the migration of historical data, that is, how to make the historical data of the old architecture apply to the new architecture?

​ Because the previous separation logic just covers this problem when considering the failure retry scenario, the solution to this problem is also very simple, we only need to mark all the historical data: after flag=1, the program is Will be automatically migrated.

Insufficiency of cold and hot separation solutions

​ I have to say that the cold and hot separation solution can indeed solve the problems of slow write operations and slow hot data, but there are still many shortcomings.

Less than one: users query data speed is still very slow cold, cold if the query data of the low percentage of users, such as only 1%, then this program is no problem.

Less than two: business data can not be modified cold, because cold data and more to a certain extent, the system can not bear. (This point can be solved by the cold storage and sub-storage, which will be discussed later)

Guess you like

Origin blog.51cto.com/11996285/2644153