BAT Daniel teach you one million data, paging, how to deal with?

Structure Daren 2019-06-02 01:12:16

Recently encountered such a situation, the data inside the database due to the long-term accumulation, leading to constantly rising amount of data, but each time the background of the system paging query efficiency will be reduced a lot. Later, I look for a moment, and found pages principle in this case is mainly using the traditional physical page limit n, m way.

To facilitate the presentation, I deliberately created the following table a few examples of exercises:

Product table are table, a user table, the table records the user to buy goods:

goods user g_u

Relationship relatively simple three tables, user's id id generated inside the combined goods and associated data, stored in the g_u inside. Design of the following three database tables:

The simulation scenario is very simple, the relationship between the user and the product is maintained at the many associations in. In order to facilitate subsequent test, I created 1.9 million jmeter test data volume, simulate a one million data queries scene.

Suppose now that there are needs such a business scenario, we need to be paged queries against the data purchase records inside the table, then for conventional paging query operation, ordinary people will think of a way might be through the following statement:

SELECT * from g_u as gu ORDER BY id limit 1850000,100
复制代码

Test found that the time of the query is:


When we search for more data on the back, the speed of the search will be more low, so this time, appropriate to create the index becomes more important.

First, we do sql tested once explain the test results as follows:


Since the time of our inquiry, using sorted according to the primary key index id, and therefore, when a query key is PRIMARY.

 SELECT * FROM g_u WHERE id >=(SELECT id FROM g_u LIMIT 1850000,1) ORDER BY id LIMIT 100
复制代码

At this point some of the many queries has improved, but still slow query


Analysis results can be seen through the Plan of Implementation explain:


Subqueries use the index, the outer query uses a secondary index where the

This time we may wish to try it by using the primary key id to improve the efficiency of our query:

SELECT * FROM g_u as gu WHERE gu.id>($firstId+$pageSize*$pageSize) limit 100
复制代码

Greatly reducing the time of the query all of a sudden a lot:


By explain analyze the sql:


There, sql running when the services of the primary key index, so the efficiency is greatly improved.

But this time, you may have such a doubt. If the index data is not contiguous how to handle pagination when each page of data integrity and consistency?

Here may wish to try another kind of thinking, through the establishment of a third-party table g_u_index table, originally scrambled id stored in g_u_index, the g_u_index a table, we can pass the table ordered g_u_index.id corresponding to the corresponding original disordered g_u.id. sql statement to build the table is as follows:

CREATE TABLE `g_u_index` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `index` int(11) DEFAULT NULL,
 PRIMARY KEY (`id`),
 UNIQUE KEY `idx_id_index` (`id`,`index`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=1900024 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
复制代码

ps: you can build a composite index of both id and index, enhance the efficiency of the query.

Here we need to ensure that it is, g_u table, insert the data sequence and order need to insert the g_u_index table is the same. And then query the specified index page when you can so to check:

 SELECT g_u_index.index FROM g_u_index WHERE id=($firstId+$pageSize*$pageSize) limit 1
复制代码

By performing explain the analysis, the result becomes as follows:


Query time: 0.001s

With the help of a third-party table above, paging sql optimization that can be adjusted in the following ways:

SELECT * FROM g_u as gu where gu.id>(
SELECT g_u_index.index FROM g_u_index WHERE id=($firstId+$pageSize*$pageSize) limit 1
) limit 100
复制代码

After the building through a third-party table, query time data suddenly greatly reduced:

When queried for a more humane, usually do not need to display these meaningless id, needed is a trade name and user name, assuming that we only use the most primitive way without third-party table query, then it would be more efficient under:

SELECT gu.id,goods.`name`,`user`.username FROM g_u as gu ,goods ,`user` 
where goods.id=gu.g_id AND `user`.id=gu.u_id 
ORDER BY id limit 1500000,1000
复制代码

result:

Therefore, if the use of third-party table query, then, sql can be adjusted to the bottom of this type:

SELECT goods.`name`,`user`.username FROM g_u as gu ,goods ,`user` 
where goods.id=gu.g_id AND `user`.id=gu.u_id 
and 
gu.id>=(
SELECT g_u_index.index FROM g_u_index WHERE id=(9+1000*1900) limit 1
) limit 100
复制代码

It will greatly reduce the time of the query:

By then explain the implementation plan, the results are as follows:

In a real business scenario, there are a million original table data to make such an id split and synchronized to a third-party table is not too easy, here recommend a thought, we can make use of the middle Ali Feed member canal is achieved for a database log, and then customize the operation of data synchronization.

For the canal to explain in my article is also about: Ali Canal preliminary practice framework (data synchronization middleware)

For sql optimization requires a combination of actual business needs to carry out, in general, this part still need to have some practical exercise in order to become stronger.

Sql common optimization techniques (self-refer to Java Network) Summary:

1. The large amount of data, you should try to avoid full table scan should be considered indexed by the column involved in where and order, indexing can greatly accelerate the speed of data retrieval.

2. appropriate sql can make the appropriate depth analysis using Explain.

3. LIMIT 1 when a single line of data.

4. As a condition of using the index field, if the index is a composite index, you must use the index to the first field as the conditions to ensure the system uses the index, otherwise the index will not be used, and should as much as possible so that field order is consistent with the order index.

5. Do not carry out functions, arithmetic operations, or other expressions in the where clause "=" left, or the system may not work properly indexed.

6. adopt the appropriate time covering index can improve query efficiency.

Written in the last:

Finally, the code word is not easy to see, it is a point of concern chant, not only the collection points are concerned about the bullying!


Guess you like

Origin blog.csdn.net/weixin_34200628/article/details/91399332