MySQL large data volume paging query method and its optimization

MySQL large data volume paging query method and its optimization

---Method 1: Use the SQL statement provided by the database directly

--- Statement style: In MySQL, the following methods can be used: SELECT * FROM table name LIMIT M, N Tuple 100/1000 level)
---reason/disadvantage: full table scan, the speed will be very slow and some database result sets return unstable (for example, 1, 2, 3 is returned in one time, and 2, 1 is returned in another time. 3). The Limit limit is to take N outputs from the M position of the result set, and discard the rest.

---Method 2: Create a primary key or unique index and use the index (assuming 10 items per page)
---Statement style:  In MySQL , the following methods can be used: SELECT * FROM table name WHERE id_pk > (pageNum*10) LIMIT M
- --Adapted to scenarios: Suitable for large amounts of data (tens of thousands of tuples)
---Reason: Index scanning, the speed will be very fast. A friend suggested: Because the data query is not sorted by pk_id, there will be In the case of missing data, only method 3

---Method 3: Reordering based on index
---Statement style: In MySQL, the following methods can be used: SELECT * FROM table name WHERE id_pk > (pageNum*10) ORDER BY id_pk ASC LIMIT M
---Adapted to scenarios: Suitable for In the case of a large amount of data (tens of thousands of tuples). It is best that the column object after ORDER BY is the primary key or unique so that the ORDERBY operation can be eliminated by using the index but the result set is stable (for the meaning of stability, see Method 1)
---Reason: Index scan, the speed will be very fast. But MySQL's sorting operation, only ASC without DESC (DESC is false, will do real DESC in the future, looking forward to...).

---Method 4: Use prepare based on the index (the first question mark indicates pageNum, the second ? indicates the number of tuples per page)
---Statement style: In MySQL, the following methods can be used: PREPARE stmt_name FROM SELECT * FROM table name WHERE id_pk > (?* ?) ORDER BY id_pk ASC LIMIT M
---Adapted to the scenario:  large amount of data
---reason: Index scan, the speed will be very fast. The prepare statement is a bit faster than the general query statement.

---Method 5: Using MySQL to support ORDER operations can use indexes to quickly locate some tuples and avoid full table scans

For example: read the tuple from rows 1000 to 1019 (pk is the primary/unique key).

SELECT * FROM your_table WHERE pk>=1000 ORDER BY pk ASC LIMIT 0,20

---Method 6: Use "subquery/join + index" to quickly locate the position of the tuple, and then read the tuple. The same principle as method 5

Such as (id is the primary key/unique key, variable in blue font):

Example using subqueries:

SELECT * FROM your_table WHERE id <= 
(SELECT id FROM your_table ORDER BY id desc LIMIT ($page-1)*$pagesize ORDER BY id desc LIMIT $pagesize

Exploit connection example:

SELECT * FROM your_table AS t1 
JOIN (SELECT id FROM your_table ORDER BY id desc LIMIT ($page-1)*$pagesize AS t2 
WHERE t1.id <= t2.id ORDER BY t1.id desc LIMIT $pagesize;

 
MySQL uses limit paging for large data volumes. As the page number increases, the query efficiency becomes lower.

test experiment

1. Directly use the limit start, count paging statement, which is also the method used in my program:

select * from product limit start, count
When the starting page is small, the query has no performance problems. Let's look at the execution time of paging starting from 10, 100, 1000, and 10000 respectively (20 entries per page), as follows:

select * from product limit 10, 20   0.016秒
select * from product limit 100, 20   0.016秒
select * from product limit 1000, 20   0.047秒
select * from product limit 10000, 20   0.094秒

We have seen that with the increase of the starting record, the time also increases, which shows that the paging statement limit has a great relationship with the starting page number, then we change the starting record to 40w (that is, the record generally around)                                    

select * from product limit 400000, 20   3.229秒

Look at the time when we take the last page of records
select * from product limit 866613, 20 37.44 seconds

A page with the largest page number with this kind of pagination is obviously unbearable for such a time.

From this, we can also conclude two things:
1) The query time of the limit statement is proportional to the position of the starting record
2) The limit statement of mysql is very convenient, but it is not suitable for direct use for tables with many records.

2. Performance optimization method for limit paging problem

Use the covering index of the table to speed up the paging query
We all know that if only the index column (covering index) is included in the statement using the index query, then the query will be very fast.

Because there is an optimization algorithm for index search, and the data is on the query index, there is no need to find the relevant data address, which saves a lot of time. In addition, there is also a related index cache in Mysql, and it is better to use the cache when the concurrency is high.

In our case, we know that the id field is the primary key, and naturally contains the default primary key index. Now let's see how a query using a covering index performs:

This time, we query the data of the last page (using the covering index, which only contains the id column), as follows:
select id from product limit 866613, 20 0.2 seconds
is about 100 times faster than 37.44 seconds when all columns are queried. speed

Then if we also want to query all the columns, there are two ways, one is the form of id>=, the other is to use join, let's see the actual situation:

SELECT * FROM product WHERE ID >=(select id from product limit 866613, 1) limit 20 The
query time is 0.2 seconds!

Another way of writing
SELECT * FROM product a JOIN (select id from product limit 866613, 20) b ON a.ID = b.id The
query time is also very short!

3. Compound index optimization method

How high can MySql performance be? MySql is a database that is definitely suitable for dba-level masters to play. Generally, a small system with 10,000 pieces of news can be written how to write it, and the xx framework can be used to achieve rapid development. But when the amount of data reaches 100,000, from one million to ten million, can his performance still be that high? A little mistake may cause the entire system to be rewritten, or even make the system unable to run normally! Well, not so much nonsense. Speak with facts, see examples:

The data table collect ( id, title , info , vtype) has these four fields, among which title is fixed length, info is text, id is gradual, vtype is tinyint, and vtype is index. This is a simple model of a basic news system. Now fill it with data, fill it with 100,000 news. The final collect is 100,000 records, and the database table occupies a hard 1.6G. OK, look at the following sql statement:

select id, title from collect limit 1000, 10; very quickly; basically 0.01 seconds is OK, see below

select id,title from collect limit 90000,10; Start paging from 90,000, the result?

8-9 seconds to complete, what's wrong with my god? In fact, to optimize this data, the answer can be found online. Look at the following statement:

select id from collect order by id limit 90000,10;

Very quickly, 0.04 seconds is OK. Why? Because using the id primary key as an index is of course faster. The online revision is:

select id,title from collect where id>=(select id from collect order by id limit 90000,1) limit 10;

This is the result of using id as an index. But the problem is a little more complicated, and it's over. see the following statement

select id from collect where vtype=1 order by id limit 90000,10; very slow, took 8-9 seconds!

When I get here, I believe that many people will have the same feeling of collapse as me! Is vtype indexed? How can it be slow? It is good to have vtype indexed, you can directly

select id from collect where vtype=1 limit 1000,10;

It is very fast, basically 0.05 seconds, but it is increased by 90 times. Starting from 90,000, that is the speed of 0.05*90=4.5 seconds. And test results 8-9 seconds to an order of magnitude. From here, someone proposed the idea of ​​​​dividing tables, which is the same idea as the dis #cuz forum. The idea is as follows:

Build an index table: t (id, title, vtype) and set it to a fixed length, then do paging, paging out the results and then go to collect to find info. Is it feasible? Find out by experimenting.

100,000 records are stored in t(id, title, vtype), and the size of the data table is about 20M. use

select id from t where vtype=1 order by id limit 90000,10;

very soon. Basically 0.1-0.2 seconds can run. Why is this so? I guess because there is so much data to collect, pagination goes a long way. The limit is completely related to the size of the data table. In fact, this is still a full table scan, just because the amount of data is small, only 100,000 is fast. OK, let's do a crazy experiment, add it to 1 million, and test the performance. After adding 10 times the data, the t-table will reach more than 200 M immediately, and it is a fixed length. Or the query statement just now, the time is 0.1-0.2 seconds to complete! Is the performance of the split meter okay? wrong! Because our limit is still 90,000, so fast. Give a big one, start at 900,000

select id from t where vtype=1 order by id limit 900000,10;

Look at the results, the time is 1-2 seconds! why ?

It's still so long after the sub-table, which is very depressing! Some people say that fixed length will improve the performance of limit. At first, I thought that because the length of a record is fixed, mysql should be able to calculate the position of 900,000, right? But we have overestimated the intelligence of mysql. It is not a business database. It turns out that fixed-length and non-fixed-length have little effect on limit? No wonder some people say that discuz will be very slow when it reaches 1 million records, I believe this is true, this is related to database design!

Couldn't MySQL break the 1 million limit? ? ? Does it really reach the limit when it reaches 1 million pagination?

The answer is: NO The reason why it can't break through 1 million is because mysql is not designed. Let's introduce the non-sub-table method, come to a crazy test! One table to get 1 million records, and 10G database, how to quickly paginate!
 
Well, our test goes back to the collect table, and the test conclusion is:

300,000 data, it is feasible to use the split table method, if it exceeds 300,000, his speed will be slow and you can't stand it! Of course, if the method of sub-table + me is used, it is absolutely perfect. But after using my method, it can be solved perfectly without dividing the table!
 
The answer is: compound indexes! Once when designing a mysql index, I accidentally found that the index name can be chosen arbitrarily, and several fields can be selected to come in. What is the use of this? started

select id from collect order by id limit 90000,10;

So fast is because the index is gone, but if the where is added, the index will not be taken. With the idea of ​​trying it out, I added an index like search(vtype,id) . then test

select id from collect where vtype=1 limit 90000,10; very fast! Done in 0.04 seconds!

Test again: select id ,title from collect where vtype=1 limit 90000,10; Very sorry, 8-9 seconds, did not go to the search index!

Retest: search(id, vtype), or select id, it is also a pity, 0.5 seconds.

To sum up: if there is a where condition, and you want to use the limit for the index, you must design an index, put the where in the first place, the primary key used by the limit in the second place, and only the primary key can be selected!

Perfectly solved the pagination problem. If you can quickly return the id, you can hope to optimize the limit. According to this logic, the limit of millions should be divided in 0.0x seconds. It seems that the optimization and indexing of mysql statements are very important!

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325938227&siteId=291194637