MySQL Paging Query Optimization Scheme for Large Data Volume

Method 1: Use the SQL statement provided by the database directly

Statement style: In  MySQL, the following methods are available:

SELECT * FROM 表名称 LIMIT M,N

Adapt to the scene:  suitable for situations with a small amount of data (tuple hundred/thousands)

Reasons/disadvantages:  full table scan, the speed will be very slow and some database result sets return unstable (such as returning 1, 2, 3 at one time, and returning 2, 1, 3 at the other time). Limit is limited from the result set Take out N output at the M position, and discard the rest.

 

Method 2: Create a primary key or unique index, use the index (assuming 10 entries per page)

Statement style: In  MySQL, the following methods are available:

SELECT * FROM 表名称 WHERE id_pk > (pageNum*10) LIMIT M

Adaptation scenarios:  suitable for situations with a large amount of data (tens of thousands of tuples)

Reason:  Index scan, speed will be very fast. A friend suggested: Because the data query is not sorted according to pk_id, so there will be cases of missing data, only method 3

 

Method 3: reorder based on index

Statement style: In  MySQL, the following methods are available:

SELECT * FROM 表名称 WHERE id_pk > (pageNum*10) ORDER BY id_pk ASC LIMIT M

Adaptation scenarios:  Suitable for situations with a large amount of data (tens of thousands of tuples). It is best that the column object after ORDER BY is the primary key or unique, so that the ORDERBY operation can be eliminated by the index but the result set is stable (stable meaning , See method 1)

Reason:  Index scanning will be very fast. But MySQL's sorting operation is only ASC without DESC (DESC is fake, real DESC will be done in the future, look forward to...).

 

Method 4: Use prepare based on the index

The first question mark means pageNum, the second? Indicates the number of tuples per page

Statement style: In  MySQL, the following methods are available:

PREPARE stmt_name FROM SELECT * FROM 表名称 WHERE id_pk > (?* ?) ORDER BY id_pk ASC LIMIT M

Adapt to the scene:  large data volume

Reason:  Index scan, the speed will be very fast. The prepare statement is a bit faster than the general query statement.

 

Method 5: Using MySQL to support ORDER operations can use indexes to quickly locate some tuples and avoid full table scans

For example: read the 1000th to 1019th row tuple (pk is the primary key/unique key).

SELECT * FROM your_table WHERE pk>=1000 ORDER BY pk ASC LIMIT 0,20

Method 6: Use "subquery/join + index" to quickly locate the position of the tuple, and then read the tuple.

For example (id is the primary key/unique key, variable in blue font)

Examples of using subqueries:

SELECT * FROM your_table WHERE id <=
(SELECT id FROM your_table ORDER BY id desc LIMIT ($page-1)*$pagesize ORDER BY id desc
LIMIT $pagesize 

Use connection example:

SELECT * FROM your_table AS t1
JOIN (SELECT id FROM your_table ORDER BY id desc LIMIT ($page-1)*$pagesize AS t2
WHERE t1.id <= t2.id ORDER BY t1.id desc LIMIT $pagesize; 

MySQL uses limit paging for large data volume. As the page number increases, the query efficiency becomes lower.

 

Test experiment

1. Directly use limit start and count paging statements, which are also the methods used in my program:

select * from product limit start, count 

When the starting page is small, the query has no performance problems. Let's look at the execution time of paging from 10, 100, 1000, 10000 (20 entries per page).

as follows:

select * from product limit 10, 20   --0.016秒 
select * from product limit 100, 20  -- 0.016秒
select * from product limit 1000, 20  -- 0.047秒
select * from product limit 10000, 20   --0.094秒

We have seen that as the starting record increases, the time also increases. This shows that the paging statement limit has a lot to do with the starting page number. Then we change the starting record to 40w and look at it (that is, the record In general)

select * from product limit 400000, 20   --3.229秒 

Look at the time we took the last page of records

select * from product limit 866613, 20   --37.44秒 

Obviously this kind of time is unbearable for the largest page number page of this kind of paging.

From this we can also summarize two things:

  • The query time of the limit statement is proportional to the position of the starting record

  • The limit statement of mysql is very convenient, but it is not suitable for direct use for tables with many records.

 

2. Performance optimization method for limit paging problem

Use the covering index of the table to speed up paging queries

We all know that if only the index column (covering index) is included in the statement that uses the index query, the query will be very fast in this case.

Because there is an optimized algorithm for index search, and the data is on the query index, there is no need to find the relevant data address, which saves a lot of time. In addition, there are related index caches in Mysql. It is better to use the cache when the concurrency is high.

In our example, we know that the id field is the primary key, so it naturally contains the default primary key index. Now let us look at the effect of the query using the covering index.

This time we query the data of the last page (using the covering index, which only contains the id column), as follows:

select id from product limit 866613, 20 0.2秒 

Compared to 37.44 seconds to query all columns, it is about 100 times faster

So if we want to query all columns, there are two methods, one is in the form of id>=, and the other is to use join. Look at the actual situation:

SELECT * FROM product WHERE ID > =(select id from product limit 866613, 1) limit 20

The query time is 0.2 seconds!

 

Another way of writing

SELECT * FROM product a JOIN (select id from product limit 866613, 20) b ON a.ID = b.id

The query time is also very short!

3. Compound index optimization method

How high can MySql performance be? MySql is definitely a database suitable for dba-level masters to play. Generally, you can write a small system with 10,000 news articles. Rapid development can be achieved with the xx framework.

But the amount of data has reached 100,000, millions to tens of millions, can his performance be that high? A little mistake may cause the entire system to be rewritten, or even the system cannot operate normally! Okay, not so much nonsense.

 

Speak with facts, see examples:

The data table collect (id, title, info, vtype) has these 4 fields, where title uses fixed length, info uses text, id is gradual, vtype is tinyint, and vtype is index.

This is a simple model of a basic news system. Now fill in the data to fill in 100,000 news. Finally, collect is 100,000 records, and the database table occupies a hard 1.6G.

OK, look at the following SQL statement:

select id,title from collect limit 1000,10;

Soon; basically it's OK in 0.01 seconds, then look at the following

select id,title from collect limit 90000,10;

Paging starts from 90,000, the result?

8-9 seconds to complete, what is wrong with my god? In fact, to optimize this data, find the answer online. Look at the following statement:

select id from collect order by id limit 90000,10;

Soon, 0.04 seconds will be OK. why? Because the id primary key is used for indexing, of course it is fast.

The online reform is:

select id,title from collect where id>=(select id from collect order by id limit 90000,1) limit 10;

This is the result of indexing with id. But if the problem is a little bit complicated, it's over. Look at the following statement

select id from collect where vtype=1 order by id limit 90000,10; 

Very slow, it took 8-9 seconds!

When I get here, I believe many people will feel like I am broken! Is vtype indexed? How can it be slow? vtype index is good, you directly

select id from collect where vtype=1 limit 1000,10;

It is very fast, basically 0.05 seconds, but it is increased by 90 times. Starting from 90,000, that is the speed of 0.05*90=4.5 seconds. And the test result reached an order of magnitude in 8-9 seconds.

 

From here, someone put forward the idea of ​​sub-table, this is the same idea as dis #cuz forum. The idea is as follows:

Create an index table: t (id, title, vtype) and set it to a fixed length, then do paging, and then the results will be paged out to find info in collect. Is it feasible? Under the experiment you will know.

100,000 records are stored in t(id,title,vtype), and the data table size is about 20M. use

select id from t where vtype=1 order by id limit 90000,10;

soon. Basically, it can run in 0.1-0.2 seconds. Why is this so?

I guess it is because the collect data is too much, so paging will go a long way. The limit is completely related to the size of the data table. In fact, this is still a full table scan, just because the amount of data is small, only 100,000 is fast. OK, let’s do a crazy experiment, add 1 million to test the performance. After adding 10 times the data, the t-table immediately reached more than 200M, and it was fixed-length. Still the query statement just now, the time is 0.1-0.2 seconds to complete! No problem with sub-meter performance?

wrong! Because our limit is still 90,000, so fast. Give a big one, start at 900,000

select id from t where vtype=1 order by id limit 900000,10;

Look at the result, the time is 1-2 seconds! why?

It is still so long after the sub-table, very depressing! Some people say that fixed length will improve the performance of limit. At first I thought that because the length of a record is fixed, mysql should be able to calculate the position of 900,000, right? But we overestimated the intelligence of mysql. It is not a commercial database. It turns out that fixed-length and non-fixed-length have little effect on limit? No wonder some people say that discuz will be very slow after reaching 1 million records. I believe this is true. This is related to database design!

Can't MySQL break the limit of 1 million? ? ? Is it really the limit when it reaches 1 million pages?

The answer is: Why can't NO exceed 1 million is caused by not designing mysql. Let’s introduce the non-split table method, let’s have a crazy test! One table handles 1 million records, and a 10G database, how to quickly paging!

Okay, our test returns to the collect table, and the conclusion of the test is:

300,000 data, it is feasible to use the sub-table method, and the speed of more than 300,000 will be slower than you can stand it! Of course, if you use the sub-table + me method, it is absolutely perfect. But after using my method, it can be solved perfectly without sub-table!

The answer is: compound index! Once when designing a mysql index, I accidentally discovered that the index name can be chosen at will, and several fields can be selected. What is the use?

The beginning

select id from collect order by id limit 90000,10; 

It is so fast because the index is gone, but if you add where, the index will not be taken. I added an index like search(vtype,id) with the idea of ​​having a try.

Then test

select id from collect where vtype=1 limit 90000,10; 

very fast! Completed in 0.04 seconds!

Test again:

select id ,title from collect where vtype=1 limit 90000,10; 

Unfortunately, 8-9 seconds, no search index!

Test again: search (id, vtype), or select id this sentence, also very regrettable, 0.5 seconds.

To sum up: if you have where conditions and want to use limit for the index, you must design an index. Put where first and the primary key used by limit second, and you can only select the primary key!

The paging problem is solved perfectly. If you can return the id quickly, there is hope to optimize the limit. According to this logic, a million-level limit should be divided within 0.0x seconds. It seems that optimization and indexing of mysql statements are very important!

 

Guess you like

Origin blog.csdn.net/bj_chengrong/article/details/103233267