Summary of paging query optimization solutions

Too many functions, scan records, etc. will affect the speed of the query, how to improve the query speed of the SQL statement is very important. Here are some optimization methods for sql queries.

1. Commonly used paging query

(1) Unused index

For tables with small amounts of data, we often use (select * from table limit x,y)the form to complete paging queries. E.g:

select * from areas limit 0,20; （第一页）
select * from areas limit 20,20;（第二页）
select * from areas limit 40,20;（第三页）

The query in this way does not use the index, so a full table query will be performed. When the amount of data is large, the speed will be very slow.

Insert picture description here

(2) Use index

When the index is used, the number of scanned records will be significantly reduced. Use explain below to view the query:

explain select * from areas order by id desc limit 300,20;

Insert picture description here
The order by idpurpose of use here is to use the primary key index in id, otherwise our query will become a full table search. The rows queried here represents the number of scanned records when we query. If the starting point of limit is 300, the number of records is 320. If the starting point is 1000000, then the number of query records is 100020, which means that the starting point is getting further Later, the slower the query speed, if the starting point reaches the end, it is equivalent to a full table search, and the index loses its meaning at this time.

This kind of index query method is often used by us, but this kind of query needs to be returned to the table, that is, a second query is required. In order to avoid the return to the table operation as much as possible, we recommend the use of delayed association to perform the query to the greatest extent Use a covering index.

(3) Field interpretation in explain:

1) id: identifier

2) select_type: the type of query:

SIMPLE: Simple SELECT statement (not including UNION operation or subquery operation)
PRIMARY: The outermost SELECT in the query (for example, two tables do UNION or there are subqueries, the outer table operation is PRIMARY, and the inner operation is UNION)
SUBQUERY: The first SELECT in the subquery (if multiple subqueries exist)
DERIVED: Driven SELECT subquery (the subquery is in the FROM clause)

3) table: result table

4) partitions: matching partitions

5) type: the connection type of the table,

Query performance: system> const> eq_ref> ref> range> index> all
ALL: Full Table Scan, MySQL will traverse the entire table to find matching rows
index: Full Index Scan, the difference between index and ALL is that the index type only traverses the index tree,Although both all and Index read the entire table, index is read from the index, and all is read from the hard disk
range: Only retrieve rows in a given range, use an index to select rows
ref: Indicates the connection matching condition of the above table, that is, which columns or constants are used to find the value on the index column

6) possible_keys: may use the index

7) key: the index actually used

8) key_len: the length of the index field

9) ref: comparison of column and index

10) rows: the number of rows scanned

11) filtered: the percentage of rows filtered by table conditions

12) Extra: Description and explanation of implementation

Using temporary: A temporary table is used to save intermediate results. MySQL uses a temporary table when sorting query results. Commonly used in sorting order by and grouping query group by.
Using index: Indicates that a covering index is used in the corresponding select operation to avoid accessing the data rows of the table, and the efficiency is good.
Using where: indicates that where filtering is used

explain in detail: https://blog.csdn.net/why15732625998/article/details/80388236

Two, paging query optimization

1 Maximum id query method

select * from areas where id>2000 limit 20;

Insert picture description here
This test data has a total of 3750 data, id>2000, that is, the number of records queried is the data after id=2000.

Note: The
maximum id query method can only be applied to self-incrementing components, the primary key generated by uuid is not suitable for this method.

2 between and

select * from user where id between 2000 and 2010;

Insert picture description here
This way also Can only be applied to self-incrementing primary keys, and the id is not broken, Otherwise this method is not recommended. When using BETWEEN AND, 11 records can be queried. Please note that BETWEEN AND contains the boundary conditions on both sides.

3 limit id

select * from areas where id>(select id from areas limit 2000,1)
limit 20;

Insert picture description here
Although 3750 rows of records are scanned in this way, since id is the primary key, it has a primary key index, So the limit range query for a primary key id is select * from areas limit 2000,20;much faster than that.

SUBQUERY: Subqueries are included in the SELECT or WHERE list

4 Delayed association

Covering index: InnoDB's secondary index saves the primary key value of the row in the leaf node, so the secondary primary key can cover the query, which can avoid the secondary query of the primary key index. The required data columns only need to be obtained from the index, no need to go back to the table, that is, no need to go to the primary key index area to find the data row.

Delayed association : In order to improve the query ability of the SQL statement as much as possible, let the part of the SQL statement use the covering index to query.

We created a common index keyin on the aid field.

select * from areas a join (select aid from areas 
limit 2000,20) b on a.aid = b.aid;

Insert picture description here
In the first stage of the query, we used the covering index select aid from areas limit 2000,20to find the matching aid in the subquery of the from clause. Then, according to these aids, all the column values are obtained in the outer query matching. Although you cannot use an index to cover the entire query, it is better than not being able to use index coverage at all.

note:

The limit id in 4 also uses the covering index, but we recommend using delayed association join to query and avoid using subqueries. When there are more fields and the length of the type is longer, delayed association is also more advantageous.
When a covering index is used in the select operation, the Using index will be displayed in the extre in the explain.
The larger the id value, the higher the query priority.
The subquery contained in the FROM list is marked as DERIVED (derived), devived2which refers to the derived table of areas.

5 points table query

MySQL recommends that the storage of a table should not exceed 500w of data. The query 400w is less than 1 second for general queries. If it is faster, it is recommended to use separate table storage, which can be divided into two situations. The sub-table is in the vertical sub-table.

(1) Horizontal sub-table
If the original data of a table has 900w pieces of data,It can be stored in three tables, and each table stores 3 million data, so that the pressure will be much less when querying, And the efficiency is very high. So the question is how to realize this level of water meter? For example, you can use mycatmiddleware like this. Alibaba Cloud also provides database sharding technology. Of course, you can also hand-write the table by yourself, but you need to pay attention to the id duplication and how to define and match the current id in that table when you hand-write the table by yourself. , The algorithm recommends using the hash value.

(2) Vertical sub-table
If the record has 100w, the query speed should not be too slow in normal terms, but because this table has too many fields, and there are many text-type fields,At this time, we can divide the fields that take up less space in one table, and the fields that take up more space in another table, The two tables are associated one by one, so that the query will be much faster.

(3) Cold and hot tables.
Cold and hot tables are also a sub-table idea. For example, when a bank queries bills, you will find that you can only query data from recent months. The previous data needs to go to the counter to query historical bills. Bank query data is used Design ideas for hot and cold tables.

Create two identical tables. One table stores the records of the past three months: table a, and the other table stores the data three months ago: table b. New records generated by users can be stored in table a.You can scan table a regularly every morning, as long as the record has been three months ago, we can migrate the record to table bFor users, they are more sensitive when querying data for the past three months, and they may not query much for data three months ago, so this design is completely reasonable.

6 Index

Adding an index can improve query efficiency. If the paging query involves conditions, we can add an index to the condition. The database will maintain a corresponding index table. When querying, it will first query the index table and directly query the records based on the records returned by the index table. Table, this also reduces the number of rows scanned, but it needs to be noted that as long as the following points occur, the index may not be triggered, so be careful.

Query condition is not null
Like statements, such as keyword like'%notebook', the index is invalid, and% cannot be at the top.
As long as one of the conditions before and after OR does not add an index, the entire table will be scanned and the index will become invalid.
Composite index: When using a composite index, it is necessary to bring the first index field, otherwise the composite index will not take effect.
> 、< 、 <>。
String without single quotes

7 Cache

Cache the query results in redis, so that the memory is directly read without querying the hard disk data.

Note:
The focus of query optimization lies in how to scan the smallest number of records, return the results of the query, and use cache to reduce database access, but this cures the symptoms and not the root cause. Only by writing beautiful sql can the program be invincible.

Reference blog: https://blog.csdn.net/qq_33220089/article/details/105012663