MySQL Tuning Series (5) - Detailed Explanation of Indexes

1. Definition of index

The database index is like the table of contents in front of the book, which can speed up the query speed of the database.
An index is a data structure used to help us quickly locate the data we want to find in a large amount of data. It is a data structure
that helps MySQL obtain data efficiently . ps: In most cases, index queries are faster than full table scans. But if the amount of data in the database is not large, the use of indexes may not necessarily bring great improvements.

Let's take a look at the previous article: MySQL Tuning Series (4) - Execution Plan .
There are two paragraphs of this description:
insert image description here
special attention should be paid to this place, because the execution plan can see the use of our index.

Second, the advantages and disadvantages of the index

Advantages:
1. Increase the retrieval speed (reduce the amount of index data), as can be seen from the fact that the data structure is a B+ tree.
2. Avoid sorting (order by, the index has been sorted), temporary table (queried temporary storage table) 3.
Turn random IO into sequential IO
4. Create a unique index to ensure the uniqueness of each row of data in the database table.
Disadvantages:
1. It takes a lot of time to create and maintain indexes. When adding, deleting, and modifying data in a table , if the data has an index, the index also needs to be dynamically modified, which will reduce the efficiency of SQL execution .
2. The index needs to use physical file storage, which will consume a certain amount of space.

Three, the classification of the index

Before looking at the classification, first list the data structure of the index. For details, refer to: "Tree" of Data Structure - Binary Tree, Red-Black Tree, B Tree, B+ Tree, B* Tree .
insert image description here
Mysql has three storage engines. The data structures of Innodb and Myisam are both B+ trees (there are differences, which will be discussed below). The data structure of Memory is a hash table. For the hash table, please refer to: HashMap source code analysis ( jdk1 .
There are different classifications of indexes from different dimensions:
Generally, they are distinguished by application dimensions:
1. Primary key index: As the name implies, it is the primary key for indexing, and the database will add indexes to the primary key columns by default.
2. Unique index: the index added to the unique column.
3. Ordinary index: It is not a primary key, nor is it a unique column. It is also called a secondary index or an auxiliary index.
4. Composite index: In some cases, it is necessary to add an index to multiple columns, and at this time, a composite index is formed.
5. Full-text index: segment the content of the text and search. Currently only CHAR, VARCHAR, and TEXT columns can create full-text indexes. It is generally not used, and the efficiency is low. Usually, a search engine such as es is used instead.
6. Covering index: An index contains (or covers) the values ​​of all fields that need to be queried.
According to the storage method of the underlying storage engine:
1. Clustered index (clustered index): The index structure and data are stored together. The primary key index in InnoDB belongs to the clustered index.
2. Non-clustered index (non-clustered index): The index structure and data are stored separately, and the secondary index (auxiliary index) belongs to the non-clustered index. MySQL's MyISAM engine, regardless of primary key or non-primary key, uses non-clustered indexes.
Let's not rush to explain in detail here, let's first look at the best matching method of the index.

4. Index matching method - combined index - leftmost match

Create a table first, as shown in the figure:
insert image description here

To create a composite index, try to choose a column with a relatively small length, which occupies less space. :

ALTER table suoyin_test add INDEX zuhe(name,area,qq);

insert image description here

1. Full value match

Full-value matching refers to matching against all columns in the index. Use as many indexed columns as possible.

explain select * from suoyin_test where name = '1' and area = '23' and qq = 'dev';

insert image description here
As can be seen from the figure:
(1) type is ref: a non-unique index is used to search for data.
(2) possible_keys and key are zuhe, which is the currently used index.
(3) ref is const, const, const, indicating that the three columns of the index are used, and the index is looking for a constant value.

2. Match the leftmost prefix

Only match the first few columns.

explain select * from suoyin_test where name = '1' and area = '23';

insert image description here

3. Match column prefix

Match the beginning of a column, similar to fuzzy query.

explain select * from suoyin_test where name = '1';
explain select * from suoyin_test where name  like 'A%';
explain select * from suoyin_test where name  like '%A%';

insert image description here
insert image description here
insert image description here
As can be seen from the above, the efficiency of type goes from good to bad step by step.
Matching the column prefix can reach the range, avoiding the full index scan of the index, but if the prefix is ​​fuzzy matched, it will lead to the all level and perform a full table scan, and the index is not used at all.
Therefore, there is a small optimization tip, try not to ambiguously match the prefix in the query statement.

4. Match a range of values

explain select * from suoyin_test where name > '1';

insert image description here

5. Exactly match a column and range match another column

Query all of the first column and some of the second column.

explain select * from suoyin_test where name = '1' and area > '23';

insert image description here

6. Match by index order – detailed explanation of the leftmost match

You must match the first column before you can match the second column, and you cannot directly match the value of the second column.
That is, index queries must be performed in the order of name, area, and qq, such as:

explain select * from suoyin_test where name = '1' and qq > '23';

The result is:
insert image description here
it only uses the leftmost match of name, and does not use qq.
However, if there are all three, the order can be ignored, and mysql will automatically match the order.

explain select * from suoyin_test where  qq = 11  and area = '23' and name = '1' ;

insert image description here

explain select * from suoyin_test where  qq = 11  and area = '23';

insert image description here

When using a composite index, MySQL will match the query conditions from left to right according to the order of the fields in the composite index. If there is a field in the query condition that matches the leftmost field in the composite index, it will use this field to filter a batch of data until all fields in the composite index are matched, or the range query (such as >, <) will stop matching during execution. For range queries with >=, <=, BETWEEN, and like prefix matches, the matching will not stop. Therefore, when we use a composite index, we can place highly discriminative fields on the far left, which can also filter more data.

7. Queries that only access the index

When querying, you only need to access the index and do not need to access the data row. In fact, it is a covering index.

explain select name,area,qq from suoyin_test where name = '1' and area = '23' and qq = 'dev';

5. Detailed Explanation of Typical Indexes

1. Clustered index and non-clustered index

Both the clustered index and the non-clustered index are a kind of B+ tree, but the stored data is different, and the B+ tree is ordered.

(1) Clustered index

A clustered index is an index in which the index structure and data are stored together, not a separate index type. The primary key index in InnoDB is a clustered index.
The data structure is shown in the figure:
insert image description here
Advantages and disadvantages: the query speed is extremely fast, and the update cost is high.
If the data of the index column is modified, the corresponding index will also be modified, and the leaf nodes of the clustered index still store data, the modification cost must be relatively high, so for the primary key index, the primary key is generally not modifiable.

(2) Non-clustered index

A non-clustered index is an index in which the index structure and data are stored separately, not a separate index type. Secondary indexes (auxiliary indexes) are non-clustered indexes. MySQL's MyISAM engine, regardless of primary key or non-primary key, uses non-clustered indexes.
The leaf nodes of the non-clustered index do not necessarily store data pointers, because the leaf nodes of the secondary index store the primary key, and then go back to the table to look up data according to the primary key.

The composite index is as follows:
insert image description here
Advantages and disadvantages: the query speed is slow (may trigger back to the table), and the update cost is small.

(3) Return form

Back to the table is to first scan out the row where the data is located through the database index, and then retrieve the data not provided in the index through the primary key id of the row, that is, the query based on the non-primary key index needs to scan an additional index tree (primary key index tree).
However, there is one situation that does not trigger back to the table, and that is a covering index.

2. Covering index

If an index contains (or covers) the values ​​of all fields that need to be queried, we call it a covering index . It can be seen here that there is no need to scan another index tree.

explain select name,area,qq from suoyin_test where name = '1' and area = '23' and qq = 'dev';

insert image description here

3. Index push down

During the non-clustered index traversal process, first judge the fields contained in the index, filter out unqualified records, and reduce the number of times to return to the table. It's too general to say so, let's start with the architecture system of the database.
First of all, from the MySQL tuning series (1) - performance monitoring , we can know that the database architecture is the client-server layer-storage engine layer, and the meaning of index pushdown is: index pushdown means that
the process of data filtering is moved down to the storage engine layer to complete, not at the server layer.
Take a chestnut:

explain select * from suoyin_test where name = '1' and area = '23';

Unindexed pushdown:
1. Get the result matching the name from the storage engine layer and load it to the server;
2. Filter the area field on the server.
Index pushdown:
1. According to the results of name and age, all qualified results are returned directly from the storage engine, without any data filtering work on the server side.
The use of index pushdown can improve the efficiency of the overall query. After mysql version 5.7, it is supported by default without any settings.

6. Index optimization details

1. Try not to use expressions

We first add an index to qq, and then query:

explain select * from suoyin_test where  qq  = 23;

insert image description here

explain select * from suoyin_test where  qq+1 = 23;

insert image description here
It can be seen that if the type is all, the query efficiency is reduced.
2. Try to use the primary key query, which can reduce the return table
3. You can use the prefix index

Sometimes you need to index a very long string, which will make the index larger and slower. Usually, you can use a part of the string at the beginning of a column, which greatly saves the index space and improves the index efficiency, but this will reduce the selectivity of the index. The selectivity of the index refers to the ratio of the unique index value to the total number of records in the data table, ranging from 1/#T to 1. The higher the selectivity of the index, the higher the query efficiency, because the more selective index allows mysql to filter out more rows when searching.
​ Generally, the selectivity of a certain column prefix is ​​high enough to satisfy the query performance, but for columns of BLOB, TEXT, and VARCHAR types, a prefix index must be used, because mysql does not allow the full length of these columns to be indexed. The trick to using this method is to choose a prefix that is long enough to ensure high selectivity, but not too long.

4. Use index scanning for sorting
When creating a composite index, the default index sorting is ascending, so subsequent sorting with composite index columns requires all ascending or all descending order to use the index.
If the value of the type column from explain is index, it means that mysql uses index scanning for sorting.
extra: using filesort: indicates that mysql cannot use the index for sorting, but can only use the sorting algorithm for sorting, which will consume additional positions .
extra: null indicates that index sorting is used.

explain select * from suoyin_test  ORDER BY qq desc;
explain select * from suoyin_test  ORDER BY id;

insert image description here
insert image description here
5. Union all, in, or can use indexes, but it is recommended to use in.
6. The range column can use the index, but the columns behind the range column cannot use the index, and the index can be used for one range column at most. Such as <><=>=between and so on.
7. Implicit conversion will cause index failure

When an implicit conversion occurs when the left side of the where query operator is a numeric type, the impact on efficiency is small, but it is still not recommended. When
an implicit conversion occurs when the left side of the where query operator is a character type, it will cause the index to fail, resulting in extremely low efficiency of full table scanning.


explain select * from suoyin_test where  name = 23;
explain select * from suoyin_test where  name = '23';

8. Fields that are frequently updated should be carefully indexed.
The cost of maintaining an index is not small. If a field is not frequently queried, but is frequently modified, then it should not be indexed on such a field.
9. Select the appropriate field to create an index.
Fields that are not NULL, fields that are frequently queried, fields that frequently need to be sorted, fields that are frequently used for connections, etc.
10. When table joins are required, it is best not to exceed three tables . Fields that are frequently used for joins are especially suitable for indexing.
The three structures of jion are as follows: (Following source: https://blog.csdn.net/main_Scanner01/article/details/123786007)
(1) Simple Nested-Loop Join (simple nested loop matching)
insert image description here
performs a rough connection, and each row of A is connected to all rows of the non-driven table B. It is very inefficient and takes up a lot of memory.
(2) Index Nested-Loop Join (nested loop connection) directly matches the index of the inner
insert image description here
table through the matching conditions of the outer table , avoiding comparison with each record of the inner table, which greatly reduces the number of matches for the inner table. The condition is to add an index to the inner table, and the connection condition is the index field, so that the A table is queried, and then the table is queried through the index. If the driven table is indexed, the efficiency is very high, but if the index is not the primary key index, it is necessary to perform a query back to the table. In comparison, the index of the driven table is a primary key index, which is more efficient. (3) Block Nest-Loop Join (block nested loop connection)



insert image description here
It is no longer to obtain the data of the driving table one by one, but to obtain the data piece by piece. The join buffer buffer is introduced to cache some data columns related to the join of the driving table (the size is limited by the join buffer) into the join buffer, and then scan the driven table in full .
11. Try to use limit when you can.
12. Limit the number of indexes on each table (no more than 5 is best).
13. Combining indexes to avoid redundancy (the two indexes (name, qq) and (qq) are redundant indexes).
14. The more indexes are not the better, nor is it better to optimize as soon as possible (after the business volume comes up, it is not too late to optimize if there is a bottleneck).

Seven, index monitoring

show status like 'Handler_read%';

insert image description here

Handler_read_first: The number of times to read the first entry of the index. If the value is high, it indicates that the server is performing a lot of full index scans.
Handler_read_key: The number of times to get data through the index. If the value is high, that's a good indication that the table is properly indexed for the request being executed.
Handler_read_last: The number of times to read the last entry of the index.
Handler_read_next: The number of times to read the next data through the index.
Handler_read_prev: The number of times to read the previous data through the index. Handler_read_rnd: The number of times to read data
from a fixed position. Handler_read_rnd_next
: The number of times to read the next data from the data node. This value is high, indicating that many full table scans are being performed. Usually indicates that the table does not use an appropriate index or that the query request does not utilize an existing index.

Guess you like

Origin blog.csdn.net/liwangcuihua/article/details/130719864