"Efficient indexing skills: classification of indexes, use of MySQL indexes, leftmost matching principle, table return query, how to avoid table return, index pushdown, and things to note when creating an index"

index

Classification of Mysql index

1. Ordinary index and unique index

Ordinary index: The basic index type in MySQL, allowing duplicate values ​​and null values ​​to be inserted into the columns that define the index.
Unique index: The value of the index column must be unique, but null values ​​are allowed
. If it is a combined index, the combination of column values ​​must be A unique
primary key index is a special unique index that does not allow null values.

2. Single column index and combined index

Single-column index: An index only contains a single column, and a table can have multiple single-column indexes.
Combined index: An index created on a combination of multiple fields in the table.
Multiple single-column indexes. When querying with multiple conditions, the optimizer will choose the optimal index strategy. You may use only one index, or you may use multiple indexes! However, multiple single-column indexes will build multiple B+ index trees at the bottom, which takes up disk space and wastes a certain amount of search efficiency. Therefore, if there are only joint queries with multiple conditions, it is best to build a combined index!

3. Full text index

The type of the full-text index is fulltext.
It supports full-text search of values ​​on the columns where the index is defined, allowing the insertion of duplicate values ​​and null values ​​in these index columns.
Full-text indexes can be created on columns of type char, varchar and text.

4. Spatial index

Spatial indexes are indexes established on fields of spatial data types
. There are four spatial data types in MySQL, namely Geometry, Point, Linestring and Polygon.
MySQL uses the Spatial keyword to extend it, making it possible to create spaces using syntax similar to creating regular indexes. Index
Columns used to create spatial indexes do not allow null values ​​and can only be created in MyISAM tables.

5. Prefix index

When creating an index on columns of type char, varchar, and text, you can specify the length of the index column

Use of MySQL indexes

1. Add INDEX to ordinary index

ALTER TABLE table_name ADD INDEX index_name ( column )

2. Add PRIMARY KEY to the primary key index

ALTER TABLE table_name ADD PRIMARY KEY ( column )

3. Add UNIQUE to unique index

ALTER TABLE table_name ADD UNIQUE ( column )

4. Add FULLTEXT to full-text index

ALTER TABLE table_name ADD FULLTEXT ( column )

5. Add multi-column index

ALTER TABLE table_name ADD INDEX index_name ( column1, column2, column3 )

6. Delete index

DROP index index_name on table_name;``

7. View index

SHOW INDEX FROM table_name;

leftmost matching principle

Leftmost priority, any consecutive index starting from the leftmost can be matched. At the same time, matching will stop when encountering range queries (>, <, between, like).
When matching all columns:
student table:

CREATE TABLE `student` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `gid` int(11) NOT NULL,
  `cid` int(11) DEFAULT NULL,
  `uid` int(11) DEFAULT NULL,
  `name` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `uni_Gid_Cid_SId` (`gid`,`cid`,`uid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Conclusion: Modify the query column order and find the same result. This is because MySQL will automatically optimize the index order through the optimizer.

#ALTER TABLE index user_index on user(name,sex,age)
#可以使用复合索引:索引中包含的最左侧字段,只是顺序不正确,在执行的时候可以动态调整为最前左缀,下列执行计划type为ref
select * from user where sex = ? and age = ? and name = ?
select * from user where age = ? and name = ?

#不可以使用复合索引:因为缺少左侧字段
select * from user where sex = ? and age = ? 
select * from user where age = ?
select * from user where sex = ? 
#当缺少左侧字段时,不使用*,使用具体需要的复合索引字段时 依旧会走索引 此时执行计划的type为index
select name,sex,age from user where sex = ? and age = ?

The leftmost suffix principle can be broken through skip scanning. Let’s briefly organize this knowledge.
This was optimized in 8.0.
MySQL version 8.0 began to add the function of index skip scanning. When the number of unique values ​​​​in the first column index is less, Even if the where condition does not have the first column index, the joint index can be used when querying. For example, the joint index we use is bcd, but there are relatively few fields in b. We do not use b when using the joint index, but we can still use the joint index. MySQL joint index sometimes follows the leftmost prefix matching principle, and sometimes does not.

Return table query

This starts with the index implementation of InnoDB. InnoDB has two major types of indexes:

  • clustered index
  • Secondary index

The difference between InnoDB clustered index and ordinary index:

The leaf nodes of the InnoDB clustered index store row records . Therefore, InnoDB must have and only one clustered index.

  1. If the table defines a PK, the PK is the clustered index ;
  2. If the table does not define a PK, the first not NULL unique column is a clustered index;
  3. Otherwise, InnoDB will create a hidden row-id as a clustered index; the leaf nodes of the
    InnoDB ordinary index store the primary key value.

The clustered index and ordinary index data structures are both a b+ tree, which is characterized by that all data is stored in leaf nodes, and there are pointers pointing between leaf nodes.

For example, you might as well have a table:
t(id PK, name KEY, sex, flag);
id is a clustered index and name is a normal index.
There are four records in the table:

1, shenjian, m, A
3, zhangsan, m, A
5, lisi, m, A
9, wangwu, f, B

The clustered index and the ordinary index are as follows:
[picture]

When searching for data through a normal index, if there are other column fields besides the index, the search will be done in the clustered index.
select * from t where name='lisi'; # 该语句执行流程如下图
Insert image description here

This is the so-called table return query , which first locates the primary key value and then locates the row record. Its performance is lower than scanning the index tree.

How to avoid table return

Covering indexes can be used, for example: the existing User table

CREATE TABLE `user`
(
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name`  int(11)     DEFAULT NULL,
  `sex`  char(3)     DEFAULT NULL,
  `address`  varchar(10) DEFAULT NULL,
  `hobby`  varchar(10) DEFAULT NULL,
  PRIMARY KEY (`id`) USING BTREE,
  KEY `i_name` (`name`)
) ENGINE = InnoDB;

There is a scenario where you need to frequently query name and sex
select id, name, sex from user where name = 'zhangsan';
this statement is often used in business, and the usage rate of other fields in the user table is much less than these field. In this case, if we build an index on the name field, instead of using a single index, we use a joint index (name, sex). In this case, if we execute this query statement again, the results obtained based on this auxiliary index (name, sex) will include the complete data of all fields of the query results we need.

index pushdown

This is a feature provided after MySQL 5.6.
Suppose we need to query select * from table1 where b like '3%' and c = 3.
The index field is index(b,c)
before 5.6

  • First query the data starting with 3 through the joint index and then get the primary key (the cyan block in the picture above is the primary key)
  • Then use the primary key to go to the primary key index and return to the table to query the secondary index. Query the few that start with 3 and return to the table several times.

After 5.6

  • First query the data starting with 3 through the secondary index, then find the data with c = 3, filter it, and get the primary key.
  • Query back to the table through the primary key. The above will all query back to the table, but before 5.6, the second level cache was not fully used for data filtering. If there is a lot of data starting with 3, then you have to keep returning to the table. But after 5.6, what do you say
    about using subsequent index fields for query?
    That's why index pushdown should be used with joint indexes. Therefore, without index pushdown, it is to make full use of the fields of joint indexes for filtering and minimize the data that needs to be returned to the table to increase query efficiency. The idea seems to be very simple.

How to avoid index failure

  • When using combined indexes, follow the leftmost matching principle
  • Do not perform any operations on index columns, such as calculations, functions, and type conversions
  • Try to use covering indexes
  • Try not to use not equal to (!= / <>) conditions, fuzzy queries starting with wildcards (like %abc), or as join conditions for index columns.
  • Add single quotes to the string (if not added, implicit conversion of the index column may occur, resulting in index failure)

explain the function of each column

Insert image description here
Insert image description here

(1) id : reflects the reading order of the table or the order in which select clauses are executed in the query.
① If the ID is the same, the execution order is from top to bottom.
② The IDs are different. If it is a subquery, the ID sequence number will be incremented. The larger the ID value, the higher the priority and will be executed first.
③ If the id is the same, it can be considered as a group and executed sequentially from top to bottom; among all groups, the larger the id value, the higher the priority and the earlier it is executed.

(2) select_type : Indicates the type of select, mainly used to distinguish complex queries such as ordinary queries, joint queries, subqueries, etc.
① Simple: A simple select query. The query does not contain subqueries or unions.
② primary: If the query contains any complex subparts, the outermost query is marked as primary.
③ subquery: subquery in select or where list.
④ Derived: For the subqueries contained in the from list, MySQL will recursively execute these subqueries and put the results in a temporary table.
⑤ Union: If the second select appears after union, it will be marked as union; if union is included in the subquery of the from clause, the outer select will be marked as derived.
⑥ Union result: the result set after union.

(3) table : Displays the name of the table in the database accessed in this step (displays which table the data in this row refers to). Sometimes it is not the real table name, but may be the abbreviation of the result of the execution of several steps.

(4) type : The access method to the table, indicating the way MySQL finds the required rows in the table, also known as the "access type". Common access types include ALL, index, range, ref, eq_ref, const, system, and NULL (from left to right, performance from worst to best).
① ALL: Full Table Scan, MySQL will traverse the entire table to find matching rows.
② index:: Full Index Scan, the difference between index and ALL is that the index type only traverses the index tree.
③ Range: Index range scan, returns a batch of rows that only retrieve a given range, and uses an index to select rows. Generally, queries such as between, <, >, in, etc. appear in the where statement. This range scan index is better than a full table scan because it only needs to start at a certain point in the index and end at another point, without scanning the entire index.
④ ref: Non-unique index scan, returns all rows that match a single value. It is essentially an index access. It returns all rows that match a single value. However, it may find multiple rows that meet the conditions. So it should be a mixture of search and scan.
⑤ eq_ref: Similar to ref, the difference is that the index used is a unique index. For each index key, only one record in the table matches it, which is common in primary key or unique index scans. To put it simply, primary key or unique key is used as the association condition in multi-table connections.
⑥ const, system: Use these types to access when MySQL optimizes a certain part of the query and converts it into a constant. If the query condition uses constants, they can be found once through the index, often appearing in indexes using primary key or unique. System is a special case of const type, used when the queried table has only one row.
⑦ NULL: MySQL decomposes the statement during the optimization process, and does not even need to access the table or index during execution. For example, selecting the minimum value from an index column can be completed through a separate index lookup.

(5) possible_keys : Indicates which index MySQL can use to find rows in the table. If there is an index on the field involved in the query, the index will be listed, but it will not necessarily be used by the query.

(6) key : Displays the index that MySQL actually decides to use. If no index is selected, the display is NULL. To force MySQL to use or ignore the index on the possible_keys column, use FORCE INDEX or IGNORE INDEX in the query. If a covering index is used in the query (the field to be queried after selecting is exactly the same as the created index field), the index will only appear in the key list.

(7) key_len : Displays the number of bytes used in the index.

(8) ref : Indicates the connection matching conditions of the above table, that is, which columns or constants are used to find the value on the index column.

(9) rows : Displays MySQL's estimate of the number of rows to be read to find the required records based on table statistics and index selection.

(10) Extra : This column contains instructions and descriptions of how MySQL solves the query, including extra information that is not suitable for display in other columns but is very important to the execution plan.
① Using where: Without reading all the information in the table, the required data can be obtained only through the index. This occurs when all requested columns of the table are in the same index part, which means that the MySQL server will retrieve the rows after the storage engine. to filter.
② Using temporary: Indicates that MySQL needs to use a temporary table to store the result set. MySQL uses temporary tables when sorting query results, which is common in sorting (order by) and group query (group by).
③ Using filesort: When the Query contains an order by operation and the sorting operation cannot be completed using the index, it is called "file sorting". When creating the index, the data will be sorted first. Using filesort occurs generally because the conditions after order by cause the index to fail. , it is best to optimize.
④ Using join buffer: Indicates that the connection cache is used. For example, when querying, the number of multi-table joins is very large, so increase the join buffer in the configuration file. If this value appears, it should be noted that depending on the specific circumstances of the query, it may be necessary to add an index to improve it.
⑤ Using index: Only use the information in the index tree without further searching to read the actual rows to retrieve column information in the table. A covering index is used in the corresponding select operation to avoid accessing the data rows of the table, which is very efficient. Covering index: The selected data columns can only be obtained from the index without reading the data rows, and are consistent with the number of indexes created (the query column is less than or equal to the number of indexes) and the order. If you want to use a covering index, you should pay attention to select columns that only need to be used, and do not use select *. At the same time, if all fields are indexed together, the index file will be too large and performance will decrease.
⑥ Using Index Condition: Indicates that ICP optimization has been performed.

First, pay attention to the type column of the query type. If the all keyword appears, it means that the whole table is scanned and no index is used. Look at the key column again. If the key column is NULL, it means that no index is used. Then look at the rows column. The larger the value of this column is, the greater the value. This means that the more rows that need to be scanned, the longer the response will take. Finally, look at the Extra column and avoid words like Using filesort or Using temporary, which greatly affects performance.

Things to note when creating an index

1. Limit the number of indexes on the table . For a table with a large number of update operations, the number of indexes should generally not exceed 3 , and no more than 5 at most. Although indexes improve access speed, too many indexes will affect data update operations.
2. Avoid building indexes on fields whose values ​​grow in one direction (for example, date type fields); for compound indexes, avoid placing this type of field at the front. Since the value of a field always grows in one direction, new records are always stored in the last leaf page of the index, which continuously causes access competition for the leaf page, allocation of new leaf pages, and splitting of intermediate branch pages. In addition, if the index built is a clustered index, the data in the table will be stored in the order of the index, and all insertion operations will be concentrated on the last data page, causing insertion "hot spots".
3. For composite indexes, create an index based on the frequency of fields appearing in the query conditions . In a composite index, records are sorted by the first field first. For records with the same value in the first field, the system sorts them according to the value in the second field, and so on. Therefore, only the first field of the composite index appears in the query condition, and the index may be used. Therefore, placing fields with high application frequency in front of the composite index will make the system use this index to the maximum extent possible and play the role of the index .
4. Delete indexes that are no longer used or are rarely used . After the data in the table is heavily updated, or the way the data is used is changed, some of the original indexes may no longer be needed. Database administrators should regularly identify these indexes and delete them to reduce the impact of the indexes on update operations.
5. Indexes should not be created for columns that are rarely used or referenced in queries. This is because, since these columns are rarely used, indexing or not indexing does not improve query speed . On the contrary, due to the addition of indexes, the maintenance speed of the system is reduced and the space requirements are increased;
6. Do not create indexes on fields with a large number of identical values . This is because, since these columns have very few values, such as the gender column of the personnel table, in the query results, the data rows in the result set account for a large proportion of the data rows in the table, that is, the data that needs to be searched in the table The proportion of rows is huge. Increasing indexes will not significantly speed up retrieval;
7. Indexes should not be added to columns defined as text, image and bit data types . This is because the amount of data in these columns is either quite large or has very few values;
8. When the modification performance is much greater than the retrieval performance, indexes should not be created . This is because modification performance and retrieval performance are contradictory to each other. When adding indexes, retrieval performance will be improved, but modification performance will be reduced. When reducing indexes, modification performance will increase and retrieval performance will decrease. Therefore, when modification performance is much greater than retrieval performance, indexes should not be created.

Guess you like

Origin blog.csdn.net/qq_45442178/article/details/129711297