Difficult parts of the index

Difficult parts of the index

Index introduction

An index is a data structure used to quickly query and retrieve data, and its essence can be regarded as a sorted data structure.

The function of the index is equivalent to the table of contents of the book. For example: when we look up a dictionary, if there is no directory, then we can only find the word we need to look up page by page, and the speed is very slow. If there is a table of contents, we only need to go to the table of contents to find the position of the word, and then directly turn to that page.

There are many types of index underlying data structures. Common index structures include: B-tree, B+ tree, Hash, and red-black tree. In MySQL, both Innodb and MyIsam use B+ tree as the index structure.

Advantages and disadvantages of indexing

Advantages :

  • Using indexes can greatly speed up data retrieval (significantly reduce the amount of data retrieved), which is the main reason for creating indexes.
  • By creating a unique index, the uniqueness of each row of data in the database table can be guaranteed.

Disadvantages :

  • Creating indexes and maintaining indexes takes a lot of time. When adding, deleting, and modifying data in a table, if the data has an index, the index also needs to be dynamically modified, which will reduce the efficiency of SQL execution.
  • Indexes need to be stored in physical files and consume a certain amount of space.

However, does using an index necessarily improve query performance?

Index queries are faster than full table scans in most cases. But if the amount of data in the database is not large, the use of indexes may not necessarily bring

B-Tree & B+Tree

B-tree is also called B-tree, and its full name is multi-way balanced search tree . B+ tree is a variant of B-tree. B in B tree and B+ tree Balancedmeans (balanced).

At present, most database systems and file systems use B-Tree or its variant B+Tree as the index structure.

What are the similarities and differences between B tree & B+ tree?

  • All nodes of the B-tree store both keys and data, while only leaf nodes of the B+ tree store keys and data, and other internal nodes only store keys.
  • The leaf nodes of the B tree are all independent; the leaf nodes of the B+ tree have a reference chain pointing to its adjacent leaf nodes.
  • The retrieval process of the B-tree is equivalent to performing a binary search on the keywords of each node in the range, and the retrieval may end before reaching the leaf node. The retrieval efficiency of the B+ tree is very stable. Any search is a process from the root node to the leaf node, and the sequential retrieval of the leaf nodes is obvious.
  • When performing a range query in the B-tree, first find the lower limit to be searched, and then perform an in-order traversal on the B-tree until the upper limit of the search is found; while in the range query of the B+ tree, you only need to traverse the linked list.

In summary, compared with B-tree, B+ tree has the advantages of fewer IO times, more stable query efficiency and more suitable for range query.

In MySQL, both the MyISAM engine and the InnoDB engine use B+Tree as the index structure, but the implementation methods of the two are different. (The following content is compiled from "The Way of Java Engineers' Practice")

In the MyISAM engine, the data field of the B+Tree leaf node stores the address of the data record. When searching the index, first search the index according to the B+Tree search algorithm, if the specified Key exists, take out the value of its data field, and then use the value of the data field as the address to read the corresponding data record. This is called a " nonclustered index (nonclustered index) ".

In the InnoDB engine, its data files are themselves index files. Compared with MyISAM, the index file and data file are separated, and the table data file itself is an index structure organized by B+Tree, and the leaf node data domain of the tree stores complete data records. The key of this index is the primary key of the data table, so the InnoDB table data file itself is the primary index. This is called " clustered index (clustered index) ", and the rest of the indexes are used as auxiliary indexes . The data field of the auxiliary index stores the value of the corresponding primary key instead of the address, which is also different from MyISAM. When searching based on the primary index, you can directly find the node where the key is located to retrieve the data; when searching based on the auxiliary index, you need to retrieve the value of the primary key first, and then go through the primary index again. Therefore, when designing tables, it is not recommended to use too long fields as primary keys, nor is it recommended to use non-monotonic fields as primary keys, which will cause frequent splitting of the primary index.

MySQL's default storage engine, InnoDB, uses the B+ tree data structure to store indexes. The main reasons for choosing B+ trees are: the first order is more, the path is shorter, and the second disk read and write cost is lower. The non-leaf nodes only store pointers, and the leaf stage stores data.

Summary of Index Types

According to the perspective of the underlying storage method:

  • Clustered index (clustered index): The index structure and data are stored together. The primary key index in InnoDB belongs to the clustered index.
  • Non-clustered index (non-clustered index): The index structure and data are stored separately, and the secondary index (auxiliary index) belongs to the non-clustered index. MySQL's MyISAM engine, regardless of primary key or non-primary key, uses non-clustered indexes.

Divided by application dimension:

  • Primary key index: accelerated query + unique column value (no NULL) + only one in the table.
  • Ordinary indexes: speed up queries only.
  • Unique index: speed up query + unique column value (can have NULL).
  • Covering index: An index contains (or covers) the values ​​of all fields that need to be queried.
  • Joint index: Multiple column values ​​form an index, which is specially used for combined search, and its efficiency is greater than index merging.
  • Full-text index: segment the content of the text and search. Currently, only CHAR, VARCHAR, TEXTand columns can create full-text indexes. It is generally not used, and the efficiency is low. Usually, a search engine such as ElasticSearch is used instead.

Clustered and non-clustered indexes

Clustered Index (Clustered Index)

Introduction to clustered index

A clustered index is an index in which the index structure and data are stored together, and is not a separate index type. The primary key index in InnoDB is a clustered index.

In MySQL, .ibdthe file of the InnoDB engine table contains the index and data of the table. For the InnoDB engine table, each non-leaf node of the index (B+ tree) of the table stores the index, and the leaf node stores the index and the data corresponding to the index.

Advantages and disadvantages of clustered index

Advantages :

  • The query speed is very fast : the query speed of the clustered index is very fast, because the entire B+ tree itself is a multi-fork balanced tree, and the leaf nodes are also in order. Locating the index node is equivalent to locating the data. Compared with the non-clustered index, the clustered index has one less IO operation for reading data.
  • Optimized for sort lookup and range lookup : The clustered index is very fast for sort lookup and range lookup of the primary key.

Disadvantages :

  • Rely on ordered data : Because the B+ tree is a multi-way balanced tree, if the indexed data is not ordered, it needs to be sorted when inserting. If the data is an integer, it’s okay, otherwise, the insertion or search speed must be slow for long and difficult to compare data like strings or UUIDs.
  • The update cost is high : if the data of the index column is modified, the corresponding index will also be modified, and the leaf nodes of the clustered index still store data, the modification cost must be relatively high, so for the primary key index, the primary key is generally not modifiable.

nonclustered index (nonclustered index)

Introduction to Nonclustered Indexes

A non-clustered index (Non-Clustered Index) is an index in which the index structure and data are stored separately, and it is not a separate index type. Secondary indexes (auxiliary indexes) are non-clustered indexes. MySQL's MyISAM engine, regardless of primary key or non-primary key, uses non-clustered indexes.

The leaf nodes of the non-clustered index do not necessarily store data pointers, because the leaf nodes of the secondary index store the primary key, and then go back to the table to look up data according to the primary key.

Pros and Cons of Nonclustered Indexes

Advantages :

The update cost is lower than that of the clustered index. The update cost of the non-clustered index is not as high as that of the clustered index, and the leaf nodes of the non-clustered index do not store data

Disadvantages :

  • Rely on ordered data : Like clustered indexes, nonclustered indexes also depend on ordered data
  • It is possible to query twice (back to the table) : This should be the biggest disadvantage of non-clustered indexes. After finding the pointer or primary key corresponding to the index, it may be necessary to query the data file or table according to the pointer or primary key.

Covering Indexes and Joint Indexes

covering index

If an index contains (or covers) the values ​​of all fields that need to be queried, we call it a covering index (Covering Index) . We know that in the InnoDB storage engine, if it is not the primary key index, the leaf node stores the primary key + column value. In the end, it is still necessary to "return to the table", that is, to search again through the primary key, which will be slower. The covering index is to match the columns to be queried with the index, and do not return to the table!

The covering index means that the field to be queried happens to be the indexed field, so the data can be found directly according to the index without going back to the table for query.

Such as the primary key index, if a SQL needs to query the primary key, then the primary key can be found just according to the primary key index. Another example is a common index. If a SQL query needs to query name, and the name field happens to have an index, then the data can be found directly based on the index without returning to the table.

Here we build a joint index scorewith two fields:name

ALTER TABLE `cus_order` ADD INDEX id_score_name(score, name);

After the creation is complete, use EXPLAINthe command to analyze the SQL statement again

EXPLAIN SELECT `score`,`name` FROM `cus_order` ORDER BY `score` DESC;#降序排序

Through Extrathis column Using index, it means that this SQL statement successfully uses the covering index.

joint index

Using multiple fields in a table to create an index is a joint index , also called a composite index or a composite index .

Create a joint index with scoretwo namefields:

ALTER TABLE `cus_order` ADD INDEX id_score_name(score, name);

What is back query

It is related to the clustered index and non-clustered index just introduced. The meaning of returning to the table is to find the corresponding primary key value through the secondary index, and then find the corresponding entire row of data in the clustered index through the primary key value. This process is to return to the table.

Leftmost prefix matching principle

The principle of leftmost prefix matching means that when using a joint index, MySQL will match the query conditions from left to right according to the order of the fields in the joint index. If there is a field in the query condition that matches the leftmost field in the joint index, it will use this field to filter a batch of data until all fields in the joint index are matched, or it will stop matching when it encounters a range query (such as > , < ) during execution . For range queries with >= , <= , BETWEEN , and like prefix matches, the matching will not stop. Therefore, when we use a joint index, we can place the highly discriminative fields on the far left, which can also filter more data.

Some suggestions for using indexes correctly

Select the appropriate field to create an index

  • Fields that are not NULL : The data of the index field should not be NULL as much as possible, because the database is difficult to optimize for fields whose data is NULL. If the field is frequently queried but cannot avoid being NULL, it is recommended to use short values ​​or short characters with clear semantics such as 0,1,true,false as an alternative.
  • Frequently queried fields : The fields we create indexes should be fields that are frequently queried.
  • Fields queried as conditions : Fields queried as WHERE conditions should be considered for indexing.
  • Fields that frequently need to be sorted : the index has been sorted, so that the query can use the sorting of the index to speed up the sorting query time.
  • Fields that are frequently used for connection : Fields that are often used for connection may be some foreign key columns. For foreign key columns, it is not necessary to establish a foreign key, just that the column involves the relationship between tables. For fields that are frequently joined and queried, you can consider building an index to improve the efficiency of multi-table join queries.

Frequently updated fields should be carefully indexed

Although indexes can bring query efficiency, the cost of maintaining indexes is not small. If a field is not frequently queried, but is frequently modified, then it should not be indexed on such a field.

Limit the number of indexes on each table

The more indexes the better, it is recommended that no more than 5 indexes be used for a single table! Indexes can increase efficiency as well as decrease efficiency.

Indexes can increase query efficiency, but also reduce the efficiency of insertion and update, and even reduce query efficiency in some cases.

Because when the MySQL optimizer chooses how to optimize the query, it will evaluate each available index based on the unified information to generate the best execution plan. If many indexes can be used for query at the same time, it will increase the time for the MySQL optimizer to generate the execution plan, which will also reduce query performance.

Consider creating joint indexes instead of single-column indexes as much as possible

Because the index needs to occupy disk space, it can be simply understood that each index corresponds to a B+ tree. If a table has too many fields and too many indexes, when the data in this table reaches a certain volume, the index will occupy a lot of space, and it will take a lot of time to modify the index. If it is a joint index and multiple fields are on one index, it will save a lot of disk space, and the operation efficiency of modifying data will also be improved.

Be careful to avoid redundant indexes

A redundant index means that the functions of the indexes are the same, and if index (a, b) can be hit, index (a) can be hit for sure, then index (a) is a redundant index. For example, the two indexes (name, city) and (name) are redundant indexes. A query that can hit the former must be able to hit the latter. In most cases, you should try to expand the existing index instead of creating a new index.

Fields of string type use prefix index instead of normal index

Prefix indexes are limited to string types, and take up less space than ordinary indexes, so you can consider using prefix indexes instead of ordinary indexes.

avoid index invalidation

Index failure is also one of the main reasons for slow queries. The common situations that lead to index failure are as follows:

  • Use SELECT *to query; SELECT *it will not directly lead to index failure (if the index is not used, the probability is that the where query range is too large), but it may bring some other performance problems, such as causing waste of network transmission and data processing, and unable to use index coverage;
  • A composite index is created, but the query condition does not comply with the leftmost matching principle;
  • Perform operations such as calculations, functions, and type conversions on indexed columns;
  • %A LIKE query starting with such as like '%abc';
  • Or is used in the query condition, and there is no index in the column before and after the or condition, and the involved indexes will not be used;
  • an implicit conversion occurs

Know how to analyze whether the statement uses the index query

We can use EXPLAINthe command to analyze the execution plan of SQL , so that we know whether the statement hits the index. The execution plan refers to the specific execution method of a SQL statement after being optimized by the MySQL query optimizer.

EXPLAINIt does not actually execute related statements, but analyzes the statements through the query optimizer , finds out the optimal query scheme, and displays the corresponding information.

EXPLAINThe output format is as follows:

mysql> EXPLAIN SELECT `score`,`name` FROM `cus_order` ORDER BY `score` DESC;
+----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+----------------+
| id | select_type | table     | partitions | type | possible_keys | key  | key_len | ref  | rows   | filtered | Extra          |
+----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+----------------+
|  1 | SIMPLE      | cus_order | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 997572 |   100.00 | Using filesort |
+----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+----------------+
1 row in set, 1 warning (0.00 sec)

The meaning of each field is as follows:

column name meaning
id Sequence identifier for SELECT queries
select_type The query type corresponding to the SELECT keyword
table table name used
partitions The matching partition, or NULL for an unpartitioned table
type table access method
possible_keys Indexes that may be used
key the actual index used
key_len the length of the selected index
ref The column or constant to compare to the index when using an index equality query
rows Estimated number of rows to read
filtered Percentage of retained records after filtering by table criteria
Extra Additional Information

Partial modification, original link: https://javaguide.cn/database/mysql/mysql-index.html#%E7%9F%A5%E9%81%93%E5%A6%82%E4%BD%95%E5%88%86%E6%9E%90%E8%AF%AD%E5%8F%A5%E6%98%AF%E5%90%A6%E8%B5 %B0%E7%B4%A2%E5%BC%95%E6%9F%A5%E8%AF%A2

Guess you like

Origin blog.csdn.net/weixin_52340450/article/details/131595134