Database index (detailed analysis)

Table of contents

1. What is an index?

2. Why use an index? Analysis of the advantages and disadvantages of the index

3. The difference between B tree and B+ tree

 4. Analysis of the advantages and disadvantages of Hash index and B+ tree index

5. Index type

5.1, primary key index (Primary Key)

5.2. Secondary index

 Six, clustered index and non-clustered index

6.1, clustered index

6.1.1, the advantages of clustered index

6.1.2. Disadvantages of Clustered Indexes

6.2, non-clustered index

6.2.1. Advantages of non-clustered indexes

6.2.2. Disadvantages of non-clustered indexes

 6.2.3. Does the non-clustered index necessarily return to the table query (covering index)?

Seven, covering index

 8. Index Creation Principles

8.1. Single-column index

8.2, joint index (multi-column index)

8.3, the leftmost prefix principle

 9. Notes on index creation

9.1. Select the appropriate field

9.2. Fields that are not suitable for creating indexes

9.3. Will the use of indexes definitely improve query performance?


1. What is an index?

An index is a data structure used to quickly query and retrieve data. Common index structures are: B tree, B+ tree and Hash.

The role of the index is equivalent to the role of the directory. For example: when we look up a dictionary, if there is no directory, then we can only find the word we need to look up page by page, and the speed is very slow. If there is a table of contents, we only need to go to the table of contents to find the position of the word, and then directly turn to that page.

2. Why use an index? Analysis of the advantages and disadvantages of the index

  • Advantages of indexes:

        It can greatly speed up data retrieval (greatly reduce the amount of data retrieved), which is the main reason for creating indexes. After all, the read requests of most systems are always greater than the write requests.  In addition, by creating a unique index, the uniqueness of each row of data in the database table can be guaranteed.

  • Disadvantages of indexes:
  1. It takes a lot of time to create and maintain indexes : when adding, deleting, and modifying data in a table, if the data has an index, the index also needs to be dynamically modified, which will reduce the efficiency of SQL execution.
  2. Occupies physical storage space  : The index needs to use physical file storage, which will also consume a certain amount of space.

3. The difference between B tree and B+ tree

  • All nodes of the B-tree store both the key and the data; while the B+ tree only stores the key and data in the leaf nodes, and the other internal nodes only store the key.
  • The leaf nodes of the B tree are all independent; the leaf nodes of the B+ tree have a reference chain pointing to its adjacent leaf nodes.
  • The retrieval process of the B-tree is equivalent to performing a binary search on the keywords of each node in the range, and the retrieval may end before reaching the leaf node. The retrieval efficiency of the B+ tree is very stable. Any search is a process from the root node to the leaf node, and the sequential retrieval of the leaf nodes is obvious.

 4. Analysis of the advantages and disadvantages of Hash index and B+ tree index

  • Hash index positioning is fast

        The Hash index refers to the Hash table. The biggest advantage is that it can locate the location of the data according to the Hash function in a short period of time, which is incomparable with the B+ tree.

  • Hash conflict problem

        Students who know HashMap or HashTable, I believe they all know that their biggest shortcoming is Hash conflict. But for the database, this is not the biggest disadvantage.

  • Hash index does not support sequence and range query (Hash index does not support sequence and range query is its biggest shortcoming.

Imagine a situation:

SELECT * FROM tb1 WHERE id < 500;

The B+ tree is ordered. In this kind of range query, the advantage is very large. It is enough to directly traverse the leaf nodes smaller than 500. The Hash index is located according to the hash algorithm. Is it possible to perform a hash calculation on each of the data from 1 to 499 to locate? This is the biggest shortcoming of Hash.

5. Index type

5.1, primary key index (Primary Key)

The primary key column of the data table uses the primary key index.

A data table can only have one primary key, and the primary key cannot be null or repeated.

In the InnoDB table of mysql, when there is no primary key of the specified table displayed, InnoDB will automatically check whether there is a unique index field in the table, and if so, select this field as the default primary key, otherwise InnoDB will automatically create it A 6Byte auto-increment primary key.

5.2. Secondary index

The secondary index is also called the auxiliary index because the data stored in the leaf nodes of the secondary index is the primary key. That is to say, through the secondary index, the position of the primary key can be located.

Indexes such as unique indexes, ordinary indexes, and prefix indexes are secondary indexes.

  1. Unique Index (Unique Key)  : A unique index is also a constraint. The attribute column of the unique index cannot have duplicate data, but the data is allowed to be NULL, and a table allows multiple unique indexes to be created.  Most of the time, the purpose of establishing a unique index is for the uniqueness of the data in the attribute column, not for query efficiency.
  2. Ordinary index (Index)  : The only function of an ordinary index is to quickly query data. A table allows multiple ordinary indexes to be created, and data duplication and NULL are allowed.
  3. Prefix index (Prefix)  : The prefix index is only applicable to string type data. The prefix index is to create an index for the first few characters of the text, and the data created by the ordinary index is smaller,
    because only the first few characters are taken.
  4. Full-text index (Full Text)  : Full-text index is mainly to retrieve keyword information in large text data, which is a technology currently used by search engine databases. Before Mysql5.6, only the MYISAM engine supported full-text indexing. After 5.6, InnoDB also supports full-text indexing.

Secondary index:

 Six, clustered index and non-clustered index

6.1, clustered index

A clustered index is an index in which the index structure and data are stored together. Primary key indexes are clustered indexes.

In Mysql,  .ibdthe file of the InnoDB engine table contains the index and data of the table. For the InnoDB engine table, each non-leaf node of the index (B+ tree) of the table stores the index, and the leaf node stores the index and the index corresponding data.

6.1.1, the advantages of clustered index

The query speed of the clustered index is very fast, because the entire B+ tree itself is a multi-fork balanced tree, and the leaf nodes are also in order. Locating the index node is equivalent to locating the data.

6.1.2. Disadvantages of Clustered Indexes

  1. Rely on ordered data  : because the B+ tree is a multi-way balanced tree, if the indexed data is not ordered, then it needs to be sorted when inserting, if the data is an integer, it is fine, otherwise it is similar to a string or UUID For long and difficult-to-comparison data, the speed of inserting or searching is definitely slower.
  2. The update cost is high  : if the data of the index column is modified, the corresponding index will also be modified,
    and the leaf nodes of the clustered index still store data, the modification cost must be relatively high,
    so for the primary key index , the primary key is generally not modifiable.

6.2, non-clustered index

A non-clustered index is an index in which the index structure and data are stored separately.

Secondary indexes are non-clustered indexes.

The .MYI file of the table of the MYISAM engine contains the index of the table.
Each leaf non-leaf node of the index (B+ tree) of the table stores the index, and the
leaf node stores the pointer of the index and the corresponding data of the index, pointing to the data of the .MYD file.

The leaf nodes of the non-clustered index do not necessarily store data pointers,
because the leaf nodes of the secondary index store the primary key, and then go back to the table to look up data according to the primary key.

6.2.1. Advantages of non-clustered indexes

Updates are less expensive than clustered indexes  . The update cost of the non-clustered index is not as high as that of the clustered index, and the leaf nodes of the non-clustered index do not store data

6.2.2. Disadvantages of non-clustered indexes

  1. Like clustered indexes, nonclustered indexes also rely on ordered data
  2. It is possible to query twice (back to the table)  : This should be the biggest disadvantage of non-clustered indexes. After finding the pointer or primary key corresponding to the index, it may be necessary to query the data file or table according to the pointer or primary key.

This is a file screenshot of the Mysql table:

 Clustered and non-clustered indexes:

 6.2.3. Does the non-clustered index necessarily return to the table query (covering index)?

Non-clustered indexes do not necessarily support table queries.

Imagine a situation where a user is going to use SQL to query for a username, and the username field happens to be indexed.

Then the key of this index itself is the name. If you find the corresponding name, just return it directly, without going back to the table for query.

 SELECT name FROM table WHERE name='guang19';

Even MYISAM is the same, although MYISAM's primary key index does need to return to the table,
because the leaf nodes of its primary key index store pointers. But what if the SQL query is the primary key?

SELECT id FROM table WHERE id=1;

The key of the primary key index itself is the primary key, just find and return. This situation is called a covering index.

Seven, covering index

        If an index contains (or covers) the values ​​of all fields that need to be queried, we call it a "covering index". We know that in the InnoDB storage engine, if it is not the primary key index, the leaf node stores the primary key + column value. In the end, it is still necessary to "return to the table", that is, to search again through the primary key. This will be slower. Covering the index is to match the columns to be queried with the index, and do not return to the table!

The covering index means that the field to be queried happens to be the indexed field, so the data can be found directly according to the index without going
back to the table for query.

        Such as the primary key index, if a SQL needs to query the primary key, then the primary key can be found just according to the primary key index.

        Another example is a common index. If a SQL query needs to query the name, and the name field happens to have an index, then the data can be found directly based on the index without returning to the table.

Covering index:

 8. Index Creation Principles

8.1. Single-column index

A single-column index is an index that consists of one column of attributes.

8.2, joint index (multi-column index)

A joint index is an index composed of multi-column attributes.

8.3, the leftmost prefix principle

Suppose the joint index created consists of three fields:

ALTER TABLE table ADD INDEX index_name (num,name,age)

        Then when the query condition is: num / (num AND name) / (num AND name AND age), the index will take effect. Therefore, when creating a joint index, try to use the most frequently queried field as the leftmost (first) field. When querying, try to use this field as the first condition.

But maybe due to version reasons (my mysql version is 8.0.x), the joint index I created is equivalent to creating the same index on each field of the joint index:

 Regardless of whether the leftmost prefix principle is met, the index of each field takes effect:

 9. Notes on index creation

9.1. Select the appropriate field

①Fields that are not NULL

        The data of the index field should not be NULL as far as possible, because it is difficult for the database to optimize the field whose data is NULL. If the field is frequently queried but cannot avoid being NULL, it is recommended to use short values ​​or short characters with clear semantics such as 0,1,true,false as an alternative.

②Fields frequently queried

        The fields we create indexes should be fields that are frequently queried.

③Fields queried as conditions

        Fields queried as WHERE conditions should be considered for indexing.

④Fields that are frequently used for connection

        The fields that are often used for connection may be some foreign key columns. For foreign key columns, it is not necessary to establish a foreign key, but only that the column involves the relationship between tables. For fields that are frequently joined and queried, you can consider building an index to improve the efficiency of multi-table join queries.

9.2. Fields that are not suitable for creating indexes

①Fields that are frequently updated should be carefully indexed

        Although indexes can bring query efficiency, the cost of maintaining indexes is not small. If a field is not frequently queried, but is frequently modified, then it should not be indexed on such a field.

② There is no need to create an index for fields that are not frequently queried

③ Consider building a joint index instead of a single column index as much as possible

        Because the index needs to occupy disk space, it can be simply understood that each index corresponds to a B+ tree. If a table has too many fields and too many indexes, when the data in this table reaches a certain volume, the index will occupy a lot of space, and it will take a lot of time to modify the index. If it is a joint index and multiple fields are on one index, it will save a lot of disk space, and the operation efficiency of modifying data will also be improved.

④ Pay attention to avoid redundant indexes

        Redundant index means that the index has the same function, and if it can be hit, it will definitely be hit. Then it is a redundant index such as (name, city) and (name). These two indexes are redundant indexes, and the query that can hit the latter must be In most cases that can hit the former, you should try to expand the existing index instead of creating a new index.

⑤ Consider using a prefix index instead of a normal index on a field of string type

        Prefix indexes are limited to string types, and take up less space than ordinary indexes, so you can consider using prefix indexes instead of ordinary indexes.

9.3. Will the use of indexes definitely improve query performance?

        Index queries are faster than full table scans in most cases. But if the amount of data in the database is not large, the use of indexes may not necessarily bring great improvements.

Guess you like

Origin blog.csdn.net/qq_54247497/article/details/131631584