Learn all the knowledge points of MySQL index in one article

Table of contents

1. Introduction to Index

Second, the advantages and disadvantages of the index

Advantage

disadvantage

3. The structure of the index

1. Hash structure

2. B tree

3. B+ tree

Fourth, the realization of MySQL index

1. MyIsam index

2. InnoDB index

3. Comparison between MyIsam and InnocentDB

5. Index declaration and use

1. Index classification

2. Syntax operation of index

Six, index setting principles

1. Which situations are suitable for indexing

2. Which situations are not suitable for indexing


1. Introduction to Index

  1. The official introduction index is a data structure that helps MySQL efficiently obtain data. More generally speaking, a database index is like the table of contents in front of a book, which can speed up the query speed of the database.
  2. Generally speaking, the index itself is also very large, and it is impossible to store all of it in memory, so the index is often stored in a file on the disk (may be stored in a separate index file, or may be stored together with the data in the data file) .
  3. The indexes we usually refer to, including clustered indexes, covering indexes, combined indexes, prefix indexes, unique indexes, etc., are not specifically specified, and the default is to use the B+ tree structure organization (multi-way search tree, not necessarily binary) index of.

Second, the advantages and disadvantages of the index

Advantage

  1. It can improve the efficiency of data retrieval and reduce the IO cost of the database, similar to the catalog of books.
  2. Sorting data through index columns reduces the cost of data sorting and reduces CPU consumption.
  3. The indexed columns will be automatically sorted, including [single-column index] and [combined index], but the sorting of the combined index is more complicated.
  4. If sorted according to the order of the index columns, corresponding to the order by statement, the efficiency will be greatly improved.

disadvantage

  1. Indexes take up disk space

  2. Although indexes will improve query efficiency, they will reduce the efficiency of updating tables . For example, every time a table is added, deleted, or modified, MySQL not only saves the data, but also saves or updates the corresponding index file.


3. The structure of the index

MySQL uses B+ tree         in most cases

1. Hash structure

Features of the Hash structure

  1. Hash index can only satisfy (=) (<>) and IN queries. Because the data pointed to by the Hash index is unordered, the time complexity will degenerate to O(n) when performing range queries ; while the "ordered" nature of the tree can still maintain the high efficiency of O(log2N).
  2. Hash index also has a flaw, the data storage is out of order, in the case of ORDER BY , the use of Hash index also needs to reorder the data.
  3. In the case of a joint index, the Hash value is calculated by merging the keys of the joint index, and it is impossible to query a single key or several index keys.
  4. For equivalent queries, the Hash index is usually more efficient, but there is also a situation that if there are many duplicate values ​​in the index column, the efficiency will decrease. This is because when encountering a Hash conflict, it is necessary to traverse the row pointers in the bucket for comparison and find the query keyword, which is very time-consuming. Therefore, Hash indexes are usually not used on columns with many repeated values, such as gender and age.

Applicable scenarios of Hash index

  1. Hash index has many limitations. In contrast, the use of B+ tree index in database will be wider. However, there are some scenarios where Hash index is more efficient. For example, in key-value (Key-Value) database, Redis stores The core is the Hash table.
  2. The Memory storage engine in MySQL supports Hash storage. If we need to use a temporary table for query, we can choose the Memory storage engine and set a field as a Hash index, such as a field of string type. After Hash calculation, the length can be shortened to a few bytes. When the duplication of fields is low and equivalent queries are often required, it is a good choice to use Hash indexes.
  3. In addition, InnoDB itself does not support Hash index, but provides adaptive Hash index (Adaptive Hash index). Under what circumstances will the adaptive Hash index be used? If a certain data is frequently accessed, when certain conditions are met, the address of the data page will be stored in the Hash table, so that the next time you query, you can Go directly to where this page is located. In this way, the B+ tree also has the advantages of the Hash index.
# 查看自适应hash索引是否启用
SHOW variables LIKE '%adaptive_hash_index%';

2. B tree

3. B+ tree

Features of B+ tree

  1. A node with k children has k keys. That is, the number of children = the number of keywords, and in the B tree, the number of children = the number of keywords +1.
  2. The keywords of non-leaf nodes also exist in child nodes at the same time, and are the maximum (or minimum) of all keywords in child nodes.
  3. Non-leaf nodes are only used for indexing and do not save data records. Information related to records is placed in leaf nodes. In the B-tree, non-leaf nodes store both indexes and data records.
  4. All keywords of the B+ tree appear in the leaf nodes, and the leaf nodes form an ordered linked list, and the leaf nodes themselves are linked in ascending order according to the size of the keywords. So it's good at range lookups.

Fourth, the realization of MySQL index

1. MyIsam index

Primary key index:

  1. MyISAM data files and index files are stored separately. When MyISAM uses a B+ tree to build an index tree, the key value stored in the leaf node is the value of the index column, and the data is the disk address of the row where the index is located.
  2. The index of the table user is stored in the index file , and the data file is stored in the data file  .user.MYIuser.MYD

Auxiliary index:

  1. In MyISAM, the structure of the auxiliary index is the same as that of the primary key index, and there is no difference. The data of the leaf nodes is stored in the disk address of the row record. Only the key value of the primary key index is unique, while the key value of the secondary index can be repeated.
  2. When querying data, since the key value of the auxiliary index is not unique, there may be multiple records with the same value, so even for equivalent query, it is necessary to retrieve data in the auxiliary index tree in the manner of range query.

2. InnoDB index

Primary key index (clustered index):
        Each InnoDB table has a clustered index. The clustered index is constructed using a B+ tree, and the data stored in the leaf nodes is the entire row of records. In general, a clustered index is equivalent to a primary key index. When a table does not create a primary key index, InnoDB will automatically create a ROWID field to build a clustered index. The specific rules for InnoDB to create indexes are as follows:

  1. Define the primary key PRIMARY KEY on the table , and InnoDB uses the primary key index as the clustered index.
  2. If the table does not define a primary key, InnoDB will select the first unique index column that is not NULL as the clustered index.
  3. If neither of the above two is available, InnoDB will use a 6-byte long integer implicit field ROWID field to build a clustered index. The ROWID field is automatically incremented when new rows are inserted.

Auxiliary index:

  1. All indexes except clustered indexes are called secondary indexes. In InnoDB, the data stored in the leaf nodes of the auxiliary index is the primary key value of the row. When retrieving, InnoDB uses this primary key value to search for row records in the clustered index.

3. Comparison between MyIsam and InnocentDB

The index methods of MyISAM are all "non-clustered", which is different from that of InnoDB which contains one clustered index. Summarize the difference between indexes in the two engines:

  1. In the innoDB storage engine, we only need to search the clustered index once according to the primary key value to find the corresponding record, but in MyISAM, we need to perform a return table operation, which means that the indexes established in MyISAM are equivalent to all is a secondary index.
  2. The data file of lnnoDB is an index file itself, while the MyISAM index file and data file are separated, and the index file only saves the address of the data record.
  3. The data domain of lnnoDB's non-clustered index stores the value of the primary key of the corresponding record, while the MySAM index records the address. In other words, all non-clustered indexes of InnoDB refer to the primary key as the data field.
  4. MyISAM's return table operation is very fast, because it directly fetches data from the file with the address offset. On the other hand, InnoDB obtains the primary key and then goes to the clustered index to find records. Although it is not slow, it is still It is not as good as directly using the address to access.
  5. lnnoDB requires that the table must have a primary key (MyISAM may not). If not explicitly specified, the MySQL system will automatically select a column that can uniquely identify data records that is not empty as the primary key. If there is no such column, MVSOL automatically generates an implicit field as the primary key for the InnoDB table. The length of this field is 6 bytes, and the type is a long integer.

5. Index declaration and use

1. Index classification

  • Classified according to functional logic: unique index, full-text index, primary key index, ordinary index;
  • Classified by physical implementation: clustered index, non-clustered index;
  • According to the number of fields: single column index, joint index.

2. Syntax operation of index

  1. index creation
    # 1 隐式的创建索引(主键约束、唯一性约束、外键约束的字段上)
    
    
    # 2 显示的创建
    CREATE TABLE 表名 (id int, lname varchar(20)
    [UNIQUE | FULLTEXT | SPATIAL] 
    [INDEX | KEY] 
    [索引名字] (索引字段列 [索引长度]) 
    [ASC | DESC]);
    
    
    # 3 在已创建的表中添加索引
    ALTER TABLE 表名 ADD INDEX [索引名字] (索引字段列 [索引长度]);
    
    CREATE INDEX [索引名字] ON 表名(索引字段列 [索引长度]);

  2. index view
    # 方式一
    SHOW CREATE TABLE 表名;
    
    
    # 方式二
    SHOW INDEX FROM 表名;

  3. index deletion
    # 方式一
    ALTER TABLE 表名 DROP INDEX 索引名字;
    
    
    # 方式二
    DROP INDEX 索引名字 ON 表名;

  4. New features of MySQL8.0
    1. Descending index, support DESC descending order
    2. hidden index
    # 创建表时创建隐藏索引
    CREATE TABLE (id INT,
    INDEX 索引名称(索引字段) invisible);
    
    
    # 创建表后添加隐藏索引
    ALTER TABLE 表名
    ADD INDEX 索引名称(索引字段) invisible;
    
    CREATE INDEX 索引名称 ON 表名(索引字段) invisible;
    
    # 修该索引的可见性
    ALTER TABLE 表名 ALERT INDERX 索引名 visible/invisible;
    

Six, index setting principles

1. Which situations are suitable for indexing

  1. Fields with unique properties 
  2. Fields frequently used as WHERE query conditions
  3. Frequently GROUP BY and ORDER BY columns
  4. Frequent UPDATE, DELETE , WHERE condition column
  5. The DISTINCT field needs to create an index
  6. During multi-table JOIN operation
    1. The number of connection tables should not exceed three.
    2. To create an index on the WHERE condition, you can filter better.
    3. Create an index on the connected fields, and the connected fields must be of the same type, otherwise the index will fail.
  7. Create an index with a small column type
  8. Create an index using a prefix of a string
    1. When creating an index on a Varchar field, the index length must be specified
    2. When using a prefix index, sorting by index cannot be supported
  9. Those with high discrimination (hash degree) are suitable as indexes
  10. When creating a joint index, the more frequently used columns should be placed on the left
  11. In the case where multiple fields need to be indexed, the joint index is better than the single-value index
  12. No more than 6 indexes in a single table
    1. Indexes take up disk space, and the more indexes you have, the more disk space you need.
    2. Indexes affect the performance of INSERT, DELETE, UPDATE , etc. statements.
    3. When multiple indexes are available, the optimizer will spend time selecting an index.

2. Which situations are not suitable for indexing

  1. Fields that are not used in WHERE do not need to set indexes
  2. Tables with a small amount of data are not suitable for indexing
  3. Do not create indexes if there is a lot of duplicate data
  4. Do not create too many indexes for tables or fields that frequently update data
  5. It is not recommended to use unordered values ​​as indexes (such as ID card numbers)
  6. Drop unused or rarely used indexes
  7. Do not define redundant or duplicate indexes, such as joint indexes and single-valued indexes

Guess you like

Origin blog.csdn.net/iuu77/article/details/128975599