MySQL's MyISAM and InnoDB indexing methods

Reprinted from https://www.cnblogs.com/renzherushe/p/4780226.html

In MySQL, indexes belong to the concept of storage engine level. Different storage engines implement indexes in different ways. This article mainly discusses the index implementation methods of MyISAM and InnoDB storage engines.

MyISAM Index Implementation

The MyISAM engine uses B+Tree as the index structure, and the data field of the leaf node stores the address of the data record. The following figure is a schematic diagram of the MyISAM index:

There are three columns in the table here. Assuming that we use Col1 as the primary key, the above figure is a schematic representation of the primary key of a MyISAM table. It can be seen that the index file of MyISAM only saves the address of the data record. In MyISAM, there is no difference in structure between the primary index and the secondary key (Secondary key), but the primary index requires the key to be unique, while the key of the secondary index can be repeated. If we build a secondary index on Col2, the structure of this index is shown in the following figure:

 

It is also a B+Tree, and the data field saves the address of the data record. Therefore, the index retrieval algorithm in MyISAM is to first search the index according to the B+Tree search algorithm. If the specified Key exists, the value of the data field is taken out, and then the corresponding data record is read with the value of the data field as the address.

The index method of MyISAM is also called "non-clustered", which is called to distinguish it from the clustered index of InnoDB. Clustered index (clustered index), non-clustered index (secondary index), although these two names are called indexes, this is not a separate index type, but a data storage method. For clustered index storage, row data is stored together with the primary key B+ tree, the secondary key B+ tree only stores the secondary key and primary key, and the primary key and non-primary key B+ tree are almost two types of trees. For non-clustered index storage, the primary key B+ tree stores pointers to real data rows at the leaf nodes, not the primary key.

InnoDB index implementation

Although InnoDB also uses B+Tree as the index structure, the specific implementation is completely different from MyISAM.

The first major difference is that InnoDB's data files are themselves index files. From the above, it is known that the MyISAM index file and the data file are separated, and the index file only saves the address of the data record. In InnoDB, the table data file itself is an index structure organized by B+Tree, and the data field of the leaf node of this tree saves complete data records. The key of this index is the primary key of the data table, so the InnoDB table data file itself is the primary index.

The above figure is a schematic diagram of the InnoDB main index (which is also a data file). You can see that the leaf nodes contain complete data records. Such an index is called a clustered index. Because the data files of InnoDB are aggregated by the primary key, InnoDB requires that the table must have a primary key (MyISAM may not have it). If it is not specified explicitly, the MySQL system will automatically select a column that can uniquely identify the data record as the primary key. If it does not exist For this type of column, MySQL automatically generates an implicit field as the primary key for the InnoDB table. The length of this field is 6 bytes and the type is long.

The second difference from MyISAM indexes is that InnoDB's secondary index data field stores the value of the corresponding record's primary key instead of its address. In other words, all secondary indexes in InnoDB refer to the primary key as the data field. For example, the following figure shows an auxiliary index defined on Col3:

Here, the ASCII code of English characters is used as the comparison criterion. The implementation of the clustered index makes the search by the primary key very efficient, but the secondary index search needs to retrieve the index twice: first, the secondary index is retrieved to obtain the primary key, and then the primary key is used to retrieve the records in the primary index.

Knowing how indexes are implemented in different storage engines is very helpful for correct use and optimization of indexes. For example, after knowing the index implementation of InnoDB, it is easy to understand why it is not recommended to use a field that is too long as a primary key, because all secondary indexes refer to the primary key. Index, a long primary index will make the secondary index too large. For another example, it is not a good idea to use a non-monotonic field as a primary key in InnoDB, because the InnoDB data file itself is a B+Tree, and a non-monotonic primary key will cause the data file to maintain the characteristics of B+Tree when inserting new records. Frequent split adjustment is very inefficient, and using an auto-increment field as the primary key is a good choice.

Here's a simpler and clearer example:

Adaptive hash index for InnoDB

  InnoDB存储引擎会监控对表上索引的查找,如果观察到建立哈希索引可以带来速度的提升,则建立哈希索引,所以称之为自适应(adaptive) 的。自适应哈希索引通过缓冲池的B+树构造而来,因此建立的速度很快。而且不需要将整个表都建哈希索引,InnoDB存储引擎会自动根据访问的频率和模式 来为某些页建立哈希索引。MySQL的Heap存储引擎默认的索引类型为哈希。

 

参考:

  http://www.cnblogs.com/ylqmf/archive/2011/09/16/2179166.html

  http://blog.codinglabs.org/articles/theory-of-mysql-index.html

  http://www.codeceo.com/article/mysql-innodb-index.html


Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325988564&siteId=291194637