【TECH】MySQL index principle

1. Basic knowledge of indexing

What is an index?

An index is a decentralized storage data structure created to speed up the retrieval of data rows in a table . Through the index, you can quickly locate specific data without searching all the records in the database. The essence of an index is a data structure.

Index example in life: dictionary

Advantages of indexes:

  • It can improve the efficiency of data retrieval and reduce the IO cost of the database
  • Sort data through index columns, reduce the cost of data sorting, and reduce CPU consumption

Disadvantages of indexes:

  • Indexes take up additional disk space
  • Although the index will improve the query efficiency, it will reduce the efficiency of updating the table
  • Indexing on very small tables is not necessarily more efficient than a full table scan

Two, MySQL index data structure selection

When it comes to mysql index, everyone will think of B+ tree and hash index. Different storage engines of mysql support different index types. Our most commonly used InnoDB engine uses B+ tree index. First, let’s introduce the storage engine and the indexes supported by different storage engines.

What is a storage engine?

The database storage engine is the underlying software component of the database, and the database management system uses the data engine to create, query, update and delete data operations. Different storage engines provide different storage mechanisms, indexing techniques, locking levels and other functions, and specific functions can also be obtained by using different storage engines. On the mysql command line, you can see the storage engines supported by mysql, as shown in the figure below:

 

Why does InnoDB choose B+ tree as index?

To explain this problem clearly, we start with the simplest binary tree, analyze its characteristics and gradually upgrade and evolve to the final selection of B+ tree.

The structure of the binary search tree is simple, and its disadvantages are also obvious:

  •  May degenerate into a linked list;
  • the height of the tree is too high

Remarks: The data storage organization in the disk block includes keywords (which can be understood as the id of the table), data, and child node references

On the basis of the binary search tree, it is upgraded and evolved into a balanced binary tree. Assuming that a balanced binary tree is used to organize data storage, a balanced binary tree can solve the problem of degenerating into a linked list, but there is still the problem that the height of the tree is too high, and the height of the tree is too high It will cause too many IO times and affect the efficiency of data reading; the storage characteristics of the operating system are based on pages, and a page is 4kb. The amount of data retrieved by each IO is only a small amount, and the interaction between the operating system and the disk is not well utilized. characteristics, resulting in IO waste; this is the shortcoming of a balanced binary tree.

  • The height of the tree is too high, resulting in too many IO times, which affects the efficiency of data reading;
  • Failure to make good use of the interaction between the operating system and the disk, resulting in IO waste;

On the basis of the balanced binary tree, it continues to be upgraded to a path-balanced search tree (that is, a B-tree). The B-tree solves the previous defects very well, and it seems to be a more suitable data structure (there is an index in oracle. B-tree). In the search process of the B-tree, assuming that the returned data is hit on the first layer of the tree when searching for a certain data, the query speed is faster. When querying other data, the data is not in the first layer, and the query speed will slow down. There may be instability in the query, and the query speed is sometimes fast and sometimes slow. This is not an advantage or a disadvantage, but also its characteristic. On this basis, InnoDB continues to upgrade the B-tree to a B+ tree, which is the plus upgraded version of the B-tree.

Compared with B tree, the characteristics of B+ tree:

  • Fewer levels, higher IO efficiency
  • Data exists only in leaf nodes
  • The leaf node data is naturally ordered and is a doubly linked list
  • Stronger scanning ability 
  • Query speed is stable

The above is the evolution process of InnoDB finally choosing the B+ tree as the index. Starting from the simplest binary tree analysis can give a deeper understanding of why it is a B+ tree.

3. Implementation of MySQL B+ tree index

Create a new table to view files based on different storage engines on windows, and you can see its file storage form as shown in the figure below

 

 In the MyISAM engine, the *.myi file stores all index information, which is the index tree in the table data, and *.myd is the stored data. The indexes of different columns will eventually point to the data storage address, as shown in the figure below, the index in MyISAM is a non-clustered index.

 InnoDB engine data is stored in the clustered index, the primary key index is the clustered index, and other indexes are auxiliary indexes. Its search process is to directly return data when searching through the primary key index. When using the auxiliary index, it still needs to return to the primary key index to find data. This process is called returning to the table, because the auxiliary index does not store data, but only stores the id of the primary key index. As shown below:

 other:

Guess you like

Origin blog.csdn.net/MrChenLen/article/details/114324500