[MySQL Study Notes (6)] Detailed explanation of the index scheme in InnoDB and MyISAM

This article is published by the official account [Developing Pigeon]! Welcome to follow! ! !


Old Rules-Sister Town House:

One. index

(I. Overview

       If there is no index, we directly traverse all the pages when searching for records, and then traverse all the records to hit the final record, which is very inefficient. Therefore, we need to use the index to improve the efficiency of the search.


(Two) index scheme in InnoDB

1. Index Rules

       Since the storage of records in each page is irregular, there is no clue when looking up, and I don't know where to start. If we want to find a record, we must first find which page the record is on, and what conditions should we use to locate the record? The primary key is a good choice, imitating the practice of the page directory in the page, the primary key value of the user record in the next index page must be greater than the primary key value of the user record in the previous page, so that the primary key can be used to determine each Page range. To maintain this state, it is necessary to maintain the order of different pages through some record movement operations during the process of adding, deleting, and modifying records in the page. Then, create a directory for all pages, each page corresponds to a directory entry, each directory entry includes two parts: the smallest primary key value key in the user record of the page and the page number page_no.

       Then, when we look up the record, we directly determine the page where the record is located through the primary key value of the record, and then enter the page to find the slot where the record is located through the page directory, and then traverse the records in the slot. In this way, the search efficiency will be much higher.

2. Index scheme in InnoDB

       InnoDB uses pages as the basic unit of storage space management, but guarantees a maximum of 16KB of continuous storage space. If the number of pages in the table is particularly large, there will always be a day when the number of pages exceeds 16KB. And if a page is deleted, the item corresponding to the page in the directory item must be deleted, or stored in the directory item as redundancy, which wastes storage space. These issues need to be resolved.

       InnoDB stores directory entries by reusing the index page that stores user records. In order to distinguish, these records used to represent directory entries are called directory entry records. The record_type attribute in their record header information is 1, and the ordinary user record is 0. And the directory entry record has only two columns: the primary key value and the page number.

       So the question is again, how do we locate these pages that store directory entries?

       These pages may not be next to each other in the storage space, and there may be many pages storing directory entry records. How to quickly locate a page storing directory entry records based on the primary key value? The answer is to generate a higher-level directory for the pages that store the records of the directory entries, just like a multi-level directory. This structure is like a tree named B+ tree. User records are stored in leaf nodes, and non-leaf nodes are directory item records. Level 0 is the level where the leaf nodes are located. Generally, the B+ tree will not exceed 4 levels.


3. Clustered Index

       A clustered index is a B+ tree that meets the following conditions:

       (1) The records in the page are arranged in a singly linked list in the order of the size of the primary key. The records are divided into several groups. The offset of the record with the largest primary key value in each group will be stored as a slot in the page. Page directory.

       (2) The pages that store the user directory are arranged in a doubly linked list according to the size of the primary key

       (3) The pages storing the directory entry records are divided into different levels. The pages in the same level are also arranged in a doubly linked list according to the size of the primary key of the directory entry records in the page.

       (4) The leaf nodes of the B+ tree store complete user records, including hidden columns.

       Clustered index does not require us to explicitly use the INDEX statement in the MySQL statement to create, InnoDB, this is the storage method.


4. Secondary Index

       The clustered index can only work when the search condition is the primary key. What about other columns?

       We can build a few more B+ trees, and the data in different B+ uses different sorting rules. What is the difference between this B+ tree and the clustered index?

       (1) What the leaf node stores is not a complete user record, but an index column + primary key. When we find the corresponding index record, we use the primary key to find the complete user record in the clustered index. This operation is called returning to the table. Then return to the leaf node of this B+ tree, and continue searching along the singly linked list. The advantage of this scheme is to save space, which is also the reason why it is called a secondary index, and the secondary operation of returning to the table must be performed.

       (2) The directory entry record is the index column + primary key + page number. If the primary key is not added, there are multiple identical values ​​in the index column, and the index columns of different pages in the directory entry are the same. When the page is split, it is impossible to determine which page the first inserted record should be placed in, so it is necessary to add a primary key to ensure the uniqueness of the search.

       When we declare UNIQUE for a certain column or combination of columns, a secondary index will be created for this column or combination of columns. But even with the UNIQUE attribute blessing, there may be multiple records with the same key value, such as all NULL values, or MVCC services.

5. Joint Index

       The size of multiple columns can be used as the sorting rule at the same time, and multiple columns can be indexed at the same time. For example, if the B+ tree is sorted according to the size of the c2 and c3 columns, it means:

       (1) First sort the records and pages according to the c2 column
       (2) When the c2 column of the record is the same, then use the c3 column to sort

       Each directory entry is composed of three parts: c2 column, c3 column, and page number. User records are composed of c2 column, c3 column and primary key.


(3) Matters needing attention

1. The root page will not move

       Creating a B+ tree index for a table will create a root node page for this index. When there is no data in the table initially, there is no user record and directory entry record in the corresponding root page. Then insert user records. When the available space in the root node is used up, all records in the root node will be copied to a newly allocated page, and this new page will be split to obtain another new page, the root The node is now upgraded to the page storing the directory entry record, and the directory entry record of the user record needs to be inserted into the root node.

       The root node of the B+ tree index will never move, that is, the page number will not change, and will be recorded somewhere. InnoDB needs to use this index to find the page number of the root node from a fixed place and access this index.


2. At least two records on a page

       If a large directory contains only one subdirectory, then the directory hierarchy will be very large. Therefore, InnoDB stipulates that a page has at least two records.


(4) Indexing scheme in MyISAM

       MyISAM stores indexes and data separately. The records in the table are stored in a file according to the insertion order of the records (non-primary key sorting, so they cannot be divided into two). They are called data files. The data pages are not divided, and the records are directly queried by row number.

       The index information is stored separately in the index file, and an index is created separately for the primary key of the table. What is stored in the leaf node of the index is not the complete user record, but the primary key + row number, that is, the corresponding row number is found through the index. Then find the user record by line number. Therefore, the indexes in MyISAM are all secondary indexes.

       For the row format of MyISAM records, there are fixed-length record format, variable-length record format, and compressed record format. The fixed-length record format can calculate the address offset of a record in the data file through the line number, but the variable-length record format does not work, so it is necessary to store the address offset of the record in the data file at the index leaf node the amount. In this regard, MyISAM is more efficient because it directly obtains the address offset to fetch records in the file, while InnoDB uses the primary key to find records in the clustered index.


(5) Create and delete indexes in MySQL

       InnoDB and MyISAM will automatically index the primary key or UNIQUE column, and other columns need to be displayed and specified.

1. Create an index

CREATE TABLE 表名(
	(KEY | INDEX) 索引名 单个列或多个列
);
ALTER TABLE 表名 ADD (KEY | INDEX) 索引名 单个列或多个列;

2. Delete the index

ALTER TABLE 表名 DROP (KEY | INDEX) 索引名;

3. Joint Index

CREATE TABLE 表名(
	(KEY | INDEX) 索引名 (多个列)
);

       The index name of the joint index should be prefixed with idx_ as much as possible, followed by the names of multiple columns.

Guess you like

Origin blog.csdn.net/Mrwxxxx/article/details/113804497