Clustered index
We first build the following table
CREATE TABLE `student` (
`id` int(11) NOT NULL AUTO_INCREMENT COMMENT '学号',
`name` varchar(10) NOT NULL COMMENT '学生姓名',
`age` int(11) NOT NULL COMMENT '学生年龄',
PRIMARY KEY (`id`),
KEY `idx_name` (`name`)
) ENGINE=InnoDB;
Insert the following sql
insert into student (`name`, `age`) value('a', 10);
insert into student (`name`, `age`) value('c', 12);
insert into student (`name`, `age`) value('b', 9);
insert into student (`name`, `age`) value('d', 15);
insert into student (`name`, `age`) value('h', 17);
insert into student (`name`, `age`) value('l', 13);
insert into student (`name`, `age`) value('k', 12);
insert into student (`name`, `age`) value('x', 9);
The data is as follows
mysql stores data according to pages, and the size of each page is 16k.
In MySQL, you can see the size of a page by executing the following statement
show global status like 'innodb_page_size'
The result is 16384, which is 16kb
In the InnoDB storage engine, the primary key is used as the index to organize data . The records are connected in the form of a singly linked list in the order of the primary key from small to large on the page.
Some friends may ask, what if the primary key is not specified when the table is built?
If the defined primary key is not displayed when creating the table, the InnoDB storage engine will select or create the primary key as follows.
- First determine whether there is a non-empty unique index in the table. If there is, the column is the primary key. If there are multiple non-empty unique indexes, the InnoDB storage engine will select the first non-empty unique index defined when the table is built as the primary key
- If the above conditions are not met, the InnoDB storage engine automatically creates a 6-byte pointer as an index
Pages are linked together in the form of a double-linked list . And the primary key value of the user record in the next data page must be greater than the primary key value of the user record in the previous data page
Assuming that a page can only store 3 pieces of data, the data storage structure is as follows.
It can be seen that when we want to query a piece of data or insert a piece of data, we need to traverse the linked list of each page in turn from the very beginning page, which is not efficient.
We can make a directory for this page, save the mapping relationship between the primary key and the page number, and quickly find the page where the data is located according to the dichotomy. But the prerequisite for this is that the mapping relationship needs to be stored in a continuous space, such as an array. If you do this, you will have the following problems
- With the increase of data, the continuous space required by the directory becomes larger and larger, which is not realistic
- When all the data of a page is deleted, the corresponding directory item must also be deleted, and the following directory items must be moved forward, which is too costly
We can put directory data in a structure similar to user data, as shown below. The directory entry has 2 columns, primary key and page number.
When there is a lot of data, there must be a lot of catalog items. After all, the size of a page is 16k. We can create multiple catalog items for the data, and then create catalog items on the basis of the catalog items, as shown in the figure below
The picture comes from "How MySQL Works: Understanding MySQL from the Root"
This is actually a B+ tree, and also a clustered index, that is, the data and the index are together. Leaf nodes store all column values
Take an integer field index of InnoDB as an example, this N is almost 1200. When the height of this tree is 4, it can store a value of 1200 to the 3rd power, which is already 1.7 billion. Considering that the data block at the root of the tree is always in memory, the index of an integer field on a 1 billion-row table requires only 3 disk accesses to find a value. In fact, the second level of the tree has a high probability of being in memory, so the average number of accesses to the disk is even less. "45 Lectures on MySQL Actual Combat"
Nonclustered index
The value of the non-clustered index leaf node is the index column + primary key
When we query the user information (student number, name, age) whose name is h, because an index is built on name, first find the corresponding primary key id from the non-clustered index of name, and then find the corresponding primary key id from the clustered index according to the primary key id record of.
The process of finding the corresponding primary key value from the non-clustered index and then finding the corresponding record on the clustered index is back to the table
Joint index/index coverage
Assuming that the teacher table is defined as follows, create a joint index on the name and age columns
CREATE TABLE `teacher` (
`id` int(11) NOT NULL AUTO_INCREMENT COMMENT '教师编号',
`name` varchar(10) NOT NULL COMMENT '教师姓名',
`age` int(11) NOT NULL COMMENT '教师年龄',
`ismale` tinyint(3) NOT NULL COMMENT '是否男性',
PRIMARY KEY (`id`),
KEY `idx_name_age` (`name`, `age`)
) ENGINE=InnoDB;
Insert the following sql
insert into teacher (`name`, `age`, `ismale`) value('aa', 10, 1);
insert into teacher (`name`, `age`, `ismale`) value('dd', 12, 0);
insert into teacher (`name`, `age`, `ismale`) value('cb', 9, 1);
insert into teacher (`name`, `age`, `ismale`) value('cb', 15, 1);
insert into teacher (`name`, `age`, `ismale`) value('bc', 17, 0);
insert into teacher (`name`, `age`, `ismale`) value('bb', 15, 1);
insert into teacher (`name`, `age`, `ismale`) value('dd', 15, 1);
insert into teacher (`name`, `age`, `ismale`) value('dd', 12, 0);
Create a joint index on the name and age columns
The directory page consists of three parts: name column, age column, and page number. The directory will be sorted according to the name column first, and the age column will be sorted only when the name column is the same.
The data page is composed of three parts: name column, age column, and primary key value. Similarly, the data page will be sorted according to the name column first, and the age column will be sorted when the name column is the same.
When the following statement is executed, there will be a process of returning to the table
select * from student where name = 'aa';
When the following statement is executed, there is no process of returning to the table
select name, age from student where name = 'aa';
Why not need to return to the table?
Because the value stored in the leaf node of the idx_name_age index is the primary key value, the name value and the age value, the required column value can be obtained from the idx_name_age index without returning to the table, that is, index coverage
Index push down
When the following statement is executed
select * from student where name like '张%' and age = 10 and ismale = 1;
The execution process before version 5.6 is as follows, first find the corresponding primary key value from the idx_name_age index, then return to the table to find the corresponding row, and determine whether the values of other fields meet the conditions.
In 5.6, the index push-down optimization is introduced, and the index can be traversed. In the process, judge the fields contained in the index, directly filter out the data that does not meet the conditions, and reduce the number of return to the table, as shown below
Reference blog
"How MySQL Works: Understanding MySQL from the Basics"
Graphical Algorithm
[0]https://www.cs.usfca.edu/~galles/visualization/Algorithms.html
[1]https://blog.csdn .net/qq_35190492/article/details/106915564
[2]https://blog.csdn.net/qq_35571554/article/details/82759668
Joint Index
[3]https://www.cnblogs.com/rjzheng/p/12557314 .html