What is a clustered index, non-clustered index, index coverage, back to the table, index push down

Clustered index

We first build the following table

CREATE TABLE `student` (
  `id` int(11) NOT NULL AUTO_INCREMENT COMMENT '学号',
  `name` varchar(10) NOT NULL COMMENT '学生姓名',
  `age` int(11) NOT NULL COMMENT '学生年龄',
  PRIMARY KEY (`id`),
  KEY `idx_name` (`name`)
) ENGINE=InnoDB;

Insert the following sql

insert into student (`name`, `age`) value('a', 10);
insert into student (`name`, `age`) value('c', 12);
insert into student (`name`, `age`) value('b', 9);
insert into student (`name`, `age`) value('d', 15);
insert into student (`name`, `age`) value('h', 17);
insert into student (`name`, `age`) value('l', 13);
insert into student (`name`, `age`) value('k', 12);
insert into student (`name`, `age`) value('x', 9);

The data is as follows
Insert picture description here
mysql stores data according to pages, and the size of each page is 16k.

In MySQL, you can see the size of a page by executing the following statement

show global status like 'innodb_page_size'

The result is 16384, which is 16kb

In the InnoDB storage engine, the primary key is used as the index to organize data . The records are connected in the form of a singly linked list in the order of the primary key from small to large on the page.

Some friends may ask, what if the primary key is not specified when the table is built?

If the defined primary key is not displayed when creating the table, the InnoDB storage engine will select or create the primary key as follows.

  1. First determine whether there is a non-empty unique index in the table. If there is, the column is the primary key. If there are multiple non-empty unique indexes, the InnoDB storage engine will select the first non-empty unique index defined when the table is built as the primary key
  2. If the above conditions are not met, the InnoDB storage engine automatically creates a 6-byte pointer as an index

Pages are linked together in the form of a double-linked list . And the primary key value of the user record in the next data page must be greater than the primary key value of the user record in the previous data page

Assuming that a page can only store 3 pieces of data, the data storage structure is as follows.
Insert picture description here
It can be seen that when we want to query a piece of data or insert a piece of data, we need to traverse the linked list of each page in turn from the very beginning page, which is not efficient.
Insert picture description here
We can make a directory for this page, save the mapping relationship between the primary key and the page number, and quickly find the page where the data is located according to the dichotomy. But the prerequisite for this is that the mapping relationship needs to be stored in a continuous space, such as an array. If you do this, you will have the following problems

  1. With the increase of data, the continuous space required by the directory becomes larger and larger, which is not realistic
  2. When all the data of a page is deleted, the corresponding directory item must also be deleted, and the following directory items must be moved forward, which is too costly

We can put directory data in a structure similar to user data, as shown below. The directory entry has 2 columns, primary key and page number.
Insert picture description here
When there is a lot of data, there must be a lot of catalog items. After all, the size of a page is 16k. We can create multiple catalog items for the data, and then create catalog items on the basis of the catalog items, as shown in the figure below

Insert picture description here
The picture comes from "How MySQL Works: Understanding MySQL from the Root"
This is actually a B+ tree, and also a clustered index, that is, the data and the index are together. Leaf nodes store all column values

Take an integer field index of InnoDB as an example, this N is almost 1200. When the height of this tree is 4, it can store a value of 1200 to the 3rd power, which is already 1.7 billion. Considering that the data block at the root of the tree is always in memory, the index of an integer field on a 1 billion-row table requires only 3 disk accesses to find a value. In fact, the second level of the tree has a high probability of being in memory, so the average number of accesses to the disk is even less. "45 Lectures on MySQL Actual Combat"

Nonclustered index

Insert picture description here
The value of the non-clustered index leaf node is the index column + primary key

When we query the user information (student number, name, age) whose name is h, because an index is built on name, first find the corresponding primary key id from the non-clustered index of name, and then find the corresponding primary key id from the clustered index according to the primary key id record of.

The process of finding the corresponding primary key value from the non-clustered index and then finding the corresponding record on the clustered index is back to the table

Joint index/index coverage

Assuming that the teacher table is defined as follows, create a joint index on the name and age columns

CREATE TABLE `teacher` (
  `id` int(11) NOT NULL AUTO_INCREMENT COMMENT '教师编号',
  `name` varchar(10) NOT NULL COMMENT '教师姓名',
  `age` int(11) NOT NULL COMMENT '教师年龄',
  `ismale` tinyint(3) NOT NULL COMMENT '是否男性',
  PRIMARY KEY (`id`),
  KEY `idx_name_age` (`name`, `age`)
) ENGINE=InnoDB;

Insert the following sql

insert into teacher (`name`, `age`, `ismale`) value('aa', 10, 1);
insert into teacher (`name`, `age`, `ismale`) value('dd', 12, 0);
insert into teacher (`name`, `age`, `ismale`) value('cb', 9, 1);
insert into teacher (`name`, `age`, `ismale`) value('cb', 15, 1);
insert into teacher (`name`, `age`, `ismale`) value('bc', 17, 0);
insert into teacher (`name`, `age`, `ismale`) value('bb', 15, 1);
insert into teacher (`name`, `age`, `ismale`) value('dd', 15, 1);
insert into teacher (`name`, `age`, `ismale`) value('dd', 12, 0);

Insert picture description here
Create a joint index on the name and age columns

The directory page consists of three parts: name column, age column, and page number. The directory will be sorted according to the name column first, and the age column will be sorted only when the name column is the same.

The data page is composed of three parts: name column, age column, and primary key value. Similarly, the data page will be sorted according to the name column first, and the age column will be sorted when the name column is the same.
Insert picture description here

When the following statement is executed, there will be a process of returning to the table

select * from student where name = 'aa';

When the following statement is executed, there is no process of returning to the table

select name, age from student where name = 'aa';

Why not need to return to the table?
Because the value stored in the leaf node of the idx_name_age index is the primary key value, the name value and the age value, the required column value can be obtained from the idx_name_age index without returning to the table, that is, index coverage

Index push down

When the following statement is executed

select * from student where name like '张%' and age = 10 and ismale = 1;

The execution process before version 5.6 is as follows, first find the corresponding primary key value from the idx_name_age index, then return to the table to find the corresponding row, and determine whether the values ​​of other fields meet the conditions.

In 5.6, the index push-down optimization is introduced, and the index can be traversed. In the process, judge the fields contained in the index, directly filter out the data that does not meet the conditions, and reduce the number of return to the table, as shown below
Insert picture description here

Reference blog

"How MySQL Works: Understanding MySQL from the Basics"
Graphical Algorithm
[0]https://www.cs.usfca.edu/~galles/visualization/Algorithms.html
[1]https://blog.csdn .net/qq_35190492/article/details/106915564
[2]https://blog.csdn.net/qq_35571554/article/details/82759668
Joint Index
[3]https://www.cnblogs.com/rjzheng/p/12557314 .html

Guess you like

Origin blog.csdn.net/zzti_erlie/article/details/110501008