The leftmost index problem?

problem

Recently, I tested the leftmost prefix principle in mysql, and found something weird. According to the principle of the leftmost prefix, the index should have failed and the full table scan should be taken, but it was found that the index can be taken normally.

The table structure is as follows (Mysql version 5.7.22):

CREATE TABLE `user` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(32) COLLATE utf8mb4_bin DEFAULT NULL,
  `age` int(11) DEFAULT NULL,
  `address` varchar(128) COLLATE utf8mb4_bin DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_user` (`name`,`age`,`address`)
) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin

INSERT INTO user(`id`, `name`, `age`, `address`) VALUES (1, 'zs', 12, 'beijing');

There are a total of four fields in the table. id is the primary key, and there is a joint index consisting of name, age, and address. The storage engine is InnoDB, and a piece of test data is inserted.

According to the principle of the leftmost prefix, the following SQL will definitely invalidate the index. (If you don’t understand the principle of the leftmost prefix, I will talk about it later~)

EXPLAIN select * from user where address='beijing';

However, it turned out to be disappointing. As follows, by looking at the execution plan, it is found to be indexed.

This makes me very confused, is it possible that the leftmost prefix principle is wrong? Or is it that Mysql is smart enough to not need the care leftmost prefix principle with the version upgrade?

table of Contents

With this question in mind, let's find out. Before this, you need to know some pre-knowledge. The content of this article is as follows:

  • What are clustered index and non-clustered index?

  • What is a back-to-table query?

  • What is index coverage?

  • Leftmost Prefix Principle

  • Problem solving

text

Since the InnoDB engine is basically used now, InnoDB is used as an example below, and MyISAM will mention it by the way.

What are clustered index and non-clustered index?

We know that the bottom layer of Mysql uses B+ trees to store indexes, and the data exists in leaf nodes. For InnoDB, its primary key index and row records are stored together, so it is called a clustered index (clustered index).

PS: The row records of MyISAM are stored separately, not together with the index, so MyISAM does not have a clustered index.

In addition to the clustered index, other indexes are called non-clustered index (secondary index). Including ordinary index, unique index, etc.

Also note that there is one and only one clustered index in InnoDB. It has three situations:

  1. If the table has a primary key, the primary key index is a clustered index.

  2. If there is no primary key, the first non-empty unique index will be used as the clustered index.

  3. Otherwise, it will implicitly define a rowid as a clustered index.

In order to facilitate understanding, let's take InnoDB's primary key index and ordinary index as examples to see their storage structure.

Create a table with the following structure and add a few records (Zhang San, Li Si, Wang Wu, Sun Qi):

CREATE TABLE `student` (
  `id` int(11) NOT NULL,
  `name` varchar(255) COLLATE utf8mb4_bin DEFAULT NULL,
  `age` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_stu` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin

insert into student(id,name,age) values(1,'zs',12);
insert into student(id,name,age) values(5,'ls',14);
insert into student(id,name,age) values(9,'ww',12);
insert into student(id,name,age) values(11,'sq',13);

In InnoDB, the leaf node of the primary key index stores the primary key and row records, while the leaf node of the ordinary index stores the primary key (for MyISAM, the leaf node of the primary key index stores the primary key and pointers to the corresponding row records, the ordinary index The leaf node stores the pointer of the current index column and the corresponding row record).

Therefore, id is a clustered index and name is a non-clustered index. Their corresponding B+ tree structure is shown in the figure below,

clustered index, secondary index

What is a back-to-table query?

From the index storage structure above, we can see that in the primary key index tree, the data we need can be found out at one time through the primary key, and the speed is very fast.

Because the primary key and row records are stored together, the primary key is located, and the record you are looking for is located. All the fields of the current row are here (this is why we say that when creating a table, it is best to create a primary key , Try to use the primary key to query when querying).

For ordinary indexes, such as name in the example, you need to find the primary key corresponding to the leaf node according to the index tree of name (non-clustered index), and then go to the primary key index tree to query through the primary key to get the record you are looking for. This is called back to the table query .

Take the following sql as an example.

select * from student where name='zs';

It needs to query the index tree twice.

  • Locate the primary key id=1 through the non-clustered index.

  • Locate the primary key id 1 through the clustered index and the corresponding row record.

Its query process diagram is as follows,

Back to table query

What is index coverage?

For the above back-to-table query, it will undoubtedly reduce the query efficiency. So, some children's shoes will ask, is there any way to prevent it from returning to the table?

The answer is, of course, index coverage .

What is index coverage is to make its index tree and the data on the leaf nodes queried can cover all the fields you query when using this index query, so that you can avoid returning to the table.

Taking the table above as an example, now on the index tree corresponding to zs, only the data of itself and the primary key can not be overwritten to the age field. Then, we can create a joint index, such as KEY (name, age). And, when querying, explicitly write out the fields (name and age) corresponding to the joint index.

Create a joint index as follows,

KEY `idx_stu` (`name`,`age`)

The query statement is modified as follows,

-- 覆盖联合索引中的字段
select id,name,age from student where name='zs' and age=12; 

In this way, when querying the index tree, there is no need to go back to the table, and all the fields can be found at once. The corresponding index tree structure is as follows:

Joint index

PS: In the figure, the fields (name, age) in the joint index should all appear on the index tree, here for the convenience of drawing, and because the amount of data is too small, it is not drawn. It only shows: the leaf node stores all the joint index fields.

Leftmost Prefix Principle

The principle of the leftmost prefix, as the name suggests, is that the leftmost one takes precedence. Refers to the index of the leftmost column in the joint index. As in the above table, the joint index of name and age is equivalent to creating a single-column index of name and a joint index of (name, age). When querying, if there is a name field in the where condition, this joint index will be used.

The same applies to the joint index of multiple fields. Such as index (a, b, c) joint index, it is equivalent to create a single-column index, (a, b) joint index, and (a, b, c) joint index.

In order to verify the principle of the leftmost prefix, we need to transform the original table structure. Add two more fields (address, sex), and then create a three-column joint index (name, age, address).

drop table student;
CREATE TABLE `student` (
  `id` int(11) NOT NULL,
  `name` varchar(255) COLLATE utf8mb4_bin DEFAULT NULL,
  `age` int(11) DEFAULT NULL,
  `address` varchar(255) COLLATE utf8mb4_bin DEFAULT NULL,
  `sex` int(1) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_stu` (`name`,`age`,`address`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;

insert into student(id,name,age,address,sex) values(1,'zs',12,'beijing',1);
insert into student(id,name,age,address,sex) values(5,'ls',14,'tianjin',0);
insert into student(id,name,age,address,sex) values(9,'ww',12,'shanghai',1);
insert into student(id,name,age,address,sex) values(11,'sq',13,'hebei',1);

View the table data as follows,

Table data

Three methods are used to make it conform to the principle of the leftmost prefix.

explain select * from student where name='zs';
explain select * from student where name='zs' and age=12;
explain select * from student where name='zs' and age=12 and address='beijing';

Then check their execution plan as follows,

As you can see, the index is finally gone. Now, how about modifying sql as follows?

explain select * from student where address='beijing';

As we expected, this does not comply with the leftmost prefix principle, so the index fails and a full table scan is taken.

PS: Expanding thinking , if sql is changed to the following, will it cause a full table scan? (Try it yourself)

explain select * from student where name='zs' and address='beijing';

Problem solving

So far, we have found that everything is normal for the leftmost prefix principle. Then back to the question raised at the beginning, why does this principle not take effect? (The created joint index and sql statement are the same!)

Don't worry, remember the index coverage we mentioned earlier? This time, we use the index coverage principle to query only specific fields (only the primary key and joint index fields).

explain select id,name,age,address from student where address='beijing';

Check the execution plan again,

The problem is here. At this time, the principle of the leftmost prefix is ​​violated, but it conforms to the covering index. Why does the index go?

Let's compare, if you use the leftmost column and don't use the leftmost column, what is the difference between their execution plans.

It will be found that if the principle of the leftmost prefix is ​​not met, the type is index, and if it is met, the type is ref.

Index represents scanning the entire index tree. As in the example, the rightmost column address will cause the entire index tree to be scanned.

ref means that mysql will look up the index according to a specific algorithm, which is more efficient than index full scan. However, it has certain requirements for the index structure, and the index fields must be ordered. The joint index meets such a requirement!

The joint index is internally ordered, and we can understand it as a sorting rule similar to order by name, age, address. They will be sorted by name first, if the names are the same, they will be sorted by age, and so on.

So, this also explains why we have to abide by the leftmost prefix principle. When the leftmost column is in order, the index column on the right can be guaranteed to be in order.

Second, if it does not meet the left-most prefix principle, but meets the covering index, you can scan the entire index tree to find the column corresponding to the covering index (avoid returning to the table).

If the principle of the leftmost prefix is ​​not met, and the covering index (same as select *) is not met, the entire index tree needs to be scanned. After completion, you need to go back to the table again to query the corresponding row records.

At this time, the query optimizer will think that such two query index trees are not as fast as a full table scan (because the joint index does not comply with the leftmost prefix principle at this time, it is much slower than the ordinary single-column index query) . Therefore, a full table scan will be performed at this time.

I have to ask if there are children's shoes. You have a lot of nonsense here, but you still haven't answered your initial doubts! ! !

Otherwise, the analysis above has already been answered. We carefully observe the initial user table, and what is the difference between the student table at this time.

In the user table, compared with the student table, the sex field is missing. However, the joint index they establish is the same KEY (name, age, address).

Therefore, in user, our initial sql statement is equivalent to,

-- 最初的sql
EXPLAIN select * from user where address='beijing';
-- 等同于
EXPLAIN select id,name,age,address from user where address='beijing';

This structure is the situation we discussed above: it does not meet the leftmost prefix principle, but it meets the index coverage. In this case, it will be indexed.

in conclusion

Then, the conclusion comes out. It is not that the leftmost prefix principle has failed, nor that Mysql has become smarter, but that the table structure created at this time and the sql statement of the query just conform to the index coverage. It's really a false alarm! !

Guess you like

Origin blog.csdn.net/qq_39809613/article/details/107150475