To be honest, after reading it, it will keep you away from index failure, so don't hurry up and learn

Hello everyone, my name is Xiaolin.

At work, if we want to improve the query speed of a statement, we usually want to build an index on the field.

But indexes are not a panacea. The establishment of an index does not mean that any query statement can go through an index scan.

If you don't pay attention, the query statement you write may cause the index to fail, and thus a full table scan is performed. Although the query result is fine, the query performance is greatly reduced.

Today, I'm going to share with you, 6 common scenarios where index failure occurs.

Not only will it be explained with experimental cases, but also the reasons for the failure of each index will be clear .

Go!

What does the index storage structure look like?

Let's first take a look at what the index storage structure looks like? Because only knowing the storage structure of the index can better understand the problem of index failure.

The storage structure of the index is related to which storage engine MySQL uses, because the storage engine is responsible for persisting data in the disk, and the index data structure used by different storage engines will be different.

MySQL's default storage engine is InnoDB, which uses B+Tree as the index data structure. As for why B+ Tree is chosen as the index data structure, for a detailed analysis, see my article: Why MySQL Likes B+ Trees?

When creating a table, the InnoDB storage engine creates a primary key index by default, that is, a clustered index, and other indexes belong to secondary indexes.

MySQL's MyISAM storage engine supports a variety of index data structures, such as B+ tree indexes, R tree indexes, and Full-Text indexes. When the MyISAM storage engine creates a table, the created primary key index uses a B+ tree index by default.

Although both InnoDB and MyISAM support B+ tree indexes, their data storage structures are implemented differently. The difference is:

  • InnoDB storage engine: the leaf nodes of the B+ tree index save the data itself;
  • MyISAM storage engine: the physical address where the leaf nodes of the B+ tree index store data;

Next, I will give an example to show you the difference between the index storage structures of the two storage engines.

Here is a t_user table, in which the id field is the primary key index, and the others are ordinary fields.

If the MyISAM storage engine is used, the leaf node of the B+ tree index stores the physical address of the data, that is, the pointer of the user data, as shown in the following figure:

If the InnoDB storage engine is used, the leaf nodes of the B+ tree index save the data itself, as shown in the following figure:

The InnoDB storage engine is divided into clustered index (the above figure is the clustered index) and secondary index according to different index types. The difference between them is that the leaf nodes of the clustered index store the actual data, all complete user data are stored in the leaf nodes of the clustered index, and the leaf nodes of the secondary index store the primary key value, not the actual data.

If the name field is set to a common index, the secondary index will be as shown in the figure below, and the leaf nodes only store the primary key value.

After knowing the storage structure of the clustered index and secondary index of the InnoDB storage engine, let's give a few query statements to explain how the query process chooses which index type to use.

When we use the "primary key index" field as a conditional query, if the data to be queried is in the leaf nodes of the "clustered index", then the corresponding leaf nodes will be retrieved from the B+ tree in the "clustered index" , and then directly read the data to be queried. Such as the following sentence:

// id 字段为主键索引
select * from t_user where id=1;

When we use the "secondary index" field as a conditional query, if the data to be queried is in the leaf nodes of the "clustered index", then we need to retrieve two B+ trees:

  • First find the corresponding leaf node in the B+ tree of the "secondary index" and obtain the primary key value;
  • Then use the primary key value obtained in the previous step to retrieve the corresponding leaf node in the B+ tree in the "clustered index", and then obtain the data to be queried.

The above process is called return table , as in the following statement:

// name 字段为二级索引
select * from t_user where name="林某";

When we use the "secondary index" field as a conditional query, if the data to be queried is in the leaf node of the "secondary index", we only need to find the corresponding leaf node in the B+ tree of the "secondary index", and then read Fetching the data to be queried is called a covering index . Such as the following sentence:

// name 字段为二级索引
select id from t_user where name="林某";

The conditions of the above query statements all use the index column, so the index is used in the query process.

But it does not mean that if the query condition uses the index column, the query process must use the index. Next, let's take a look at what conditions will lead to the implementation of the index, and a full table scan will occur.

First of all, in the following experimental case, the MySQL version I use is 8.0.26.

Use left or left fuzzy matching for indices

When we use left or left fuzzy matching, that is, like %xx or like %xx%, these two methods will cause the index to fail.

For example, in the following like statement, to query users whose name suffix is ​​"forest", type=ALL in the execution plan represents a full table scan without going through the index.

// name 字段为二级索引
select * from t_user where name like '%林';

If the query is for users whose name prefix is ​​forest, then index scan will be performed. Type=range in the execution plan means index scan, and key=index_name will see that the index_name index is actually gone:

// name 字段为二级索引
select * from t_user where name like '林%';

Why can't the left or left and right fuzzy matching of the like keyword go through the index?

Because the index B+ tree is stored in order according to the "index value", it can only be compared according to the prefix.

For example, the following secondary index map is stored in order by the name field.

Suppose we want to query the data whose name field is prefixed with "forest", that is, name like 'forest%', the process of scanning the index:

  • First node query comparison: The pinyin size of the word Lin is larger than the word Chen in the first index value of the first node, but smaller than the word Zhou in the second index value of the first node, so choose to go to node 2 to continue the query;
  • Node 2 query comparison: The pinyin size of the Chen word in the first index value of node 2 is smaller than that of the forest word, so continue to look at the next index value and find that node 2 has an index value that matches the prefix of the forest word, so go to the leaf Node query, that is, leaf node 4;
  • Query comparison of node 4: The prefix of the first index value of node 4 matches the word forest, so the data of this row is read, and then the matching continues to the right until no index value with the prefix forest is matched.

If you use name like '%lin' to query, because the results of the query may be "Chen Lin, Zhang Lin, Zhou Lin", etc., so you don't know which index value to start with, so you can only pass the full table Scan to query.

If you want to learn more about InnoDB's B+ tree query process, you can read this article I wrote: What is stored in the nodes in the B+ tree? What is the process of querying data?

Use functions on indexes

Sometimes we will use some of MySQL's own functions to get the results we want. At this time, we should pay attention. If the function is used on the index field in the query condition, it will cause the index to fail.

For example, in the query condition of the following statement, the LENGTH function is used for the name field, and type=ALL in the execution plan represents a full table scan:

// name 为二级索引
select * from t_user where length(name)=6;

Why can't you use the function for the index, you can't use the index?

Because the index stores the original value of the index field, not the value calculated by the function, there is no way to go through the index.

However, since MySQL 8.0, the index feature has added a function index, that is, an index can be established for the value calculated by the function, that is to say, the value of the index is the value calculated by the function, so the data can be queried by scanning the index.

For example, I use the following statement to create an index named idx_name_length on the result of length(name).

alter table t_user add key idx_name_length ((length(name)));

Then I use the following query statement, and the index will be taken at this time.

Evaluate an expression on an index

It is also impossible to use the index to perform expression calculation on the index in the query condition.

For example, in the following query statement, type = ALL in the execution plan, indicating that data is queried by full table scan:

explain select * from t_user where id + 1 = 10;

However, if the condition of the query statement is changed to where id = 10 - 1, then the expression calculation will not be performed in the index field, so the index query can be performed.

Why can't the index be used for the expression calculation of the index?

The reason is similar to using functions on indexes.

Because the index saves the original value of the index field, not the value calculated by the id + 1 expression, the index cannot be used, but the value of the index field can only be taken out, and then the expression is calculated in turn. Conditional judgment, so a full table scan is used.

Some students may say that this kind of simple expression calculation for the index should be able to perform index scan under the special processing of the code, for example, id + 1 = 10 becomes id = 10 - 1.

Yes, it can be implemented, but MySQL still steals this laziness and does not implement it.

My thinking is that it may also be because there are various situations in expression calculation, and if each of them has to be considered, the code may be very bloated, so simply tell the programmer of this kind of index failure scenario, and let the programmer himself ensure that in Do not perform expression calculation on indexes in query conditions.

Implicit type conversion for indexes

If the index field is a string type, but in the conditional query, the input parameter is an integer type, you will find in the execution plan result that this statement will perform a full table scan.

I added a phone field to the original t_user table, which is a secondary index and the type is varchar.

Then I use integers as input parameters in the conditional query. At this time, type = ALL in the execution plan, so the data is queried through a full table scan.

select * from t_user where phone = 1300000001;

However, if the index field is an integer type, even if the input parameter in the query condition is a string, it will not cause the index to be implemented, and the index scan can still be performed.

Let's look at the second example, the id is an integer, but the following statement still goes through an index scan.

 explain select * from t_user where id = '1';

Why does the first example invalidate the index, but not the second?

To understand this reason, first we need to know what are MySQL's data type conversion rules? It is to see whether MySQL converts strings into numbers for processing, or converts numbers into strings for processing.

When I read "When mysql45 Talks", I saw a simple test method, which is to know what the data type conversion rules of MySQL are by selecting the result of "10" > 9:

  • If the rule is that MySQL will automatically convert "string" into "number", which is equivalent to select 10 > 9, this is a number comparison, so the result should be 1;
  • If the rule is that MySQL will automatically convert "numbers" into "strings", it is equivalent to select "10" > "9", this is a string comparison, and the string comparison size is compared bit by bit from high to low (press ascii code), then the "10" string is equivalent to the combination of "1" and "0" characters, so first compare the "1" character with the "9" character, because the "1" character is smaller than the "9" character, so The result should be 0.

In MySQL, the result of the execution is as follows:

The above result is 1, indicating that  when MySQL encounters a string and a number comparison, it will automatically convert the string to a number, and then perform the comparison .

The query statement in the previous example 1, I also told you that it will perform a full table scan:

//例子一的查询语句
select * from t_user where phone = 1300000001;

This is because the phone field is a string, so MySQL will automatically convert the string to a number, so this statement is equivalent to:

select * from t_user where CAST(phone AS signed int) = 1300000001;

It can be seen that the CAST function acts on the phone field, and the phone field is an index, that is, the function is used for the index! As we said earlier, using a function on an index will cause the index to fail .

The query statement in Example 2, I told you that it will go through index scan:

//例子二的查询语句
select * from t_user where id = "1";

At this time, because the string part is an input parameter, it is necessary to convert the string to a number, so this statement is equivalent to:

select * from t_user where id = CAST("1" AS signed int);

It can be seen that the index field does not use any function, the CAST function is used for the input parameter, so it can go through the index scan.

union index non-leftmost match

An index built on the primary key field is called a clustered index, and an index built on a common field is called a secondary index.

Then the index created by combining multiple common fields together is called a joint index , also called a combined index.

When creating a joint index, we need to pay attention to the order of creation, because the joint index (x, y, z) and (z, y, x) will be different when used.

To be able to use the joint index correctly, it needs to follow the leftmost matching principle , that is, the index matching is performed according to the leftmost first method.

For example, if a (a, b, c) joint index is created, the joint index can be matched if the query conditions are as follows:

  • where a=1;
  • where a=1 and b=2 and c=3;
  • where a=1 and b=2;

Note that because of the query optimizer, the order of the x fields in the where clause does not matter.

However, if the query conditions are the following, because the leftmost matching principle is not met, the joint index cannot be matched, and the joint index will be invalid:

  • where b=2;
  • where c=3;
  • where b=2 and c=3;

There is a special query condition: where a = 1 and c = 3, does it match the leftmost match?

This is actually an index truncation in a strict sense, and different versions handle it differently.

In MySQL 5.5, the first a will go to the index. After the primary key value is found in the joint index, it will start to return to the table, read the data row from the primary key index, and then compare the value of the z field.

Since MySQL 5.6, there is an index push-down function , which can first judge the fields contained in the index during the index traversal process, and directly filter out the records that do not meet the conditions, reducing the number of returns to the table.

The general principle is: the truncated field will be pushed down to the storage engine layer for conditional judgment (because the value of the c field is in the (a, b, c) joint index), and then the qualified data will be filtered out and then returned to Server layer. Since a large amount of data is filtered out at the engine layer, there is no need to read data from the table to make judgments, reducing the number of times of returning to the table, thereby improving performance.

For example, in the following where a = 1 and c = 0 statement, we can use the index pushdown function from the Extra=Using index condition in the execution plan.

Why does the joint index fail to follow the leftmost matching principle?

The reason is that in the case of a joint index, the data is sorted according to the first column of the index, and the second column will be sorted only when the data in the first column is the same.

That is, if we want to use as many columns as possible in the joint index, each column in the query condition must be the column that is consecutive from the leftmost in the joint index. If we only search by the second column, we will definitely not be able to walk the index.

OR in the WHERE clause

In the WHERE clause, if the conditional column before the OR is an indexed column, and the conditional column after the OR is not an indexed column, the index will fail.

For example, in the following query statement, id is the primary key and age is a common column. From the result of the execution plan, it is a full table scan.

select * from t_user where id = 1 or age = 18;

This is because the meaning of OR is that only one of the two can be satisfied, so it is meaningless that only one conditional column is an index column. As long as the conditional column is not an index column, a full table scan will be performed.

The solution is as simple as setting the age field as an index.

You can see that type=index merge, index merge means to scan id and age respectively, and then merge these two result sets. The advantage of doing this is to avoid full table scan.

Summarize

Today, I will introduce 6 situations in which index failure will occur:

  • When we use left or left fuzzy matching, that is, like %xx or like %xx%, both methods will cause the index to fail;
  • When we use a function on the index column in the query condition, it will cause the index to fail.
  • When we perform expression calculation on the index column in the query condition, the index cannot be used.
  • When MySQL encounters a string and number comparison, it will automatically convert the string to a number, and then perform the comparison. If the string is an index column, and the input parameter in the conditional statement is a number, then the index column will undergo implicit type conversion. Since the implicit type conversion is implemented through the CAST function, it is equivalent to using a function for the index column, so will cause the index to fail.
  • To be able to use the joint index correctly, it needs to follow the leftmost matching principle, that is, the index matching is performed in the leftmost first way, otherwise the index will be invalid.
  • In the WHERE clause, if the conditional column before the OR is an indexed column, and the conditional column after the OR is not an indexed column, the index will fail.

Finally, I leave a very interesting question for you to think about.

  • Topic 1 : A table has multiple fields, among which name is an index field, other non-index, id has an auto-incrementing primary key index.
  • Topic 2 : A table has 2 fields, where name is an index field, and id has an auto-incrementing primary key index.

For the above two tables, execute the following query statements respectively:

  • select * from s where name like "xxx"
  • select * from s where name like "xxx%"
  • select * from s where name like "%xxx"
  • select * from s where name like "%xxx%"

For the data tables of topic 1 and topic 2, which ones trigger index queries and which ones don't?

Original link:
https://mp.weixin.qq.com/s/lEx6iRRP3MbwJ82Xwp675w

Author: Kobayashi coding

If you think this article is helpful to you, you can retweet, follow and support

Guess you like

Origin blog.csdn.net/m0_67645544/article/details/124429552