The most left-prefix principles and MySQL [index] index pushdown optimization

table of Contents

First, the introduction of

Second, the coverage index

Looks like first to speak about the issue before talking about the next joint index of the underlying storage architecture? Joint index retrieval process is what it?

Third, the most left-prefix principles

The most left-prefix defined principles

Fourth, the index pushdown

V. Summary


First, the introduction of

Before starting this article, first of all a clear concept, each node of the clustered index is a B + tree index page , the index page will be based on the provisions of previous good degree to decide how many index page index value of a put.

 

Only non-leaf node index area (Index Entry ), that is, only store the index data.

Leaf node and the index area has a data area (data item) , the index is an index data storage area, the data area of the primary key index and a secondary index tree tree is different:

  • InnoDB primary key table index tree leaf node is stored in the data area of the data of the entire table.
  • InnoDB table secondary index tree leaf node is stored in the primary key index corresponding to the node.

 

It is easy to see that the number of Node B + may redundant stores index values, but because of a small amount of data occupied by the index, so that this redundancy is not affected.

 

Here we look at this question:

In the following table T, if I execute select * from T where k between 3 and 5, you need to perform a search operation several times tree, how much will the scan line?

 

The following is a table of initialization statement.

mysql> create table T (
ID int primary key,
k int NOT NULL DEFAULT 0, 
s varchar(16) NOT NULL DEFAULT '',
index k(k))
engine=InnoDB;

insert into T values(100,1,'aa'),(200,2,'bb'),(300,3,'cc'),(500,5,'ee'),(600,6,'ff'),(700,7,'gg');

 

InnoDB indexes organizational structure

现在,我们一起来看看这条 SQL 查询语句的执行流程:

  1. K = 3 to find the record in the index tree k acquires ID = 300;
  2. Then the index tree ID corresponding to ID = 300 found R3;
  3. Remove the index tree in a k value k = 5, to obtain ID = 500;
  4. Back to the index tree ID corresponding to ID = R4 found 500;
  5. Remove the index tree in a k value k = 6, the condition is not satisfied, the loop ends.

 

因为B+树的叶子节点之间都按循序用指针连接在一起,所以k的索引树找到3的叶子节点之后直接根据叶子节点的后继指针到5结点就行了,依次同理,什么时候结点的索引值k不符合查询条件中的范围要求了就结束查询。

 

在这个过程中,回到主键索引树搜索的过程,我们称为回表。可以看到,这个查询过程读了 k 索引树的 3 条记录(步骤 1、3 和 5),回表了两次(步骤 2 和 4)。

 

在这个例子中,由于查询结果所需要的数据只在主键索引上有,所以不得不回表。那么,有没有可能经过索引优化,避免回表过程呢?这里就引入了覆盖索引的概念。

 

Second, the coverage index

If the statement is executed select ID from T where k between 3 and 5, then only need to check the value of the ID, and the ID value of the index k in the tree has a node data area (secondary index tree is stored which corresponds the primary key value), it can provide a direct query results, you do not need back to the table . In other words, this query inside, the index k has been "covered" the needs of our inquiry, we called a covering index .

 

Since the covering index can reduce the number of searches of the tree, significantly improve query performance, so use a covering index is a common performance optimization tools.

 

Note that, in the interior of the engine cover using the index on the index k actually read three records, R3 ~ R5 (entries on the corresponding index k), but for the MySQL Server layer, it is looking to take the engine to the two records, so the number of scanning lines is considered MySQL 2.

 

Covering index based on the description above, we discuss a problem: on a public information sheet, it is necessary to establish whether the ID number and the name of the joint index (the index is a way to create a joint covering index creation) ?

 

Assuming that this definition of the public table is as follows:

CREATE TABLE `tuser` (
  `id` int(11) NOT NULL,
  `id_card` varchar(32) DEFAULT NULL,
  `name` varchar(32) DEFAULT NULL,
  `age` int(11) DEFAULT NULL,
  `ismale` tinyint(1) DEFAULT NULL,
  PRIMARY KEY (`id`), -- 这个表的主键是单独建了一个id,不是用的身份证号,因为身份证号太长了,用它作为主键太占空间
  KEY `id_card` (`id_card`),
  KEY `name_age` (`name`,`age`)
) ENGINE=InnoDB

 

We know that the ID number is a unique identifier of the public. That is, if there is a query ID number according to the information needs of the public, as long as we build an index on the ID number field is enough ( so just first have to find the corresponding ID number corresponding to the ID number in the primary key index tree ID , and then under the tree from the primary key index to find the primary key ID to find the appropriate resident information, only to B + to do twice the number of queries ). The re-establishment of a (ID number, name) of the joint index ( this index only need to build a query, you can find the data needed to find the corresponding index entry in the index, it does not come back to the table the primary key index query the tree ), is not a waste of space (because each build an index, you need to create a B + tree and the corresponding data stored inside)?

 

 If there is now a high-frequency request, to query his name from the public ID number, the joint index to makes sense. It can be used on the high-frequency request covering index (an index entry in the index field covered fields to be queried, which is covered by the index) , so you can check their names according to ID number directly in the secondary index tree, because Although the data area of the leaf node is still its auxiliaries corresponding primary key storage, but in the index area of the storage node is to create good (ID number + name) joint index, it can obtain the required data directly from the index area, not then we need to return to the table to check the entire line recording, reduce the execution time of the statement.

 

Of course, maintaining an index field is always a price (see detailed index maintenance ). Therefore, when creating a redundant indexes to support a covering index will need to consider the trade-offs. This is the business DBA (database administrator), or as the work of business data architect.

 

 

 Looks like first to speak about the issue before talking about the next joint index of the underlying storage architecture? Joint index retrieval process is what it?

A plurality of index fields combined to construct the B + tree, when a plurality of fields will be defined in the order of indexes into the index area of ​​all nodes, as shown above in FIG. Search conducting comparative index tree index when multiple fields from left to right is the comparison of (from top to bottom on the map view is the comparison).

 

Above figure as an example query process in three ways:

  1. The first field data and the retrieval index field is not the same
  2. Data retrieval index field and the first field of the same
  3. As with the first two fields of data retrieval index field

 

  • A situation : the condition where data is to be retrieved (10003, XXX, XXX), start from the root to retrieve discovery is larger than 10002, and 10004 and then find the right is relatively smaller than it is, so it is directly to the right into the data page , do not have to compare the two fields behind the index
  • Case 2 : where conditions of data to be retrieved is (10001, Assiatant), as with the case of a process, go directly into the field in comparison to the first complete index to the left of the data page, find the first field and query data the first field three nodes of the left pages of data are the same, it would then compare the second field, and found that the left-most node, as it directly is this node, although the index area has three fields, but only two search field, then do not control the third field, and this is the direct field. This is the most left-prefix execution talk about the principles behind this case is the most left-prefix principles .
  • Three cases : where the condition data is to be retrieved (10003, Staff, XXX), this case is finished in accordance with a first comparative case, to the right into the data page, complete found comparing the second field according to a second case is the same, but also a field of data retrieval will then to compare the third region index field
  • Additional circumstances : where the data to be retrieved if the conditions are (10003), then he will follow the case of a procedure to retrieve the data to the right page, and then find the right data field of the first index page of all nodes are 10003, because only one search field (10003), then directly to the two nodes get all the right data page. This is also the most left-prefix principles of application

 

Third, the most left-prefix principles

Here you will see there is a doubt, if for each query design an index, the index is not too much. If I want to follow the public's identification number to check his home address? Although the probability of this query in demand in the business is not high, but we can not let it go full table scan, right? Conversely, create a separate (ID number, address) of the index is not a frequent requests and feeling a little wasted . What should I do?

 

Here, let me say to you your conclusion. This B + tree index structure index can be used to "leftmost prefix", to locate the record.

 

To visually illustrate this concept, we use (name, age) this joint index to analyze.

(Name, age) index schematic

It can be seen, the index entries are sorted in order of appearance in accordance with the index fields defined inside.

 

When your needs are found in all logical name is "Joe Smith" who can quickly navigate to ID4, and then iterate backwards to get all the results you want.

 

If you want to check all the names of the first word is "Zhang" people, the condition of your SQL statement is "where name like 'Zhang%'." At this point, you can spend the index, find the first matching record is ID3, then iterate backwards until the conditions are not satisfied.

 

 These are the principles of the most left-prefix principles to speed up retrieval. All of the most left-prefix principles of retrieval are like this, if the number is less than the number of index entries to retrieve items, such as just search for "John Doe", if the index entry in the index tree is only one seating, so it made a direct Joe Smith nodes. If you have more after the seating of the nodes, the nodes take multiple returns (this process is to find the first node Joe Smith because the index trees are ordered, then the other nodes Joe Smith also stored in the order next node, directly to the right until the access node traversal does not meet the conditions so far), and then back to the table one by one on the line.

 

The most left-prefix defined principles:

It can be seen not only all the definitions of the index, as long as the most left-prefix, you can use the index to speed up retrieval. The most left-prefix can be a joint index leftmost N fields, it can also be a string index leftmost M characters. That N fields you want to query in the leftmost including a joint index of N fields (but not too much happens is this N fields, the order they have the same ), then you do not need offenders N Czech field to create a separate index, the direct use of the existing joint index can, you can play the same effect.

 

In this use as the leftmost character of the match, '% 23', which is not in line with the principle of the most left-prefix, it can not use the index, this index is not take anything, our only one to traverse, because index comparisons are from left to right to start, if this start with one percent, no way to compare it down, so it can not use the index.

 

But like the above case in some cases it is very convenient

For example: table stored in the url url

www.baidu.com

www.360.com

www.null.xyz

I am now looking for xyz domain name suffix, you need to look like '% com' queries, but makes it impossible to use the index query can lead to very slow

 

Solution of this is backwards insert domain data

moc.udiab.www

When this query is 'com%', so you can use the index, which is a little trick when using the database.

 

Based on the instructions on the face of the most left-prefix of the index, we discuss a problem: when the establishment of the joint index, how to arrange the order of the fields in the index .

 

Here our evaluation criteria, the index multiplexing capability . Because it can support the most left-prefix, so when've got (a, b) the joint index, generally do not need a separate index on a. Therefore, the first principle is that if by adjusting the order, you can maintain a low index, then the order of priorities is often used .

 

 So now you know, the question at the beginning of this section, we want to create a high-frequency request (ID number, name) the joint index, "according to the ID number inquiry address" the needs of the demand is not high frequency, we did not necessary for maintaining a (ID number, address) of the joint index. And if you want to improve your search query speed ID number based on the address, you have a need (ID number) index, or else MySQL will traverse the entire B + tree to query the content. And then we have the most left-prefix principles can come in handy, you can (ID number, name) joint index with a request to act as a high-frequency power school (ID number) Index directly According to that principle, with the index support "according to ID number inquiry address" demand (by the joint index found quickly find the ID number corresponding to the primary key ID, only to quickly find the leaf nodes corresponding to the primary key ID from the primary key index tree, he can find the ID of all including the address data).

 

So, if both the joint inquiry, but also based on a, b respective fields of inquiry do? Query which only b statements can not be used (a, b) the joint index, which is not consistent with the principles of the most left-prefix. This time you have to maintain another index, which means you need to maintain (a, b), (b) the two indexes.

 

At this time, we want the principle consideration is the space of. For example, the above situation the public table, name field is greater than the age of the field, then I suggest that you create a (name, age) of the joint index and a single-field index (age) of. Do not establish (age, name) joint index and (name) index, although the effect is the same, but the latter name is stored in two large fields, more space.

 

 

Fourth, the index pushdown

Speaking of the period we meet the most left-prefix principles, the most left-prefix can be used to locate records in the index. At this point, you may want to ask, those parts do not meet the leftmost prefix, what will happen?

 

We were still members of the public joint index table (name, age), for example. If there is now a demand: to retrieve the table. "The first word is the name of Zhang, aged 10 years old and all the boys." So, SQL statements are written so:

mysql> select * from tuser where name like '张%' and age=10 and ismale=1;

 

You already know the prefix index of rules, so this statement in a search index tree when only "Zhang" Records Found ID3 first meet the conditions. Of course, this is also good, better than a full table scan is better.

 

然后呢?当然是判断其他条件是否满足。

 

在 MySQL 5.6 之前,只能从 ID3 开始一个个回表。到主键索引上找出数据行,再对比字段值。而 MySQL 5.6 引入的索引下推优化(index condition pushdown), 可以在索引遍历过程中,对索引中包含的字段先做判断,直接过滤掉不满足条件的记录,减少回表次数。

 

图1和图2,是这两个过程的执行流程图。

1 No execution flow index map pushdown

 

Figure 2 index pushed down the implementation process

In FIG 1 and FIG 2 two inside each dotted arrow indicates back to the table once.

图 1 中,在 (name,age) 索引里面我特意去掉了 age 的值,这个过程 InnoDB 并不会去看 age 的值,只是按顺序把“name 第一个字是’张’”的记录一条条取出来回表。因此,需要回表 4 次。

图 2 跟图 1 的区别是,InnoDB 在 (name,age) 索引内部就判断了 age 是否等于 10,对于不等于 10 的记录,直接判断并跳过。在我们的这个例子中,只需要对 ID4、ID5 这两条记录回表取数据判断,就只需要回表 2 次。

 

V. Summary

This article today, you and I continue to discuss the concept of database indexes, including covering indexes, prefix index, the index pushed down. As you can see, while meeting the needs of the statement, as little access to resources is one of the important principles of database design. When we use the database, especially in the design table structure, but also to reduce the consumption of resources as the target


Other related articles: [MySQL] MySQL storage engine and index Comments (clustered index and non-clustered index)
                        [MySQL] InnoDB row format, the data page structure and the index underlying principle analysis
                        [MySQL] InnoDB storage engine, MyISAM storage engine, clustered index , non-clustered index, the relationship carding between their primary key index, secondary index
                        [MySQL] InnoDB index model (B + tree)
                        [MySQL] MySQL lock transaction isolation level Explanation
                        [MySQL] MySQL sub-library sub-table Explanation
                        [MySQL ] Detailed master-slave replication implementation principle


Reference: "MySQL combat 45 stresses" Lin Xiaobin

Published 54 original articles · won praise 47 · views 10000 +

Guess you like

Origin blog.csdn.net/cy973071263/article/details/104550117