Left matching principle, clustered index, back to table query, index coverage, do you really understand

one question

There is a table test, this table in addition to the primary key id, there are three columns a, b, c

Assuming that a composite index index_abc (a, b, c) is built for these three fields, ask, which of the following queries will use the index index_abc?


1. Query one

select * from test where a > 1000 and b > 1000;

2. Query 2

select * from test where a > 1000 and c > 1000

3. Query 3

select * from test where b > 1000 and c > 1000;

This is a classic interview question. From this question, I can ask you related, what is the  left matching principle ? What is a  clustered index ? What is  index coverage ? What is the  return form ?

The following test is for everyone, the following experiment is based on MySQL5.7-InnoDB

Left matching principle

Next to the above question, back to the three queries just now, first of all, how do we know if the query uses an index? Are there any commands that can help us analyze the query statement? The answer is of course yes, then explain command

We explain the above statements separately to see what information:

mysql> explain select * from test where a > 1000 and b > 1000;
+----+-------------+-------+------------+-------+---------------+-----------+---------+------+------+----------+--------------------------+
| id | select_type | table | partitions | type  | possible_keys | key       | key_len | ref  | rows | filtered | Extra                    |
+----+-------------+-------+------------+-------+---------------+-----------+---------+------+------+----------+--------------------------+
|  1 | SIMPLE      | test  | NULL       | range | index_abc     | index_abc | 4       | NULL | 5060 |    33.33 | Using where; Using index |
+----+-------------+-------+------------+-------+---------------+-----------+---------+------+------+----------+--------------------------+

mysql> explain select * from test where a > 1000 and c > 1000;
+----+-------------+-------+------------+-------+---------------+-----------+---------+------+------+----------+--------------------------+
| id | select_type | table | partitions | type  | possible_keys | key       | key_len | ref  | rows | filtered | Extra                    |
+----+-------------+-------+------------+-------+---------------+-----------+---------+------+------+----------+--------------------------+
|  1 | SIMPLE      | test  | NULL       | range | index_abc     | index_abc | 4       | NULL | 5060 |    33.33 | Using where; Using index |
+----+-------------+-------+------------+-------+---------------+-----------+---------+------+------+----------+--------------------------+

mysql> explain select * from test where b > 1000 and c > 1000;
+----+-------------+-------+------------+-------+---------------+-----------+---------+------+-------+----------+--------------------------+
| id | select_type | table | partitions | type  | possible_keys | key       | key_len | ref  | rows  | filtered | Extra                    |
+----+-------------+-------+------------+-------+---------------+-----------+---------+------+-------+----------+--------------------------+
|  1 | SIMPLE      | test  | NULL       | index | NULL          | index_abc | 12      | NULL | 10120 |    11.11 | Using where; Using index |
+----+-------------+-------+------------+-------+---------------+-----------+---------+------+-------+----------+--------------------------+

 

We can see that after the explain statement is executed, 12 columns of information are returned. The description of each column is as follows:

Cloumn Meaning
id  Query identifier
select_type Query type
table Output row table
partitions Matching partition
type The connection type, to be exact, is a way for the database engine to look up the table
possible_keys Possible indexes that can be selected, but not necessarily actually used by the query
key Actually selected index
key_len The length of the selected key
ref Columns compared to index
rows Estimate the column to be queried
filtered Percentage of rows filtered by table conditions
Extra other information

Usually analyze sql statement, we only focus on type, possible_keys, key, rows

After explaining three query statements, we found that:

  • where a> 1000 and b> 1000 is the same as where a> 1000 and c> 1000, the index search method specified by type is range, possible_keys may use index_abc, and the actual index used by key is index_abc
  • where b> 1000 and c> 1000 In the conditional query, the value of type is index, possible_keys is NULL, and the value of key is index_abc


What is the difference between range and index above?

  • range : Retrieve only the rows in a given range, use the index to select rows
  • index : The index join type is the same as ALL, except that the index tree is scanned. There are two cases:
    • If the index is queried covering index (as described later in speaking), and may be required to meet all of the data in the table, only the scan index tree. In this case, the "extra" (Extra) column indicates the use of an index. Index-only scans are usually faster than full scans, because the size of the index is usually smaller than the table data
    • Use reads in the index to perform a full table scan to find rows of data in index order. The used index is not displayed in the "extra" column, that is to say: if it is not a covering index , the used index is not displayed in the "extra" column

in other words,

Range uses an index, and can use the fast search method on the corresponding index tree for fast search. It is a range search. If you use range, you must use the index we built, and index can only be scanned by scanning the entire Index tree

All is also mentioned above, so what are the more common values ​​of type? Listed below (specific other types of values, see to refer to official documents ):

  • system : This table has only one row (= system table). This is a special case of const join type
  • const : indicates that it is found by indexing once, because it only matches one row of data, so it is very fast. If you put the primary key in the where list, MySQL can convert the query to a constant table with at most one matching row, and read the row at the beginning of the query. Since there is only one row, the rest of the optimizer can treat the values ​​in the columns in that row as constants. When comparing all parts of the primary key or UNIQUE index to constant values, const is used
  • eq_ref : Unique index scan. For each row combination in the previous table, a row will be read from this table. Commonly used for primary key or unique index scans. In addition to system and const types, this is the best connection type
  • ref : non-unique index scan, for each row combination in the previous table, all rows with matching index values ​​will be read from this table
  • ALL : Will traverse the entire table to find matching rows

Ok, back to the above three query statements, why the where condition is a> 1000 and b> 1000 and a> 1000 and c> 1000 The type is range (use index), and the where condition is b> 1000 and c> 1000 Is the type index? The construction method and storage structure of the index tree (B + tree)

So what is the composite index B + tree? Look at the picture, a picture is worth a hundred words

For the index, it is only a few more columns than the single-value index, and these index columns all appear in the index tree. For compound indexes, the storage engine will first sort according to the first index column. As shown in the figure above, we can look at the first index column, for example, 1 1 4 15 18 .... he is monotonically increasing; if the first column is equal Then sort according to the second column, and in turn constitute the index tree of the above picture

Taking the created index index_abc (a, b, c) as an example, as shown in the above figure, each node has three key values, corresponding to the three index columns a, b, and c from top to bottom.

When constructing the index tree, first use the index tree constructed by the first column of the multi-column index. Taking index_abc (a, b, c) as an example, it is preferred to use column a to build, and when the value of column b is equal, then sort by column c

Therefore, the first column of the index, column a, can be said to increase monotonically from left to right, but we see that columns b and c do not have this feature, they can only be in this small range when the value of column a is equal Increase, see the node in the lower left corner of the above picture to understand this

Key points : Because the composite index is built from the left to the right according to the order of the index column when you built the index (index_abc (a, b, c)), so you must also use the left to rule the right to use, this is the index of the left matching principle

So why the above where a> 1000 and b> 1000 and where a> 1000 and c> 1000 conditional query type is range, and where b> 1000 and c> 1000 type is index, do you understand?

Back to the table, clustered index

As we all know, a characteristic of B + tree is that its leaf nodes store keywords and data, and non-leaf nodes store index keywords. Then in the B + tree constructed by the composite index, its leaf nodes store What is the data? Answer the primary key value of the data

Key point : That is to say, the process of using the composite index to find data is to find the primary key value of the corresponding data on the B + tree of the composite index (ID, Note: The index leaf node of MyISAM stores the record pointer), and then according to this The primary key (ID) value, go to the primary key index tree (B + tree) to find the row record where the ID is located (keywords stored in the page subnodes of the primary key index tree and corresponding row record data), and finally the search ends. This search process operation is called back to the table query

Have you noticed that in the B + tree, some leaf nodes store row records, and a bit stores the primary key value

Key points :

  • Leaf node storage rows of the index called the clustered index , InnoDB must have, and only one clustered index:
    • If the primary key defined, the primary key index is clustered index
    • If no primary key is defined, the first not NULL unique column is the clustered index
    • Otherwise, InnoDB will create a hidden row-id as a clustered index
  • Leaf node stores the primary key values called the general index, also known as non-clustered index


Covering index

Still the above example, let's look again at the information after the query where the condition is b> 1000 and c> 1000

mysql> explain select * from test where b > 1000 and c > 1000;
+----+-------------+-------+------------+-------+---------------+-----------+---------+------+-------+----------+--------------------------+
| id | select_type | table | partitions | type  | possible_keys | key       | key_len | ref  | rows  | filtered | Extra                    |
+----+-------------+-------+------------+-------+---------------+-----------+---------+------+-------+----------+--------------------------+
|  1 | SIMPLE      | test  | NULL       | index | NULL          | index_abc | 12      | NULL | 10120 |    11.11 | Using where; Using index |
+----+-------------+-------+------------+-------+---------------+-----------+---------+------+-------+----------+--------------------------+

 

According to the left-matching principle of the index we just talked about  , this query should not effectively use the index_abc we built. Why is the key (actually used index) column index_abc? Here involves the coverage index

What is a covered index? The covering index  is: SQL only needs to return the data required by the query through the index, without having to find the primary key through the secondary index and then query the data (that is, back to the table query )

It is not difficult to understand, because our test table originally has only four fields, id, a, b, c, where (a, b, c) establishes the column index, id is the primary key, and the leaf node of the composite index tree is stored It is the primary key value, so the data searched by select * from test where b> 1000 and c> 1000 can be obtained through the compound index tree, and there is no need to return to the table, so the index is used here. What index is this index tree actually What about index trees? , Of course index_abc, because columns b and c are included in the composite index column

Why the possible_keys column (possibly used index) is NULL, because the search engine cannot find the index starting with column b

Therefore, using column index coverage, Extra column also has the column Using index


Finally, why does a> 1000 and b> 1000 and b> 1000 and a> 1000, explain the same result?

mysql> explain select * from test where a > 1000 and b > 1000;
+----+-------------+-------+------------+-------+---------------+-----------+---------+------+------+----------+--------------------------+
| id | select_type | table | partitions | type  | possible_keys | key       | key_len | ref  | rows | filtered | Extra                    |
+----+-------------+-------+------------+-------+---------------+-----------+---------+------+------+----------+--------------------------+
|  1 | SIMPLE      | test  | NULL       | range | index_abc     | index_abc | 4       | NULL | 5060 |    33.33 | Using where; Using index |
+----+-------------+-------+------------+-------+---------------+-----------+---------+------+------+----------+--------------------------+

mysql> explain select * from test where b > 1000 and a > 1000;
+----+-------------+-------+------------+-------+---------------+-----------+---------+------+------+----------+--------------------------+
| id | select_type | table | partitions | type  | possible_keys | key       | key_len | ref  | rows | filtered | Extra                    |
+----+-------------+-------+------------+-------+---------------+-----------+---------+------+------+----------+--------------------------+
|  1 | SIMPLE      | test  | NULL       | range | index_abc     | index_abc | 4       | NULL | 5060 |    33.33 | Using where; Using index |
+----+-------------+-------+------------+-------+---------------+-----------+---------+------+------+----------+--------------------------+

This is the time for our mysql  query optimizer to  work. The mysql query optimizer will determine the order in which the sql statement should be corrected to achieve the highest efficiency, and finally generate a real execution plan.


At this point, the left matching principle of the index, clustered index, back to the table query, covering the index is over

If there is something wrong, please correct me and communicate

 

Like it and go ~ thxs ~~~~

Guess you like

Origin www.cnblogs.com/CNYYGJ/p/12677690.html