This article takes you to understand the single table access method of MySQL

foreword

For us MySQLusers, MySQLit is actually a piece of software, which is the one we use the most 查询功能. DBAFrom time to time, throw some over 慢查询语句for optimization. If we don’t even know how the query is executed, what should we optimize, so it’s time to master the real technology.

1. Data preparation

In the previous article, we know that MySQL Serverthere is a 查询优化器module called . After a query statement is parsed, it will be handed over to the query optimizer for optimization. The result of the optimization is to generate a so-called execution plan. This execution plan shows that Which indexes should be used for query, what is the connection sequence between tables, and finally, the method provided by the storage engine will be called according to the steps in the execution plan to actually execute the query, and the query result will be returned to the user. However, the topic of query optimization is a bit big, and you have to learn how to use it before you learn how to run it, so in this chapter, let’s first take a look at how MySQL executes single-table queries (that is, there is only one table behind the from statement, the simplest kind of query~). However, one thing to emphasize is that you must read the previous parts about record structure, data page structure and index before studying this chapter. If you can't guarantee that these things have been fully mastered, then this chapter is not suitable for you

In order for us to study smoothly, we first create a table:

mysql> USE testdb;

mysql> create table demo8 (    
id int not null auto_increment,    
key1 varchar(100),    
key2 int,    
key3 varchar(100),    
key_part1 varchar(100),    
key_part2 varchar(100),    
key_part3 varchar(100),    
common_field varchar(100), 
primary key (id),
key idx_key1 (key1),    
unique key idx_key2 (key2),    
key idx_key3 (key3),    
key idx_key_part(key_part1, key_part2, key_part3));

A total of 1 clustered (primary key) index and 4 secondary indexes have been created for the demo8 table:

  • A clustered index created for the id column;
  • A secondary index created for the key1 column;
  • A unique secondary index created for the key2 column;
  • A secondary index created for the key3 column;
  • Composite (joint) secondary index created for key_part1, key_part2, key_part3 columns.

Then we need to insert 20,000 records for this table, and insert random values ​​into all columns except the id column.

mysql> delimiter //
create procedure demo8data()
begin    
	declare i int;    
	set i=0;    
	while i<20000 do        
		insert into demo8(key1,key2,key3,key_part1,key_part2,key_part3,common_field) values(substring(md5(rand()),1,2),i+1,substring(md5(rand()),1,3),substring(md5(rand()),1,4),substring(md5(rand()),1,5),substring(md5(rand()),1,6),substring(md5(rand()),1,7));        
		set i=i+1;    
	end while;
end;
//
delimiter ;

mysql> call demo8data();

2. Single table access method

2.1 The concept of access method

We all had the experience of looking up a dictionary when we were in elementary school. When our vocabulary has not reached a certain level, if we want to know the definition of a word we have never seen before, we need to look it up in a dictionary. Under normal circumstances, we will first go to the dictionary directory to find the corresponding pinyin or Chinese radicals, locate the page number of the word, and then directly turn to the corresponding page number to view the corresponding definition. There is also a situation where the dictionaries with hundreds of pages can be searched page by page from the first page to the last page when you are very idle, and you can finally find it. Both ways can get the results we want, but the time and energy spent are very different. The same is true for MySQL. In order to speed up the query data, the concept of B+ tree index is proposed. But sometimes, if we want to query all the data, we have to traverse all the data pages line by line and page by page to get the results.

For the query of a single table, the execution methods of MySQL query are roughly divided into the following two types:

2.1.1 Query using full table scan

This method is what I said above, traversing records line by line, page by page, and adding the records that meet the search conditions to the result set is over. Although this method can meet our needs, the efficiency is undoubtedly the lowest.

2.1.2 Querying with Indexes

Executing the query directly using the full table scan method requires traversing many records, and the cost may be very high. Just like we look up Chinese characters through the dictionary directory, if the search conditions in the MySQL query statement can use an index, then directly using the index to execute the query may speed up the query execution time. The ways in which indexes are used to perform queries can be broken down into many categories:

  • Equivalence queries against primary keys or unique secondary indexes
  • Equivalence queries against common secondary indexes
  • Query by range of indexed columns
  • By directly scanning the entire index

summary

The way MySQL executes a query statement is called an access method or access type. The same query statement may be executed using a variety of different access methods. Although the final query results are the same, the execution efficiency varies widely.

insert image description here

2.2 const

Locate a record by its primary key column:

mysql> select * from demo8 where id=9999;
+------+------+------+------+-----------+-----------+-----------+--------------+
| id   | key1 | key2 | key3 | key_part1 | key_part2 | key_part3 | common_field |
+------+------+------+------+-----------+-----------+-----------+--------------+
| 9999 | 34   | 9999 | 6c7  | 1823      | 24955     | 5deed4    | 3aebe82      |
+------+------+------+------+-----------+-----------+-----------+--------------+
1 row in set (0.00 sec)

Do you still remember the structure of the clustered index B+ tree in the MySQL B+ tree index? The leaf (Leaf) nodes store complete records, and the records in the B+ tree leaf (Leaf) nodes are sorted according to the value of the primary key id column from small to large of. The B+ tree is originally a short and fat man, with unique, ordered, and non-empty attributes, the speed of locating a record according to the primary key value is very fast. Similarly, it is very fast to locate a record based on the unique secondary index column:

mysql> select * from demo8 where key2=8888;
+------+------+------+------+-----------+-----------+-----------+--------------+
| id   | key1 | key2 | key3 | key_part1 | key_part2 | key_part3 | common_field |
+------+------+------+------+-----------+-----------+-----------+--------------+
| 8888 | 12   | 8888 | fc9  | 7810      | 1c7ed     | 5d6dbc    | adbea8c      |
+------+------+------+------+-----------+-----------+-----------+--------------+
1 row in set (0.00 sec)

Based on the knowledge of previous articles, the execution of this query is divided into two steps:

  • The first step is to locate a secondary index record from the B+ tree index corresponding to idx_key2 according to the equivalence comparison condition between the key2 column and the constant.
  • In the second step, the complete user record is obtained from the clustered index according to the id value of the record.

It is very, very fast for MySQL to locate a record through the equivalent comparison between the primary key or the unique secondary index column and the constant, so the access method for locating a record through the primary key or the unique secondary index column is defined as: const , meaning constant level, the cost is negligible.

However, this const access method is only valid when the primary key column or the unique secondary index column is compared with a constant for equality. If the primary key or the unique secondary index is composed of multiple columns, each index in the index Columns need to be compared with constants for equivalence, and this const access method is valid (this is because the only record can be located only if all columns in the index are compared for equivalence).

For a unique secondary index, the case of querying that the column is a NULL value is special:

mysql> select * from demo8 where key2 is null;

The only secondary index column does not limit the number of NULL values, so the above statement may access multiple records. The above statement cannot be executed using the const access method. As for the access method, we will talk about it immediately below

2.3 ref

Ordinary secondary index columns are compared for equality with constants:

mysql> select * from demo8 where key1 = 'd9';
+-------+------+-------+------+-----------+-----------+-----------+--------------+
| id    | key1 | key2  | key3 | key_part1 | key_part2 | key_part3 | common_field |
+-------+------+-------+------+-----------+-----------+-----------+--------------+
|    59 | d9   |    59 | b7a  | 041b      | 5b1cf     | e342ac    | e103738      |
|   182 | d9   |   182 | 8bf  | 2b3c      | 08b7c     | a63ed0    | 40f2c52      |
|   401 | d9   |   401 | 137  | adbd      | dafba     | 581313    | ba72bf5      |
|  2114 | d9   |  2114 | d8b  | bf2a      | 117ae     | 69de3d    | f5467a5      |
|  2758 | d9   |  2758 | 2f7  | a159      | e3707     | f60f38    | 795ec06      |
|  3823 | d9   |  3823 | 682  | 347e      | f6195     | 6faa0d    | 5e55f78      |
|  4351 | d9   |  4351 | e92  | f3b4      | a159e     | d3e013    | e28ca48      |
|  5138 | d9   |  5138 | b41  | 1b10      | b9605     | cbe517    | b267144      |
|  5539 | d9   |  5539 | da1  | d30d      | c59b1     | 0c5d79    | ae57b7d      |
|  5604 | d9   |  5604 | ddc  | f0b0      | 00dbf     | f93c0e    | 3218cff      |
|  6050 | d9   |  6050 | 64e  | 22a4      | 69e69     | 7284ba    | 4a5b7d5      |
|  6147 | d9   |  6147 | 529  | cd38      | 71855     | 434168    | a426cbe      |
|  6428 | d9   |  6428 | a38  | c00f      | 9f710     | 9fb7c5    | a722b51      |
|  6456 | d9   |  6456 | eb7  | d208      | 30539     | ee3ae4    | a6c4870      |
|  6473 | d9   |  6473 | c15  | da29      | cc897     | 1e35c3    | 4b5f135      |
|  6860 | d9   |  6860 | b01  | bad8      | 4e3a9     | e83331    | 0cd3b9d      |
|  7257 | d9   |  7257 | fa6  | d537      | a4afe     | bbc5d8    | e11e937      |
|  7333 | d9   |  7333 | a97  | 5532      | 64097     | cb5d16    | 7e43077      |
|  7867 | d9   |  7867 | 3d7  | b341      | 0b0bb     | 7df721    | 8c64142      |
|  8265 | d9   |  8265 | a16  | 120b      | 9d372     | c17ce4    | c481ace      |
|  8371 | d9   |  8371 | 5be  | 7924      | 313f7     | 293487    | cb52072      |
|  8738 | d9   |  8738 | c05  | 7123      | b61b1     | c8d819    | e310cbf      |
|  9005 | d9   |  9005 | 44e  | e857      | 4075c     | 8460a0    | 409cb1d      |
|  9006 | d9   |  9006 | d8c  | 8c2b      | ed54f     | 3b8bfd    | 268fcce      |
|  9362 | d9   |  9362 | 5cb  | 9d70      | 05937     | 1b70d2    | a866a32      |
|  9449 | d9   |  9449 | ab8  | f9c6      | c1917     | 5ffe25    | ff88471      |
| 10146 | d9   | 10146 | 07f  | 31a7      | c30c4     | 7c2e48    | 6c5c562      |
| 10197 | d9   | 10197 | 85a  | 4796      | e5ff9     | d12af4    | 20be699      |
| 10223 | d9   | 10223 | 94b  | c57e      | adfb6     | b93c19    | a7c944b      |
| 10285 | d9   | 10285 | 9ab  | f33d      | 69e5c     | 35a651    | 0953db7      |
| 10621 | d9   | 10621 | a29  | a92b      | fbf80     | 83f1e2    | d167770      |
| 11133 | d9   | 11133 | 560  | 97af      | 35f38     | ceb1b9    | 6e89ca8      |
| 11265 | d9   | 11265 | fcc  | e7d7      | 0243e     | b52571    | 89ea417      |
| 11557 | d9   | 11557 | 1c0  | 7f66      | 0898f     | d41cfc    | e759975      |
| 11614 | d9   | 11614 | de5  | 00f7      | fb3b3     | 93dc1a    | bbe8993      |
| 11672 | d9   | 11672 | f3f  | 9fe4      | da2dd     | 82f711    | 436f3d4      |
| 12477 | d9   | 12477 | d58  | 0613      | 1df6d     | 40999a    | b748cd2      |
| 13789 | d9   | 13789 | 67b  | 5b30      | ab2f3     | 89f0ec    | 9e2d255      |
| 14050 | d9   | 14050 | 537  | bbdc      | 5e87e     | 4ac153    | 0346558      |
| 14363 | d9   | 14363 | 2ac  | 33f3      | e2b82     | 7e55c1    | 45ee579      |
| 14444 | d9   | 14444 | e47  | 6319      | 851b7     | 1d4c57    | e17a95b      |
| 14635 | d9   | 14635 | 16a  | 4d83      | 52b33     | 376017    | c853bc0      |
| 14646 | d9   | 14646 | 202  | 6fdd      | f2486     | 9900f3    | c29d0d6      |
| 15298 | d9   | 15298 | 074  | a7ee      | 6bc1d     | e96458    | 723b0f8      |
| 15489 | d9   | 15489 | 514  | 0bdc      | fb94c     | db5ce8    | 63797e8      |
| 16895 | d9   | 16895 | 4aa  | 921c      | 00b9e     | f07907    | bce779f      |
| 17587 | d9   | 17587 | 6aa  | 621b      | d521f     | a6c5ad    | 45fac89      |
| 18151 | d9   | 18151 | 87d  | cd74      | f7135     | 47d900    | 211303e      |
| 18255 | d9   | 18255 | 4dc  | b9e7      | 99bf2     | 55d0eb    | 3e6ce6c      |
| 18490 | d9   | 18490 | 6a2  | f0ff      | 85e86     | ed9bb8    | dca2cb4      |
| 18872 | d9   | 18872 | 404  | eeee      | 001c7     | 0e846d    | fae0876      |
| 19018 | d9   | 19018 | 142  | 80d7      | 2b9fd     | 77be32    | d6d8398      |
| 19228 | d9   | 19228 | a5b  | a125      | 795fa     | 108159    | 65acbf5      |
| 19537 | d9   | 19537 | 1ca  | 016a      | 3df13     | 3f5b9c    | 720de00      |
| 19940 | d9   | 19940 | 95c  | 6150      | 2696d     | 3f89b8    | d37d43a      |
| 19945 | d9   | 19945 | 538  | 6378      | a20a9     | 2b7b00    | 1865f1c      |
+-------+------+-------+------+-----------+-----------+-----------+--------------+
56 rows in set (0.00 sec)

Ordinary secondary indexes do not limit the uniqueness of index column values, so it is possible to find multiple corresponding records (as in our example, a total of 56 rows satisfying the conditions were matched), that is to say, the use of secondary indexes to execute queries The cost depends on the number of secondary index records matched by the equivalent value. If there are fewer matching records, the cost of returning to the table is still relatively low, so MySQL may choose to use an index instead of a full table scan to execute the query. MySQL regards this search condition as a comparison between the secondary index column and the constant equivalent value, and the access method that uses the secondary index to execute the query is called: ref.
For an ordinary secondary index, multiple consecutive records may be matched after the equivalent comparison of the index column, instead of matching only one record at most like a primary key or a unique secondary index, so this ref access The method is a bit worse than const, but the efficiency is still very high when the number of matching records is small in the comparison of secondary index equivalents (if there are too many matching secondary index records, the cost of returning to the table will be too high).

Need to pay attention to the following two situations:

  • The case where the secondary index column value is NULL: Whether it is a common secondary index or a unique secondary index, there is no limit to the number of NULL values ​​contained in their index columns, so we use the key IS NULL form of search Conditions can only use accessor methods of ref at most, not const accessor methods.
  • For a secondary index that contains multiple index columns, as long as the leftmost continuous index column is an equivalent comparison with a constant, the ref access method may be usedidx_key_part(key_part1, key_part2, key_part3)
mysql> select * from demo8 where key_part1='6378';
+-------+------+-------+------+-----------+-----------+-----------+--------------+
| id    | key1 | key2  | key3 | key_part1 | key_part2 | key_part3 | common_field |
+-------+------+-------+------+-----------+-----------+-----------+--------------+
| 19945 | d9   | 19945 | 538  | 6378      | a20a9     | 2b7b00    | 1865f1c      |
+-------+------+-------+------+-----------+-----------+-----------+--------------+
1 row in set (0.00 sec)

mysql> select * from demo8 where key_part1='6378' and key_part2='a20a9';
+-------+------+-------+------+-----------+-----------+-----------+--------------+
| id    | key1 | key2  | key3 | key_part1 | key_part2 | key_part3 | common_field |
+-------+------+-------+------+-----------+-----------+-----------+--------------+
| 19945 | d9   | 19945 | 538  | 6378      | a20a9     | 2b7b00    | 1865f1c      |
+-------+------+-------+------+-----------+-----------+-----------+--------------+
1 row in set (0.00 sec)

mysql> select * from demo8 where key_part1='6378' and key_part2='a20a9' and key_part3='2b7b00';
+-------+------+-------+------+-----------+-----------+-----------+--------------+
| id    | key1 | key2  | key3 | key_part1 | key_part2 | key_part3 | common_field |
+-------+------+-------+------+-----------+-----------+-----------+--------------+
| 19945 | d9   | 19945 | 538  | 6378      | a20a9     | 2b7b00    | 1865f1c      |
+-------+------+-------+------+-----------+-----------+-----------+--------------+
1 row in set (0.00 sec)
  • But if the leftmost consecutive index columns are not all equal value comparisons, its access method cannot be called ref:
mysql> select * from demo8 where key_part1='6378' and key_part2>'a20a9';
Empty set (0.00 sec)

2.4 ref_or_null

Find the records whose value of a secondary index column is equal to a certain constant, and also want to find the records whose value is NULL

mysql> select * from demo8 where key1 = 'd9' or key is null;

When the query is executed using a secondary index instead of a full table scan, the access method used by this type of query is called ref_or_null. It is equivalent to first finding two consecutive record ranges of key1 IS NULL and key1 = 'ef' from the B+ tree corresponding to the idx_key1 index, and then returning to the table to find complete user records according to the id values ​​in these secondary index records .

2.5 range

The several access methods introduced before are only possible to use when comparing the index column with a certain constant value (ref_or_null is rather peculiar, and the value of NULL is also calculated), sometimes the search conditions we face More complex:

mysql> select * from demo8 where key2 in (5555,6666) or (key2>=1234 and key2<=1235);
+------+------+------+------+-----------+-----------+-----------+--------------+
| id   | key1 | key2 | key3 | key_part1 | key_part2 | key_part3 | common_field |
+------+------+------+------+-----------+-----------+-----------+--------------+
| 1234 | 67   | 1234 | a2d  | 2779      | 18191     | 96a5b2    | 86c5afa      |
| 1235 | 6b   | 1235 | c48  | 43b4      | 0e1d1     | 27f9a0    | 1c17810      |
| 5555 | 0b   | 5555 | dc2  | 6b61      | dac52     | a0451f    | 187011e      |
| 6666 | 00   | 6666 | 9c3  | 99f8      | d22de     | 283e92    | 656f2f1      |
+------+------+------+------+-----------+-----------+-----------+--------------+
4 rows in set (0.00 sec)

This query can be executed by full table scan, but it can also be executed by using secondary index + table return. If the query is executed by secondary index + table return, then the search condition at this time is not only required The index column matches the equivalent value of the constant, but the index column needs to match a certain or certain range of values. In this query, as long as the value of the key2 column matches any of the following three ranges, the match is successful:

  • The value of key2 is 5555
  • The value of key2 is 6666
  • The value of key2 is between 1234 and 1235

MySQL calls this access method using indexes for range matching:range

2.6 index

mysql> select key_part1,key_part2,key_part3 from demo8  where key_part2 = 'd22de';
+-----------+-----------+-----------+
| key_part1 | key_part2 | key_part3 |
+-----------+-----------+-----------+
| 99f8      | d22de     | 283e92    |
+-----------+-----------+-----------+
1 row in set (0.01 sec)

key_part2 is not the leftmost index column of the joint index idx_key_part, so we cannot use the ref or range access method to execute this statement. But this query meets the following two conditions:

  • Its query list has only 3 columns: key_part1, key_part2, key_part3, and the index idx_key_part contains these three columns
  • There is only key_part2 column in the search condition, this column is also included in the index idx_key_part

You can directly compare the key_part2 = 'd22de' condition by traversing the leaf node records of the idx_key_part index, and add the key_part1, key_part2, and key_part3 column values ​​of the successfully matched secondary index records directly to the result set. Since secondary index records are much smaller than clustered index records (clustered index records store all user-defined columns and so-called hidden columns, while secondary index records only need to store index columns and primary keys), and this process does not need Perform table return operations, so the cost of directly traversing the secondary index is much smaller than directly traversing the clustered index. This method of traversing the secondary index records is called:index

2.7 all

mysql> select * from demo8;

Just like this SQL statement, query all data. This query execution method is a full table scan. For an InnoDB table, it is a direct scan of the clustered index. This method of executing a query using a full table scan is called: all

3. Matters needing attention

3.1 Review the secondary index + return table

In general, only a single secondary index can be used to execute queries, such as the following statement:

mysql> select * from demo8 where key1='00' and key2>15544;

The query optimizer will recognize two search conditions in this query

  • key1='00'
  • key2>15544

The optimizer generally judges which condition to use to query and scan fewer rows in the corresponding secondary index based on the statistical data of the demo8 table, and selects the condition with fewer scanned rows to query in the corresponding secondary index. Then return the results queried from the secondary index to the table to obtain complete user records, and then filter the records according to the remaining where conditions. Assuming that the optimizer decides to use the idx_key1 index for query, the entire query process can be divided into two steps:

  • Step 1: Use the secondary index to locate records: find the corresponding secondary index record from the B+ tree represented by the idx_key1 index according to the condition key1 = '00'
  • Step 2: Return to the table according to the primary key value of the record found in the previous step, that is, find the corresponding complete user record in the clustered index, and then continue to filter the complete user record according to the condition key2>15544. Return the records that finally meet the filter criteria to the user

Because the records in the nodes of the secondary index only contain index columns and primary keys, only the search conditions related to the key1 column will be used when using the idx_key1 index to query in step 1, and the other conditions, such as key2 > 9988, are in step 1 It is not used in 1. You can continue to filter the complete user records only after the table return operation is completed in step 2.

Tips:
It should be noted that what we are talking about here is the general situation. In general, only a single secondary index will be used to execute a query

3.2 Clarify the range interval used by the range access method

For the B+ tree index, as long as the index column and constant use =, <=>, in, not in, is null, is not null, >, <, >=, <=, between, != (not equal to Written as <>) or connected with like operators, a so-called interval can be generated.

hint:

  • The like operator is special, and can only use the index when matching the complete string or matching the prefix of the string.
  • The effect of the in operator is the same as the connection =between several equivalent matching operators, that is to say, multiple single-point intervals will be generated. For example, the effects of the following two statements are the same: select * from demo8 where key2 in (1222, 1333); select * from demo8 where key2 = 1222 or key2 = 1333;or

However, in regular work, the where clause of a query may have many small search conditions, and these search conditions need to be connected using and or or operators:

  • A and B, both A and B are true, the entire expression is true
  • A or B, if either A or B is true, the entire expression is true

When we want to use the range access method to execute a query statement, the key point is to find out the indexes available for the query and the range intervals corresponding to these indexes. Let’s look at how to extract the correct range interval from the complex search conditions composed of and or or in the following two cases:

3.2.1 All search conditions can use an index

mysql> select * from demo8 where key2 > 2222 and key2 > 3333;

All the search conditions in this query can use key2, that is to say, each search condition corresponds to a range interval of idx_key2. These two small search conditions are connected using AND, that is, the intersection of two range intervals is taken. The intersection of key2 > 2222 and key2 > 3333 is of course key2 > 3333. The range of the above query using idx_key2 is (3333, +∞)

if it is or

mysql> select * from demo8 where key2 > 2222 or key2 > 3333;

or means that the union of each range interval needs to be taken. The union of key2 > 2222 and key2 > 3333 is key2 > 2222. The range interval using idx_key2 in the above query is (2222, +∞)

3.2.2 Some search conditions cannot use the index

mysql> select * from demo8 where key2 > 2222 AND common_field = '039cb00';

The index that can be used in this query statement is only idx_key2, and the record of the secondary index of idx_key2 does not contain the common_field field, so the condition of common_field = '039cb00' is not used in the stage of using the secondary index idx_key2 to locate records. This condition is used after returning to the table to obtain complete user records, and the range interval is for the concept proposed in the index to fetch records, so the condition of common_field = '039cb00' does not need to be considered when determining the range interval. When we determine the range interval for an index, we only need to replace the search conditions that do not use the relevant index with true (replace the search conditions that do not use the index with true, because we do not intend to use these conditions to search for the index So whether the records in the index meet these conditions or not, we will select them and use them to filter when returning to the table later).

mysql> select * from demo8 where key2 > 2222 AND true;

Simplified

mysql> select * from demo8 where key2 > 2222;

From this, it can be obtained that the range of using idx_key2 is: (2222, +∞). In the same way, the situation of using or can be obtained:

mysql> select * from demo8 where key2 > 2222 or common_field = '039cb00';

mysql> select * from demo8 where key2 > 2222 or true;

mysql> select * from demo8 where true;

This also shows that if you are forced to use idx_key2 to execute the query, the corresponding range is (-∞, +∞), that is, you need to return all the records of the secondary index to the table. This cost is definitely greater than direct full table scanning . That is to say, if a search condition that uses the index is connected with a search condition that does not use the index using or, the index cannot be used.

3.2.3 Find the range matching interval under complex search conditions

mysql> select * from demo8 
	where (key1 > 'ed' and key2 = 66 ) 
	or (key1 < 'zc' and key1 > 'zz') 
	or (key1 like '%33' and key1 > 'fa' and (key2 < 7777 or common_field = '97d435e')) ;

Don't panic when you see SQL with such complex conditions, let's analyze it slowly

Step 1: First check which columns are involved in the search conditions in the where clause, and which columns may use indexes

The search conditions of this query involve the three columns of key1, key2, and common_field, and then the key1 column has a common secondary index idx_key1, and the key2 column has a unique secondary index idx_key2

Step 2: For those indexes that may be used, analyze their range intervals

Step 3: Suppose we use idx_key1 to execute the query, or use the above method, temporarily remove those search conditions that do not use the index

(key1 > 'ed' and true ) or(key1 < 'zc' and key1 > 'zz') or(true and key1 > 'fa' and (true or true))

continue to simplify

(key1 > 'ed') OR(key1 < 'zc' AND key1 > 'zz') OR(key1 > 'fa')

Step 4: Replace the condition that is always true or false

mysql> select 'zc' > 'zz';
+-------------+
| 'zc' > 'zz' |
+-------------+
|           0 |
+-------------+
1 row in set (0.00 sec)

The result is 0, which is false, 'zc' is less than 'zz'. Matching key1 < 'zc' AND key1 > 'zz' is always false, so the above search condition can be written like this:
(key1 > 'ed') OR (key1 > 'fa')

Continue to simplify the interval

mysql> select 'ed' > 'fa';
+-------------+
| 'ed' > 'fa' |
+-------------+
|           0 |
+-------------+
1 row in set (0.00 sec)

The or operator between key1 > 'ed' and key1 > 'fa' is used to connect, which means to take a union, so the interval to which the final result is simplified is: key1 > 'ed'. That is to say: if the query statement of the above complex search conditions uses the idx_key1 index to execute the query, it is necessary to extract all the secondary index records satisfying key1 > 'ed', and then return to the table with the id of these records to get a complete user records and then use other search criteria to filter.

Step 5: Assuming that we use idx_key2 to execute the query, we need to temporarily replace those search conditions that do not use the index with the trueE condition, and the search conditions related to key1 and common_field need to be replaced

(true and key2 = 66 ) or(true and true) or(true and true and (key2 < 7777 or true))

The result of key2 < 7777 OR true is true, continue to simplify

key2 = 66 or true

Continue to simplify: true

This result means that if we want to use the idx_key2 index to execute the query statement, we need to scan all the records of the idx_key2 secondary index and then return to the table. Compared with using the idx_key1 secondary index, the gain outweighs the gain, and the comparison results of the two methods will not be used idx_key2 index.

4. Index Merge

In MySQL, many optimization features are officially built in for us. Whether we need to enable it or not requires us to set it manually:

mysql> show variables like 'optimizer_switch'\g
*************************** 1. row ***************************
variable_name: optimizer_switch
        value: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,engine_condition_pushdown=on,index_condition_pushdown=on,mrr=on,mrr_cost_based=on,block_nested_loop=on,batched_key_access=off,materialization=on,semijoin=on,loosescan=on,firstmatch=on,duplicateweedout=on,subquery_materialization_cost_based=on,use_index_extensions=on,condition_fanout_filter=on,derived_merge=on,use_invisible_indexes=off,skip_scan=on,hash_join=on,subquery_to_derived=off,prefer_ordering_index=on,hypergraph_optimizer=off,derived_condition_pushdown=on
1 row in set (0.01 sec)

Let's briefly explain:

  • index_merge=on (index merge)
  • index_merge_union=on (Union index merge—union of id values ​​​​obtained by non-clustered indexes)
  • index_merge_sort_union=on (Sort-Union index merge - the id value obtained by the non-clustered index is sorted first - take the union)
  • index_merge_intersection=on (Intersection index merge—the intersection of id values ​​obtained by non-clustered indexes)
  • engine_condition_pushdown=on (engine condition pushdown—only used for NDB engine, when enabled, the data filtered according to the WHERE condition is sent to the SQL node for processing, and the data of all data nodes is not enabled to be sent to the SQL node for processing.)
  • index_condition_pushdown=on (index condition pushdown—ICP)
  • mrr=on (Multi-Range Read-MRR—the main purpose of this optimization is to use sequential disk reading as much as possible)
  • mrr_cost_based=on (cost-based choice—whether the calculation is based on the cost calculation/judgment consumption using MRR)
  • block_nested_loop=on (block-based nested loop join - BNL)
  • batched_key_access=off (BKA—optimization for Index Nested-Loop Join (NLJ) algorithm)
  • materialization=on (materialization)
  • subquery_materialization_cost_based=on (whether to enable the cost calculation of subquery materialization)
  • semijoin=on (semi join)
  • loosescan=on (semi-join - loose scan)
  • firstmatch=on (semi join - first match)
  • duplicateweedout=on (semi-join - duplicate value elimination)
  • use_index_extensions=on (use index extensions)
  • condition_fanout_filter=on (conditional (fanout) filtering)
  • derived_merge=on (view/derived table merge, need to cooperate with Auto_key)

These optimization features will be slowly rolled out in detail in later articles. Back to today’s content, we said earlier that MySQL only uses a single secondary index at most when executing a query under normal circumstances, but in some special cases It is also possible to use multiple secondary indexes in a query. MySQL calls this execution method of using multiple indexes to complete a query: index merge, which is the optimization feature that ranks at the top. Specifically, There are three types of index merge algorithms:

4.1 Intersection merge

The literal translation of Intersection is intersection. This means that a certain query can use multiple secondary indexes, and the results queried from multiple secondary indexes will be intersected.

mysql> select * from demo8 where key1='4a' AND key3='c84';

If this query is executed using Intersection merge, the process is as follows:

  • Take out the related record of key1 = '4a' from the B+ tree corresponding to the idx_key1 secondary index.
  • Take out the relevant records of key3 = 'c84' from the B+ tree corresponding to the idx_key3 secondary index.
  • The records of the secondary index are composed of index column + primary key, so we can calculate the intersection of the id values ​​in the two result sets.
  • Return to the table according to the id value list generated in the previous step, that is, take out the complete user record with the specified id value from the clustered index and return it to the user.

Some students here will think: Why not use idx_key1 or idx_key3 directly to read a secondary index based on a certain search condition, and then filter another search condition after returning to the table? Here is an analysis of the cost between the two query execution methods:

The cost of reading only one secondary index:

  • Read a secondary index according to a certain search condition;
  • Return to the table according to the primary key value obtained from the secondary index, and then filter other search conditions;

Take the intersection cost after reading multiple secondary indexes:

  • Read different secondary indexes according to different search conditions;
  • Take the intersection of the primary key values ​​obtained from multiple secondary indexes, and then return to the table;

Although reading multiple secondary indexes consumes more performance than reading one secondary index, the operation of reading secondary indexes is sequential I/O, while the operation of returning to the table is random I/O, so if only one secondary index is read When indexing, the number of records that need to be returned to the table is very large, and the number of records that are intersected after reading multiple secondary indexes is very small. When the performance loss caused by returning to the table is saved compared to the performance brought by accessing multiple secondary indexes When the loss is higher, it is cheaper to read the intersection of multiple secondary indexes than to read only one secondary index.

MySQL may only use Intersection index merging in certain specific situations:

Situation 1: The secondary index column is an equivalence match. For a joint index, each column in the joint index must be an equivalence match, and only part of the columns cannot be matched.

For example, the following query may use the two secondary indexes idx_key1 and idx_key_part to perform the operation of merging Intersection indexes:

mysql> select * from demo8 where key1 = 'a' and key_part1 = 'a' and key_part2 = 'b' and key_part3 = 'c';

The following two queries cannot be merged with Intersection indexes:

mysql> select * from demo8 where key1 > 'a' and key_part1 = 'a' and key_part2 = 'b' and key_part3 = 'c';
mysql> select * from demo8 where key1 = 'a' and key_part1 = 'a';

The first query is because the range matching is performed on key1, and the second query is because the key_part2 column in the joint index idx_key_part does not appear in the search conditions, so these two queries cannot be merged with the Intersection index.

Case 2: The primary key column can be a range match.

For example, the following query may use the primary key and idx_key1 to merge the Intersection index:

mysql> select * from demo8 where id > 100 and key1 = 'a';

For InnoDB's secondary index, the records are first sorted according to the index column. If the secondary index is a joint index, it will be sorted according to the columns in the joint index. The user record of the secondary index is composed of index column + primary key. There may be many records with the same value of the secondary index column, and the records with the same value of these index columns are sorted according to the value of the primary key. Intersection index merging is only possible when the secondary index columns are all equivalent matches, because only in this case the result set queried based on the secondary index is sorted according to the primary key value.

so? I still don’t understand that the result set queried based on the secondary index is sorted according to the primary key value. What’s the benefit of using the Intersection index to merge? Boy, don’t forget that the intersection index merge will intersect the primary key values ​​queried from multiple secondary indexes. If the result set queried from each secondary index is already sorted according to the primary key, then The process of finding the intersection is very easy. Assume that a query uses Intersection index merging to obtain the primary key values ​​from the two secondary indexes idx_key1 and idx_key2 respectively:

  • Get the sorted primary key values ​​from idx_key1: 1, 3, 5
  • Get the sorted primary key values ​​from idx_key2: 2, 3, 4

Then the process of finding the intersection is like this: take out the smallest primary key value in the two result sets one by one, if the two values ​​are equal, add it to the final intersection result, otherwise discard the current smaller primary key value, and then take the discarded Compare the last primary key value of the result set where the primary key value is located until the primary key value in a certain result set is used up. If you still don’t understand, then continue to read:

  • First take out the smaller primary key value of the two result sets for comparison, because 1 < 2, so discard the primary key value 1 of the result set of idx_key1, and take out the latter 3 for comparison.
  • Because 3 > 2, the primary key value 2 of the result set of idx_key2 is discarded, and the latter 3 is taken out for comparison.
  • Because 3 = 3, add 3 to the final intersection result, and continue to compare the primary key values ​​behind the two result sets.
  • The following primary key values ​​are also not equal, so only the primary key value 3 is included in the final intersection result.

Despite the complexity of what we wrote, in fact, this process is actually very fast, and the time complexity is O(n), but if the result set queried from each secondary index is not sorted by the primary key, then the result must first be After the centralized primary key values ​​are sorted, it will be time-consuming to do the above process.

Tip:
There is a proper term for retrieving records from the table according to the ordered primary key value, called: Rowid Ordered Retrieval, or ROR for short, and you will be familiar with this term when you see it in some places in the future.

In addition, not only the Intersection index merge can be used between multiple secondary indexes, but the cluster index can also participate in the index merge
. Use Intersection index merge index merge. Why can the primary key be range matched? Still have to go back to the application scenario, for example, look at the query below:

mysql> select * from demo8 where key1 = 'a' and id > 100;

Assuming that this query can be merged using the Intersection index, we take it for granted that this query will obtain some records from the clustered index according to the condition of id > 100, and obtain some records from the idx_key1 secondary index through the condition of key1 = 'a' , and then find the intersection. In fact, this complicates the problem, and there is no need to obtain a record from the clustered index. Don't forget that the records of the secondary index all have primary key values, so you can directly run the condition id > 100 filter on the primary key value obtained from idx_key1, it's so simple. Therefore, the search condition involving the primary key is just to
filter records from the result set obtained from other secondary indexes, and it is not important whether it is an equivalent match or not.

Of course, the above-mentioned cases 1 and 2 are only necessary conditions for intersection index merging to occur, but not sufficient conditions. Even if Case 1 and Case 2 are true, intersection index merging may not necessarily occur, and ultimately depends on how the optimizer chooses. The optimizer will use the Intersection index merge only when the number of records obtained from a secondary index based on the search conditions alone is too large, resulting in too much cost of returning to the table, and the number of records that need to be returned to the table after merging through the Intersection index is greatly reduced .

4.2 Union merger

When we write a query statement, we often want to take out the records that meet a certain search condition, and also take out the records that meet another search condition. We say that there is an OR relationship between these different search conditions. Sometimes different search conditions of the OR relationship use different indexes, for example:

mysql> select * from demo8 where key1 < 'a' or key3 > 'z';

Intersection means intersection, which is applicable to the case where search conditions using different indexes are connected by and; Union is the meaning of union, which is applicable to the case where search conditions using different indexes are connected by or. Similar to Intersection index merging, MySQL may only use Union index merging under certain circumstances:

Situation 1: The secondary index columns are equivalence matching. For the joint index, each column in the joint index must be equivalence matching, and only some of the matching columns cannot appear.

For example, the query below may use two secondary indexes, idx_key1 and idx_key_part, to merge Union indexes:

mysql> select * from demo8 where key1 = 'a' or ( key_part1 = 'a' and key_part2 = 'b' and key_part3 = 'c');

However, the following two queries cannot be merged with Union indexes:

mysql> select * from demo8 where key1 > 'a' or (key_part1 = 'a' and key_part2 = 'b' and key_part3 = 'c');
mysql> select * from demo8 where key1 = 'a' or key_part1 = 'a';

The first query is because the range matching is performed on key1, and the second query is because the key_part2 column in the joint index idx_key_part does not appear in the search conditions, so these two queries cannot be merged with the Union index.
Case 2: The primary key column can be a range match.
Case 3: Use the search condition merged by Intersection index (this situation is actually quite easy to understand, that is, some parts of the search condition use the intersection of the primary key set obtained by merging the Intersection index and the primary key set obtained by other methods).

This situation is actually quite easy to understand, that is, some parts of the search condition use the Intersectionintersection of the primary key set obtained by index merging and the primary key set obtained by other methods, such as this query:

mysql> select * from single_table where key_part1 = 'a' and key_part2 = 'b' and key_part3 = 'c' or (key1 = 'a' and key3 = 'b');

The optimizer might execute this query in this way:

  • First, according to the search condition key1 = 'a' AND key3 = 'b', the primary key set is obtained by merging the indexes idx_key1 and idx_key3 using the Intersection index.
  • Then according to the search condition key_part1 = 'a' AND key_part2 = 'b' AND key_part3 = 'c' get another primary key set from the joint index idx_key_part.
  • Merge the above two primary key sets by means of Union index merge, and then return to the table, and return the result to the user

Of course, if the query conditions meet these conditions, Union index merging may not necessarily be adopted. In the end, it still depends on how the optimizer chooses. The optimizer will use Union index merge only when the number of records obtained from a secondary index based on search conditions alone is relatively small, and the cost of accessing through Union index merge is lower than that of full table scan.

4.3 Sort-Union merge

The use conditions of Union index merging are too strict, and it is necessary to ensure that each secondary index column can be used under the condition of equivalent matching. For example, the following query cannot use Union index merge:

mysql> select * from demo8 where key1 < 'a' and key3 > 'z';

This is because the primary key values ​​of the secondary index records obtained from the idx_key1 index according to key1 < 'a' are not sorted, and the primary key values ​​of the secondary index records obtained from the idx_key3 index according to key3 > 'z' The key values ​​are not sorted, but the two conditions of key1 < 'a' and key3 > 'z' are particularly tempting to us, so we can do this:

  • First obtain records from the idx_key1 secondary index according to the key1 < 'a' condition, and sort according to the primary key value of the records
  • According to the key3 > 'z' condition, the records are always obtained from the idx_key3 secondary index, and sorted according to the primary key value of the records
  • Because the primary key values ​​of the above two secondary indexes are sorted, the remaining operations are the same as the Union index merge method

We first sort the above-mentioned primary key values ​​​​according to the secondary index records, and then perform the union index merge method as Sort-Union index merge. This kind of Sort-Union index merge is one step more than the simple Union index merge The process of sorting the primary key values ​​of secondary index records.

Tip:
Why is there no Sort-Intersection index merge when there is a Sort-Union index merge? Yes, there is indeed no such thing as Sort-Intersection index merging. The applicable scenario of Sort-Union is that the number of records obtained from a secondary index based on the search criteria alone is relatively small, so that the cost of sorting these secondary index records according to the primary key value is not too high. The applicable scenario for merging the Intersection index is that too many records are obtained from a secondary index based on the search criteria alone, resulting in too much overhead for returning to the table. After merging, the overhead for returning to the table can be significantly reduced, but if Sort-Intersection is added, It is necessary to sort a large number of secondary index records according to the primary key value. This cost may be higher than that of querying back to the table, so Sort-Intersection is not introduced.

4.4 Notes on joint index merging

Joint index instead of Intersection index merge

mysql> select * from demo8 where key1 = 'a' and key3 = 'b';

The reason why this query may be executed by combining the Intersection index is not because idx_key1 and idx_key3 are two separate B+ tree indexes. If you make a joint index for these two columns, then use this joint index directly to get things done. Now, why bother to merge with any index, like this:

mysql> alter table demo8 drop index idx_key1, idx_key3, add index idx_key1_key3(key1, key3);

In this way, we get rid of the useless idx_key1 and idx_key3, and then add a joint index idx_key1_key3. Using this joint index to query is simply fast and good. There is no need to read an extra B+ tree or merge the results. Why not for?

Tip:
But be careful that there are business scenarios where the key3 column is queried separately, so you have to add the separate index of the key3 column

Summarize

Today, I learned about MySQL’s access method for a single table and the optimization feature-index merge. Let’s summarize it below:

  • Access method for single table:

    • const: The access method to locate a record through the primary key or the unique secondary index column is defined as: const, which means constant level, and the cost is negligible.
    • ref: The search condition is to compare the equivalent value of the secondary index column with a constant. The access method that uses the secondary index to execute the query is called: ref.
    • ref_or_null: When the query is executed using a secondary index instead of a full table scan, the access method used by this type of query is called: ref_or_null.
    • range: The access method that uses the index for range matching is called: range.
    • index: The execution method of traversing secondary index records is called: index.
    • all: The method of executing a query using a full table scan is called: all.
  • Three algorithms for index merge:

    • Intersection merge (index_merge_intersection).
    • Union merge (index_merge_union).
    • Sort-Union merge (index_merge_sort_union).

We have also learned that the trigger conditions of the three algorithms for index merging are both necessary and insufficient. Whether the relevant algorithms will be used in the end still needs to be judged by the optimizer. At the same time, I also learned the range interval used by the range access method, and estimated the cost of executing SQL using different indexes through artificial analysis. Today's content is more principled, but it is very simple to understand. You should combine the previous MySQL B+ tree index knowledge to feel the subtlety of MySQL design.

So far, today's study is over, I hope you will become an indestructible self
~~~

You can’t connect the dots looking forward; you can only connect them looking backwards. So you have to trust that the dots will somehow connect in your future.You have to trust in something - your gut, destiny, life, karma, whatever. This approach has never let me down, and it has made all the difference in my life

If my content is helpful to you, please 点赞, 评论,, 收藏creation is not easy, everyone's support is the motivation for me to persevere

insert image description here

Guess you like

Origin blog.csdn.net/liang921119/article/details/130707882