Troubleshooting MySQL Query - Index Application

PS: Original article, if you need to reprint, please indicate the source, thank you!     

The address of this article: http://flyer0126.iteye.com/blog/2410145

 

    Recently, I need to query the system id in the development. I wrote two sqls at random and found that the query structure is different.

select * from apps limit 1;
id city_code short_name company_code
1 410100 zz ZZXJ8888

 

select id from apps limit 1;
id
2

    In the end, it was found that the results of the two queries were inconsistent!

    To find out, look at the table data and related indexes as follows:

id city_code short_name company_code
1 410100 zz ZZXJ8888
2 410100 zz HNFG6666

 

    The index is as follows:
 

     Let's see how MySQL itself interprets it:

explain select * from apps limit 1;


 

explain select id from apps limit 1;


 

     It can be seen that the first one does not use the index, and the first one is obtained by sorting by the primary key; the second one uses the uniq_company_code index, which is sorted by the index, and the second one is obtained.

     To sum up: According to the different fields of select, MySQL selects different strategies, resulting in different query results.

   

     But there are a few doubts

1. Why the company_code field does not appear in statement 2, but its index (uniq_company_code) is used?
2. Why is the uniq_company_code index not used in statement 1?

    Before answering the above questions, let's first understand how MySQL's commonly used table engine indexes are implemented

    An example table is as follows:

id company_code city_code ...
10 ZZXJ8888 410100 ...
21 HNFD6666 410100 ...
32 WH9999 420100 ...
43 CS99999 430100 ...

     Implementation of indexes for different table engines:


     So far, the above questions have been concluded

1. Because the id field is included in the uniq_company_code index, statement 2 can directly obtain data from the uniq_company_code index, so the optimizer chooses to use the uniq_company_code index;
2. In statement 1, select * selects a column that is not included in the uniq_company_code index, so the uniq_company_code index cannot be used.

  

    In order to verify the above conclusions, further experiments:

explain select id, company_name from apps limit 1;


     At this point, the index coverage problem has been verified (company_name is not within the index coverage of uniq_company_code, and its index cannot be used).

     So, why use index coverage? MySQL is explained as follows. 

It is possible that key will name an index that is not present in the possible_keys value. This can happen if none of the possible_keys indexes are suitable for looking up rows, but all the columns selected by the query are columns of some other index. That is, the named index covers the selected columns, so although it is not used to determine which rows to retrieve, an index scan is more efficient than a data row scan.
     The main thing is: if the index coverage covers the selected field, the index coverage will be used first, because the efficiency is faster.      Since the primary key index column contains all data columns, the primary key index column can also achieve index coverage, so why does the optimizer not choose to use the primary key index?

     In 5.1.46 the optimizer made a little change  to the index selection:

“Performance: While looking for the shortest index for a covering index scan, the optimizer did not consider the full row length for a clustered primary key, as in InnoDB. Secondary covering indexes will now be preferred, making full table scans less likely。”
     In this version, find_shortest_key() is added . The function of this function can be considered as selecting the index with the smallest key length to satisfy our query.

The find_shortest_key() function in      mysql is annotated as follows:

“As far as clustered primary key entry data set is a set of all record fields (key fields and not key fields) and secondary index entry data is a union of its key fields and primary key fields (at least InnoDB and its derivatives don’t duplicate primary key fields there, even if the primary and the secondary keys have a common subset of key fields), then secondary index entry data is always a subset of primary key entry. Unfortunately, key_info[nr].key_length doesn’t show the length of key/pointer pair but a sum of key field lengths only, thus we can’t estimate index IO volume comparing only this key_length value of secondary keys and clustered PK. So, try secondary keys first, and choose PK only if there are no usable secondary covering keys or found best secondary key include all table fields (i.e. same as PK):”

     总结:因为辅助索引总是主键的子集,从节约IO的角度,优先选择辅助索引。

     附:由于MySQL数据是通过文件形式进行存储的,那IO主要是指对数据文件的读写。

 

     至此,问题完结。

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326047697&siteId=291194637