MySQL查询问题排查-索引应用

PS：原创文章，如需转载，请注明出处，谢谢！

本文地址：http://flyer0126.iteye.com/blog/2410145

最近开发中需查询系统id，随手写了两条sql，发现查询结构不同。

select * from apps limit 1;

id	city_code	short_name	company_code
1	410100	zz	ZZXJ8888

select id from apps limit 1;

最终发现，两次查询结果竟然不一致！

为了一探究竟，查阅一下表数据及相关索引如下：

id	city_code	short_name	company_code
1	410100	zz	ZZXJ8888
2	410100	zz	HNFG6666

索引如下：

那来看看MySQL本身是如何解释的：

explain select * from apps limit 1;

explain select id from apps limit 1;

由此可见，第一条没有用到索引，按主键排序取到了第一条；第二条用到了uniq_company_code索引，按索引排序，取到了第二条。

总结一下：根据select的字段不同，MySQL选取的策略不同，导致查询结果不同。

但是存在几个疑问点

1、为什么语句2中并没有出现company_code字段，却会使用其索引(uniq_company_code)？
2、为什么语句1中就不会使用uniq_company_code索引？

回答以上问题之前，先了解一下MySQL常用表引擎索引的实现方式

示例表如下：

id	company_code	city_code	...
10	ZZXJ8888	410100	...
21	HNFD6666	410100	...
32	WH9999	420100	...
43	CS9999	430100	...

不同表引擎索引的实现：

至此，以上问题有了定论

1、因为uniq_company_code索引中包含id字段，语句2可以从uniq_company_code索引中直接取得数据，所以优化器选择走uniq_company_code索引；
2、而语句1中select * 选取了在uniq_company_code索引中不包含的列，所以无法使用uniq_company_code这个索引。

为了验证上面的结论，进一步实验：

explain select id, company_name from apps limit 1;

至此，验证了索引覆盖问题（company_name不在uniq_company_code索引覆盖范围内，无法使用其索引）。

那么，为什么要使用索引覆盖呢？MySQL是如下这么解释的。

It is possible that key will name an index that is not present in the possible_keys value. This can happen if none of the possible_keys indexes are suitable for looking up rows, but all the columns selected by the query are columns of some other index. That is, the named index covers the selected columns, so although it is not used to determine which rows to retrieve, an index scan is more efficient than a data row scan.

主要就是：假如索引覆盖覆盖了所选取的字段，会优先使用索引覆盖，因为效率更快。既然主键索引列包含所有数据列，那么主键索引列一样可以做到索引覆盖，那么优化器为什么不选择使用主键索引呢？

在5.1.46中优化器在对index选择上做了一点改动：

“Performance: While looking for the shortest index for a covering index scan, the optimizer did not consider the full row length for a clustered primary key, as in InnoDB. Secondary covering indexes will now be preferred, making full table scans less likely。”

该版本中增加了 find_shortest_key()，该函数的作用可以认为是选择最小 key length的索引来满足我们的查询。

mysql中find_shortest_key()函数注释如下：

“As far as clustered primary key entry data set is a set of all record fields (key fields and not key fields) and secondary index entry data is a union of its key fields and primary key fields (at least InnoDB and its derivatives don’t duplicate primary key fields there, even if the primary and the secondary keys have a common subset of key fields), then secondary index entry data is always a subset of primary key entry. Unfortunately, key_info[nr].key_length doesn’t show the length of key/pointer pair but a sum of key field lengths only, thus we can’t estimate index IO volume comparing only this key_length value of secondary keys and clustered PK. So, try secondary keys first, and choose PK only if there are no usable secondary covering keys or found best secondary key include all table fields (i.e. same as PK):”

总结：因为辅助索引总是主键的子集，从节约IO的角度，优先选择辅助索引。

附：由于MySQL数据是通过文件形式进行存储的，那IO主要是指对数据文件的读写。

至此，问题完结。

MySQL查询问题排查-索引应用

猜你喜欢