MySQL索引的设计、使用与优化

一、使用索引的优缺点

索引是数据库用来提高性能的最常用工具，所有的MySQL列都可以被索引，对相关的列进行索引是提高select性能的最佳途径。使用索引的优点和缺点如下：
优点：

快速访问数据表中的特定信息，提高检索速度
创建唯一性索引，保证数据库表中每一行数据的唯一性。
加速表和表之间的连接

缺点：

创建索引和维护索引需要耗费时间，这个时间随着数据量的增加而增加
索引需要占用物理空间，不光是表需要占用数据空间，每个索引也需要占用物理空间
当对表进行增、删、改、的时候索引也要动态维护，这样就降低了数据的维护速度

二、设计索引的原则

为了提高索引的使用效率，设计索引时应遵循以下规则：

添加索引的列不一定是要搜索的列：最适合索引的列是出现在WHERE子句中的列，或连接子句中指定的列，而不是SELECT关键字后的选择列表中的列。
使用唯一索引：索引的列的基数（不重复数）越大，索引的效果越好。比如出生日期列数据各不相同，适合添加索引，而性别列只含有‘M’和‘F’两个值，则不适合添加索引。
使用短索引：如果对字符串列进行索引，应当制定一个前缀长度，例如，一个CHAR(200)的列如果前10个或者20个字符多数是唯一的，就不要对整个列进行索引。较短的索引涉及的磁盘IO较少，占用空间也小，查询更快。
不要过度索引：每个额外的索引都要占用额外的磁盘空间，并降低写操作性能。在修改表内容时，索引必须进行更新，可能还会重构，从而降低表的修改速度。

三、索引的存储分类

1、分类

索引是在MySQL存储引擎层实现的，而不是在服务器层，所以每种存储引擎的索引都不一定相同，MySQL目前有以下四种索引：

B-Tree索引：最常见的索引类型，大部分存储引擎都支持B树索引。
HASH索引：只有Memory、Heap引擎支持，使用场景简单。
R-Tree索引（空间索引）：MyISAM的一个特殊索引类型，使用较少。
Full-text（全文索引）：特殊索引类型，MyISAM和InnoDB支持。

MyISAM、InnoDB、Memory三个常用引擎支持的索引类型比较：

索引	MyISAM引擎	InnoDB引擎	Memory引擎
B-Tree索引	支持	支持	支持
HASH索引	不支持	不支持	支持
R-Tree索引	支持	不支持	不支持
Full-text索引	支持	支持	不支持

2、HASH索引与B-Tree索引

使用HASH索引时应注意：

HASH索引只能用于=或<=>操作符的等式比较。
优化器不能使用HASH索引来加速ORDER BY 操作。
HASH索引只能使用整个关键字来搜索一行，而不能使用通配符。

下列范围查询适用于HASH索引和B-Tree索引：

select * from demo where id=2 or id in(4,5,8,9);

下列范围查询只适用于B-Tree索引：

select * from demo where id>5 and id<10;

select * from demo where birthday like '1996-08-%' or salary between 5000 and 10000;

此处如果表类型是Memory/Heap，id是HASH索引，则查询使用全表扫描的方式，不会用到索引。

四、SQL优化中的索引问题

SQL优化中对索引的优化实际上是将对添加了索引，但sql执行计划没有选择使用索引的场景的优化。
以下内容默认使用B-Tree索引。
案例用表：

mysql> show create table demo \G;
*************************** 1. row ***************************
       Table: demo
Create Table: CREATE TABLE `demo` (
  `id` int(10) NOT NULL AUTO_INCREMENT,
  `name` varchar(20) NOT NULL DEFAULT '张三' COMMENT '姓名',
  `sex` enum('男','女') NOT NULL DEFAULT '男' COMMENT '性别',
  `age` int(10) DEFAULT NULL,
  `birthday` date DEFAULT '1970-01-01',
  `address` varchar(50) DEFAULT NULL,
  `salary` double DEFAULT '0',
  PRIMARY KEY (`id`),
) ENGINE=InnoDB AUTO_INCREMENT=13 DEFAULT CHARSET=utf8
1 row in set (0.00 sec)

ERROR:
No query specified

mysql> select * from demo;
+----+--------+-----+------+------------+---------+--------+
| id | name   | sex | age  | birthday   | address | salary |
+----+--------+-----+------+------------+---------+--------+
|  2 | tom    | 男  |   22 | 1996-08-08 | 武汉    |   2000 |
|  4 | 赵六   | 男  |   60 | 1958-08-08 | 上海    |   3000 |
|  6 | 李四   | 男  |   23 | 1996-08-08 | 武汉    |  50000 |
|  7 | lucy   | 女  |   22 | 1996-07-08 | 武汉    |   8000 |
|  8 | 李四   | 男  |   36 | 1982-06-08 | 杭州    |   6000 |
|  9 | sunnie | 女  |   22 | 1996-08-28 | NULL    |      0 |
| 11 | 王五   | 女  |   30 | 1988-08-08 | 北京    |  10000 |
| 12 | sherly | 女  |   18 | 2000-02-20 | 深圳    |  12000 |
+----+--------+-----+------+------------+---------+--------+
8 rows in set (0.00 sec)

1、MySQL中能使用索引的场景

（1）匹配全值。

对索引中所有的列都指定具体值。例如：

mysql> create index member on demo(name,sex,address(10));
Query OK, 0 rows affected (0.03 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> explain select * from demo where name='王五' and sex='女' and address='北京' \G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: demo
         type: ref
possible_keys: member
          key: member
      key_len: 96
          ref: const,const,const
         rows: 1
        Extra: Using index condition; Using where
1 row in set (0.00 sec)

explain语句输出结果中type=ref表示使用非唯一索引或唯一索引的前缀进行扫描，key=member表示使用了组合索引member。
Extra中的Using index condition表示使用ICP（Index Condition Pushdown）优化查询。

（2）匹配值的范围查询

对索引值能够进行范围查找。例如：

mysql> explain select * from demo where id>5 and id<10 \G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: demo
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 3
        Extra: Using where
1 row in set (0.00 sec)

type为range说明优化器选择范围查询，key为primary说明使用主键查询。

（3）匹配最左前缀

仅仅使用复合索引中最左边的列进行查找，例如：

mysql> explain select * from demo where name='tom' and address in ('武汉','北京','上海') \G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: demo
         type: ref
possible_keys: member
          key: member
      key_len: 62
          ref: const
         rows: 1
        Extra: Using index condition; Using where
1 row in set (0.00 sec)

mysql> explain select * from demo where sex='男' and address in ('武汉','北京','上海') \G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: demo
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 7
        Extra: Using where
1 row in set (0.00 sec)

从（1）中知道创建的member索引包含三列：name、sex、address，只要查询条件包含第一列（最左前缀）时，优化器使用复合索引member进行查询，当查询条件不包含第一列时，执行计划不会利用到复合索引member。

（4）仅仅对索引进行查询

当查询的列都在索引的字段中时，查询效率更高，例如：

mysql> explain select name,sex from demo where name='tom' \G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: demo
         type: ref
possible_keys: member
          key: member
      key_len: 62
          ref: const
         rows: 1
        Extra: Using where; Using index
1 row in set (0.00 sec)

要查询的name字段和sex字段都在member索引中，Extra中Using index，意味着直接访问索引就能获取到所需数据，也就是‘覆盖索引扫描’。Using where表示优化器需要通过索引回表查询数据。

（5）匹配列前缀

仅仅使用索引中的第一列的开头一部分信息进行查询，例如：

mysql> explain select * from demo where name like 'sh%' \G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: demo
         type: range
possible_keys: member
          key: member
      key_len: 62
          ref: NULL
         rows: 1
        Extra: Using index condition
1 row in set (0.00 sec)

（6）索引一部分精确匹配，其他部分范围匹配

例如：

mysql> explain select * from demo where name='李四' and address<'武汉' \G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: demo
         type: ref
possible_keys: member
          key: member
      key_len: 62
          ref: const
         rows: 2
        Extra: Using index condition; Using where
1 row in set (0.00 sec)

key=member说明优化器使用了member索引。

（7）如果列名是索引，那么使用列名 is null 会使用索引。（区别于oracle）

例如：

mysql> create index city on demo(address);
Query OK, 0 rows affected (0.04 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> explain select * from demo where address is null \G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: demo
         type: ref
possible_keys: city
          key: city
      key_len: 153
          ref: const
         rows: 1
        Extra: Using index condition
1 row in set (0.00 sec)

这里给address列创建一个索引city，使用列名is null 会使用索引。

2、MySQL中存在索引但不能使用索引的场景

（1）以%开头的like查询不能利用B-Tree索引。

例如：

mysql> explain select * from demo where name like '%m' \G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: demo
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 7
        Extra: Using where
1 row in set (0.00 sec)

以%或_开头的查询不能利用B-Tree索引的由于B-Tree索引的平衡树结构。
解决办法：
先获得满足条件的主键列表，之后根据主键回表检索记录，这样可以避免全表扫描产生大量的IO请求。
语句如下：

mysql> explain select * from (select id from demo where name like '%m') a,demo b where a.id=b.id \G;

*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: b
         type: ALL
possible_keys: PRIMARY
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 7
        Extra: NULL
*************************** 2. row ***************************
           id: 1
  select_type: PRIMARY
        table: <derived2>
         type: ref
possible_keys: <auto_key0>
          key: <auto_key0>
      key_len: 4
          ref: test.b.id
         rows: 2
        Extra: Using index
*************************** 3. row ***************************
           id: 2
  select_type: DERIVED
        table: demo
         type: index
possible_keys: NULL
          key: member
      key_len: 96
          ref: NULL
         rows: 7
        Extra: Using where; Using index
3 rows in set (0.00 sec)

换一个数据量较大的数据库验证情况：

mysql> show profiles;
+----------+------------+---------------------------------------------------------------------------
------------------+
| Query_ID | Duration   | Query
                  |
+----------+------------+---------------------------------------------------------------------------
------------------+
|        3 | 0.03196575 | select * from tb_item where updated like '%30'
                  |
|        4 | 0.01050700 | select * from (select id from tb_item where updated like '%30') a,tb_item
b where a.id=b.id |
|        5 | 0.02138725 | select * from tb_item where id in (select id from tb_item where updated li
ke '%30')         |
+----------+------------+---------------------------------------------------------------------------
------------------+
5 rows in set, 1 warning (0.00 sec)

打开profiling观察语句执行用时，查询3是全表扫描方式，查询4是先得到主键列表，然后连接查询方式，查询5是用子查询方式。结果显示得到主键列表后连接查询最快，子查询次之，全表扫描最慢。

（2）数据类型出现隐式转换时不会使用索引

例如：

mysql> explain select * from demo where name=777 \G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: demo
         type: ALL
possible_keys: member
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 10
        Extra: Using where
1 row in set (0.00 sec)

name字段类型为字符串，查询时where name=777给的是数值类型，MySQL会发生隐式转换，将777转化为字符串类型，但是即使name字段存在索引，发生隐式转换后不会使用索引。由例子中key=null可见。
加上引号后，没有隐式转换，就会用到索引了：

mysql> explain select * from demo where name='777' \G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: demo
         type: ref
possible_keys: member
          key: member
      key_len: 62
          ref: const
         rows: 2
        Extra: Using index condition
1 row in set (0.00 sec)

（3）复合索引不满足最左原则不会使用索引

见上一节第（3）小节

（4）如果MySQL估计使用索引比全表扫描更慢，则不会使用索引

在查询的时候，筛选性越高越容易用到索引，筛选性越低越不容易用到索引。

mysql> explain select * from demo where name like 's%' \G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: demo
         type: range
possible_keys: member
          key: member
      key_len: 62
          ref: NULL
         rows: 2
        Extra: Using index condition
1 row in set (0.00 sec)

正常情况下，使用索引member。执行以下语句，在name列所有值钱加一个s：

mysql> update demo set name=concat('s',name);

此时MySQL估计使用索引比全表扫描更慢，不使用索引：

mysql> explain select * from demo where name like 's%' \G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: demo
         type: ALL
possible_keys: member
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 10
        Extra: Using where
1 row in set (0.00 sec)

（5）用or分割条件，如果or前的条件的列有索引，而后面的条件的列没有索引，那么前面的索引也不会被用到

mysql> explain select * from demo where name='tom' or age=22 \G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: demo
         type: ALL
possible_keys: member
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 10
        Extra: Using where
1 row in set (0.00 sec)

name存在索引，age没有索引，结果没有使用索引。