How to choose the general index and unique index? | CSDN Bowen selection

Author | NY & XX

Zebian | Guo Rui

Produced | CSDN blog

Internet has been a lot of differences about the unique index and the general index, here is not described in detail, and then we discuss in depth how the different business scenarios, should choose the general index is the only index. Such as maintaining a system of social security administration, social security personnel each has a unique ID number, and the service code has been written to ensure that the two will not repeat the ID number. If the system needs ID number in accordance with the query name, will execute this SQL statement:

select name from suser where id_card = ‘xxxxxxxxxxx’;

So generally consider building an index on id_card field. Since the ID number field is relatively large, is not suitable for primary keys, indexes now have two choices, either to create a unique index to id_card field, either create a common index. If the business does not ensure that the code has been written to duplicate identity card number, then the logic of these two options is correct. But from a performance point of view up to consider, on what basis should it choose? Here we analyze the query process and affect the performance of the update process from neither index.

Query operation

We look at the InnoDB index organization, assumed to be performed:

select id from t where a=3

This query to find in the index tree process will be as follows:

First is started by the B + tree from the root, traversing search by layer manner to leaf nodes to locate a data page.
To locate the record by binary search.

For a unique index, after finding the first to meet the conditions of entry (for example (3,300)) will discontinue retrieval. After the general index lookup to meet a condition of entry will continue searching until he came to a = 3 does not satisfy the first condition of entry.

They differ bring the performance gap is minimal. Because InnoDB is based on data in units of pages to read and write, that is, when reading an entry when the entry is not read from the disk, but in units of pages, the overall read into memory. Since the storage engine to read and write by page, so that when you find a = entry 3, in which it has a data page in memory. Then you only need a pointer operation and computation time required for the normal index operations do "find and determine if an entry satisfies conditions".

Update

当需要更新一个数据页时，如果数据页在内存缓冲池（buffer pool）中就直接更新，并同时记录redo log,但是如果这个数据页不在内存中的话。在不影响一致性的前提下，InnoDB会将更新操作缓存在写缓冲（change buffer）中，同时记录redo log。

写缓冲(change buffer)

那什么是change buffer呢？

它的主要目的是将对二级索引的数据操作缓存下来，以此减少二级索引的随机IO，并达到操作合并的效果。

在MySQL5.5之前的版本中，由于只支持缓存insert操作，所以最初叫做insert buffer，只是后来的版本中支持了更多的操作类型（操作类型包括insert、update、delete）缓存，才改叫change buffer。

change buffer的数据结构上是一颗b+树，存储在ibdata系统表空间中，根页为ibdata的第4个page(FSP_IBUF_TREE_ROOT_PAGE_NO)。

将change buffer中的操作应用到原数据页从而得到最新结果的过程被称为merge。merge 的时候才是是真正进行数据更新的时刻，change buffer 将条目的变更动作进行缓存。在一个数据页做 merge 之前，change buffer 记录的变更越多（也就是这个页面上要更新的次数越多),收益就越大。

一般来说，触发merge的操作主要有以下几种：

访问这个数据页；
master thread线程每秒或每10秒进行一次merge insert buffer的操作；
在数据库正常关闭的时候。

此外，虽然名字叫做change buffer，但实际上它是可以持久化的数据，也就说它在内存中有拷贝，也会被写入到磁盘上。

change buffer状态查看

seg size 为插入缓冲区的总大小（页的数量X16KB）；
merges表示已经合并的merge的数量；
merged operations: insert 插入记录被merge的次数；
delete mark 删除操作被merge的次数；
delete 更新操作被merge了多少次。

change buffer占用buffer pool

数据读入内存是需要占用buffer pool的，采用这种方式能够避免占用内存，提升内存利用率。

change buffer用的是buffer pool的内存，因此不能无线增大，它通过参数innodb_change_buffer_max_size来设置，这个参数表示占用内存的比例，默认是25%，最大值为50%，一般在写多读少的场景下才需要设置。

change buffer带来什么好处？

如果MySQL承担大量的DML操作，则change buffer是必不可少的，他的存在就是尽量减小I/O的消耗，通过内存进行数据的合并操作，将多次操作操作尽量变为少量的I/O操作，从而提升了更新操作的速度。

什么场景适合开启change buffer？

change buffer只限于普通索引的场景下，不适用与唯一索引。为什么呢？

因为，假设要插入(3, 300)这个条目，首先要判断这个条目是否在表中出现过。而这必须要将数据页读入内存才能判断。如果都已经读入到内存了，那直接更新内存会更快，就没必要使用 change buffer 了。

那么InnoDB中插入的条目（3,300）的流程是如何的呢？

如果这个条目要更新的数据页在内存中：

对于唯一索引，找到2和4的位置，判断没有冲突后，插入这个值，执行结束
对于普通索引，找到2和4的位置，插入这个值，执行结束

如果这个条目要更新的数据页不在内存中：

对于唯一索引，需要将数据页读入内存，然后判断有没有冲突，然后进行插入。
对于普通索引，只需要将条目更新操作记录在change buffer就执行结束了。

不是所有场景都可以用change buffer

普通索引并不是所有场景使用change buffer都能受益，对于写多读少的业务来说，页面在写完以后马上被访问到的概率比较小，此时 change buffer 的使用效果最好。

但是假设一个业务的更新模式是写入之后马上会做查询，那么即使满足了条件，将更新先记录在change buffer，但之后由于马上要访问这个数据页，会立即触发 merge 过程。这样随机访问 IO 的次数不会减少，反而增加了 change buffer 的维护代价。所以，对于这样类似的业务模式来说，change buffer 反而起到了副作用。

举个例子：

假设要执行insert into t values(id1,a1),(id2,a2);

假设a1 所在的数据页在内存 (InnoDB buffer pool) 中，a2 所在的数据页不在的话，如图所示：