Interviewer: You talk about the realization of the principle of MySQL indexes?

In the database, if too many indexes, the application's performance may be affected, if the index is too little, they will have an impact query performance. Therefore, we must seek a balance between the two, more than enough to bring the index to improve query performance and do not lead to excessive load because the index is too high to modify the data and other operations.

The article will be from classified B + tree index, index, hash indexes, full-text indexing, explain the aspects

B + tree index

  • Index lookup
  • Insert index
  • Delete index

Category Index

  • Clustered index
  • Secondary indexes
  • Joint index
  • Covering index

Hash indexes

  • Hash Algorithm
  • Adaptive hash indexes

Full-text index

  • Inverted index
  • Full-text search index cache
  • Some limitations of full-text indexing

InnoDB supports three common indices, we are going to explain in detail is the B + tree index, hash indexes, full-text indexing.

B + tree index

1, B + tree B is not representative of the binary (Binary), but on behalf of balance (Balance), because the B + tree is a balanced binary tree from the earliest evolved, but the B + tree is not a binary tree.

2, B + tree is a tree to find a balance disk or other auxiliary equipment designed for direct access, the B + tree, all nodes are recorded leaf node exists in the same layer in accordance with the order of the key, by the leaf node pointer connected.

. 3, B + tree in the database features is the high fan-out, and therefore in the database B + height of the tree are typically 2 to 4 layers, which means that to find a record key, a maximum of only 2 to 4 times the IO, there are at least 100 times per second mechanical hard drive current IO, IO 2 to 4 times the time required means that the query is only 0.02 to 0.04 seconds.

4, B + tree index and can not find a page to a specific row given key, B + key is to find just the tree index could find the row, then the database page read memory, to find again in memory, and finally found the data you want to find.

5, database B + tree index can be divided, clustered index and non-clustered index, but whether it is a clustered index or non-clustered index, its interior is a B + tree implementation, that is highly balanced, leaf node is stored with all the data different clustered index and non-clustered index is whether the leaf node storage is a whole line of information. Each table can have only one clustered index.

6, B + tree for each data page (leaf node) is linked by a bidirectional linked list, sequence data on the data pages are stored sequentially according to the master key.

First look at a B + tree with a height of 2, page 4 can put recorded fanout 5.

FIG: a B + tree height 2

 

Interviewer: You talk about the realization of the principle of MySQL indexes?

 

 

Index lookup

B + tree index using a binary search, also known as binary search, the basic idea is this: the record ordering (ascending or descending) order, the use of super jump to find ways to find the process, first ordered the number of columns of the center point location comparison, if you are looking for elements smaller than the center point of that element, the element will be reduced to find the left half, right half of the otherwise, by a comparison, look intervals reduced by half.

, The lookup in FIG. 48 from the ordered list, only 3 steps:

Figure: a binary search

 

Interviewer: You talk about the realization of the principle of MySQL indexes?

 

 

Insert index

B + tree to find speed quickly, but to maintain a balance of B + tree cost is very large, generally speaking, require 1 or more times handedness to ensure the balance after the insertion of the tree.

B + tree is inserted in order to maintain the balance of the tree, takes a lot of pages (leaf node) of the split, the memory pages are basically split in operation means that the disk disk page, so it should minimize the split page, in the self ID growth, as the primary key, the page will be a substantial reduction of the resolution, increased performance.

B + tree inserted three cases

 

Interviewer: You talk about the realization of the principle of MySQL indexes?

 

 

FIG: a B + tree height 2

 

Interviewer: You talk about the realization of the principle of MySQL indexes?

 

 

We analyzed inserted by examples B + tree.

1, we insert the key 28, and found that the current Leaf Page Index Page are not full, we directly into it.

 

Interviewer: You talk about the realization of the principle of MySQL indexes?

 

 

2, this time we insert a 70 this key, then the original Leaf Page is already full, but Index Page is not yet full, in accordance with Table (B + tree insertion of three cases) in the second case , then insert situation after the Leaf Page is 50,55,60,65,70 . According to our 60 split the middle of the leaf node values . The intermediate node Index Page put into the.

 

Interviewer: You talk about the realization of the principle of MySQL indexes?

 

 

3, because the relationship between picture shows, this time I was not able to add a doubly linked list pointer at each leaf node. Finally, we insert record 95, then the third case in line with the table (B + tree insertion of three cases) discussed , namely Leaf Page Index Page and are full, then split needs to be done twice.

 

Interviewer: You talk about the realization of the principle of MySQL indexes?

 

 

可以看到,不管怎么变化,B+树总是会保持平衡。但是为了保持平衡,对于新插入的键值可能需要做大量的拆分页(split)操作,而B+树主要用于磁盘,因此页的拆分意味着磁盘的操作,应该在可能的情况下尽量减少页的拆分。因此,B+树提供了旋转(rotation)的功能。

索引的删除

B+树使用填充因子(fill factor) 来控制树的删除变化,50%是填充因子可设的最小值,B+树的删除也同样必须保证删除后树的平衡性,删除的过程中会涉及合并叶子节或兄弟节点,但是都是为了保持树的平衡。

索引的分类

在了解B+树索引的本质和实现后,我们看看索引分为几类,聚集索引,辅助索引,联合索引,覆盖索引

聚集索引

就是按照每张表的主键构造一颗B+树,同时叶子节点存储整张表的行记录数,也将聚集索引的叶子节点成为“数据页”,聚集索引的特性决定了表中的行记录数据也是索引的一部分。同B+树数据结构一样,每个数据页都通过一个双向链表进行链接。

数据页只能按照一颗B+树进行排序,因此每张表只能有一个聚集索引,由于数据页定义了逻辑顺序,聚集索引能够很快的在数据页访问指针进行范围的查找数据。

聚集索引在物理上不是连续的,在逻辑上是连续的,前面已经说过是通过双向链表进行维护,物理存储可以不按照主键顺序存储。

辅助索引

辅助索引(也称非聚集索引),叶子节点并不包含行记录的全部数据。叶子节点除了包含键值外,每个叶子节点还包含了一个书签,该书签告诉InnoDB 存储引擎可以从哪里找到辅助索引相对应行的记录。因此InnoDB 存储引擎的辅助索引的书签就是相应整行数据的聚集索引键。

一个表中可以有多个辅助索引。例如,一个辅助索引树需要遍历3次才能找到主键索引,如果聚集索引树的高为同样为3,那么它还需要对聚集索引树进行三次查找,最终才能找到一个完整的数据页,因此一共需要6次IO访问才能得到最终的数据页。

联合索引

联合索引是指对表上多个列进行建立索引,联合索引本质还是一颗B+树,不同的是索引的键值数量不是1个,而是大于等于2。联合索引的键值在B+树中也是有序的,通过叶子节点可以在逻辑的顺序上读出所有数据。

覆盖索引

InnoDB存储引擎支持覆盖索引(或称索引覆盖),就是从辅助索引中就可以直接得到查询的记录,而不需要再次查询聚集索引中的记录。使用覆盖索引的好处就是,辅助索引不包括整行记录的所有信息,所以覆盖索引的大小要小于聚集索引,因此可以减少IO操作。

通俗的解释:

覆盖索引是非聚集组合索引的一种形式,它包括在查询里的Select、Join和Where子句用到的所有列(即建立索引的字段正好是覆盖查询语句[select子句]与查询条件[Where子句]中所涉及的字段,也就是索引包含了查询正在查找的所有数据

哈希索引

学习哈希索引之前,我们先了解一些基础的知识:哈希算法。哈希算法是一种常用的算法,时间复杂度为 O(1)。它不仅应用在索引上,各个数据库应用中也都会使用。

哈希算法

InnoDB存储引擎使用哈希算法来对字典进行查找,哈希碰撞采用转链表解决,哈希函数采用除法散列方式。

例如:当前参数InnoDBbufferpool_size大小为10M,则共有640个16k的页,对于缓冲页内存的哈希表来说,需要分配640×2=1280个槽,但是由于1280不是质数,所以需要取比1280更大的一点的质数,应该是1399,所以启动的时候,会分配1399个槽的哈希表,用来哈希查询所在的缓冲池中的页。

InnoDB存储引擎是通过除法散列到1399个其中的一个槽中。

自适应哈希索引

自适应哈希索引采用之前说的哈希表方式,不同的是哈希索引对字典类型的等值查找非常快,对范围查询就无能为力了。

所以说哈希索引只能用于搜索等值查询,范围查询是不能使用哈希索引。

全文索引

之前已经说过,B+树索引的特点,对于使用如下sql,是支持B+树索引的,只要content 加了B+树索引,就能利用索引进项快速查询。

我们通过 B+ 树索引可以进行前缀查找,如:

select * from blog where content like 
'xxx%'
;

只要为content列添加了B+树索引(聚集索引或辅助索引),就可快速查询。但在更多情况下,我们在博客或搜索引擎中需要查询的是某个单词,而不是某个单词开头,如:

select * from blog where content like 
'%xxx%'
;

此时如果使用B+树索引依然是全表扫描,而全文检索(Full-Text Search)就是将整本书或文章内任意内容检索出来的技术。

根据B+树索引的特点是不支持的,InnoDB存储引擎从1.2.x开始支持全文索引技术,其特性支MyISAM的全部功能。

具体实现原理接下来会介绍

倒排索引

全文检索使用倒排索引来实现,倒排索引同B+树索引一样,也是一种数据结构,它在辅助表中存储了单词与单词自身在一个或多个文档中所在位置的映射,这通常利用关联数组实现。

倒排索引它需要将分词(word)存储在一个辅助表(Auxiliary Table)中,为了提高全文检索的并行性能,共有6张辅助表。辅助表中存储了单词和单词在各行记录中位置的映射关系。它分为两种:倒排文件索引,详细倒排索引

1、inverted file index(倒排文件索引),表现为{单词,单词所在文档ID}

2、full inverted index(详细倒排索引),表现为{单词,(单词所在文档ID, 文档中的位置)}

全文检索表

DocumentIDText1Souyunku Technical team 2Go Technical stack

Interviewer: You talk about the realization of the principle of MySQL indexes?

 

inverted file index(倒排文件索引)-辅助表存储为

倒排文件索引类型的辅助表存储为:

Documents 表现为{单词,单词所在文档ID}

NumberText 分词Documents 1Souyunku12Technical1,23team14Go25stack2

Interviewer: You talk about the realization of the principle of MySQL indexes?

 

full inverted index( 详细倒排索引)-辅助表存储为

详细倒排索引类型的辅助表存储为,占用更多空间,也更好的定位数据,比提供更多的搜索特性:

Documents 表现为{单词,(单词所在文档ID, 文档中的位置)}

NumberTextDocuments

1Souyunku1:12Technical1:2 ,2:23team1:34Go2:15stack2:3

全文检索索引缓存

辅助表是存在与磁盘上的持久化的表,由于磁盘I/O比较慢,因此提供FTS Index Cache(全文检索索引缓存)来提高性能。FTS Index Cache是一个红黑树结构,根据(word, list)排序,在有数据插入时,索引先更新到缓存中,而后InnoDB存储引擎会批量进行更新到辅助表中。

当数据库宕机时,尚未落盘的索引缓存数据会自动读取并存储,配置参数innodbftcache_size控制缓存的大小,默认为32M,提高该值,可以提高全文检索的性能,但在故障时,需要更久的时间恢复。

When you delete data, InnoDB index data is not deleted, but stored in DELETED auxiliary table, so after a period of time, the index will become very large, you can manually remove the invalid command recorded by the index optimize table. If you need to remove the content very much, will affect the availability of the application, the number of parameters innodbftnumwordoptimize control every word deleted, the default is 2000, the user can adjust the parameters to control the magnitude deleted.

Some limitations of full-text indexing

1, now only support myisam and innodb

2, does not support the partition table

3, multi-column text retrieval index must use the same combination of the character set and character sequence

4, do not support hieroglyphs. Ngram need to word

5, full-text indexing of each field must be unified

6, match () in the lookup column must be defined over the years in the fulltext index

7, against () must be a string constant, and

8, the index hint will be even worse

9, in innodb, all DML operations related to the full-text indexed column (update, insert, delete), only when the transaction is committed, execution. The middle may be word, markers, etc.

10, can not use wildcards%

11, does not support the language has no word delimiters (delimiter), such as Chinese, Japanese, Korean, etc.

Guess you like

Origin blog.csdn.net/mifffy_java/article/details/90768257