MySQL InnoDB Storage Engine Architecture - Index Advanced

Reprinted address: https: //mp.weixin.qq.com/s/HNnzAgUtBoDhhJpsA0fjKQ

The world's only two things can shock people's hearts: one is a lofty ethical standards in our hearts; the other is brilliant sky above us - [Kant]

 

Hello everyone, today I share with you some MySQL-related knowledge, and hope this article can give us some help in the work.

 

In the interview, the interviewer will often ask some database optimization. For example: How to speed up queries. Usually they are generally the kind of answer

  1. Indexed

  2. Modify sql, reducing unnecessary fields

  3. limit

  4. Sub-library sub-table

  5. and many more

 

The answer is very superficial . Since the index can speed up queries, Well, here we talk about B + index under the MySQL InnoDB storage engine.

 

In MySQL's InnoDB engine, in order to speed up the search, you can add an index, table of contents is like a book in the field, to locate the contents of the book in which page through the directory.

 InnoDB supports indexes are summarized as follows:

  • B + tree index

  • Full-text index

  • Hash indexes

        

The author of this article already mentioned, InnoDB hash index is adaptive, users can not intervene, not discussed here, this article focuses on B + tree index.

"MySQL InnoDB Storage Engine Architecture - Memory Management"

 

 

 

 

 

 

https://blog.csdn.net/nuoWei_SenLin/article/details/83034832

 

 

 

 

 

 

 

-B + tree data structure 01

I think we all learned a binary search data structures in the course of the university, the binary tree and balanced binary tree. In an ordered set of data, use binary search to quickly retrieve data complexity log2N, a balanced binary tree is evolved on the basis of the binary search tree, to solve the binary search tree into in extreme cases questionnaire list. The B + trees? Let's look at the structure of B + tree

 

In the B + tree, the data are stored in accordance with the order from the large to the leaf nodes, the figure can be derived B + tree, B + tree height Zheke is 2, each of which can store four data fanout 5, a first page index layer, the second layer is a data page. Essence database B + tree index is a B + tree in the database to achieve, and the height of the B + tree is generally limited to 2-4 layers, magnetic disk IO operations requires only 2-4 times, so look for the data in the index, fast.

 

 

 

 

 

 02B + tree index

a. clustered index

In InnoDB engine, has a clustered index, typically primary key, if the user does not display the specified primary key, InnoDB table by default unique index to select the first not null primary key, if not, it will automatically create a 6 _rowid byte size as the primary key.

 

 

The figure is a schematic diagram of a clustered index from the figure, we can see, the tree is divided into two layers, the first layer is the same index page, the second layer is a data page, where real data is stored. We can also draw, index pages to store data but not the offset point to a real data and real data stored in the data page of the second layer, so if a SQL statement index hit, just hit the index page data and find the page where the real data through the index page.

Thoughts: clustered index is stored physically not continuous, logically it is continuous, it is because from page to page is maintained by doubly linked list, and each page is maintained by the Bank recorded a doubly linked list. Why doubly linked list? ?

This is because of the convenience range queries and sorting, such as through an index to find where the offset data page, or directly to traverse the list in reverse order traversal of this list, it can easily be sorted in reverse order and range queries. such as

select * from table where id>10 and id<1000;

 

b. secondary index

 

 

Another index, InnoDB secondary indexes, the secondary index is also called non-clustered index. For secondary indexes, the leaf does not contain all of the data rows, in addition to the leaf nodes contain keys, but also contains a called "bookmark" things, the bookmark is used to tell InnoDB to where to find the desired line of data, so the actual bookmark is stored in a clustered index, so if you hit a secondary index SQL query two-step process:

1. Find the index page

2, find the index page by page data, the data page contains the aggregated value of the index

3, through the clustered index to find rows

Therefore, the secondary index clustered index and more general than once IO.

 

 

一个很容易被DBA忽略的问题:如果一条SQL语句命中索引,B+树索引不能找到一个给定查询条件的具体行,只能找到被查询数据行所在的页,然后将这个数据读入内存,然后再内存中遍历所有行找到数据。另外,每一页大小为16k,每一页会包含多行,行与行之间是通过双向链表组织的,所以范围查询或者顺序倒序排序查询时,只需遍历链表就可以了。

 

03 索引的管理

方便测试,我们创建一张表t,并添加索引

create table t(

  a int primary key,

  b varchar(500),

  c int

);

alter table t add key idx_b (b(100));

alter table t add key idx_a_c (a,c);

alter table t add key idx_c (c);

表t,a字段是主键,b字段是字符串长度500,在b字段创建索引,索引名是idx_b,并且只对b的前100个字符创建索引,联合s索引idx_a_c,和索引idx_c;

通过命令可以查看某张表索引的创建情况

show index from tG;

 

 

我们来分析返回的信息

  • table:索引所在的表名

  • Non_unique:非唯一索引,我们可以看到primary key是0,代表非唯一索引

  • Key_name:索引的名字

  • Seq_in_index:索引中该列的位置,可以看索引idx_a_c就比较直观

  • Column_name:字段名字

  • Collation:一般都是A,此字段不重要

  • Cardinality:非常关键的一个字段,在下面细讲

  • Sub_part:是否是列的部分被索引,b字段长度500,我们只在b的前100长度上创建索引

  • Packed:不重要

  • Null:索引的列是否包含Null值

  • Index_type:索引类型,都是BTREE

  • Comment:注释

  • Index_comment:不重要

     

返回数据中,有一Cardinality字段,优化器会根据这个字段来选择是否使用这个字段,不过这个字段并不是实时更新的,如果实时更新,代价比较大,如果要更新Cardinality字段的值,可以使用如下命令

analyze table tG;

Cardinality字段代表什么意思呢?表示索引中不重复记录数量的预估值,Cardinality/count(*)的值尽可能接近1(几乎没有重复字段),如果这个比值很小接近0,表示该索引中这个字段的数据大部分都是重复的,那么用户可以考虑是否有必要创建这个索引。

 

那么InnoDB何时更新Cardinality的值呢?

如果每次更新操作都对Cardinality进行更新统计,那么代价是非常大的,因此InnoDB对Cardinality的更新策略如下:

    • 表中1/16的数据已发生过变化

    • start_modified_counter>2000000000  #20亿

 

如果表中某一行数据频繁的更新,表中数据量没变,变化的只是这一行。

InnoDB如何统计Cardinality的值呢?

  • 取得B+数叶子节点的数量,记作A

  • 随机取得8个叶子节点,统计每页不同记录得个数,记作p1,p2...p8

Cardinality = (p1+p2+..+p8)*A/8,因为是随机取得8个叶子节点,所以暗示着每次计算出得Cardinality的值有可能不同。

让我们看一下,我们公司测服上的数据库的Cardinality值

在工作中排查过的一个慢查询:

笔者有一个好朋友,在公司遇到一个很简单的单表查询,sql大概是这样的

select * from tb where status=1 and shop_id=1;

 

这张表数据量并不大,只有14万条,status字段上有索引,而且sql语句很简单,但是查询结果却要将近20s,笔者查询status字段Cardinality值为2,非常小,并没有用到status字段的索引,导致扫描全表。

 

关于覆盖索引:

  • 就是select的数据列只用从索引中就能够取得,不必从数据表中读取,换句话说查询列要被所使用的索引覆盖。

  • 如果一个索引包含了(或覆盖了)满足查询语句中字段与条件的数据就叫做覆盖索引。

  • 当发起一个被索引覆盖的查询(也叫作索引覆盖查询)时,在EXPLAIN的Extra列可以看到“Using index”的信息

 

举个例子如下,建表t,a是主键,b和c中添加联合索引(b_c),并插入一些数据

create table t(
     a int primary key auto_increment,
     b int, 
     c int,
     d int,
    key b_c (b,c)
);
insert into t(b,c,d) values(1,1,1);
insert into t(b,c,d) values(2,2,2);
insert into t(b,c,d) values(3,3,3);
insert into t(b,c,d) values(4,4,4);
insert into t(b,c,d) values(5,5,5);

 

 

 

example1:我们看到,匹配到了主键,在Extra列中,出现Using index的字样;

 

example2:我们看到,匹配到了(b_c),覆盖索引,key是b_c,在Extra列中,出现Using index的字样

 

example3:虽然查询条件是b,但是查询到的字段没有b/c而是d,所以key是NULL,没有用到索引;

 

example4:返回字段b c d,查询条件是b,索引没有完全覆盖到返回的字段。

 

 

example5:没有覆盖到索引

 

 

example6:索引中就包含c列的值,只用到了覆盖索引,Extra字段有Using index的字样

Guess you like

Origin www.cnblogs.com/chengshan/p/10980259.html