And B + tree index Detailed InnoDB

What is the index

Is the value of the index or columns in a database table for a configuration of ranking, fast access to specific information index database table.

For example, when we use the dictionary elementary school text search, find, catalog this time is in effect in accordance with the index directory. If there is no table of contents, when we want to find a character can only be traversed from the start of the first order. Therefore, the index is a tool to improve the retrieval speed.

Index structure in InnoDB

Recall B + tree

Before describing the database indexes let's take a look at B + tree, why should review it what? Because InnoDB implemented in the form of an index is a B + tree to achieve.

Related concepts students on B + tree can review data. Here we look at a few of his several features suitable for use as a database index structure

  1. There k th node of the tree contains k intermediate elements (B tree is k-1 elements), each element of data is not saved, the index only used, all data stored in the leaf node. Since the intermediate node does not save the data, the data thus stored in the same space for more, the larger the index, a single disk access to obtain more information, can effectively reduce the disk access io. Since only the data stored in the leaf node, so each lookup performance is very stable.
  2. All leaf nodes contains information of all the elements, and these elements contain pointers to records, and the leaf node itself according to the size and large keyword brought sequentially linked. Mysql is a relational database, access interval is a common scenario, a child node can effectively improve the efficiency of the access section based on the relationship list size by string together.

We look at the map to familiarize yourself with the search process B + tree.

Suppose we want to find the number 4, then find the process is as follows.

The first disk access

The second disk access

The third disk access

The most basic index - primary key index

Data in the database data pages in a singly-linked list is formed according to the master key. Each data page has a certain size, if a data page is full so we need more data pages to store data, and the data page through the list to string together. The upper limit is now assumed that data of a page of data can be stored is two. So we have to guess what data is stored in the database might look like.

But we look only save the global scan head start in accordance with the list in the above structure to point to find the data we need.

Then we will increase the portion of the index data structure composed of B + tree if you can improve the efficiency of the query? The answer is definitely yes. Then we convert it to a lower figure.

There is not a student asked the two data page is stored it? How the figure some three? Because of this, please ignore this is b + tree, the intermediate nodes might store data points. (Fundamental problem is that I am lazy, reuse the original map)

Data above the leaf nodes arranged according to the size of a certain value in a page, this value is usually our database primary key clustered index in the construction of the table when the primary key is usually for the table, the data table in the data page according clustered index in ascending order.

This time we think about it, if we insert a value of 8 data, the structure of which will be how to change?

We first find the location of 8 should be stored and found this page has is full, in order to ensure the orderly nature of our child node, it can only be a page split operation. Such efficiency significantly decreases. So we used to always use the default increment sequence database as the primary key. But not necessarily the primary key index is a clustered index. (Leave that to the students to consider it)

General index

Single-field index is the only index to a field Judging from the name. Joint index as the index is more than one field with the chant. We look at the following table structure

CREATE TABLE `tb_predict_user_info` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT COMMENT '自增主键id,赛季id',
  `user_id` bigint(20) NOT NULL DEFAULT '0' COMMENT '用户ID',
  `app_id` int(11) NOT NULL DEFAULT '0' COMMENT '用户的appID',
  `bonus_amount` bigint(20) NOT NULL DEFAULT '0' COMMENT '用户红包的总金额,单位分',
  `is_del` tinyint(4) NOT NULL DEFAULT '0' COMMENT '软删除标记,0:未删除1:已删除',
  `create_time` datetime(3) NOT NULL DEFAULT CURRENT_TIMESTAMP(3) COMMENT '记录创建时间',
  `update_time` datetime(3) NOT NULL DEFAULT CURRENT_TIMESTAMP(3) ON UPDATE CURRENT_TIMESTAMP(3) COMMENT '记录最后更新时间',
  PRIMARY KEY (`id`),
  KEY `index_user` (`user_id`)
  KEY `index_user_app` (`app_id`,`user_id`)
) ENGINE=InnoDB AUTO_INCREMENT=9 DEFAULT CHARSET=utf8mb4 COMMENT='用户信息表';
复制代码
  • index_user is a single-field index
  • index_user_app is a joint index

For the above table, stored in the database structure is what it?

For in each index, InnoDB will build a B + tree data index, but in order to save space reasons, and only clustered index leaf node will preserve the integrity of the data, the leaf node of other B + tree, save only the corresponding index data fields.

In the rows in the database format may be simplified as shown below

record_type: type of data

  • 0: Normal user records
  • 1: Directory records
  • 2: Minimum Record
  • 3: Max Records

Index page as shown below

FIG following data page

InnoDB index used in the query

Above we said, the way data is B + tree data stored in the database. So how do we find the time to find data in what?

Find by primary key index is not a process can be simply understood as the following procedure which?

  1. Find assume primary key data 10000.
  2. Since the data of the same page is already sorted, can be found in the pages by half.
  3. Find where to find the root page of the index the next level by half.
  4. And so find the leaf nodes.
  5. Find the target data resides in a leaf node.

上面我们说了在Innodb中有多少个索引就有多少个B+树,那么这样的话数据会不会重复?如何解决这个问题哪?

答案当然是不会重复啊!只有主键索引(聚簇索引)的叶子节点才会保存所有的数据,而其他的索引中并不会保存所有的数据,只保存了作为索引的值以及该条记录所在的主键值。

那么我们根据普通索引的过程又是如何哪?

其实跟我们通过主键索引查找很类似,只是由于叶子节点不保存记录的所有数据,所以需要根据主键再次进行一次查找,这个过程就是我们通常说的回表(回表:再次回到表中进行一次查询)。

既然又回表,那么如何避免回表哪?

叶子节点不保存记录的所有字段,但是保存了索引字段的值啊,那么如果我们查询的字段只有索引字段,是不是就可以避免回表了哪?

答对了,如果我们的查询字段的所有字段都可以被使用到的索引字段所覆盖,那么就可以避免回表,这就是我们通常说的覆盖索引。

范围查询走索引么?

上面我们说的都是指定某个索引条件进行查询,但是在日常开发中我们不可避免的会遇到范围查询(>,<,!=)等等,网上查找相关资料,有人说不走索引,有人说走索引(网络是开放的,大家需要自行识别真伪),那么范围查询到底走不走索引?如果走索引,这个时候索引在我们的查询中又有什么作用哪?

我们还是回到B+树的查找过程。 大家先自己想一下,如果你在B+树中查找 索引值>5000某个值时会如何操作哪?

  1. 由于在数据库的B+树中,数据都是从大到小进行排列。
  2. 我们先查找id=5000的值所在的叶子节点,然后通过next指针顺序遍历查找符合条件的值
  3. 通过回表进行其他字段的查找。

上面的过程就是在>,<的情况下如如何走索引的 注:由于数据库查询引擎的自动优化,同学在测试这个场景的时候最好加上limit来限定查询个数,如果查询个数过多可能会触发全表扫描。

还有一种类似于!=的操作,那么这个走索引么? 我们大家来思考一下,通过B+树来查找一个!=某个值的数据要如何操作哪? 1、先找到等于这个值的数据,然后查找其他的数据?

是不是感觉跟全表扫描差不多(全表扫描还不需要回表哪!),我们可以认为这样操作的效率要低于全表扫描,没表要使用到索引啊,所以这类查询并不会用到索引。

索引除了在where条件中会用到,在order by的字段中会用到么?

还是之前的方式,我们来思考一下,如果我们的查询是 select * from tb_table order by id asc limit 100。这个时候要如何查询哪? 1、由于表中的数据都是按照id进行从大到小排序的。 2、所以我们只要找到id最小的,然后顺序遍历出符合条件个数的值。 3、如果需要回表在进行回表操作

order by一定会走索引么?

order by原来也会用到索引啊!那么如果我们的索引是userid_tradeDay,查询条件是 select * from table order by userid desc, tradeDay asc,两个排序条件不一致。这个时候还会走索引么? 老方式,再来看一下这个查询我们在B+树中应该如何操作。

B+树中的索引字段都是按照从小到大排列的,我们要查询这两个索引的排序方式不一致,,,,,,,,,,,无法用到索引啊。

在来看一种情况

如果我们的索引是userid_tradeDay,查询条件是 select * from table where user_id = ?, userid和tradeDay作为联合索引,但是查询条件中的where条件只有user_id, 第二个查询条件是 select * from table where trade_day = ?,查询条件只有trade_day,没有user_id。

上面两个查询会走索引么? 还是从B+树出发。 1、我们的索引是先按照user_id从小到大排序,在按照trade_day从小到大排序。 2、我们的查询条件是user_id,那么当作trade_day没有呗,这样不就走索引进行查询了么。 3、我们的查询条件是trade_day,先按照。。。。。。啊啊啊啊啊啊啊,走不了索引我,我想不出啊啊啊啊啊啊啊啊啊啊。

上面所说的就是左前缀匹配原则,那么我们的where条件中使用了user_id和trade_day,但是两个顺序在where条件中颠倒了会走索引么。 会的,查询优化器会帮我们做的。

我们来看下面的表

CREATE TABLE `tb_user_info` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT COMMENT '自增主键id,赛季id',
  `user_id` bigint(20) NOT NULL DEFAULT '0' COMMENT '用户ID',
  `user_id_c` varchar(200) NOT NULL DEFAULT '' COMMENT '用户ID',
  PRIMARY KEY (`id`),
  KEY `index_user` (`user_id`)
  KEY `index_userc` (`user_id_c`)
) ENGINE=InnoDB AUTO_INCREMENT=9 DEFAULT CHARSET=utf8mb4 COMMENT='用户信息表';
复制代码

上面的表主要有两列int类型的user_id和varchar类型的user_id_c以及相关索引。 我们表中存在一下数据

带引号是为了说明是字符串 我们看下面几个查询语句。 A:select * from tb_user_info where user_id = 0; B:select * from tb_user_info where user_id = '0'; C:select * from tb_user_info where user_id = 'abc'; D:select * from tb_user_info where user_id_c = 0; E:select * from tb_user_info where user_id_c = '0'; F:select * from tb_user_info where user_id_c = 'abc'; 上面6条语句的查询结果是什么,以及上面查询是否走索引了哪? A:查询结果为id=1的数据,走了index_user索引。

B:查询结果为id=1的数据,走了index_user索引。

C:查询结果为id=1的数据,走了index_user索引。

D:查询结果为id=1的数据,走了index_user_c索引

E:查询结果为id=1的数据,走了index_user_c索引

F:查询结果为空,走了index_user_c索引

看到这里是不是有疑问? 1、我都建立索引了,怎么有时候走,有时候不走? 2、我查=’abc‘的怎么返回了=0的数据?

这里我们引入了一个概念:隐式转换

在我们查询条件的类型和数据库字段的类型不一致时,mysql会进行会进行以下操作:

  1. 两个参数至少有一个是 NULL 时,比较的结果也是 NULL,例外是使用 ⇔ 对1. 两个 NULL 做比较时会返回 1,这两种情况都不需要做类型转换 两个参数都是字符串,会按照字符串来比较,不做类型转换
  2. 两个参数都是整数,按照整数来比较,不做类型转换
  3. 十六进制的值和非数字做比较时,会被当做二进制串
    1. 有一个参数是 TIMESTAMP 或 DATETIME,并且另外一个参数是常量,常量会被转换为 timestamp 有一个参数是 decimal 类型,如果另外一个参数是 decimal 或者整数,会将整数转换为 decimal 后进行比较,如果另外一个参数是浮点数,则会把 decimal 转换为浮点数进行比较
  4. 所有其他情况下,两个参数都会被转换为浮点数再进行比较

针对上面的原则我们来看刚才我们的疑问

1、我都建立索引了,怎么有时候走,有时候不走? 发现了没,走索引的都是类型一致或者数据库类型是int类型的? 类型一致走索引没有什么疑问,那为什么类型转换后有可能不走索引哪? 我们来想以下float数字的排序和字母的排序: 3,21 '21','3', 类型不一致时排序结果不一样啊同学!。 所以按照我的理解,可以简单认为发生隐式类型转换的时候,如果转换方是数据库的字段类型,这个时候索引就不生效了。(也不知道对不对,欢迎同学Diss) 2、我查=’abc‘的怎么返回了=0的数据? 类型不一致会进行转换啊!‘abc’转换的时候转为为数字是多少哪?无法转换啊,当然就是默认的0了,所以~你懂的。

总结:

当不知道走不走索引的时候,就会想一下B+树,如果查询条件是xxx,索引是xxx,我来查询的时候如何能够最优。。。。。。

Guess you like

Origin juejin.im/post/5df64970f265da339d106059