Why use MySQL index B + tree implementation?

Foreword

When looking for the specified data from a bunch of data, we used data structure is a hash table and a binary search tree, the table is essentially a collection of a bunch of data, so the MySQL database with a B + tree and hash table to achieve index

B + Tree is a binary search tree through, then the balanced binary tree, the tree B (also known as B- tree) evolved, B + tree B do not represent binary (binary), but represent the balance (Balance), because B + tree is evolved from the earliest balanced binary tree, but the tree is not a binary tree B +

Binary search trees and balanced binary tree

Binary search trees and balanced binary tree search efficiency efficiency is already high, why not these two data structures to achieve the index it? Take your analysis

Binary search tree is a binary tree with special attributes, you need the following attributes

Non-leaf node has at most two child nodes

Non-leaf nodes left child node is greater than, less than the right child node

No duplicate node values ​​are equal;

 

FIG on the binary tree to find, such as the search key record 5, first find the root, when its value is 6, greater than 5, 6 look left subtree, 3,5 find more than 3, to find its right subtree, I found a total of three times. Similarly, to find the key for the record 8, with three times. Average seek times of all keys (1 + 2 + 3 + 3 + 2 + 3) /6=2.3 times, to find if these keys sequentially, to find the average number of (1 + 2 + 3 + 4 + 5 + 6) /6=3.3 (to find the sequence number display, the first number must be 1, and the second number is 2, and so on), clearly average binary search tree for a more than a sequential search speed fast

Binary search tree can be arbitrarily configured, if the binary search tree is configured as follows

 

Average seek rate of (1 + 2 + 3 + 4 + 5 + 5) /6=3.16 times, and almost sequential search. Binary search tree in order to improve the efficiency of queries, the number needs to be balanced binary search, which leads to a balanced binary tree.

In addition to meeting the above three balanced binary attributes, but also to satisfy the following properties 1

The number of hierarchical tree of the left and right difference of no greater than 1

Balanced binary tree search efficiency really fast, but the cost of maintaining a balanced binary tree is very large, you need one or more times to get the left and right hand balance inserted or updated tree. Simple example.

The initial balanced binary tree

 

Insert 3

 

Right-handed once

 

L once again

 

As a popular science article, details here are not left-handed right-handed analysis, put a few pictures left and right hand to be able to understand

 

L to be x, x meant to become a left node

 

Y to be right-handed, which means will become a right node y

 

回头看上面例子的左旋和右旋,是不是很清楚了?

B树和B+树

B树和B-树是同一种树,假如用平衡二叉树实现索引,效率已经很高了,查找一个节点所做的IO次数是这个节点所处的树的高度,因为我们无法把整个索引都加载到内存,并且节点数据在磁盘中不是顺序排放的。所以最坏情况下,磁盘的IO次数为树的高度。

虽然平衡二叉树查找效率确实很高,但是频繁的IO才是阻碍提高性能的瓶颈,怎样减少IO次数呢?前辈们很聪明的提出了局部性原理,分为时间局部性原理,即假如你查询id为1的用户数据,过一段时间你还会查询id为1的数据,所以会将这部分数据缓存下来。空间局部性原理,当你查询id为1的用户数据的时候,你有很大的概率会去查询id为2,3,4的用户的数据,所以会一次性的把id为1,2,3,4的数据都读到内存中去,这个最小的单位就是页。

所以你看到的B树是这样的

 

B+树是这样的

 

那么B树和B+树的区别在哪呢?

B+跟B树不同B+树的非叶子节点不保存键值对应的数据,这样使得B+树每个节点所能保存的键值大大增加;

B+树叶子节点保存了父节点的所有键值和键值对应的数据,每个叶子节点的键值从小到大链接;

B+树的根节点键值数量和其子节点个数相等;

B+的非叶子节点只进行数据索引,不会存实际的键值对应的数据,所有数据必须要到叶子节点才能获取到,所以每次数据查询的次数都一样;

放个图理解的更清楚一点,B树

 

B+树

 

在B+树的基础上每个节点存储的关键字数更多,树的层级更少所以查询数据更快,所有关键字指针都存在叶子节点,所以每次查找的次数都相同,查询速度比B树更稳定。除此之外,B+树的叶子节点是跟后序节点相连接的,这对范围查找是非常有用的。

聚集索引和联合索引

在InnoDB存储引擎中,是以主键为索引来组织数据的。在InnoDB存储引擎中,每张表都有个主键,如果在创建表时没有显示的定义主键,则InnoDB存储引擎会按如下方式选择或创建主键。

首先判断表中是否有非空的唯一索引,如果有,则该列即为主键

如果不符合上述条件,InnoDB存储引擎自动创建一个6字节大小的指针作为索引

如果有多个非空唯一索引时,InnoDB存储引擎将选择建表时第一个定义的非空唯一索引作为主键

假如说有如下数据,用户id为主键(1, tom),(2,mike),(3,sam),(4,lisa),(5,li)则数据是这样存储的,图1

 

假如说我们现在对用户名建索引,用户名索引是怎么存的呢?图2

 

用户名索引叶子节点数据存储的是主键,所以当我们运行如下sql语句时

过程是这样的,先在name索引上找到对应的主键,在根据对应的主键去建表时建立的B+树上找到对应的记录,即先在图1上找,再到图2上找。

聚集索引:数据行的物理顺序与列值(一般是主键的那一列)的逻辑顺序相同,一个表中只能拥有一个聚集索引。图1用的就是聚集索引

非聚集索引:定义:该索引中索引的逻辑顺序与磁盘上行的物理存储顺序不同,一个表中可以拥有多个非聚集索引。图2用的就是非聚集索引

最后再说一个联合索引,联合索引是指对表上的多个列进行索引。创建方式如下:

联合索引也是一颗B+树,不同的是联合索引的键值的数量不是1,而是大于等于2,多个键值的B+树是如下存的

 

可以看到键值都是排序的,就上面的例子来说(1,1)(1,2)(2,1)(2,4)(3,1)(3,2),数据按照(a,b)的顺序进行了存放。

因此对于查询select * from table where a = xxx and b = xxx,显然是可以使用(a,b)这个联合索引的。对于单个的a列查询select * from table where a = xxx,也可以使用(a,b)这个索引。但对于b列的查询select * from table where b = xxx,则不可以使用这颗B+树索引。可以发现叶子节点上的b值为1,2,1,4,1,2,显然不是排序的,因此对于b列的查询使用不到(a,b)的索引

自适应哈希索引

InnoDB存储引擎会监控对表上各项索引页的查询。如果观察到建立哈希索引可以带来速度提升,则建立哈希索引,称之为自适应哈希索引,DBA不能对建立哈希索引的过程进行干预,只能启动或禁用自适应哈希索引

数据库一般采用除法散列的方法,即取k除以m的余数,将关键词k映射到m个槽的某一个去,即哈希函数为h(k) = k mod m,当发生冲突时,即两个关键字可能映射到同一个槽上,采用链接法,即以链表的形式保存冲突的关键字,和HashMap类似

当对热点数据建立了哈希索引以后,省去在B+树上进行查找,可以极大地提高服务的性能,自适应哈希索引对于字典类型的查找非常迅速,如select * from table where id = xxx,但是对于范围查找就无能无力了

 

Guess you like

Origin www.cnblogs.com/longqin/p/11671960.html