The 36th day index (binary tree, balanced binary tree, b tree, b+ tree) Innodb storage engine index index simple use of joint index

什么是索引?
    索引是存储引擎中的一种数据结构,或者说是数据的组织方式,又称之为键key
    为数据建立索引就好比是为书建目录
    
为何要用索引?
    为了优化查询效率
    ps:创建完索引后会降低增、删、改的效率
    好就好在读写比例10:1

如何正确看待索引?
    开发人员最懂业务,任何一个软件都有其吸引用户的亮点
    亮点背后对应的是热数据,这一点开发人员是最清楚的,
    开发人员最了解热数据对应的数据库表字段有哪些,所以
    应该在开发软件的过程中就提前为相应的字段加上索引,而不是
    等到软件上线后,让DBA发现慢查询sql后再做处理,因为
    1、一个软件慢会影响用户体验,但是慢的原因有很多,你不能立即确定
    是sql的问题,所以等到定位到sql的问题,可能已经过去了很久,问题已经被
    拖了很久了
    2、因为大多数DBA都是管理型DBA而非开发型,所以即便是DBA从日志中看到了慢查询sql,
    也会因为其不懂业务而很难分析出慢的原因,最后这顶锅还是得扣在你开发的脑袋上,
    躲得过初一躲不过十五啊
索引到底是一种什么样的数据结构:B+树
    二叉树 平衡二叉树 B树 -》B+树
    小于节点在左  大于节点在右  
    平衡二叉树:要求每个节点的左右子树的高度不能超过1
    
    B树相对于平衡二叉树的优点:每个节点存储的数据量更多了,降低了树的高度
    
    B+树与B树的区别:只有叶子节点才放完整的数据,根节点与枝节点只放索引值,
    这样减少了每个节点硬盘空间的占用,可以使每个节点能存储更多的索引数据,
    从而降低了树的高度
    
    表--》书
    记录--》一页内容

    索引-》书的目录

索引就是b+树的模式

b+树的优点:
	b+树的高度是最低的,因此查询的效率是最高的
	
	b树与b+树查询范围sql语句的差别:
	select * from user where id > 12 and id < 15;
	b+树的叶子节点都是排好序的,这就意味着在范围查询上,b+树比b树更快,
	快就快在一旦找到了一个树叶节点,就不需要再从根节点查起了

Binary search tree
Insert picture description here
As you can see from the figure, we have established a binary search tree index for the user table (user information table). The circle in the figure is the node of the binary search tree, and the key and data are stored in the node. The key corresponds to the id in the user table, and the data corresponds to the row data in the user table. The characteristic of the binary search tree is that the key value of the left child node of any node is less than the key value of the current node, and the key value of the right child node is greater than the key value of the current node. The top node is called the root node, and the node without child nodes is called the leaf node.
If we need to find user information with id=12, using the binary search tree index we created, the search process is as follows:

1. Take the root node as the current node, compare 12 with the key value 10 of the current node, 12 is greater than 10, and then we take the right child node of the current node> as the current node.

2. Continue to compare 12 with the key value 13 of the current node, and find that 12 is less than 13, and use the left child node of the current node as the current node.

3. Compare 12 with the key value 12 of the current node, 12 is equal to 12, and if the condition is met, we take data from the current node, that is, id=1>2, name=xm.

Using a binary search tree, we only need 3 times to find matching data. If we search one by one in the table, we need 6 times to find it.

    select * from user where id = 12;   # 命中索引   
    select * from user where name = "xxx";  # 没有命中索引

Balanced Binary Tree At
Insert picture description here
this time, we can see that our binary search tree has become a linked list. If we need to find user information with id=17, we need to find 7 times, which is equivalent to a full table scan. The reason for this phenomenon is actually that the binary search tree has become unbalanced, that is, the height is too high, which leads to unstable search efficiency. In order to solve this problem, we need to ensure that the binary search tree is always balanced, and we need to use a balanced binary tree.
The balanced binary tree is also called the AVL tree. On the basis of satisfying the characteristics of the binary search tree, the height of the left and right subtrees of each node must not exceed 1. The following is a comparison between a balanced binary tree and an unbalanced binary tree:
Insert picture description here
From the structure of the balanced binary tree, we can find that the binary tree in the first figure is actually a balanced binary tree. The balanced binary tree ensures that the structure of the tree is balanced. When we insert or delete data and cause the unbalanced binary tree to be unbalanced, the balanced binary tree will adjust the nodes on the tree to maintain balance. The specific adjustment method will not be introduced here. Compared with the binary search tree, the balanced binary tree has more stable search efficiency and faster overall search speed.

B-tree
because of the volatile nature of memory. Under normal circumstances, we will choose to store the data and indexes in the user table in a peripheral device such as a disk. But compared with the memory, the speed of reading data from the disk will be hundreds of times, thousands of times or even ten thousand times slower, so we should try to reduce the number of times to read data from the disk. In addition, when reading data from the disk, it is read in accordance with the disk block, not one by one. If we can put as much data as possible into the disk block, then more data will be read in one disk read operation, and the time for us to find the data will be greatly reduced. If we use the data structure of the tree as the data structure of the index, then we need to read a node from the disk every time we look up data, which is what we call a disk block. We all know that a balanced binary tree but each node only stores A key value and data. What does that mean? Explain that each disk block only stores a key value and data! What if we want to store massive amounts of data? It can be imagined that the binary tree will have a lot of nodes, and the height will be extremely high. When we look up data, we will also perform many disk IOs, and our efficiency of looking up data will be extremely low!
Insert picture description here
In order to solve this disadvantage of the balanced binary tree, we should find a balanced tree in which a single node can store multiple key values ​​and data. This is the B-tree we will talk about next.
A B-tree (Balance Tree) means a balanced tree, and the picture below is a B-tree.

Note:
– The p node in the graph is a pointer to the child node. In fact, binary search trees and balanced binary trees also exist, because of the aesthetics of the graph, they are omitted. – Each node in the figure is called a page (a page occupies 16k of memory). The page is the disk block we mentioned above. The basic unit of data reading in mysql is a page, so we call it a page here, which is more in line with the index in mysql The underlying data structure.

As can be seen from the above figure, compared to the balanced binary tree, each node stores more keys and data, and each node has more child nodes, and the number of child nodes is general Called the order, the B tree in the above figure is a 3 order B tree, and the height will be very low. Based on this feature, the number of times that the B-tree searches for data and reads the disk will be few, and the efficiency of data search is much higher than that of the balanced binary tree.
If we want to find user information with id=28, then the process we look up in the tree B in the above figure is as follows:

  1. First find the root node, which is page 1, and judge that 28 is between the key values ​​of 17 and 35. Then we find page 3 according to the pointer p2 in page 1.

  2. Compare the key value of 28 and page 3, 28 is between 26 and 30, we find page 8 according to the pointer p2 in page 3.

  3. Comparing the key values ​​in 28 and page 8, it is found that there is a matching key value 28, and the user information corresponding to the key value 28 is (28, bv).

Note:-There are some regulations on the structure of the B-tree, but this is not the focus of this article. Students who are interested can understand it. – The B-tree is also balanced. When adding or deleting data causes the B-tree to be unbalanced, node adjustments are also required.

B+ tree
B+ tree is a further optimization of B tree. Let us first look at the structure diagram of the B+ tree:
Insert picture description here
According to the above figure, let's see what is the difference between the B+ tree and the B tree.

1. B+ tree non-leaf nodes do not store data, only key values, while B-tree nodes not only store key values, but also store data. The reason for this is because the page size in the database is fixed, and the default page size in InnoDB is 16KB. If data is not stored, then more key values ​​will be stored, the corresponding tree order (the node's child node tree) will be larger, the tree will be shorter and fatter, so that we can find the data for disk The number of IOs will decrease again, and the efficiency of data query will be faster. In addition, the order of the B+ tree is equal to the number of key values. If one node of our B+ tree can store 1000 key values, then the 3-layer B+ tree can store 1000×1000×1000=1 billion data. Generally, the root node is resident in memory, so generally we only need 2 disk IO to find 1 billion data.

2. Because all the data of the B+ tree index is stored in the leaf nodes, and the data is arranged in order. Then B+ tree makes range search, sort search, group search and de-duplication search extremely simple. The B-tree is not easy to achieve because the data is scattered across various nodes.

innodb storage engine index

innodb存储引擎索引分类:
    1 hash索引(key:value的形式)
        更适合等值查询,不适合范围查询
        
    2 B+树索引
        聚集索引/聚簇索引-》以主键字段的值作为key创建的索引
        辅助索引:针对非主键字段创建的索引(一张表中可以有多个,辅助查询可以直接查询到非主键字段创建的索引和key索引)
            
    innodb-》索引组织表
    回表查询:通过辅助索引拿到主键值,然后再回到聚集索引从根再查一遍
    覆盖查询:不需要回表就能拿到你想要的全部数据
    
    select name,age,gender from user where id = 3;  # 聚集索引查询
    select name,age,gender from user where name = "nana";   # 辅助索引查询,且也是回表查询
    select name,id from user where name = "nana";   # 覆盖查询

Simple use of index

   create table t1(
        id int,
        name varchar(10),
    );
    
    create index id_xx on t1(id);   # 创建索引
    drop index id_xx on t1;     # 删除索引
    

Test index speed-up code

创建一个数据量比较大的表s1
#1. 准备表
create table s1(
id int,
name varchar(20),
gender char(6),
email varchar(50)
);

#2. 创建存储过程,实现批量插入记录
delimiter $$ #声明存储过程的结束符号为$$
create procedure auto_insert1()
BEGIN
    declare i int default 1;
    while(i<3000000)do
        insert into s1 values(i,'nana','female',concat('nana',i,'@beautiful_girl'));
        set i=i+1;
    end while;
END$$ #$$结束
delimiter ; #重新声明分号为结束符号

#3. 查看存储过程
show create procedure auto_insert1\G 

#4. 调用存储过程
call auto_insert1();
250条记录=》ibd文件的大小167M   
 
验证sql语句
select count(*) from s1 where id = 33333;   # 查询时间慢

create index id_xx on s1(id);
select count(*) from s1 where id = 33333;   # 加上索引后查询时间提速很明显

explain select count(*) from s1 where id = 33333;   # 查看查询计划,是否命中索引

select count(*) from s1 where id > 3;   # 查询时间慢,查询范围大
select count(*) from s1 where id > 3 and id < 700;    # 查询时间快,范围比较小
select count(*) from s1 where id = 3;   # 查询时间快
结论:命中索引也不一定能起到很好的提速效果
 总结:
        1.应该对区分度高且占用空间比较小的字段建索引
        
        2.针对范围查询中了索引,如果范围很大,查询效率依然很低,如何解决
            要么把范围缩小
            要么就分段取值,一段一段取最终把大范围取完
            
        3.索引下推技术(默认开启)
            mysql会自动分析用哪个索引查询会更快
            
        4.不要查询字段放到函数或者参与运算
        select count(*) from where id*12 = 3;
        select count(*) from where id = 3/12;
        
        5.索引覆盖
            运行效率是最高
            
        6.联合索引
            最左前缀匹配原则
            把多个值匹配到一起,建立联合索引
            假设联合索引中建了两个辅助索引name和gender,可以命中name或者是name+gender,无法单独命中gender
        create index idx_id_name_gender on s1(id,name,gender);  
        id
        id name
        id gender
        id name gender

Guess you like

Origin blog.csdn.net/Yosigo_/article/details/114026256