Slow indexing and query optimization

Slow indexing and query optimization

1. Why have indexes

Knowledge Review: Data are, inevitably requires that query data on the hard disk IO operations presence

In MySQL indexes also called "key", it is a data structure storage engine used to quickly find the record.

  • primary key

  • unique key

  • index key

Note that foreign key is not used to speed up queries, and we are not within the scope of the study, in addition to the above two accelerating query results as well as additional constraints before the three kinds of key (primary key: non-empty and only, unique key : The only), while the index key without any constraint function will help you speed up queries

The index is a data structure similar to the directory book. It means the later survey data should go first to find the data directory, instead of querying the data page of the way

 

2. What is the index

Eventually want to filter results by continually narrow the scope want to get the data, while the random sequence of events becomes an event,

In other words, with this indexing mechanism, we can always look to lock data the same manner.

 

3. Impact of the index

  • Subject to the availability of large amounts of data in tables, creating the index can be slow

  • After the index is created, query performance to the table will be greatly improved, but write performance is reduced

 

4. The index data structure

Speaking in front of the basic principles of the index, the complexity of the database, but also talked about the knowledge of the operating system, the purpose is to let everyone know, any kind of data structure is not created out of thin air, there will be its background and context, we to summarize, we need this data structure what can be done, it is actually very simple, that is: every time the data to find the number of disk IO control in a very small magnitude, preferably constant magnitude. Then we wonder whether if a highly controllable multiple search trees to meet demand? In this way, b + tree emerged (B + Tree is a binary search tree through, then the balanced binary tree, B tree evolved).

Only leaf nodes store the actual data, root and branch nodes exist only virtual data

The number of inquiries by the hierarchical decision tree, the lower the level, the less often

A disk the size of the pieces is a certain amount of data that can be stored is means certain. How to ensure the lowest level of the tree it? A disk storage space is relatively small pieces of data items

I think we should give us a table inside what Fields can reduce the level of index tree height >>> primary key id field

 

The clustered index and a secondary index

1. clustered index (primary key)

Clustered index actually refers to the primary key of the table, innodb engine specified in the table must have a primary key. First look at the storage engine.

myisam when construction of the table corresponds to the hard disk has several files (three)?

innodb when construction of the table corresponds to the hard disk has several files (two)? frm file stores the table structure, it is impossible to put the index, which means innodb index with data on the idb table data file.

: Features a complete record of a section of the leaf nodes put

Focus Index Benefits

1 . Sort Find and scope of its primary key lookup very fast, is a leaf node data users to query. 

2. Query range (range query), i.e., if the data in the primary key to find a range of, by the upper layer of the intermediate leaf node page range can be obtained, then the page can be read directly

2. The secondary index (unique, index)

Secondary indexes: query data when not all be used as a screening id conditions, may also use the information field name, password, etc., then you can not use this time to speed up query performance clustered index. It needs to be indexed to other fields, these indexes is called secondary indexes

Features: leaf node is stored in the primary key of that record index field corresponding to the value of the auxiliary (for example: creating an index according to the name field, then the leaf node is stored: the value corresponding to {name: name of the master that record is located } key)

select name from user where name='jason';

The above statement is called a covering index: only the leaf nodes of the secondary index has found all the data we want

select age from user where name='jason';

The above statement is called a non-covering indexes, though, when the index hit a query field name, but to check that the age field, so it needs to find it using the master key

 

 

6. Test Index

1. Prepare

#1. 准备表
create table s1(
id int,
name varchar(20),
gender char(6),
email varchar(50)
);

#2. 创建存储过程,实现批量插入记录
delimiter $$ #声明存储过程的结束符号为$$
create procedure auto_insert1()
BEGIN
    declare i int default 1;
    while(i<3000000)do
        insert into s1 values(i,'jason','male',concat('jason',i,'@oldboy'));
        set i=i+1;
    end while;
END$$ #$$结束
delimiter ; #重新声明 分号为结束符号

#3. 查看存储过程
show create procedure auto_insert1\G 

#4. 调用存储过程
call auto_insert1();
View Code

2.在没有任何索引的情况下

# 表没有任何索引的情况下
select * from s1 where id=30000;
# 避免打印带来的时间损耗
select count(id) from s1 where id = 30000;
select count(id) from s1 where id = 1;

# 给id做一个主键
alter table s1 add primary key(id);  # 速度很慢

select count(id) from s1 where id = 1;  # 速度相较于未建索引之前两者差着数量级
select count(id) from s1 where name = 'jason'  # 速度仍然很慢


"""
范围问题
"""
# 并不是加了索引,以后查询的时候按照这个字段速度就一定快   
select count(id) from s1 where id > 1;  # 速度相较于id = 1慢了很多
select count(id) from s1 where id >1 and id < 3;
select count(id) from s1 where id > 1 and id < 10000;
select count(id) from s1 where id != 3;

alter table s1 drop primary key;  # 删除主键 单独再来研究name字段
select count(id) from s1 where name = 'jason';  # 又慢了

create index idx_name on s1(name);  # 给s1表的name字段创建索引
select count(id) from s1 where name = 'jason'  # 仍然很慢!!!
"""
再来看b+树的原理,数据需要区分度比较高,而我们这张表全是jason,根本无法区分
那这个树其实就建成了“一根棍子”
"""
select count(id) from s1 where name = 'xxx';  
# 这个会很快,我就是一根棍,第一个不匹配直接不需要再往下走了
select count(id) from s1 where name like 'xxx';
select count(id) from s1 where name like 'xxx%';
select count(id) from s1 where name like '%xxx';  # 慢 最左匹配特性

# 区分度低的字段不能建索引
drop index idx_name on s1;

# 给id字段建普通的索引
create index idx_id on s1(id);
select count(id) from s1 where id = 3;  # 快了
select count(id) from s1 where id*12 = 3;  # 慢了  索引的字段一定不要参与计算

drop index idx_id on s1;
select count(id) from s1 where name='jason' and gender = 'male' and id = 3 and email = 'xxx';
# 针对上面这种连续多个and的操作,mysql会从左到右先找区分度比较高的索引字段,先将整体范围降下来再去比较其他条件
create index idx_name on s1(name);
select count(id) from s1 where name='jason' and gender = 'male' and id = 3 and email = 'xxx';  # 并没有加速

drop index idx_name on s1;
# 给name,gender这种区分度不高的字段加上索引并不难加快查询速度

create index idx_id on s1(id);
select count(id) from s1 where name='jason' and gender = 'male' and id = 3 and email = 'xxx';  # 快了  先通过id已经讲数据快速锁定成了一条了
select count(id) from s1 where name='jason' and gender = 'male' and id > 3 and email = 'xxx';  # 慢了  基于id查出来的数据仍然很多,然后还要去比较其他字段

drop index idx_id on s1

create index idx_email on s1(email);
select count(id) from s1 where name='jason' and gender = 'male' and id > 3 and email = 'xxx';  # 快 通过email字段一剑封喉 
View Code

联合索引

select count(id) from s1 where name='jason' and gender = 'male' and id > 3 and email = 'xxx';  
# 如果上述四个字段区分度都很高,那给谁建都能加速查询
# 给email加然而不用email字段
select count(id) from s1 where name='jason' and gender = 'male' and id > 3; 
# 给name加然而不用name字段
select count(id) from s1 where gender = 'male' and id > 3; 
# 给gender加然而不用gender字段
select count(id) from s1 where id > 3; 

# 带来的问题是所有的字段都建了索引然而都没有用到,还需要花费四次建立的时间
create index idx_all on s1(email,name,gender,id);  # 最左匹配原则,区分度高的往左放
select count(id) from s1 where name='jason' and gender = 'male' and id > 3 and email = 'xxx';  # 速度变快
View Code

 

Guess you like

Origin www.cnblogs.com/xiongying4/p/11401476.html