MySQL indexing and query optimization principle and the principle of index MySQL Query Optimization

MySQL query optimization and indexing works

I. INTRODUCTION

1. What is the index?

General applications, read and write in the ratio of about 10: 1, and rarely insert and update operations general performance problems in a production environment, we encounter the most, is the most problematic, or some complex query operation, thus optimizing the query statement is clearly a top priority. Speaking to speed up queries, we have to mention indexed.

2. Why should index it?

In MySQL indexes also called "key", it is a data structure storage engine used to quickly find the record. Index for good performance
is critical, especially when the amount of data in the table more and more, the index more important effect on performance.
Index optimization should be the most effective means to optimize the performance of queries. Index can easily improve query performance by several orders of magnitude.
Dictionary index is equivalent to the sequencer table, if you want to check a word, if you do not use the sequencer table, you will need hundreds of pages from one page to check.

Second, the principle of indexing

A principle index

The purpose of the index is to improve query efficiency, and we used the catalog for books is a reason: to locate the chapter, and then navigate to a section in this chapter, and then find the pages. There are similar examples: dictionary, check train trips, airplane flights, etc.

Essentially: to filter through continuous narrow range of data you want to get the final results you want, while the random events become the order of events, that is to say, with this indexing mechanism, we can always use Find a way to lock the same data.

Database is the same, but obviously much more complex, because not only faced with the equivalent of a query, and the query range (>, <, between, in), fuzzy queries (like), and set the query (or), and so on. Database should choose how to deal with all kind of ways the problem? We recall the example of the dictionary, can not put the data into segments and sub-queries it? The simplest if 1000 data, the first segment is divided into 1 to 100, 101 to 200 into the second segment, a third segment into 201-300 ...... article such check data 250, as long as the third stage to find , all of a sudden go to 90% of invalid data in addition. But if it is a record 10 million, divided into paragraphs is better? Slightly algorithm based on the students think of the search tree, the average complexity is lgN, has good query performance. But here we have overlooked a critical issue, the complexity of each model is based on the same operating costs to consider. The database implementation is more complicated, on the one hand the data is stored on disks, on the other hand in order to improve performance, but also every part of the data can be read into memory to compute, because we know the cost of access to the disk is about one hundred thousand access memory around times, so simple search tree is difficult to meet the complex application scenarios.

 Two disk IO and pre-reading

Considering the very high disk IO operation, the computer operating system to do some optimization, when the IO once, not just the current disk address of the data, but also the adjacent data is read into memory buffer , because the local pre-reading principle tells us that when a computer accesses data address when adjacent data will soon be accessed. Every time we read IO data call a (page). How much data with a specific operating system, generally for the 4k or 8k, that is, when we read the data in a fact only occur once IO, the data structure design theory for the index is very helpful.

Third, the index data structure

Any data structure is not created out of thin air, there will be its background and context, we now summarize, we need this data structure what can be done, it is actually very simple, that is: every time to find data the number of disk IO control in a small number of stages, preferably a constant magnitude. Then we wonder whether if a highly controllable multiple search trees to meet demand? In this way, b + tree came into being.

As shown above, the tree is b + a, b + tree definition can be found in B + tree , where only some of said key, which we call a blue block disk block, the block can be seen that each disk contains a few data items (shown in dark blue) and a pointer (shown in yellow), a magnetic disk comprising a block of data items 17 and 35 contain pointers P1, P2, P3, P1 represents a disk block is smaller than 17, P2 represents between 17 and 35 disk blocks, P3 that is greater than the disk block 35. Real data exists in the leaf node that is 3,5,9,10,13,15,28,29,36,60,75,79,90,99. Not only non-leaf nodes store the actual data, storing data items only guide the direction of the search, such as 17, 35 does not exist in the real data in the table.

lookup process ### b + tree
shown in FIG, 29, if you want to find the data item, then the first block will disk by the disk 1 is loaded into memory, IO occurs a case, a binary search in memory 17 and 29 is determined by 35, the locking disk block P2 of the pointer 1, since the memory is very short time (as compared to a disk IO) is negligible, the disk block 3 is loaded into memory from the disk by the disk blocks P2 disk address pointer 1, the first occurrence secondary IO, 29 between 26 and 30, locking disk block pointer P2 3 through 8 pointer is loaded into memory disk blocks, the occurrence of the third IO, while memory do binary search to find 29, the end of the inquiry, a total of three times IO . The truth is, the layer 3 b + tree can represent millions of data, if millions of data to find only three IO, performance improvement would be great, if there is no index, each data item occurs once every IO then a total of millions of IO, obviously very, very high cost.

### b + tree nature
1 . index field to be as small as possible : by the above analysis, we know that the number of IO depends on the height h b + number, data of the current data table is assumed to N, the number of data items for each disk block is m, there ㏒ h = (m + 1) N, N when the data amount constant, the greater the m, the smaller H; m = size and the size of disk block size / data entries, disk blocks is also is the size of a data page is fixed, the smaller the space occupied if the data item, the more the number of data items, the lower the height of the tree. This is why each data item, or index fields to be as small as possible, such as int occupies 4 bytes, less than half bigint8 bytes. This is why the real requirements b + tree data into a leaf node rather than the inner nodes, once placed in the inner layer node, the data item will be a significant decline in disk blocks, resulting in increased tree. When the data item will be equal to a degenerate linear tables.
2. leftmost index matching characteristics (i.e., left to right match) : When the data item is a compound b + tree data structure, such as (name, age, sex), when, in accordance with the number b + left to right establishing search tree, such as (Zhang, 20, F) when such data to retrieve, b + tree name priority comparison determines the next search direction, if the same name and age Sex comparison in turn, and finally obtained data retrieved; but (20, F) no such data name to time, b + tree node which does not know the next check, because when building the search tree name is the first comparison factors must be according to the first name to know where to go next to search queries. For example, when (Zhang, F) to retrieve such data, b + tree name can be used to specify the search direction, but the lack of age next field, so only the name is equal to the seating of the data is found, then the matching sex F of the data, this is a very important property, namely the left-most matching characteristics of the index.

Four, Mysql index management

First, the function

# 1 . Indexing function is to accelerate Find 
# 2. MySQL in the primary key, unique, joint are also the only index, these indexes in addition to the accelerated search, as well as the constraints of function

Two, MySQL index classification

Index Category 
1. Ordinary index index: Find the acceleration 
2. index unique 
    primary key index: primary key: Find + acceleration constraint (not null and only) 
    unique index: unique: Find + acceleration constraint (unique) 
3. joint index 
    -primary key (id, name): the primary key index 
    -unique (id, name): United unique index 
    -index (id, name): joint general index 
4. the full-text index fulltext: searching for a very long article, the effect of the most it is good. 
The spatial index spatial: understanding enough, almost no
Take, for example, you are doing such a membership card system for a shopping mall. 

This system has a membership table 
has the following fields: 
membership number INT 
member's name VARCHAR ( 10 ) 
Member ID number VARCHAR ( 18 ) 
Member Tel VARCHAR ( 10 ) 
Member address VARCHAR ( 50 ) 
Member remarks information TEXT 

then the account number as the primary key using PRIMARY 
member names if you want to build the index, then is the ordinary iNDEX 
membership ID number if you want to build the index, then you can choose uNIQUE (unique, not allowed to repeat) 

# in addition to full-text index, namely FULLTEXT 
Remarks membership information, if you need to build an index, you can choose full-text search. 
When searching for a very long article, the best. 
Used in a relatively short text, if it is a line or two of ordinary INDEX can. 
But in fact, for full-text search, and we will not use MySQL comes with the index, but will choose a third-party software such as Sphinx, dedicated to do full-text searches. 

# Others, such as spatial index SPATIAL, you can understand, almost no 

scenarios each index

 

Third, the two types of hash and btree index

 
# When we can create the index, the index for the specified type, two types of 
hash index types: fast single query, the query scope slow 
btree types of indexes: b + tree, the more layers, the amount of data exponentially ( we will use it as the default innodb support it) 

# different storage engines support indexes are not the same type 
InnoDB supports transactions, supports row-level locking, support for B-tree, Full-text indexing and so on, is not supported Hash index; 
MyISAM not support services, to support table-level locking, supports B-tree, Full-text and other indexes do not support the Hash index; 
Memory does not support transactions, to support table-level locking, supports B-tree, Hash and other indexes do not support the Full-text indexing; 
NDB supports transactions, supports row-level locking, support Hash indexes do not support the B-tree, Full-text indexing and so on; 
Archive does not support transactions, to support table-level locking, does not support the B-tree, Hash, Full- text indexing and so on;

 

 

 

Fourth, create / delete index syntax

 

# Method One: When you create a table 
      CREATE TABLE table name ( 
                field name 1 data type [integrity constraints ...], 
                field name 2 Data Type [integrity constraints ...], 
                [UNIQUE | FULLTEXT | the SPATIAL] INDEX | KEY 
                [index name] (field [(length)] [the ASC | DESC]) 
                ); 


# method two: CREATE created on existing table indexes 
        the CREATE [UNIQUE | FULLTEXT | the SPATIAL] the iNDEX index name 
                     oN table name (field name [( length)] [ASC | DESC]); 


# method three: ALTER tABLE create an index on an existing table 
        ALTER tABLE table name the ADD [UNIQUE | FULLTEXT | the SPATIAL] iNDEX 
                             index name (field name [(length)] [ASC | DESC]);
                             
# Delete the index: DROP INDEX index name ON table name; 

create / delete index syntax
Take advantage of the help documentation 
Help the Create 
Help the Create index
 ================== 
1 to create the index.
     - is created (points to note) when you create the table 
    the Create the Table s1 ( 
    the above mentioned id int , which can add # key primary 
    #id int index # can not do indexed, but the index because index, one that is not binding, 
    # not as a primary key, and unique constraints, as in the definition of the field when the indexed 
    name char (20 is ), 
    Age int , 
    in Email VARCHAR ( 30 ) 
    #primary Key (ID) which may be added to the # 
    index (id) # can be added so 
    );
     - in creating a table created after 
    create index name on s1 (name) ; Add a regular index # 
    create unique age on s1 (age) ; add a unique index
    alter table s1 add primary key (id ); # add the building housing index, that is, to the id field to add a primary key constraint 
    create index name on s1 (id, name); # add a normal joint index
 2 Remove the index 
    drop index id on s1 ; 
    drop index name oN s1; # delete the general index 
    drop index age on s1; # delete unique index, and the general index, like, do not be deleted from the index before adding unique, you can directly delete the 
    alter table s1 drop primary key; # delete the primary key (as when it is added to increase according to alter, then we have to delete with alter)

 

Help View

 

 

 

Fifth, the index test

1, ready

 

#1. 准备表
create table s1(
id int,
name varchar(20),
gender char(6),
email varchar(50)
);

#2. 创建存储过程,实现批量插入记录
delimiter $$ #声明存储过程的结束符号为$$
create procedure auto_insert1()
BEGIN
    declare i int default 1;
    while(i<3000000)do
        insert into s1 values(i,concat('egon',i),'male',concat('egon',i,'@oldboy'));
        set i=i+1;
    end while;
END$$ #$$结束
delimiter ; #重新声明分号为结束符号

#3. 查看存储过程
show create procedure auto_insert1\G 

#4. 调用存储过程
call auto_insert1();

2 、在没有索引的前提下测试查询速度

 

#无索引:从头到尾扫描一遍,所以查询速度很慢
mysql> select * from s1 where id=333;
+------+---------+--------+----------------+
| id   | name    | gender | email          |
+------+---------+--------+----------------+
|  333 | egon333 | male   | [email protected] |
|  333 | egon333 | f      | alex333@oldboy |
|  333 | egon333 | f      | alex333@oldboy |
+------+---------+--------+----------------+
rows in set (0.32 sec)

mysql> select * from s1 where email='egon333@oldboy';
....
... rows in set (0.36 sec)

3、 加上索引

#1. 一定是为搜索条件的字段创建索引,比如select * from t1 where age > 5;就需要为age加上索引

#2. 在表中已经有大量数据的情况下,建索引会很慢,且占用硬盘空间,插入删除更新都很慢,只有查询快
比如create index idx on s1(id);会扫描表中所有的数据,然后以id为数据项,创建索引结构,存放于硬盘的表中。
建完以后,再查询就会很快了

#3. 需要注意的是:innodb表的索引会存放于s1.ibd文件中,而myisam表的索引则会有单独的索引文件table1.MYI

 

 

六、正确使用索引

一、覆盖索引

 

#分析
select * from s1 where id=123;
该sql命中了索引,但未覆盖索引。
利用id=123到索引的数据结构中定位到该id在硬盘中的位置,或者说再数据表中的位置。
但是我们select的字段为*,除了id以外还需要其他字段,这就意味着,我们通过索引结构取到id还不够,
还需要利用该id再去找到该id所在行的其他字段值,这是需要时间的,很明显,如果我们只select id,
就减去了这份苦恼,如下
select id from s1 where id=123;
这条就是覆盖索引了,命中索引,且从索引的数据结构直接就取到了id在硬盘的地址,速度很快

二、联合索引

 

三、索引合并

#索引合并:把多个单列索引合并使用

#分析:
组合索引能做到的事情,我们都可以用索引合并去解决,比如
create index ne on s1(name,email);#组合索引
我们完全可以单独为name和email创建索引

组合索引可以命中:
select * from s1 where name='egon' ;
select * from s1 where name='egon' and email='adf';

索引合并可以命中:
select * from s1 where name='egon' ;
select * from s1 where email='adf';
select * from s1 where name='egon' and email='adf';

乍一看好像索引合并更好了:可以命中更多的情况,但其实要分情况去看,如果是name='egon' and email='adf',
那么组合索引的效率要高于索引合并,如果是单条件查,那么还是用索引合并比较合理

 

 

三 若想利用索引达到预想的提高查询速度的效果,我们在添加索引时,必须遵循以下原则

 
 

#1.最左前缀匹配原则,非常重要的原则,
create index ix_name_email on s1(name,email,)
- 最左前缀匹配:必须按照从左到右的顺序匹配
select * from s1 where name='egon'; #可以
select * from s1 where name='egon' and email='asdf'; #可以
select * from s1 where email='[email protected]'; #不可以
mysql会一直向右匹配直到遇到范围查询(>、<、between、like)就停止匹配,
比如a = 1 and b = 2 and c > 3 and d = 4 如果建立(a,b,c,d)顺序的索引,
d是用不到索引的,如果建立(a,b,d,c)的索引则都可以用到,a,b,d的顺序可以任意调整。

#2.=和in可以乱序,比如a = 1 and b = 2 and c = 3 建立(a,b,c)索引可以任意顺序,mysql的查询优化器
会帮你优化成索引可以识别的形式

#3.尽量选择区分度高的列作为索引,区分度的公式是count(distinct col)/count(*),
表示字段不重复的比例,比例越大我们扫描的记录数越少,唯一键的区分度是1,而一些状态、
性别字段可能在大数据面前区分度就是0,那可能有人会问,这个比例有什么经验值吗?使用场景不同,
这个值也很难确定,一般需要join的字段我们都要求是0.1以上,即平均1条扫描10条记录

#4.索引列不能参与计算,保持列“干净”,比如from_unixtime(create_time) = ’2014-05-29’
就不能使用到索引,原因很简单,b+树中存的都是数据表中的字段值,
但进行检索时,需要把所有元素都应用函数才能比较,显然成本太大。
所以语句应该写成create_time = unix_timestamp(’2014-05-29’);

 

 最左前缀示范

 

mysql> select * from s1 where id>3 and name='egon' and email='[email protected]' and gender='male';
Empty set (0.39 sec)

mysql> create index idx on s1(id,name,email,gender); #未遵循最左前缀
Query OK, 0 rows affected (15.27 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> select * from s1 where id>3 and name='egon' and email='[email protected]' and gender='male';
Empty set (0.43 sec)


mysql> drop index idx on s1;
Query OK, 0 rows affected (0.16 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> create index idx on s1(name,email,gender,id); #遵循最左前缀
Query OK, 0 rows affected (15.97 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> select * from s1 where id>3 and name='egon' and email='[email protected]' and gender='male';
Empty set (0.03 sec)
6. 最左前缀匹配
index(id,age,email,name)
#条件中一定要出现id(只要出现id就会提升速度)
id
id age
id email
id name

email #不行  如果单独这个开头就不能提升速度了
mysql> select count(*) from s1 where id=3000;
+----------+
| count(*) |
+----------+
|        1 |
+----------+
1 row in set (0.11 sec)

mysql> create index xxx on s1(id,name,age,email);
Query OK, 0 rows affected (6.44 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql>  select count(*) from s1 where id=3000;
+----------+
| count(*) |
+----------+
|        1 |
+----------+
1 row in set (0.00 sec)

mysql>  select count(*) from s1 where name='egon';
+----------+
| count(*) |
+----------+
|   299999 |
+----------+
1 row in set (0.16 sec)

mysql>  select count(*) from s1 where email='[email protected]';
+----------+
| count(*) |
+----------+
|        1 |
+----------+
1 row in set (0.15 sec)

mysql>  select count(*) from s1 where id=1000 and email='[email protected]';
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.00 sec)

mysql>  select count(*) from s1 where email='[email protected]' and id=3000;
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.00 sec)

建联合索引,最左匹配

索引无法命中的情况需要注意:

 

- like '%xx'
    select * from tb1 where email like '%cn';
    
    
- 使用函数
    select * from tb1 where reverse(email) = 'wupeiqi';
    
    
- or
    select * from tb1 where nid = 1 or name = '[email protected]';
    
    
    特别的:当or条件中有未建立索引的列才失效,以下会走索引
            select * from tb1 where nid = 1 or name = 'seven';
            select * from tb1 where nid = 1 or name = '[email protected]' and email = 'alex'
            
            
- 类型不一致
    如果列是字符串类型,传入条件是必须用引号引起来,不然...
    select * from tb1 where email = 999;
    
普通索引的不等于不会走索引
- !=
    select * from tb1 where email != 'alex'
    
    特别的:如果是主键,则还是会走索引
        select * from tb1 where nid != 123
- >
    select * from tb1 where email > 'alex'
    
    
    特别的:如果是主键或索引是整数类型,则还是会走索引
        select * from tb1 where nid > 123
        select * from tb1 where num > 123
        
        
#排序条件为索引,则select字段必须也是索引字段,否则无法命中
- order by
    select name from s1 order by email desc;
    当根据索引排序时候,select查询的字段如果不是索引,则不走索引
    select email from s1 order by email desc;
    特别的:如果对主键排序,则还是走索引:
        select * from tb1 order by nid desc;
 
- 组合索引最左前缀
    如果组合索引为:(name,email)
    name and email       -- 使用索引
    name                 -- 使用索引
    email                -- 不使用索引


- count(1)或count(列)代替count(*)在mysql中没有差别了

- create index xxxx  on tb(title(19)) #text类型,必须制定长度
- 避免使用select *
- count(1)或count(列) 代替 count(*)
- 创建表时尽量时 char 代替 varchar
- 表的字段顺序固定长度的字段优先
- 组合索引代替多个单列索引(经常使用多个条件查询时)
- 尽量使用短索引
- 使用连接(JOIN)来代替子查询(Sub-Queries)
- 连表时注意条件类型需一致
- 索引散列值(重复少)不适合建索引,例:性别不适合

 

 七、慢查询优化的基本步骤

0.先运行看看是否真的很慢,注意设置SQL_NO_CACHE
1.where条件单表查,锁定最小返回记录表。这句话的意思是把查询语句的where都应用到表中返回的记录数最小的表开始查起,单表每个字段分别查询,看哪个字段的区分度最高
2.explain查看执行计划,是否与1预期一致(从锁定记录较少的表开始查询)
3.order by limit 形式的sql语句让排序的表优先查
4.了解业务方使用场景
5.加索引时参照建索引的几大原则
6.观察结果,不符合预期继续从0分析

 

Guess you like

Origin www.cnblogs.com/111testing/p/11300574.html
Recommended