day44-- index, explain, slow queries, data backup, lock, transaction

day44

Index Introduction

Why have an index?

General applications, read and write in the ratio of about 10: 1, and rarely insert and update operations general performance problems in a production environment, we encounter the most, but also the most likely to go wrong, there are some complex query operations, thus optimizing the query is obviously the most important, talk about speed up queries, we have to mention the index

What is the index?

Also known as index keys in MySQL or key (primary key, unique key), is a data structure for storing engine quickly found record. Index for good performance is critical, especially when the amount of data in the table more and more, more important for the performance index and reduce the number of IO, speed up queries.

In which the primary key and unique key, in addition to the effect of accelerating query, there are constraints effect, primary key is not null and unique, unique key unique, and the effect of index key only speed up queries, no binding effect

Index optimization should be the most effective means of optimizing query performance, the index can easily improve query performance by several orders of magnitude

Dictionary index is equivalent to the sequencer table, if you want to check a word, if you do not use the sequencer table, you will need hundreds of pages from one page to check.

He stressed: Once you create an index for the table, after the query is best to check the index, look for the data based on the results of the index positioning

Myth Index

The index is an important aspect of application design and development, if too many indexes, application performance will be affected, and the index too, will have an impact on query performance. So the number of indexes need to be carefully considered.

The index should be at the beginning of the database is added instead of waiting until there is time to add large amounts of data.

Principle index

Principle index

The purpose of the index is to improve query efficiency, and we used the catalog for books is a reason: to locate the chapter, and then navigate to a section in this chapter, and then find the pages. There are similar examples: dictionary, check train trips, plane flights and other students can not read the contents of the following does not matter, that they understood this truth directory on the line. So you think accounts for the book directory does not account for the number of pages, this page is not have to save to your hard drive inside, but also take up hard disk space. You think, you first build the case in the absence of data or index directory fast, or a lot of data already exist, and then go to build the index, which is fast, certainly faster when there is no data, because if you already have a lot of data, you go to build indexes based on these data, all the data are not trying to traverse again, and then indexed according to the data. You think, then add the index after the establishment of good data fast, fast or add data without an index when the index is used to doing, is used to speed up queries, data that is written on what impact you, certainly slow Some, because whenever you add some new data, index, or need to re-do a storyteller directory, so although the index will speed up queries, but will reduce the efficiency of writing.

The impact of index
  • Subject to the availability of large amounts of data in tables, creating the index can be slow
  • After the index is created, query performance to the table will be greatly improved, but write performance is reduced
Nature

To filter through continuous access to data you want to narrow the scope of the final results you want, while the random sequence of events becomes an event, that is to say, with this indexing mechanism, we can always use the same kind of look way to lock data

IO and disk read-ahead

The average time per visit disk is 9ms

Considering the very high disk IO operation, the computer operating system to do some optimization, when the IO once, not just the current disk address of the data, but also the adjacent data is read into memory buffer , because the local pre-reading principle tells us that when a computer accesses data address when adjacent data will soon be accessed. Every time we read IO data call a (page). How much data with a specific operating system, generally for the 4k or 8k, that is, when we read the data in a fact only occur once IO, the data structure design theory for the index is very helpful.

Index data structure

b + tree

Each time the number of disk IO data search control in a small number of stages, preferably a constant magnitude. Then we wonder whether if a highly controllable multiple search trees to meet demand? In this way, b + tree emerged (B + Tree is a binary search tree through, then the balanced binary tree, B tree evolved

Find the process tree b +

As shown, if you want to find a data item 29, the first disk blocks will be loaded from disk to memory 1, a case occurs IO, determined binary search between 29 17 and 35, with the locking disk blocks in memory P2 pointer, a memory because the time is very short (compared to a disk IO) is negligible, the disk block 3 is loaded into memory from the disk by the disk blocks P2 disk address pointer 1, the second IO occurred, 29 26 between 30 and locking disk block pointer P2 3 through 8 pointer is loaded into memory disk blocks, the occurrence of the third IO, while memory do binary search to find 29, the end of the inquiry, a total of three times IO. The truth is, the layer 3 b + tree can represent millions of data, if millions of data to find only three IO, performance improvement would be great, if there is no index, each data item occurs once every IO then a total of millions of IO, obviously very, very high cost. In addition to the leaf node, the other branches of the roots ah ah index data is saved, they establish a relationship between you and such data exist.

B + tree nature
  • Index field to be small

Through the above analysis, we know that the number of IO depends on the height h b + number or level, this is your height or level every time the number of inquiries IO data, the data is assumed that the current data table is N, each disk data blocks the number of items is m, there ㏒ h = (m + 1) N, N when the data amount constant, the greater the m, the smaller H; m = size of the disk block size / data entries, disk blocks the size is the size of a data page is fixed, if the data items accounted for the smaller space, the more the number of data items, the lower the height of the tree. This is why each data item, or index fields to be as small as possible, such as int occupies 4 bytes, less than half bigint8 bytes. This is why the real requirements b + tree data into a leaf node rather than the inner nodes, once placed in the inner layer node, the data item will be a significant decline in disk blocks, resulting in increased tree. When the data item will be equal to a degenerate linear tables.

For example: in the case you only keep two each leaf node data, if you want to pay more two data, how do you do

So we need to build the tree the better, since the size of each disk block is certain, it means that the size of our single database of individual data inside the bigger the better or the smaller the better, you think, ah, you now the leaf nodes of the disk blocks, two data is covered, your data if the bigger words, you can only put a disk block data a pro, so as to increase the amount of your data, your tree will the higher ah, we should find ways to help lower layers of the tree down, the efficiency was high, ah, so we should let the size of each data as small as possible, it means that you save each disk block of data more , the less your tree hierarchy ah, ah lower tree, right. And the greater the amount of data, you need more disk blocks, the more disk blocks, you need to level the higher the tree, so we should use less disk blocks hold more data items as possible, such height of the tree to come down, how can hold more data items ah, of course, the smaller your data item, the more the amount of data on your disk blocks in full bloom, so if a table has a lot of fields, we should use what fields to index ah, if you have the id field, name field that describes the information field, and so on, you should be indexed ah with which, of course, is the id field, you think right, because id is digital, minimal space ah.

  • Leftmost index matching characteristics

Simply means that your data came after the start of the match from the left block, in matching the right to know what it ~ ~ ~ ~ on the line, we continue to learn the following content. When b + tree data items is a composite data structure, such as (name, age, sex) when, b + number is from left to right in order to establish a search tree, such as (Zhang, 20, F) so when the data retrieval, b + tree name priority comparison determines the next search direction, if the same name and age Sex comparison in turn, finally obtained data retrieved; but (20, F) no such name is data came, b + tree node does not know what the next step to the investigation, because the time to establish the search tree name is the first comparative factor, you must first name according to the search query in order to know where to go next. For example, when (Zhang, F) ​​to retrieve such data, b + tree name can be used to specify the search direction, but the lack of age next field, so only the name is equal to the seating of the data is found, then the matching sex F of the data, this is a very important property, namely the left-most matching characteristics of the index.

Clustered index and secondary indexes

In the database, the height of B + trees are generally in 2 to 4 layers, which means that a maximum of only 2-4 times IO when looking for a key value of a row record, this a good one. Because the current can do at least 100 times per second IO general mechanical hard drive, 2 to 4 times the IO means that the query time requires only 0.02 to 0.04 seconds.

Database B + tree index can be divided into clustered index (clustered index) and secondary indexes (secondary index)

The same aggregation index and the secondary index is: either clustered index or secondary index, which in the form of B + trees are internal, i.e. the height is balanced, with all the leaf nodes storing the data.

  Different clustered index and secondary index is: whether the leaf node is stored in an entire row of information

Clustered index

#InnoDB存储引擎表示索引组织表,即表中数据按照主键顺序存放。而聚集索引(clustered index)就是按照每张表的主键构造一棵B+树,同时叶子结点存放的即为整张表的行记录数据,也将聚集索引的叶子结点称为数据页。聚集索引的这个特性决定了索引组织表中数据也是索引的一部分。同B+树数据结构一样,每个数据页都通过一个双向链表来进行链接。
    
#如果未定义主键,MySQL取第一个唯一索引(unique)而且只含非空列(NOT NULL)作为主键,InnoDB使用它作为聚簇索引。
    
#如果没有这样的列,InnoDB就自己产生一个这样的ID值,它有六个字节,而且是隐藏的,使其作为聚簇索引。

#由于实际的数据页只能按照一棵B+树进行排序,因此每张表只能拥有一个聚集索引。在多少情况下,查询优化器倾向于采用聚集索引。因为聚集索引能够在B+树索引的叶子节点上直接找到数据。此外由于定义了数据的逻辑顺序,聚集索引能够特别快地访问针对范围值得查询。

Clustered index benefits
  • It sort of master key lookup and range search speed is very fast, is a leaf node data users to query. If users need to find a table, query last 10 user information, because the B + tree index is doubly linked list, so users can quickly find the last data page, and remove the 10 records  

  • Range queries (range query), i.e., if the data in the primary key to find a range of, by the upper layer of the intermediate leaf node page range can be obtained, then the page can be read directly

Secondary indexes

That is, when we query the back where the need to write the name of the other fields than the id to query, for example, is where name = xx, can not use the primary key index of efficiency, how to do, we need to add a secondary indexes, Add to name a secondary index.

    In addition the table indexes are clustered index other secondary indexes (Secondary Index, also known as non-clustered index) (unique key ah, index key ah), a clustered index is the difference between: the auxiliary leaf node index records that do not contain all data.

Leaf node is stored piece of data values ​​corresponding to the primary key field, you can find the data you want to go through this primary key clustered index.

We can find assistance through the leaves of the index value of the primary key, and then locate the clustered index by primary key values, after using the clustered index to find data they want, this operation is called back to the operating table.

MySQL Index Management

Features

: Accelerated find the desired data

MySQL commonly used index
普通索引INDEX:加速查找

唯一索引:
    -主键索引PRIMARY KEY:加速查找+约束(不为空、不能重复)
    -唯一索引UNIQUE:加速查找+约束(不能重复)

联合索引:
    -PRIMARY KEY(id,name):联合主键索引
    -UNIQUE(id,name):联合唯一索引
    -INDEX(id,name):联合普通索引
Index operation
添加主键索引:
创建的时候添加:  添加索引的时候要注意,给字段里面数据大小比较小的字段添加,给字段里面的数据区分度高的字段添加.
聚集索引的添加方式
创建的是添加
Create table t1(
Id int primary key,
)
Create table t1(
Id int,
Primary key(id)
)

表创建完了之后添加
Alter table 表名 add primary key(id)
删除主键索引:
Alter table 表名 drop primary key;


唯一索引:
Create table t1(
Id int unique,
)

Create table t1(
Id int,
Unique key uni_name (id)
)

表创建好之后添加唯一索引:
alter table s1 add unique key u_name(id);
删除:
Alter table s1 drop index u_name;

普通索引:
创建:
Create table t1(
Id int,
Index index_name(id)
)
表创建好之后添加普通索引:
Alter table s1 add index index_name(id);
Create index index_name on s1(id);

删除:
Alter table s1 drop index u_name;
DROP INDEX 索引名 ON 表名字;
Indexing applications
举个例子来说,比如你在为某商场做一个会员卡的系统。

这个系统有一个会员表
有下列字段:
会员编号 INT
会员姓名 VARCHAR(10)
会员身份证号码 VARCHAR(18)
会员电话 VARCHAR(10)
会员住址 VARCHAR(50)
会员备注信息 TEXT

那么这个 会员编号,作为主键,使用 PRIMARY
会员姓名 如果要建索引的话,那么就是普通的 INDEX
会员身份证号码 如果要建索引的话,那么可以选择 UNIQUE (唯一的,不允许重复)

#除此之外还有全文索引,即FULLTEXT
会员备注信息 , 如果需要建索引的话,可以选择全文搜索。
用于搜索很长一篇文章的时候,效果最好。
用在比较短的文本,如果就一两行字的,普通的 INDEX 也可以。
但其实对于全文搜索,我们并不会使用MySQL自带的该索引,而是会选择第三方软件如Sphinx,专门来做全文搜索。

#其他的如空间索引SPATIAL,了解即可,几乎不用
Two types of hash and btree index
#我们可以在创建上述索引的时候,为其指定索引类型,分两类
hash类型的索引:查询单条快,范围查询慢
btree类型的索引:b+树,层数越多,数据量指数级增长(我们就用它,因为innodb默认支持它)

#不同的存储引擎支持的索引类型也不一样
InnoDB 支持事务,支持行级别锁定,支持 B-tree、Full-text 等索引,不支持 Hash 索引;
MyISAM 不支持事务,支持表级别锁定,支持 B-tree、Full-text 等索引,不支持 Hash 索引;
Memory 不支持事务,支持表级别锁定,支持 B-tree、Hash 等索引,不支持 Full-text 索引;
NDB 支持事务,支持行级别锁定,支持 Hash 索引,不支持 B-tree、Full-text 等索引;
Archive 不支持事务,支持表级别锁定,不支持 B-tree、Hash、Full-text 等索引;
Create / delete index syntax
#方法一:创建表时
      CREATE TABLE 表名 (
                字段名1  数据类型 [完整性约束条件…],
                字段名2  数据类型 [完整性约束条件…],
                [UNIQUE | FULLTEXT | SPATIAL ]   INDEX | KEY
                [索引名]  (字段名[(长度)]  [ASC |DESC]) 
                );


#方法二:CREATE在已存在的表上创建索引
        CREATE  [UNIQUE | FULLTEXT | SPATIAL ]  INDEX  索引名 
                     ON 表名 (字段名[(长度)]  [ASC |DESC]) ;


#方法三:ALTER TABLE在已存在的表上创建索引
        ALTER TABLE 表名 ADD  [UNIQUE | FULLTEXT | SPATIAL ] INDEX
                             索引名 (字段名[(长度)]  [ASC |DESC]) ;
                             
#删除索引:DROP INDEX 索引名 ON 表名字;

Examples

#方式一
create table t1(
    id int,
    name char,
    age int,
    sex enum('male','female'),
    unique key uni_id(id),
    index ix_name(name) #index没有key
);


#方式二
create index ix_age on t1(age);

#方式三
alter table t1 add index ix_sex(sex);

#查看
mysql> show create table t1;
| t1    | CREATE TABLE `t1` (
  `id` int(11) DEFAULT NULL,
  `name` char(1) DEFAULT NULL,
  `age` int(11) DEFAULT NULL,
  `sex` enum('male','female') DEFAULT NULL,
  UNIQUE KEY `uni_id` (`id`),
  KEY `ix_name` (`name`),
  KEY `ix_age` (`age`),
  KEY `ix_sex` (`sex`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

Testing Index

For more details, see: https://www.cnblogs.com/clschao/articles/10049133.html#top

The proper use of the index

Scope of the problem

Conditions are not clear, these symbols or keywords appear conditions:>,> =, <, <=, =, between ... and ..., like!

The larger the scope of the query speed will become slower

like '% al': it can be slow, like 'al%': will be much faster than the speed of the front

Selection

Select index the best choice for less duplicate data, do not have too many duplicate data, because the way met duplicate data query is a query one by one when found in this field, will seriously affect the query rate.

Indexing, index tree is low discrimination field height will be high

Out of order problem
  • = Can be scrambled and in, for example a = 1 and b = 2 and c = 3 to establish (a, b, c) index can be in any order, mysql query optimizer will help you identify may be optimized to form the index
Calculation
  • Participate in evaluation index column, the column remains "clean", such from_unixtime (create_time) = '2014-05-29' can not be used to index the simple reason that, b + tree are stored in the data field values ​​in the table, but upon retrieval, the need to use all elements function to compare, apparently cost too much. Therefore, the statement should be written create_time = unix_timestamp ( '2014-05-29')

and and or problem
#1、and与or的逻辑
    条件1 and 条件2:所有条件都成立才算成立,但凡要有一个条件不成立则最终结果不成立
    条件1 or 条件2:只要有一个条件成立则最终结果就成立

#2、and的工作原理
    条件:
        a = 10 and b = 'xxx' and c > 3 and d =4
    索引:
        制作联合索引(d,a,b,c)
    工作原理:  #如果是你找的话,你会怎么找,是不是从左到右一个一个的比较啊,首先你不能确定a这个字段是不是有索引,即便是有索引,也不一定能确保命中索引了(所谓命中索引,就是应用上了索引),mysql不会这么笨的,看下面mysql是怎么找的:
        索引的本质原理就是先不断的把查找范围缩小下来,然后再进行处理,对于连续多个and:mysql会按照联合索引,从左到右的顺序找一个区分度高的索引字段(这样便可以快速锁定很小的范围),加速查询,即按照d—>a->b->c的顺序

#3、or的工作原理
    条件:
        a = 10 or b = 'xxx' or c > 3 or d =4
    索引:
        制作联合索引(d,a,b,c)
        
    工作原理:
        只要一个匹配成功就行,所以对于连续多个or:mysql会按照条件的顺序,从左到右依次判断,即a->b->c->d
The scope of the joint index to find the problem

(See Section VIII), a very important principle of the most left-prefix matching principle, for the combination of the index mysql will always matched to the right until it encounters a range queries (>, <, between, like) to stop the match (refer to the wide range of , and the index and slow), such as a = 1 and b = 2 and c> 3 and d = 4 if the establishment (a, b, c, d) of the order index, d is less than the index, if the establishment of (a, b, d, c) the index can be used, the order of a, b, d may be adjusted.

   

other problems
- 使用函数
    select * from tb1 where reverse(email) = 'egon';
            
- 类型不一致
    如果列是字符串类型,传入条件是必须用引号引起来,不然...
    select * from tb1 where email = 999;
    
#排序条件为索引,则select字段必须也是索引字段,否则无法命中
- order by
    select name from s1 order by email desc;
    当根据索引排序时候,select查询的字段如果不是索引,则速度仍然很慢
    select email from s1 order by email desc;
    特别的:如果对主键排序,则还是速度很快:
        select * from tb1 order by nid desc;
 
- 组合索引最左前缀
    如果组合索引为:(name,email)
    name and email       -- 命中索引
    name                 -- 命中索引
    email                -- 未命中索引


- count(1)或count(列)代替count(*)在mysql中没有差别了

- create index xxxx  on tb(title(19)) #text类型,必须制定长度

Other Considerations

  • Avoid using select *
  • count (. 1) or the count (column) instead of count (*)
  • Try to use char instead of varchar when you create a table
  • Order of fields in a table of fixed-length field preference
  • (Often used when a plurality of query conditions) combination index instead of a plurality of single-column index
  • Try to use short index
  • Using the connection (the JOIN) instead of sub-queries (Sub-Queries)
  • Note that even when the table must be the same type of conditions
  • Repeat many are not suitable for indexing, for example: sex is not suitable

Covering the joint index and index

Joint index

For more details, please see: https://www.cnblogs.com/clschao/articles/10049133.html#top

联合索引:
    -PRIMARY KEY(id,name):联合主键索引
    -UNIQUE(id,name):联合唯一索引
    -INDEX(id,name):联合普通索引

Establishment of a joint index of a principle: the index is a principle leftmost match, so the time to build joint index, the most distinguished high on the left, turn down the row, the scope of the query conditions as much as possible to put the side back .

Where noted (personal understanding)

  • If the first joint field is not used, the back of the field can not be indexed lookup
  • If the field or after the second joint field is scope to find the back of the field can not be indexed lookup
Covering index

InnoDB storage engine supports a covering index (covering index, also known as index covering), from secondary index, you can get query records, without the need to query records clustered index.

One advantage of using the cover index: index does not contain all the auxiliary information on the entire rows, so much smaller than the size of the clustered index, it is possible to reduce the number of IO operations

Note: The index covers the technology was first completed and implemented in InnoDB Plugin, which means that for less than InnoDB version 1.0, MySQL database version 5.0 or less, InnoDB storage engine does not support indexing coverage characteristics

For secondary index InnoDB storage engine, since it contains a primary key information, which data is stored in leaf nodes (primary key1, priamey key2, ..., key1, key2, ...). - back to the operating table

An additional benefit is covered by the index for some statistical problems

innodb storage engine does not choose to statistics gathered by querying the index. Since buy_log auxiliary table index, the index is much less than the auxiliary aggregate index, index selection can reduce the secondary IO operations, so as to select the optimizer secondary index key is userid

Specific details, please see: https://www.cnblogs.com/clschao/articles/10049133.html#top

Query Optimization artifact --explain

I believe we explain about the command is no stranger to the specific usage and meaning of the fields can refer to the official website to explain-output, needs to be emphasized here is the core index rows, the vast majority of small rows statement execution must quickly (there are exceptions, will be mentioned below). So basically in a statement optimization optimize rows.

  About explain, if you are interested, you can read this blog, he concluded quite good: http://www.cnblogs.com/yycc/p/7338894.html

执行计划:让mysql预估执行操作(一般正确)
    all < index < range < index_merge < ref_or_null < ref < eq_ref < system/const
    id,email
    
    慢:
        select * from userinfo3 where name='alex'
        
        explain select * from userinfo3 where name='alex'
        type: ALL(全表扫描)
            select * from userinfo3 limit 1;
    快:
        select * from userinfo3 where email='alex'
        type: const(走索引)

The basic steps slow query optimization

0.先运行看看是否真的很慢,注意设置SQL_NO_CACHE
1.where条件单表查,锁定最小返回记录表。这句话的意思是把查询语句的where都应用到表中返回的记录数最小的表开始查起,单表每个字段分别查询,看哪个字段的区分度最高
2.explain查看执行计划,是否与1预期一致(从锁定记录较少的表开始查询)
3.order by limit 形式的sql语句让排序的表优先查
4.了解业务方使用场景
5.加索引时参照建索引的几大原则
6.观察结果,不符合预期继续从0分析

Create a user and authorization

For more details, please see: https://www.cnblogs.com/clschao/articles/10050473.html

data backup

For more details, see: https://www.cnblogs.com/clschao/articles/10263425.html

Backup: mysqldump -uroot -p -B -d database name> path (g: \ av \ av.sql)
reduction: mysql -uroot -p <path (g: \ av \ av.sql)

lock

For more details, see: https://www.cnblogs.com/clschao/articles/10463743.html

innodb存储引擎默认是行级锁
myISAM 表锁
    共享锁

 select * from xx where xx=xx for update; 排它锁

Affairs

For more details, see: https://www.cnblogs.com/clschao/articles/10463743.html

原子性 一致性 隔离性 持久性
    
运行事务:begin; 或者 start transaction;

commit;  提交

rollback; 回滚

Guess you like

Origin www.cnblogs.com/NiceSnake/p/11575091.html