MySQL index classification, structure, usage scenarios

MySQL Index Classification

1, the primary key index: the database is automatically set after the primary key index, InnoDB clustered index

grammar:

Together with the construction of the index table: 
the CREATE TABLE Customer (ID the INT (10) UNSIGNED AUTO_INCREMENT, CUSTOMER_NO VARCHAR (200 is), CUSTOMER_NAME VARCHAR (200 is), a PRIMARY KEY (ID) 
); 
unsigned (unsigned) 
using the column key must AUTO_INCREMENT the index (as long as there is an index on the line). 
The CREATE TABLE Customer2 (ID the INT (10) UNSIGNED, CUSTOMER_NO VARCHAR (200 is), CUSTOMER_NAME VARCHAR (200 is), a PRIMARY KEY (ID) 
); 
 single build the primary key index: 
the ALTER TABLE Customer 
 the Add a PRIMARY KEY Customer (CUSTOMER_NO);   
delete build the primary key index : 
the ALTER TABLE the Customer 
 drop pRIMARY kEY;   
modify built the primary key index: 
you must first delete (drop) the original index, then create (add) index
   
 
   
 
 
 

2, a single value of the index: the index contains only a single column, a table can have multiple separate index

With table built with the index: 
the CREATE TABLE the Customer (the above mentioned id INT (10) UNSIGNED AUTO_INCREMENT, CUSTOMER_NO VARCHAR (200), customer_name VARCHAR (200), 
  PRIMARY KEY (the above mentioned id), KEY (customer_name) 
); 
 with the table set up with index index name the same column name (customer_name) 
alone to build single-valued index: 
the CREATE iNDEX ON idx_customer_name the Customer (customer_name); 
remove the index: 
DROP iNDEX idx_customer_name;
   
 

3, the only index: the index column value must be unique, but allow free value

With table indexed together: 
the CREATE TABLE the Customer (the above mentioned id INT (10) UNSIGNED AUTO_INCREMENT, CUSTOMER_NO VARCHAR (200), customer_name VARCHAR (200), 
  PRIMARY KEY (the above mentioned id), 
  KEY (customer_name), UNIQUE (CUSTOMER_NO) 
); 
the establishment of a unique index when we must ensure that all values are unique (except null), if duplicate data error.  
Alone build a unique index: 
the CREATE UNIQUE INDEX ON idx_customer_no the Customer (CUSTOMER_NO); 
remove the index: 
DROP INDEX ON idx_customer_no the Customer;
  
 
 

4, composite index: the index comprises a plurality of columns

 随表一起建索引:
CREATE TABLE customer (id INT(10) UNSIGNED  AUTO_INCREMENT ,customer_no VARCHAR(200),customer_name VARCHAR(200),
  PRIMARY KEY(id),
  KEY (customer_name),
  UNIQUE (customer_name),
  KEY (customer_no,customer_name)
);
 
单独建索引:
CREATE  INDEX idx_no_name ON customer(customer_no,customer_name); 
 
删除索引:
DROP INDEX idx_no_name  on customer ;

5, basic grammar

create:

ALTER mytable ADD  [UNIQUE ]  INDEX [indexName] ON (columnname(length)) 

delete:

DROP INDEX [indexName] ON mytable; 

View:

SHOW INDEX FROM table_name\G

NON_UNIQUE: whether unique index 1: 0: not 
seq_in_index: the sequence listed in the index. The conformity index (an index corresponding to multiple columns). Sorted in the order they create a composite index for the same composite index 
collation: 
with cardinality: 
sub_part: 
packed The: 
Null: whether to allow null values 
the Comment: 
index_comment:

Use ALTER command:

There are four ways to add data table indexes: 
the ALTER TABLE tbl_name the ADD PRIMARY KEY (column_list): This statement adds a primary key, which means that the index value must be unique, and can not be NULL. ALTER TABLE tbl_name ADD UNIQUE index_name (column_list ): This statement creates an index value must be unique (in addition to NULL, NULL may appear several times). ALTER TABLE tbl_name ADD INDEX index_name (column_list ): add a normal index, the index value can occur multiple times. ALTER TABLE tbl_name ADD FULLTEXT index_name (column_list ): This statement specifies the index is FULLTEXT, for full-text indexing.

mysql index structure

 1, BTree index (Myisam general index)

Schematic:

[Initialization] describes 
a b-tree, which we call a blue block disk block, the block can be seen that each disk contains a few data items and pointers (shown in yellow) (shown in dark blue), 
such as disk block 1 contains the data items 17 and 35, contains pointers Pl, P2, P3, 
Pl represents a disk block is smaller than 17, P2 indicates the disk blocks between 17 and 35, P3 represents the block 35 is larger than the disk. Real data exists in the leaf node that is 3,5,9,10,13,15,28,29,36,60,75,79,90,99.
Non-leaf node does not store the actual data, storing data items only guide the direction of the search , such as 17, 35 does not exist in the real data in the table. 
[Lookup process] 
To find a data item 29, the first disk will block from the disk 1 is loaded into memory, IO occurs a case, in the memory 29 is determined by a binary search between 17 and 35, the locking disk blocks 1 P2 pointer, the memory because it is very short time (as compared to a disk IO) is negligible, the disk block 3 by loading a disk block pointer P2 from disk to disk memory address, the second IO occur, 29 and 30 at 26 between the locking disk block pointer P2 3 through 8 pointer is loaded into memory disk blocks, the occurrence of the third IO, while memory do binary search to find 29, the end of the inquiry, a total of three times IO. 
The truth is, the layer 3 b + tree can represent millions of data, if millions of data to find only three IO, performance improvement would be great, if there is no index, each data item occurs once every IO then a total of millions of IO, obviously very, very high cost.

 
 
 

About time complexity: different algorithms can be used to solve the same problem, but the quality of the pros and cons of an algorithm will affect the efficiency of the algorithm and the program. Analysis algorithm aims to select the appropriate algorithm and the improved algorithm.

1  N  logN 分别表示数据与查询次数之间的关系。
常数  1*c 表示查询最快的方式。查询次数不随数据的增加而增加
变量 N 表示查询次数随数据数量的增加而增加
对数 logN 表示查询次数与数据数量成对数关系。 介于常数与 N 之间。
n*logN 表示使用的复合方法。

2、B+Tree索引(innodb的普通索引)

原理图:

B+TREE 第二级的 数据并不能直接取出来,只作索引使用。在内存有限的情况下,查询效率高于 B-TREE
B-TREE 第二级可以直接取出来,树形结构比较重,在内存无限大的时候有优势。

B树和B+树的区别:

B+Tree与B-Tree 的区别:结论在内存有限的情况下,B+TREE 永远比 B-TREE好。无限内存则后者方便
 
 1)B-树的关键字和记录是放在一起的,叶子节点可以看作外部节点,不包含任何信息;B+树叶子节点中只有关键字和指向下一个节点的索引,记录只放在叶子节点中。(一次查询可能进行两次i/o操作)
 2)在B-树中,越靠近根节点的记录查找时间越快,只要找到关键字即可确定记录的存在;而B+树中每个记录的查找时间基本是一样的,都需要从根节点走到叶子节点,而且在叶子节点中还要再比较关键字。从这个角度看B-树的性能好像要比B+树好,而在实际应用中却是B+树的性能要好些。因为B+树的非叶子节点不存放实际的数据,这样每个节点可容纳的元素个数比B-树多,树高比B-树小,这样带来的好处是减少磁盘访问次数。尽管B+树找到一个记录所需的比较次数要比B-树多,但是一次磁盘访问的时间相当于成百上千次内存比较的时间,因此实际中B+树的性能可能还会好些,而且B+树的叶子节点使用指针连接在一起,方便顺序遍历(例如查看一个目录下的所有文件,一个表中的所有记录等),这也是很多数据库和文件系统使用B+树的缘故。 
 
思考:为什么说B+树比B-树更适合实际应用中操作系统的文件索引和数据库索引? 
1) B+树的磁盘读写代价更低 
  B+树的内部结点并没有指向关键字具体信息的指针。因此其内部结点相对B 树更小。如果把所有同一内部结点的关键字存放在同一盘块中,那么盘块所能容纳的关键字数量也越多。一次性读入内存中的需要查找的关键字也就越多。相对来说IO读写次数也就降低了。 
2) B+树的查询效率更加稳定 
  由于非终结点并不是最终指向文件内容的结点,而只是叶子结点中关键字的索引。所以任何关键字的查找必须走一条从根结点到叶子结点的路。所有关键字查询的路径长度相同,导致每一个数据的查询效率相当。

3、聚簇索引与非聚簇索引

聚簇索引并不是一种单独的索引类型,而是一种数据存储方式。
术语‘聚簇’表示数据行和相邻的键值进错的存储在一起。
如下图,左侧的索引就是聚簇索引,因为数据行在磁盘的排列和索引排序保持一致。

聚簇索引的好处:

  • 按照聚簇索引排列顺序,查询显示一定范围数据的时候,由于数据都是紧密相连,数据库不用从多个数据块中提取数据,所以节省了大量的io操作。

聚簇索引的限制:

  • 对于mysql数据库目前只有innodb数据引擎支持聚簇索引,而Myisam并不支持聚簇索引。
  • 由于数据物理存储排序方式只能有一种,所以每个Mysql的表只能有一个聚簇索引。一般情况下就是该表的主键。
  • 为了充分利用聚簇索引的聚簇的特性,所以innodb表的主键列尽量选用有序的顺序id,而不建议用无序的id,比如uuid这种。(参考聚簇索引的好处。)

这里说明了主键索引为何采用自增的方式:1、业务需求,有序。2、能使用到聚簇索引

4、full-text全文索引

全文索引(也称全文检索)是目前搜索引擎使用的一种关键技术。它能够利用【分词技术】等多种算法智能分析出文本文字中关键词的频率和重要性,然后按照一定的算法规则智能地筛选出我们想要的搜索结果。

CREATE TABLE `article` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `title` varchar(200) DEFAULT NULL,
  `content` text,
  PRIMARY KEY (`id`),
  FULLTEXT KEY `title` (`title`,`content`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

不同于like方式的的查询:
SELECT * FROM article WHERE content LIKE ‘%查询字符串%’;
全文索引用match+against方式查询:
SELECT * FROM article WHERE MATCH(title,content) AGAINST (‘查询字符串’);
明显的提高查询效率。
限制:
mysql5.6.4以前只有Myisam支持,5.6.4版本以后innodb才支持,但是官方版本不支持中文分词,需要第三方分词插件。
5.7以后官方支持中文分词。
随着大数据时代的到来,关系型数据库应对全文索引的需求已力不从心,逐渐被 solr,elasticSearch等专门的搜索引擎所替代。

5、Hash索引

Hash索引只有Memory, NDB两种引擎支持,Memory引擎默认支持Hash索引,如果多个hash值相同,出现哈希碰撞,那么索引以链表方式存储。
NoSql采用此中索引结构。

6、R-Tree索引

R-Tree在mysql很少使用,仅支持geometry数据类型,支持该类型的存储引擎只有myisam、bdb、innodb、ndb、archive几种。
相对于b-tree,r-tree的优势在于范围查找。

索引的使用场景

1、哪些情况需要创建索引

(1)主键自动建立唯一索引

(2)频繁作为查询条件的字段应该创建索引(where 后面的语句)

(3)查询中与其它表关联的字段,外键关系建立索引

A 表关联 B 表:A join B  。  on 后面的连接条件 既 A 表查询 B 表的条件。所以 B 表被关联的字段建立索引能大大提高查询效率
因为在 join 中,join 左边的表会用每一个字段去遍历 B 表的所有的关联数据,相当于一个查询操作

(4)单键/组合索引的选择问题,who?(在高并发下倾向创建组合索引)

(5)查询中排序的字段,排序字段若通过索引去访问将大大提高排序速度

group by 和 order by 后面的字段有索引大大提高效率

(6)查询中统计或者分组字段

2、哪些情况不要创建索引

(1)表记录太少

(2)经常增删改的表

Why:提高了查询速度,同时却会降低更新表的速度,如对表进行INSERT、UPDATE和DELETE。
因为更新表时,MySQL不仅要保存数据,还要保存一下索引文件

(3)Where条件里用不到的字段不创建索引

    索引建多了影响 增删改 的效率

(4)数据重复且分布平均的表字段,因此应该只为最经常查询和最经常排序的数据列建立索引。注意,如果某个数据列包含许多重复的内容,为它建立索引就没有太大的实际效果。

    

 

Guess you like

Origin www.cnblogs.com/116970u/p/10978649.html