mysql index related (InnoDB storage engine)

Common models of indexes 

The appearance of the index is to improve the query efficiency, but there are many ways to implement the index, so the concept of the index model is also introduced here. There are many data structures that can be used to improve the efficiency of reading and writing. Three common and simple data structures are introduced. They are hash table, ordered array, and search tree.

  • Hash table 

       Hash table is a key-value (key-value) storage data structure, we only need to enter the value to be searched, namely key, to find its corresponding value, namely Value. The idea of ​​hashing is very simple. Put the value in the array, use a hash function to convert the key into a certain position, and then put the value in this position of the array. Inevitably, if multiple key values ​​are converted by the hash function, the same value will appear. One way to handle this situation is to pull out a linked list.

Time complexity: The average time complexity of query/insert/modify/delete is O(1).

The structure of the hash table is suitable for scenarios where there are only equivalent queries, but for sorting queries, such as grouping: group by, sorting: order by, comparison <, >, etc., the time complexity will degenerate to O(n). (InnoDB does not support hash indexes, the database itself will create and use adaptive hash indexes).

  • Ordered array

     Sort the ordered array according to the given order. That is, the index is an orderly increasing array.

    Time complexity: query can be quickly queried using dichotomy, the time complexity is: O(log(N))

    If you only look at query efficiency, an ordered array is the best data structure. However, it is troublesome when you need to update the data. If you insert a record in the middle, you must move all subsequent records, which is too costly. Therefore, the ordered array index is only suitable for static storage engines . For example, what you want to save is all the population information of a certain city in 2017. This type of data will not be modified.

  • Search tree

       The simplest is a binary search tree, and then there are balanced binary trees, B-Trees, B+Trees, and even B- Trees. These B-trees, B+ trees, and B- trees are all self-balanced search trees, which are similar to ordinary balanced binary trees. The difference is that the B-tree allows each node to have more child nodes. B-tree is specially designed for external storage, such as disk. It has good performance for reading and writing large blocks of data, so it is generally used in file systems and databases.

  Time complexity: balanced binary search tree, the average time complexity of query/insert/modify/delete is O(lg(n));

 The most commonly used B+ tree for database indexes has the following advantages:

  • It is very suitable for disk storage and can make full use of the principle of locality. The idea of ​​disk pre-reading and data pre-reading is: disk reads and writes are not read on demand, but read page by page, one page of data at a time, each time Load more data to reduce disk IO in the future.
  • Very low tree height, capable of storing large amounts of data;
  • The memory occupied by the index itself is very small;
  • It can well support single-point query, range query, orderly query, and a linked list is added between the leaf nodes of the B+ tree.

InnoDB's index model

InnoDB uses the B+ tree index model, so the data is stored in the B+ tree. Each index corresponds to a B+ tree in InnoDB.

The B+ book search model in the database can be divided into primary key index (also known as clustered index) and non-primary key index (also known as auxiliary index, non-clustered index, secondary index)

The leaf node of the primary key index of the B+ tree species stores the entire row of data, and the content of the leaf node of the non-primary key index is the value of the primary key.

InnoDB's primary key index and row records are stored together, so it is called a clustered index (Clustered Index):

  • There is no separate area to store row records

  • The leaf node of the primary key index, stores the primary key, and the corresponding row record (not a pointer)

Because of this feature, InnoDB tables must have a clustered index:

(1) If the table defines PK, then PK is a clustered index;

(2) If the table does not define PK, the first non-empty unique column is a clustered index;

(3) Otherwise, InnoDB will create a hidden row-id as a clustered index;

There can be only one clustered index, because the data row can only have one clustered storage on the physical disk.

When using a non-clustered index query, the non-clustered index tree will be searched to find the value of the primary key, and then the row data corresponding to the primary key value will be queried through the clustered index. Back to the process of searching the primary key index tree, we call it back to the table . In other words, queries based on non-primary key indexes need to scan one more index tree. Therefore, we should try to use primary key queries in our applications.

Two index properties:

1. The index field should be as small as possible : Through the above analysis, we know that the number of IOs depends on the height of the b+ number h. Assuming that the data in the current data table is N and the number of data items in each disk block is m, then there is h =㏒(m+1)N, when the amount of data N is constant, the larger the m, the smaller the h; and m = the size of the disk block / the size of the data item, the size of the disk block is the size of a data page , Is fixed. If the space occupied by the data item is smaller, the number of data items is larger, and the height of the tree is lower. This is why each data item, that is, the index field, should be as small as possible. For example, int occupies 4 bytes, which is half less than bigint8 bytes. This is why the b+ tree requires that the real data be placed on the leaf nodes instead of the inner nodes. Once placed on the inner nodes, the data items of the disk block will drop significantly, leading to the height of the tree. When the data item is equal to 1, it will degenerate into a linear table.
2. The leftmost matching feature of the index : When the data item of the b+ tree is a composite data structure, such as (name, age, sex), the b+ number is used to build the search tree in the order from left to right, such as when (Zhang San, 20, F) When searching for such data, the b+ tree will compare the name first to determine the next search direction. If the name is the same, then compare age and sex in turn, and finally get the retrieved data; but when ( 20, F) When such data without name comes, the b+ tree does not know which node to check next, because name is the first comparison factor when building the search tree, and you must first search by name to know the next Where to check in one step. For example, when searching data like (Zhang San, F), the b+ tree can use name to specify the search direction, but the next field age is missing, so you can only find all the data with the name equal to Zhang San, and then match the gender. It is the data of F. This is a very important property, that is, the leftmost matching feature of the index

   Joint index

A joint index refers to indexing multiple columns on a table. The method of creating a joint index is the same as that of a single index. The only difference is that there are multiple index columns.

mysql> create table t(
    -> a int,
    -> b int,
    -> primary key(a),
    -> key idx_a_b(a,b)
    -> );
Query OK, 0 rows affected (0.11 sec)

So when do you need to use a joint index? Before discussing this issue, let's take a look at the internal results of the joint index. Essentially, the joint index is a B+ tree, the difference is that the number of key values ​​of the joint index is not 1, but >=2. Next, we will discuss the joint index composed of two integer columns, assuming that the two key values ​​are named a and b, as shown in the figure

It can be seen that this is no different from the single-key B+ tree we saw before. The key values ​​are all sorted, and all data can be read out logically and sequentially through the leaf nodes. In the above example, that is ( 1,1), (1,2), (2,1), (2,4), (3,1), (3,2), the data is stored in the order of (a, b).

Therefore, for the query select * from table where a=xxx and b=xxx, it is obvious that the (a, b) joint index can be used, and for the query select * from table where a=xxx for a single column a, it can also be used ( a, b) of this index.

But for the query of column b select * from table where b=xxx, the (a, b) index cannot be used. In fact, it is not difficult to find the reason. The value of b on the leaf node is 1, 2, 1, 4, 1, 2 Obviously it is not sorted, so the (a, b) index is not used for the query of column b

The second advantage of the joint index is that when the first key is the same, the second key has been sorted.

Covering index

 The InnoDB storage engine supports covering index (covering index, or index coverage), that is, query records can be obtained from the auxiliary index, without the need to query the records in the clustered index. You can avoid the process of returning to the table.

One advantage of using a covering index is that the auxiliary index does not contain all the information recorded in the entire row, so its size is much smaller than the clustered index, so it can reduce a lot of IO operations.

 

Supplement: The following content is taken from https://www.cnblogs.com/Eva-J/articles/10126413.html#_label8

MySQL commonly used indexes

普通索引INDEX:加速查找

唯一索引:
    -主键索引PRIMARY KEY:加速查找+约束(不为空、不能重复)
    -唯一索引UNIQUE:加速查找+约束(不能重复)

联合索引:
    -PRIMARY KEY(id,name):联合主键索引
    -UNIQUE(id,name):联合唯一索引
    -INDEX(id,name):联合普通索引

Application scenarios of each index

举个例子来说,比如你在为某商场做一个会员卡的系统。

这个系统有一个会员表
有下列字段:
会员编号 INT
会员姓名 VARCHAR(10)
会员身份证号码 VARCHAR(18)
会员电话 VARCHAR(10)
会员住址 VARCHAR(50)
会员备注信息 TEXT

那么这个 会员编号,作为主键,使用 PRIMARY
会员姓名 如果要建索引的话,那么就是普通的 INDEX
会员身份证号码 如果要建索引的话,那么可以选择 UNIQUE (唯一的,不允许重复)

#除此之外还有全文索引,即FULLTEXT
会员备注信息 , 如果需要建索引的话,可以选择全文搜索。
用于搜索很长一篇文章的时候,效果最好。
用在比较短的文本,如果就一两行字的,普通的 INDEX 也可以。
但其实对于全文搜索,我们并不会使用MySQL自带的该索引,而是会选择第三方软件如Sphinx,专门来做全文搜索。

#其他的如空间索引SPATIAL,了解即可,几乎不用

Syntax for creating/deleting indexes

#方法一:创建表时
      CREATE TABLE 表名 (
                字段名1  数据类型 [完整性约束条件…],
                字段名2  数据类型 [完整性约束条件…],
                [UNIQUE | FULLTEXT | SPATIAL ]   INDEX | KEY
                [索引名]  (字段名[(长度)]  [ASC |DESC]) 
                );


#方法二:CREATE在已存在的表上创建索引
        CREATE  [UNIQUE | FULLTEXT | SPATIAL ]  INDEX  索引名 
                     ON 表名 (字段名[(长度)]  [ASC |DESC]) ;


#方法三:ALTER TABLE在已存在的表上创建索引
        ALTER TABLE 表名 ADD  [UNIQUE | FULLTEXT | SPATIAL ] INDEX
                             索引名 (字段名[(长度)]  [ASC |DESC]) ;
                             
#删除索引:DROP INDEX 索引名 ON 表名字;


#方式一
create table t1(
    id int,
    name char,
    age int,
    sex enum('male','female'),
    unique key uni_id(id),
    index ix_name(name) #index没有key
);
create table t1(
    id int,
    name char,
    age int,
    sex enum('male','female'),
    unique key uni_id(id),
    index(name) #index没有key
);


#方式二
create index ix_age on t1(age);


#方式三
alter table t1 add index ix_sex(sex);
alter table t1 add index(sex);

#查看
mysql> show create table t1;
| t1    | CREATE TABLE `t1` (
  `id` int(11) DEFAULT NULL,
  `name` char(1) DEFAULT NULL,
  `age` int(11) DEFAULT NULL,
  `sex` enum('male','female') DEFAULT NULL,
  UNIQUE KEY `uni_id` (`id`),
  KEY `ix_name` (`name`),
  KEY `ix_age` (`age`),
  KEY `ix_sex` (`sex`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

 

 

Guess you like

Origin blog.csdn.net/u014608280/article/details/98319722