[MySQL]-depth understanding of the clustered index and secondary indexes in InnoDB (B + tree index)

Disclaimer: This article is a blogger original article, shall not be reproduced without the bloggers allowed. https://blog.csdn.net/wrs120/article/details/91126531

1. B + tree index classification

1.1 clustered index

  According to the primary key of each table constituting a B + tree, leaf nodes storing the data rows of the entire table , will be clustered index the leaf node page data to be linked by a bidirectional linked list between each data page. Data page is stored in all the records of each row, the data page is stored in the non-offset and the key to the data pages. Tables can have only one clustered index

  1. Data can be found directly in the leaf nodes
  2. For the primary sort key lookup and scope of the search speed is very fast, because the aggregate index is logically continuous. For example, the query 10 data, due to the B + tree index is doubly linked list, then you can quickly find a data page, and then remove the last of the 10 data

1.2 Secondary indexes

  Also known as non-clustered index, according to the index column of each table is created to create a B + tree, the leaf node does not contain all of the data rows . Leaf nodes contain keys and bookmarks, which can be used to tell the InnoDB storage engine to find rows of data corresponding to the index in which, usually clustered index key Xiangyang rows of data. Each table can have multiple secondary indexes

  If a query is to find the data through secondary index, the lookup process: first traversal of the secondary index and find the leaf nodes to find pointers to obtain the primary key primary key index, then find the corresponding page by primary key index to find a complete line of record. Note: I did not execute a query that once IO, such as tree height of 3 secondary indexes, clustered index tree height of 2, the query data will be carried out through the secondary index 3 + 2 IO end up with a logical data page


2. Use

2.1 joint index

  Joint index is a B + tree, but the number of a plurality of keys, such as joint index (a, B), B is ordered with respect to a can understand how the two-dimensional data, such as A (3, 5), the key is made smaller than a (3,5), is greater than or equal to the right (3,5), a table is created with the following sentence

create tablebuy_log (
	userid int Unsigned not null,
	num int,
	buy_date Date
)
//创建两个索引:userid,userid_2
Alter table buy_log add key(userid);   //userid
Alter table buy_log add (userid,num,buy_date)  //userid_2
  1. select * from buy_log where userid=2;After EXPLAIN analysis will use the userid index because the index of the auxiliary leaf node contains a single key, a page can theoretically store more records
  2. select * from buy_log where userid =1 and num=2 order by buy_date Desc limit 3After EXPLAIN analysis will be used (userid, num, buy_date) joint index userid_2, because of buy_date joint index has sorted, and not to do an extra pair buy_date sorted. If you force index index userid, after analysis, will do a sorting operation Useing filesort, that is sort buy_date, because userid index buy_date are unordered
  3. select * from buy_log where userid =1 order by buy_date Desc limit 3After EXPLAIN analysis will use the userid index, the index will not be used in combination, and a sorting operation, because (userid, buy_date) are unordered

2.2 covering index

  Also known as index coverage that can be obtained from the auxiliary records check of the index, without the need to query records clustered index .
Use a covering index has two major benefits are:

  1. Secondary index does not contain all information on the entire rows, so much smaller than the size of the clustered index, a large amount can be less IO operations
  2. When do statistics, statistics will not come through the clustered index, secondary index can be achieved by statistics, but also reduces the IO

Using the tables and indexes created above, and then by way of illustration cover index:

  1. select count(*) from buy_logAfter EXPLAIN analysis, will use the userid secondary index, then the index is to cover Extra for the Using index
  2. select count (*) from buy_log where buy_date> = '2019-01-01' and buy_date < '2019-06-07', through EXPLAIN analysis, may be used in combination index (userid, num, buy_date), under normal circumstances is not the joint index were carried out, but this is a statistical SQL operation, can use the information covered by the index, you can get the desired results, the optimizer will choose the joint index

2.3 optimizer chooses index NA: full table scan

  That is not to use the index to find data, but by scanning the clustered index, that is, full table scan directly to get the data. This situation occurred in the scope of Find, join links , etc. The following table illustrates the order through it:

select * from orderdetails where orderid > 10000 and orderid<102000;

  I.e. Find Information Order No. 10000 of less than 102,000 is greater than the order of the table has a joint index (OrderID, ProductID), there is a single index (OrderID), the above sentence will be apparent through the index to find data (OrderID) but after EXPLAIN analysis, the optimizer did not choose userid search index data, but the choice of primary clustered index, which is a table scan, why that? ? ? Because the user-selected data is entire row of data, secondary indexes can not cover information that we want to query, because after orderid index query to the specified data, also need to find a bookmark to access the entire row of data. Although the data orderid index is stored in the order, but again bookmark lookup data is disordered, and thus becomes a discrete reads on the disk. If the amount of data accessed rarely (20%), the optimizer will choose a clustered index to find data. Because sequential read much faster than the dispersion
  can not be covered by the index case, the optimizer chooses the case for secondary indexes: Find data through the secondary index is a small amount , but if the disk is a solid state disk random read operations quickly and confirm the secondary index can lead to better performance, use keyword FORCE iNDEX to force the use of an index

select * from orderdetails FORCE INDEX(userid) where orderid > 10000 and orderid<102000;

  When in use before encountered such a problem, use the query time, time field indexed, run-time discovery can sometimes hit the index, sometimes life is not, and now understand the reasons for not hit at this time is the amount of data too, was forced to go is to use the index, the database has 300,000 data before did not use the index query about 3 seconds after the mandatory use of the index only about one second


For more information about mysql indexing problems, see https://blog.csdn.net/wrs120/article/details/80711800

Guess you like

Origin blog.csdn.net/wrs120/article/details/91126531