Mysql fast query secrets--understanding of B+ tree index

1. Search without index

SELECT [列名列表] FROM 表名 WHERE 列名 = xxx;

1. Find within a page

Assuming that there are few records in the table at present, all records can be stored in one page. It can be divided into two cases according to the different search conditions:

  • Search by primary key
  • Search by other columns

2. Find in many pages

In most cases, the records stored in our table are very large, and many data pages are needed to store these records. Finding records across many pages can be divided into two steps:

  1. Navigate to the page where the record is located.
  2. Find the corresponding record from the page where it is located.

In the absence of an index, whether it is based on the value of the primary key column or other columns, since we cannot quickly locate the page where the record is located, we can only go down the doubly linked list from the first page, In each page to find the specified record. If there are a lot of records in a table, this search efficiency is very low.

Second, index

First create a table:

CREATE TABLE index_demo(
  c1 INT,
  c2 INT,
  c3 CHAR(1),
  PRIMARY KEY(c1)
 ) ROW_FORMAT = Compact;

The row format of this table is as follows:
insert image description here
record_type : Indicates the type of record. 0 for normal records, 1 for directory entry records, 2 for minimum records, 3 for maximum records

next_record : Indicates the address offset of the next address relative to this record. (For ease of understanding, the following diagrams will use arrows to indicate who the next record is.)

To put some records on the page is:
insert image description here

1. Simple indexing scheme

Because the records in each page are irregular, we don't know which records our search criteria match, so we have to traverse all the data pages in turn.

If we want to quickly locate the data pages of the records we need to find, we can create another directory for the data pages. The established directory mainly satisfies:

  • The primary key value of the user record in the next data page must be greater than the primary key value of the user record in the previous page (assuming: each of our data pages can store up to 3 records)
INSERT INTO index_demo VALUES(1, 4, 'u'), (3, 9, 'd'), (5, 3, 'y');
Query OK, 3 rows affected (0.01 sec)

insert image description here
Now let's insert another record:

INSERT INTO index_demo VALUES(4, 4, 'a');

Since page 10 can only hold up to 3 records, we have to allocate a new page:
insert image description here

The numbers of newly allocated data pages may not be consecutive , which means that the pages we use may not be next to each other in the storage space. They only establish a linked list relationship by maintaining the number of the previous page and the next page.

The maximum primary key value of the user record in page 10 is 5, and the primary key value of a record in page 28 is 4, because 5>4, so this does not meet the primary key value of the user record in the next data page must be greater than The requirement of the primary key value of the user record in the previous page, so when inserting the record with the primary key value of 4, it needs to be accompanied by a record move , that is, the record with the primary value of 5 is moved to page 28, and then the record is moved. A record with a primary key value of 4 is inserted into page 10.

This process is called page splitting:
insert image description here

  • Create a directory entry for all pages.
    Since the number of data pages may not be continuous, after inserting many records into the index_demo table, the effect may be as follows: insert image description here
    I want to quickly locate a certain page from so many pages according to the primary key value. For the pages where these records are located, we need to make a directory for them, and each page corresponds to a directory entry. insert image description here
    We just need to put a few directory entries inContiguous storage on physical memory, such as putting them in an array, you can quickly find a record based on the primary key value.

A simple directory for the data page is done. This directory has an alias called the index .

2. Index scheme in InnoDB

The reason why the above is called a simple indexing scheme is because we assume that all directory entries can be stored contiguously on physical memory in order to use dichotomy to quickly locate specific directory items when searching based on the primary key value, but doing so has some questions:

  • InnoDB uses pages as the basic unit for managing storage space, that is, it can guarantee up to 16KB of continuous storage space. With the increase in the number of records in the table, a very large continuous storage space is required to put down all directory items. , which is impractical for tables with very large numbers of records
    .
  • We often add or delete records. Suppose we delete all the records in page 28, and page 28 does not need to exist, which means that directory entry 2 does not need to exist, which requires the directory entry. The directory items after 2 are moved forward.

Therefore, we can reuse the data pages that previously stored user records to store directory items. In order to distinguish them from user records, we call these records used to represent directory items asdirectory entry record

How does InnoDB distinguish whether a record is a normal user record or a directory entry record?

  • Through the record_type attribute in the record header information
    0: normal user record 1: directory entry record 2: minimum record 3: maximum record
    insert image description here

The differences between directory entry records and ordinary user records:

  1. The record_type value of directory entry records is 1, and the record_type value of ordinary user records is 0.
  2. The catalog item record has only two columns, the primary key value and the page number, while the columns of the ordinary user record are defined by the user and may contain many columns, as well as the hidden columns added by InnoDB itself.
  3. The record header information has a min_rec_mask attribute. Only the directory entry record with the smallest primary key value in the page storing the directory entry record has the min_rec_mask value of 1, and the min_rec_mask value of other records is 0.

Therefore, the steps of finding records based on a primary key value can be roughly divided into the following two steps:

  1. First go to the page where the directory entry record is stored , that is, page 30, to quickly locate the corresponding directory entry by dichotomy. Because 12 < 20 < 209 , the page where the corresponding record is located is page 9.
  2. Then go to the page 9 where the user record is stored and quickly locate the user record with the primary key value of 20 according to the dichotomy method.

Q: Although it is said that only the primary key value and the corresponding page number are stored in the directory entry record, which is much smaller than the storage space required by the user record, but a page is only 16KB in size, and the directory entry records that can be stored are also limited. There is so much data in it that one data page is not enough to store all the directory entry records. What should I do?
A: Add one more page to store directory entry records

We assume that a page storing directory entry records can only store up to 4 directory entry records, then: insert image description here
now because there are more than one page storing directory entry records, if we want to find a user record based on the primary key value, it takes roughly 3 steps:

  1. Identify the catalog entry record page.
  2. The page where the user record is actually located is determined by the directory entry record page.
  3. Locate a specific record in the page where the actual user record is stored.

Question: In step 1, we need to locate the pages that store catalog item records, but these pages may not be next to each other in the storage space. If there is a lot of data in our table, there will be many pages that store catalog item records. How to quickly locate a page storing directory item records based on the primary key value?
A: A higher-level directory is then generated for these pages that store directory entry records.
insert image description here
We found that this graph is a B+ tree.

Whether it is the data pages that store user records or the data pages that store directory entry records, we store them in the B+ tree data structure, so we also call these data pages nodes .

As can be seen from the figure, ouractual user recordIn fact, they are all stored on the bottom node of the B+ tree. These nodes are also called leaf nodes . The rest of the nodes used to store directory items are called non-leaf nodes. The top node of the B+ tree is also called the root node.

  • InnoDB stipulates that the bottom layer, that is, the layer that stores user records, is the 0th layer, and then it is added in order.
  • Under normal circumstances, the B+ tree we use will not exceed 4 layers.
  • To find a record through the primary key value only needs to do a search within 4 pages at most (find 3 directory item pages and a user record page), and because there is a Page Directory page directory in each page), so in The page can also be quickly positioned and recorded through the dichotomy method.

1. Clustered Index

The B+ tree introduced earlier is itself a directory, or an index itself. It has two features:

  1. Use the size of the record primary key value to sort records and pages, which includes three meanings:
  • The records in the page are arranged in a singly linked list according to the size of the primary key .
  • Each page storing user records is also arranged in a doubly linked list according to the size of the primary key of the user records in the page.
  • Pages storing directory entry records are divided into different levels, and pages in the same level are also arranged in a doubly linked list according to the size of the primary key of the directory entry records in the page.
  1. The leaf nodes of the B+ tree store complete user records.
  • A complete user record means that all column values ​​(including hidden columns) are stored in this record.

We call a B+ tree with these two properties asclustered index, all complete user records are stored at the leaf nodes of this clustered index . This clustered index does not require us to explicitly use the INDEX statement in the MySQL statement to create, the InnoDB storage engine will automatically create the clustered index for us.

In the InnoDB storage engine, the clustered index is the storage method of data (all user records are stored in the leaf nodes), that is, the so-called index is the data, and the data is the index.

2. Secondary index

The clustered index can only work when the search condition is the primary key value , because the data in the B+ tree is
sorted by the primary key. What if we want to use other columns as search criteria?

We can build several more B+ trees, and the data in different B+ trees use different sorting rules. For example, we use the size of the c2 column as the sorting rule of the data page and the records in the page, and then build a B+ tree, as shown in the figure:
insert image description here
This B+ tree is different from the clustered index introduced above:

  • Use the size of the record c2 column to sort records and pages, which includes three meanings:
  1. The records in the page are arranged in a singly linked list according to the size of the c2 column.
  2. Each page storing user records is also arranged in a doubly linked list according to the size of the c2 column recorded in the page.
  3. The pages storing the directory entry records are divided into different levels, and the pages in the same level are also arranged in a doubly linked list according to the size of the c2 column of the directory entry records in the page.
  • The leaf node of the B+ tree does not complete user record , but only the values ​​of the two columns c2 column + primary key.
  • The catalog entry record is no longer the combination of primary key + page number, but the combination of column c2 + page number.

When looking for user records, since the leaf nodes only store c2 and primary key columns, we must search for the complete user records again in the clustered index according to the primary key value. This process is called return table

Q: Why not put the complete user record directly in the leaf node?
A: Although there is no need to return the table to store complete user records in leaf nodes, this is equivalent to copying all user records every time a B+ tree is built, which is a waste of storage space. Therefore, this B+ tree based on non-primary key columns requires a table return operation to locate the complete user record, so this B+ tree is also calledsecondary indexor secondary index.

Since we are using the size of the c2 column as the collation of the B+ tree, we also call this B+ tree an index for the c2 column .

3. Joint index

We can also use the size of multiple columns as the sorting rule at the same time, that is, to build indexes for multiple columns at the same time , for example, we want the B+ tree to be sorted according to the size of the c2 and c3 columns, which contains two meanings:

  • First sort each record and page according to column c2.
  • In the case that the c2 column of the record is the same, the c3 column is used for sorting
    insert image description here
  • Each record is first sorted according to the value of the c2 column, if the c2 column of the record is the same, it is sorted according to the value of the c3 column.
  • The user record at the leaf node of the B+ tree consists of columns c2, c3 and the primary key c1.

The B+ tree built with the size of the c2 and c3 columns as the collation is calledjoint index, which is essentially a secondary index. Its meaning is different from the statement of indexing columns c2 and c3 separately.

  • Building a joint index will only build a B+ tree as shown above.
  • Indexing the columns c2 and c3 respectively will create 2 B+ trees with the size of the columns c2 and c3 as the collation rules, respectively.

3. Matters needing attention

When the B+ tree index was introduced earlier, for the convenience of understanding, the leaf nodes that store user records are drawn first, and then the inner nodes that store directory entry records are drawn, but in fact, the formation process of the B+ tree is as follows:

  1. Whenever a B+ tree index is created for a table (clustered index is not created artificially, it is there by default), a root node page is created for this index . When there is no data in the table at first, there is neither user record nor directory entry record in the root node corresponding to each B+ tree index.
  2. When inserting user records into the table, store the user records in this root node first.
  3. When the free space in the root node runs out, continue to insert records. At this time, all records in the root node will be copied to a newly allocated page, such as page a, and then the page splitting operation will be performed on this new page to obtain another A new page, such as page b. At this time, the newly inserted record will be allocated to page a or page according to the size of the key value (that is, the primary key value in the clustered index, the value of the corresponding index column in the secondary index), and the root node will be upgraded . A page for storing catalog entry records .

The root node of a B+ tree index will not move since its birth . In this way, as long as we create an index on a table, the page number of its root node will be recorded somewhere, and then whenever the InnoDB storage engine needs to use this index, it will take out the root node from that fixed place. the page number to access this index.

The content recorded in the directory entry in the inner node of the B+ tree index is the combination of index column + page number , but this combination is a bit imprecise for secondary indexes. We need to ensure that the directory entry records of nodes in the same level of the B+ tree are unique except for the page number field .

Therefore, the content recorded by the directory entry of the inner node of the secondary index is actually composed of three parts:

  • the value of the index column
  • primary key value
  • page number

In InnoDB, the index is data, that is, the leaf node of the B+ tree of the clustered index already contains all complete user records. Although the MyISAM index scheme also uses a tree structure, it combines the index and data. Store separately.

Fourth, create and delete indexes in MySQL

InnoDB and MyISAM will automatically create a B+ tree index for the primary key or column declared as UNIQUE, but if we want to create an index for other columns, we need to specify it explicitly.

#创建
CREATE TALBE 表名 (
  各种列的信息 ··· ,
  [KEY|INDEX] 索引名 (需要被索引的单个列或多个列)
)

ALTER TABLE 表名 ADD [INDEX|KEY] 索引名 (需要被索引的单个列或多个列);

ALTER TABLE 表名 DROP [INDEX|KEY] 索引名;

#删除
ALTER TABLE 表名 DROP INDEX 索引名;

For example, if we want to add a joint index to the c2 and c3 columns when we create the index_demo table, we can write the table creation statement like this:

CREATE TABLE index_demo(
	c1 INT,
	c2 INT,
	c3 CHAR(1),
	PRIMARY KEY(c1),
	INDEX idx_c2_c3 (c2, c3)
);

Guess you like

Origin blog.csdn.net/myjess/article/details/115550686
Recommended