mysql learning - B + tree index

We already know that when looking for data in a single data page, if the search criteria is the primary key, you can use the dichotomy positioning groove, then the groove of the order through the data to find specific data. But we do not know how tens of thousands of page locate data in which page, in the absence of an index, based on whether it is to find the value of the primary key column or any other column, since we can not quickly navigate to the page where the record, we can only along from the first page a doubly linked list has been looking down at each page to find specific records based on nagging we just had to find the way.

A simple index Introduction

In order to be able to quickly locate data in which page, the index provides 下一个数据页中用户记录的主键值必须大于上一个页中用户记录的主键值。the following index using a case to see how to improve the efficiency of finding data:

Create a table:

mysql> CREATE TABLE index_demo(
    ->     c1 INT,
    ->     c2 INT,
    ->     c3 CHAR(1),
    ->     PRIMARY KEY(c1)
    -> ) ROW_FORMAT = Compact;
Query OK, 0 rows affected (0.03 sec)

Insert the three data:

mysql> INSERT INTO index_demo VALUES(1, 4, 'u'), (3, 9, 'd'), (5, 3, 'y');
Query OK, 3 rows affected (0.01 sec)
Records: 3  Duplicates: 0  Warnings: 0

Suppose a page which can only store three general user records 3 index_demo table records are inserted into the number of the data page 10, insert a record again:

mysql> INSERT INTO index_demo VALUES(4, 4, 'a');
Query OK, 1 row affected (0.00 sec)

Because up to 10 pages can only put three records, so we had a new redistribution of page:

Page 10 records the user's primary key is the largest 5, pages 28 and a record has a primary key value is 4, since 5> 4, so this is not consistent with the primary key value of the next page of data records the user must be greater than requires a primary key on the page recorded in the user, so the key is inserted into the main record 4 when necessary accompanied by a movement record, the primary key is moved to the page 28 is recorded 5, and then the primary key of the record 4 is inserted into the page 10:

This process shows that in the process of recording page additions and deletions to the operation, we must always ensure to move through a number of operations such as recording the state has been set up:下一个数据页中用户记录的主键值必须大于上一个页中用户记录的主键值。这个过程我们也可以称为页分裂。

After inserting several data, then it becomes:

So if you want so much from the page quickly locate the page where according to some records primary key values, we need to give them to be a directory, a directory entry corresponding to each page, each directory entry includes the following two parts:

Page user records the smallest primary key values, we use the key to represent.
Page number, we use the page_no.

We just need a few directory entries stored on continuous physical memory, for example, put them in an array, you can achieve quick search function based on a record of the primary key value. Let's say we are looking for the primary key record 20, look for the specific two-step process:

Start quickly determine the directory entry for the primary key value recorded in the directory entry 20 3 (since 12 <20 <209), which corresponds to the page is a page 9 dichotomy.
Then go to page 9 to locate specific records based on find a record of the way in the front page says.

The name of the page directory named index.

InnoDB index

The top is called a simple indexing scheme, because we assumed in order to use dichotomy quickly locate specific items in the catalog to find all the primary key value according to catalog items are continuously stored on a physical memory

InnoDB is the basic unit using a page as the management of storage space, that is, up to 16KB ensure continuous storage space, and with the increase in the number of records in a table, you need a very large contiguous storage space to put all catalog items are put down Moreover, we often have records additions and deletions, suppose we have 28 pages of records deleted, page 28 there is no longer necessary, that means the directory entry 2 there would be no longer necessary, which requires the directory entry after the 2 catalog items are about to move forward

At this point innodb need to provide a very flexible way to manage 目录项记录, it is not a directory records can be managed in accordance with the general management of data records. Record_type would only difference is that the directory data records or ordinary data record. After after rethink, records the data into the page can be managed in the manner of page management data:

Directory entry record data record and the general difference:

1. record_type value record directory entry is 1, while the ordinary users record_type value 0 is recorded.
2. The catalog records only two columns of numbers and pages of primary key values, while the average user record columns are user-defined, may contain many columns, hide columns in addition to their own InnoDB added.
3. Only a minimum primary key values recorded in the page directory entry stored in the directory entry recorded min_rec_mask value 1, the value of another recording min_rec_mask other is 0

In addition to these three different, other parts are the same then that is when the store the index pages filled a page can be extended later to continue to store and generate a higher page directory, the last renderings became this:

And slowly abstraction:

Its name is a B + tree.

Whether to store user data page record, or store data page directory entry records, we regard them stored in the B + tree data structure, so we also call these pages of data for the node. It can be seen from the figure, in fact, our actual user records are stored on the bottom of the Node B + tree, also known as leaf nodes or leaf node, the remaining nodes used to store directory entries is called a non-leaf node or within the nodes, where the nodes B + tree is also known as the uppermost root. for ease of discussion, that the provisions of the lowermost layer, that layer is stored as 0 layer of our user records, then turn add up.

A rough estimate the power of the tree if a page can store 100 data, then:

If the B + tree is only one layer, which is only one for storing the node recorded by the user, up to store 100 records
if the B + tree has two layers, up to storing 1000 × 100 = 100000 records
if the B + tree has three layers, storing up to 1000 × 1000 × 100 = 100000000 records.
If the B + tree has four layers, can store up to 1000 × 1000 × 1000 × 100 = 100000000000 records

Generally there will not be 100 billion recorded in the table, so under normal circumstances, we use B + tree no more than four layers by primary key value to find a record only to find most in four pages to do (Find 3 page directory entries and records a user page), and because there are so-called page directory (page directories) within each page, so the page can also be implemented in quickly locate records by dichotomy

Clustered index

The above tree structure has the following two characteristics:

1. The size of the recording using the primary key and sort records page
2.B + tree leaf node is stored in the complete user records

B + having these two characteristics is called a tree 聚簇索引, all the complete user records are stored at the leaf nodes in this cluster index. This clustered index does not need our explicit use INDEX statement to create the MySQL statement, InnoDB storage engine will automatically create a clustered index for us.

As the clustered index is the primary key to check the data, when our query can not contain a primary key, you have to do from start to finish through all the data yet?

Secondary index

Suppose our query is the value of c2 column, then we should create a new post of sorts to the value of c2 column b + trees.

The clustered B + tree index introduced above with several different:

  • Use the size of the record c2 column to sort records and pages, which includes three aspects of meaning:

    • It is recorded in the page order of size c2 columns arranged in a singly linked list.
    • Store the user records each page is arranged in a doubly-linked list in accordance with the order of pages recorded in the column c2.
    • Records storage directory page is divided into different levels and pages at the same level in a doubly linked list is arranged according to the order page directory entries recorded in column c2.
  • It is not a complete user record stored in the leaf nodes of the B + tree, but only columns c2 + primary key values ​​of these two columns.

  • Directory records are no longer the primary key + page numbers match, which turned into a column c2 + page numbers match.

So if we want to find some words recorded by the value of the c2 column you can use our just built this a B + tree. C2 to record lookup column is 4 as an example, look as follows:

  • Determining history page directory entry, according to the root page, i.e. page 44, the page directory can quickly locate records where p is 42 (since 2 <4 <9).

  • Determine where the real user records page by page directory entry records. On page 42 to the page can quickly locate a user records actually stored, but since there is no single column c2 constraint, the recording c2 column value of 4 may be distributed in a plurality of pages of data, and because 2 <4 ≤ 4, on page 34 it is determined the page and a page of real storage 35 records the user.

  • Navigate to specific records stored in the pages of real user record. Page 34 and pages 35 to target specific record.

  • But the record of this B + tree leaf node stores only in the c2 and c1 (that is, the primary key) two columns, so we 必须再根据主键值去聚簇索引中再查找一遍完整的用户记录。这个过程也被称为回表。也就是根据c2列的值查询一条完整的用户记录需要使用到2棵B+树were required operations are back to the table to save space.

Because of this in a non-primary key columns of the B + tree requires a return to the operating table can be positioned before the complete user records, so this is also known as a B + tree index two (English secondary index), or a secondary index. Since we are using is the size of a B + c2 column collation tree, so we call this a B + tree is indexed column c2 established.

Joint index

We can also simultaneously the size of multiple columns as collation, that is, at the same time to index multiple columns, for example, we want the B + tree sorted according to size c2 and c3 column, this contains two meanings:

First page of each record and sorted by column c2.
In the case where the same recording columns c2, c3 sort columns using

Renderings:

  • Each directory entry recorded by c2, c3, the page number of the three parts, each record to be sorted by the value of the column c2, c2 column if the same record, then sorted according to the value of the column c3.

  • B + user leaves from the recording node c2, c3, and the primary key columns c1

  • C2 and c3 in the size column is established as collation called B + tree index joint, is essentially a secondary index. It means the columns c2 and c3 are respectively index expression is different

Notes InnoDB B + tree index

The root page of the page number will not change

When we create a clustered index to a table (if there is not a default will be created), it has been the root page, but this time the root page which no data is stored.

When the addition of the data table, the root page previously stored data, when the data storage is full, it copies the data into the pages a, when adding data, generating page split occurs page B, this time is stored inside the root page the page directory is recorded.

Those who need to use the InnoDB storage engine index, that they will root out the page number from the fixed place, so to access the index.

The uniqueness of the nodes in the page

This is only a problem for a secondary index, when the secondary index sorted by a column, the same situation may arise, this time together with the need to distinguish between the primary key uniqueness. Targeting only add data to the addition page.

At least one page store two records

This is because if you only put a record, the picture is too beautiful not imagine.

MySQL create and delete indexes statement

InnoDB will automatically be declared as primary key or UNIQUE column to automatically create a B + tree index, but if we want to build indexes for other columns we need to explicitly specify.

We can specify a single column needs to be indexed in the table creation or establishment of joint index of multiple columns:

CREATE TALBE 表名 (
    各种列的信息 ··· , 
    [KEY|INDEX] 索引名 (需要被索引的单个列或多个列)
)

Or modifications to the table:

ALTER TABLE 表名 ADD [INDEX|KEY] 索引名 (需要被索引的单个列或多个列);

You can also delete indexes modify table structure when:

ALTER TABLE 表名 DROP [INDEX|KEY] 索引名;

Specific examples:

CREATE TABLE index_demo(
    c1 INT,
    c2 INT,
    c3 CHAR(1),
    PRIMARY KEY(c1),
    INDEX idx_c2_c3 (c2, c3)
);
ALTER TABLE index_demo DROP INDEX idx_c2_c3;

Guess you like

Origin www.cnblogs.com/it-dennis/p/12611407.html