Behind Tree index B +

Summary

Based MySQL5.7 basis to discuss some of the content associated with the database index. MySQL is a multi-database storage engine supports plug-ins, various engine has plug-ins for access, and support for a variety of storage engines index varies. This article discusses only the B + Tree index of InnoDB, MySQL because it is the most widely quoted index, as the hash indexing and full-text indexing This article will not discuss.

Clustered index and secondary index

Each InnoDBtable has a unique index, a clustered index is called (some translated clustered index) for storing line data, a primary key is generally meant. In order to obtain the best performance from a query, you must understand the use of clustered index to optimize queries:

  • Defines the primary key for each table. If no logic and the only non-empty column or set of columns, add a auto-increment primary key column.

  • If the table does not define a primary key, MySQL will look first of all a unique index key columns are not empty, InnoDBwill be used as the clustered index.

  • If no primary key table or no suitable unique index, then InnoDBgenerates a hidden internal clustered index GEN_CLUST_INDEX, the row index in row_id ( InnoDBtable is not defined if the primary key, the system uses a default incremental row_id). row_id is a 6-octet field, as new rows are inserted monotonically increases. Thus, according to row_id actually sorted by the insertion order.

    It should be noted that the range row_id, is defined as a row_id unsigned long long, but because it is only 6 bytes, is the maximum value of 2 ^ 48, row_id beyond this value is incremented, just when the writing take only 48 low. If for MySQL that lead to long-running row_id repeat of the situation, what will happen? Row_id result in duplicate rows are covered, resulting in data loss. (Special attention is row_id is a library instance variables, all shared tables.)

How clustered index speed up queries

By accessing clustered index row it is very fast, because the clustered index contains the search page directly to all the rows of data. Clustered index architecture often save disk I / O operations.

Relations with the clustered index, secondary index

All except the index aggregation index are called secondary index. In InnoDBeach record index contains two primary key columns of the row and column designated for the secondary index. InnoDBUse this master key in the clustered index in the search line. If the primary key is long, the secondary index using more space, the primary key is short space can be saved. Behind parses the actual relationship between the two samples.

Physical structure

In addition to the spatial index, InnoDBthe index is a B-Tree data structure. Index page (index page) The default size is 16KB. Parameter can innodb_page_sizebe adjusted, such as: 4K, 8K, 32K, 64K and the like. When you insert a new record to InnoDBwhen clustered indexes, InnoDBpreserves 1/16 space of the page, insert and update the index for future records. If it is random insertion, the retention ratio reached a maximum of 1/2. To cope with the index space reserved for future data growth. What page is it?

IBD File structure

MySQL to each table of data stored in the IBD File is a .ibdfile, let's take a look at its structure.

Here Insert Picture Description

It can be seen from the figure, in addition to the first three pages be allocated beginning index page (index page) on page 3 (weight index 0) is allocated in the sequence of the index table definition:

  • The first index root page (usually clustered index).
  • The second root index page (usually the first secondary index), if the table there are other indexes, and so on. Page 5 begins node page.
  • Node page is stored in the leaf node root page and page (leaf page) of all indexes except.
  • Followed by leaf pages, leaves clustered index page is stored in the associated data lines. Because most InnoDBof the records are stored in the system structure table space allocated to each separate page table space will be a type of index, while storing the data table.

INDEX structure

Now that the structure of the index stored in the index page, then we look at the index page.

Here Insert Picture Description

Can be seen from the figure, index page contains FIL Header / Trailer, their structure is as follows:

Here Insert Picture Description

Where the Previous Page and Next Page represent the pointer to the next page and previous page. According to these two pointers, we can easily imagine, page put together is a doubly linked list . Offset represents the unique number of this page.

Back to the index page of the structure, in addition to FIL Header / Trailer addition, there is an important component of that Records the User, the User Records is used to store an entire page of the real part of the record, including a pointer pointing to the next page (non-leaf pages) or rows (the leaf pages) , and Free space is a free space, it is a one-way linked list data structure must maintain good order in a single list, the line is not recorded on the physical storage in the order of their sequence are controlled by the pointer next_record.

A structure index page is like this:

Here Insert Picture Description

leaf page is divided into pages (page Leaf) and non-leaf page (non-leaf page), the leaves of the actual page contains rows of data, the non-leaf page contains only non-leaf page pointers or leaf page pointer. InnoDB Each page is assigned a level level: level 0 is the leaf page, along with a tree branch going up, level continue to increase.

All non-root non-leaf pages and pages, and are referred to as internal page.

B+Tree

Explained with B + Tree is a configuration diagram of what root page, what is the internal page, what is leaf page.

Here Insert Picture Description

From the figure it is clearly indicated in the three positions in the tree. And see the difference ** User Records ** in the leaf page and the non-leaf page:

leaf page is stored in rows (A, and the above figure B), non-leaf page is stored in the pointer points to the next (the above figure 6 and 7).

Simplified block diagram of leaf pages:

Here Insert Picture Description

Simplified block diagram of a non-leaf pages:

Here Insert Picture Description

Whether it is a leaf or non-leaf pages page, which contains the line User Records are pointing to a pointer to the next line (on the same page where the offset).

Here, the main structure of the basic index finished.

Index Search

As already mentioned, the two indexes, clustered index and a secondary index. That is how they found MySQL data it? Now I will look at two examples about the process, the index structure used in all of the simplified structure.

Clustered index

. Eg1 following SQL statement, where ID is the clustered index:

SELECT * FROM T WHERE T.id=1;

It search procedure is shown below:

Here Insert Picture Description

When the optimizer chooses MySQL clustered index query ID, will be shown in FIG sort used as the index tree ID record ID = 1 (here, the search process is not to say), to find the leaf page ID = 1, speaking in front of leaf pages when've said, a leaf page User records storage is the rows, you can return directly to the data.

Secondary index

. Eg2 following an SQL statement, which is a clustered index ID, a is two the index:

SELECT * FROM T WHERE T.a=10;

It search procedure is shown below:

Here Insert Picture Description

When the selected secondary index MySQL a query, will, according to the find first look at the flow shown in FIG secondary index on a tree in a record = 10, the recorded content is stored aggregated value of the index. After get the value of the clustered index, MySQL will be the clustered index tree to find the record ID = 6, just like eg1, this process is called back to the table .

In other words, based on secondary index queries need more scan the index tree. Performance is certainly not as good as the clustered index. Is there any way to avoid back to the table? some. That's coverage index . eg2 we are SELECT *, query all the fields, but if we SELECT id, it only query ID, because the index tree on a stored value ID, so would not return to the table. A covering index is a common SQL optimization methods. Index pushdown can also reduce the number back to the table of contents of the relevant index pushdown, refer to optimize the SQL Select statement .

The most left-prefix

Index mentioned above are cited only a single row, in fact, MySQL index may reference a plurality of columns, such an index called the joint index, also belongs to the secondary index. General (..., coln col1, col2,) is present in the form of tuples, wherein each element are a data table, ordered progressively.

eg. the following statement, create a joint index a_b_index ( a, b)

CREATE TABLE `t` (
  `a` int(11) DEFAULT NULL,
  `b` int(11) DEFAULT NULL,
  KEY `a_b_index` (`a`,`b`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

Joint retrieval index and also as separate secondary index, but the index value stored in the node independent value becomes tuple.

eg.3

SELECT * FROM T WHERE T.a=10 and T.b=10;

As shown below:

Here Insert Picture Description

B + Tree advantage

We should all know InnoDB most commonly used index type is B + Tree, but why use a B + Tree of it? I can not do it with a balanced binary tree? I use the hash no good? I use an array of no good? To solve these problems, we take a look at B + Tree What are the advantages.

Data structures hash index is hash table does not support range queries, this shortcoming will not be able to meet the needs of most people.

Array is also very obvious shortcomings, when I insert and delete data, do high-volume data movement, too costly.

Although the balanced binary tree to meet the two requirements above, but the balanced binary tree is too high it will lead to the hierarchy. In general, the index itself is also great, it can not all be stored in memory, so the index also tend to like the same persistent data to disk. Speaking of IBD File from the top of the structure will be able to prove it. If you use a balanced binary tree, you can imagine, a balanced binary 1 million nodes, tree height 20. A query may need to access 20 index pages. Although the popularity of speed solid-state drive, disk access has improved, but also 20 times too many points. Is there any way that you can reduce the number of disk accesses it? Yes, and that is to reduce the height of the tree, the height of the tree down, and visits naturally reduced. This is why InnoDB select B + Tree reasons.

B + Tree is actually a "N tree", at the same level constraint, N-ary tree significantly more data than receiving a binary tree. InnoDB index to an integer field for example, the N almost 1200. This tree is high time 4, you can save 3 power values ​​1200, which has a 1.7 billion. Since the root page is always permanent memory, so look for a value of up to 3 times to access the disk.

Selective Index

Since the index can speed up queries, is not it better? The answer is no, you know everything has advantages and disadvantages, although the index can speed up queries, but it will consume storage space, and will add, delete, update a burden, so that is not possible, you want to make trade-offs and changes in the query performance. Not recommended to build the index under two conditions:

  • Little data table is not indexed, like the table to do a full table scan may be a better performance. For less than the number of lines and the row lines 10 short-length table, which is common.
  • Less selective index. I.e., the ratio of (distinct key / key) in. In the range of (0, 1], the higher the selectivity index greater value.

Index maintenance

I have already said, the index will add, delete, update a burden, why would burden it? The main index is to maintain orderliness caused. That InnoDB is how to maintain orderliness of the index it? User Records because the content is a single chain structure, increasing the leaf pages, delete, update a single list in full compliance with the rules.

Insert

First, InnoDB will traverse the root page node page and find the right leaf page insert and update them, take a look at the process of insertion:

Here Insert Picture Description

During insertion, we must ensure orderly single list. Due to the size of each page defaults to 16kb, as we continue to insert data, page remaining capacity will gradually decrease, although we mentioned it above InnoDB will reserve space for each page 1/16, to deal with future data exponential growth, but always run out of space, then we need to insert the data on this page how should we do?

This time need to apply for a new page, and then move some data in the past. This process is called page splitting . In this case, the performance of natural affected.

Delete

Insert the understanding of the process, we take a look at Delete process is kind of how:

Here Insert Picture Description

Delete records i = 5, it is necessary in the leaf page User Records find the node k is 5, since the leaf page User Records in a single linked list, it is necessary to traverse the linked list to find a single node k is 5, after disconnecting found connected with it, and marked deletion flag (D: Yes), then the offset update garbage @ 258, Next record pointer is pointing to the node to delete itself, and updates the total size of the waste. Up to this point, InnoDB deleted just finished logic, i = any naturally occurring 5 recording on the physical disk. When will it trigger back physically remove it? Strictly speaking, delete the record mark of writing to the disk will be removed later during the cleanup operation.

Update

Update we take a look at how the process is like:

Here Insert Picture Description

Update i = record 5 is "abcde", need User Records in the leaf page in the find k to node 5, since the leaf page of User Records as a single linked list, it is necessary to traverse the single linked list to find k to node 5, to find modified v = abcde.

to sum up

This article was written a week's time, due to the low level of mysql principle of limited knowledge, some things have not expanded in-depth, also accessible during the relevant information so that they are further awareness of MySQL. This article just chatted InnoDB of B + Tree index, the index for other MySQL does not relate to, such as full-text index, hash indexes, etc., and then will look for opportunities to supplement.

reference

[1] MySQL 5.7 Reference Manual

[2] The basics of InnoDB space file layout

Published 11 original articles · won praise 40 · views 10000 +

Guess you like

Origin blog.csdn.net/qq_36011946/article/details/104995332
Recommended