InnoDB index principle of Java architecture -MySQL Detailed

Under this introduction Mysql InnoDB index of knowledge from a variety of trees to indexing works to store the details.

Mysql InnoDB is the default storage engine (before Mysql5.5.5 is MyISAM, documentation). In the purpose of efficient learning of this part to introduce InnoDB, a small amount relates MyISAM comparison.

This article is my summary in the learning process completed, mainly from books and blog (Ref will be given), the process of adding some of their own understanding, please describe inaccuracies pointed out.

A variety of tree

I did not intend to start from binary search trees, because there are already too many relevant articles online, but given the clear picture of great help to understand the problem, and to ensure the integrity of the article, finally add this part.

Take a look at a few tree structure:

1 Search binary tree: each node has two children, the amount of data increases will inevitably lead to a rapid increase in height, it is clear that the infrastructure is not suitable as a large amount of data storage.

2 B tree: an m-order B-tree is a balanced search tree path m. The most important properties are included in each non-root node j satisfies the number of keywords: ┌m / 2┐ - 1 <= j <= m - 1; the number of children of a node 1 will be more than the number of keywords so that it becomes a key division flag child nodes. In the illustrated generally in the child node to the intermediate Videos keywords, very image, and also easy to distinguish the back of the B + tree. Since the data exist in a non-leaf nodes and the leaf nodes, not simply in order to complete the B-tree traversal key, the method must be used in order to traverse.

3 B + tree: an m-order B-tree is a balanced search tree path m. The most important properties are included in each non-root node j satisfies the number of keywords: ┌m / 2┐ - 1 <= j <= m; maximum number of keywords as many subtree. Non-leaf node is stored in the smallest sub-tree of keywords. While data node exists only in leaf nodes, and leaf nodes between the lateral pointer increases, so that the order of traversing all data very easily.

4 B * tree: an m-order B-tree is a balanced search tree path m. The two most important properties of each non-root node 1 is the number of keywords contained j satisfies: ┌m2 / 3┐ - 1 <= j <= m; non-leaf nodes between two lateral pointer added.

B / B + / B * tree has three kinds of similar operations, such as retrieving / insert / delete nodes. Here only focus on the case of the insertion node, and only analyze their insertion in the current situation is full of nodes, because this action is a little more complicated and can fully reflect the differences of several trees. Contrast is relatively easy to implement retrieval node, the node is deleted as long as the complete opposite to the insertion procedure to (delete operations are not inserted completely inverse in practical applications, often only delete the data retention space for subsequent use).

Look at the B-tree split, the following figure is the red value each time a new node is inserted. Whenever a node is full, it is necessary to split occurs (cleavage is a recursive process, the following reference leads 7 inserted into two divisions), due to the non-leaf nodes of the B-tree key also stored, so that the split node is full value will be distributed in three places: an original node, the parent node of the original node 2, node 3 new sibling node of the original (refer to insertion 5,7). Split may lead to increase in height of the tree (3,7 insertion of reference), may not affect the height of the tree (refer insertion 5,6).

B + tree split: When a full node is assigned a new node, copying 1/2 data of the original to the new node and the node, and finally to increase the pointer to the new node in the parent node; B + split tree affects only the original node and parent node, without affecting sibling, so it does not need a pointer to the sibling node.

B split tree: When a node is full, if it's next sibling node is not full, a portion of the data to the sibling node, and then insert the keywords in the original node, and finally to modify the parent node in the sibling key points (because the keyword range sibling changed). If the brothers is full, the junction between the original and adding new sibling node, and each copy 1/3 of the data to the new node, the last increase pointer to the new node in the parent node. B can see the tree split very clever, because the B- tree node to ensure that even after the split two-thirds full, if B + tree approach simply will full of nodes into two, each will lead node only half full, this B does not meet the requirements of the tree. So B policy is taken after the tree node is full, continue to insert sibling (which is why B trees need to add a chain between the brothers in the non-leaf nodes) until the node is also filled with cronies, and then scrape together one way or another sibling elements, and their own siblings each contribute one third of the founding of new nodes, such a result is three nodes happen to be two-thirds full, to the requirements of B * tree, happy.

B + tree structure as a basis for the database, simply because the computer's memory - two mechanical hard disk storage structure. Memory can be done fast random access (i.e. random access to any given address, returns the required data address storage) but smaller capacity. And a random access to the hard disk through the mechanical action (a head moving disc 2 is rotated), the memory access efficiency orders of magnitude lower than, but larger capacity hard disk. A typical database capacity greatly exceeds the available memory size, which determines the retrieved data in a B + tree is likely to be several times by means of disk IO operations to complete. As shown below: typically reads the next node of action may be a disk IO operation, but the non-leaf nodes are usually loaded into memory in the initial stage to speed up access. At the same time to improve the lateral traversing velocity between the nodes, the database might true blue available CPU / memory to read-optimized binary search tree (page directory in InnoDB mechanism) FIG.

Real B + tree in the database should be very flat, can be inserted in order to authenticate to a table of data sufficient manner in InnoDB how the B + tree in the end flat. We build a simple field test table only through the CREATE statement as shown below, and then keep adding data to populate this table. By statistics below (see Reference 1 source) can analyze several intuitive conclusion, these conclusions show macroscopic scale the B + tree database.

1 each leaf node stores the 468 rows of data, each non-leaf node stores about 1200 value. This is a balanced search tree 1200 road!

Table 2 for a 22.1G capacity, only required height of the B 3 + tree can be stored, the capacity would be able to meet many of the needs of the application. If the height is increased to 4, the storage capacity of the B + tree immediately increased to 25.9T huge!

Table 3 for a capacity of 22.1G, B + tree height is three, if we want all non-leaf node is loaded into memory only takes less than 18.8M memory (how to come to this conclusion? Because the tree to a height of 2 , 1203 leaf node need only 18.8M space, and 22.1G 3, 1204 non-leaf nodes from the height of the good table. we assume that the size of the leaf node is greater than the non-leaf nodes, because the leaf node stores the data row non-leaf nodes only keys and a small amount of data.), only so little memory can be guaranteed only requires one disk IO operations required to retrieve the data, efficiency is very high.

2 Mysql storage engine and index

It can be said database must have an index, then no index retrieval process into a sequential search, O (n) time complexity is almost unbearable. We are very easy to imagine a table consisting of only a single key on how to use B + tree index, as long as the key is stored in the nodes of the tree can be. When the database record contains a plurality of fields, a B + tree stores a master key can only, if the retrieved non-primary key field, is the primary key index out of action, turned into a sequential search. At this time should be based on the second set of indexes to retrieve the second column. This index is organized by independent B + tree. There are two common ways to solve more problems B + tree access the same set of data tables a clustered index (clustered index) is called, called non-clustered indexes (secondary index). Although these two names are called index, but this is not a separate index type, but a data storage. For clustered indexes for storage, and the main line data stored together keys B + tree, B + tree only secondary key secondary storage and primary keys, master key and non-key B + tree is almost two types of trees. For non-clustered index is stored, the primary key in the B + tree leaf nodes store a pointer to the real data lines, instead of the primary key.

InnoDB using a clustered index, the primary keys are organized into a B + tree, the line data is stored in the leaf node, by using "where id = 14" such conditions to find the primary key, i.e. according to the search algorithm B + Tree You can find the corresponding leaf node, after obtaining rows of data. If the Name column of the conditional search, requires two steps: the first step in the B + tree index in the secondary retrieval Name, which reaches leaf nodes corresponding to the acquired primary key. The second step uses the primary key in the primary B + tree index perform another B + tree retrieval operation, to obtain the final leaf node entire row of data.

A non-clustered index MyISM use, non-clustered index of two B + tree looks no different from the structure of the node is exactly the same except the stored content is different, the node primary key index B + tree stored primary key and a secondary key index B + tree storage auxiliary key. Table data is stored in a separate place, these two B + tree leaf nodes are using a real address points table data for the table data, these two keys there is no difference. Because the index is independent of the tree, retrieved without accessing the primary key index tree through the auxiliary key.

To further illustrate the difference between these two image index, a virtual table below we stored the data line 4. Wherein Id as the main index, Name as a secondary index. Illustrated clearly shows the differences and non-clustered index clustered indexes.

We focus on the clustered index, clustered index looks to be significantly lower than the efficiency of the non-clustered indexes, because every time after two B + tree to find a secondary index search, this is not a bother you? Advantages clustered index where?

Since the data line 1 and the leaf nodes are stored together, so the primary key and data lines are loaded with memory, find the leaf node can be immediately returned rows of data, according to the primary key Id to organize the data, get data faster.

2 auxiliary benefits using the primary key index as a "pointer" instead of using the address value as a pointer that reduces maintenance work when there is an auxiliary line or mobile data page split index, primary key values ​​as a pointer to make use of secondary indexes take up more much space in exchange for the benefits of mobile InnoDB in a row without updating the secondary index this "pointer." That is the position of the line (positioned by implementation Page 16K, see below relates) will be changed (in front of the B + tree Page node splitting and division) with the modification of data in databases, using clustered index on it to ensure that no matter how this primary key node of B + tree changes, secondary index trees are not affected.

3 Page structure

If the foregoing is biased in favor of the principle of interpretation that later began to deal with specific implements.

InnoDB achieve understanding of the structure must mention Page, Page is the most basic part of the whole InnoDB storage, InnoDB is the smallest unit of disk management, and all related content are stored in a database structure in this Page. Page classified into several types, the type of the page common data pages (B-tree Node) Undo page (Undo Log Page) page system (the System Page) page data transaction (Transaction System Page) and the like. Page size is the single 16K (compiler macros UNIV_PAGE_SIZE control), using a 32-bit per Page int value to uniquely identify, which correspond exactly to the maximum storage capacity of 64TB InnoDB (16Kib * 2 ^ 32 = 64Tib). Page a basic structure shown below:

Each GM has Page head and tail, but the content of the central Page varies depending on the type. Page's head, there are some data we are concerned, the figure below displays Page header details:

We focus on the fields and data related to the organizational structure: Page header saved two pointers, a Page before and after a point to Page, Page's head as well as the type of information that uniquely identifies the Page number. According to these two indicators we can easily imagine Page link up as a doubly linked list structure.

Look at Page body content, we focus on the row and index data storage, they are in the User Records Page section, User Records Page occupy most of the space, User Records from the Record a one component, each record represents an index a node in the tree (non-leaf nodes and leaf nodes). In a Page internal, single list recording head and tail is represented by two fixed content, beginning "Infimum" representative of the end of the string, "Supremum" representative. The two used to represent the beginning of the end of Record System Records are stored in the segment's this and User Records System Records are two parallel segments. There are 4 different InnoDB Record, which are a primary key index tree leaf node 2 non-primary key index node 3 leaves the secondary key index tree 4 non-leaf nodes leaves the secondary key index node. Record format of the four nodes have some differences, but they are stored with a pointer to the Next Record. We follow will detail the four nodes, now only need to store a Record as a single linked list node also contains data Next to the pointer.

User Record in the form of a single list in the Page, the initial data is arranged in chronological order according to the inserted, but with the insertion of new data and delete old data, the physical order of the data will become confused, but they still maintained logic on the order.

User Record the form of a number of organizations and Page combined saw a little full form.

Now look at how to position a Record:

1 by traversing the root of a B + tree index, through layers of non-leaf node eventually reach a Page, the Page is stored in the leaf nodes.

2 in the Page from "Infimum" node traversing a singly linked list (which tend to be traversed optimization), if the key is to find a successful return. If the record reached "supremum", describes the current Page where no suitable key, this time to help Page of Next Page pointer, Page continued to jump to the next one by one began to look for the "Infimum".

Record detailed look in different types of what data is stored in the end, depending on the B + tree nodes, the User Record can be divided into four formats, the kinds of FIG be distinguished by color.

1 tree main index non-leaf nodes (green)

Child node 1 stores the primary key of the smallest value (Min Cluster Key on Child), which must be a B + tree, the role is a Page in particular to the location of the record.

Page number (Child Page Number) 2 where the minimum value, the role of positioning Record.

2 leaves the main index node (yellow)

1 part of the primary key (Cluster Key Fields), B + tree must and the data line

2 removed all the columns (Non-Key Fields) other than the main key, which is a collection of all the other columns to remove the primary key data row.

1 and 2 where the two parts add up to a complete line of data.

3 secondary index non-leaf nodes of non-tree (blue)

An auxiliary key in the child node stored in the minimum value (Min Secondary-Key on Child), which must be a B + tree, the role is a Page in particular to the location of the record.

2 primary key (Cluster Key Fields), why should non-leaf nodes store the primary key of it? Because the secondary index is not unique, but the value of the key requirements of the B + tree must be unique, so here the value of the auxiliary key and the primary key values ​​as a merged B + tree real key to ensure uniqueness. But it also results in a secondary index B + tree leaf nodes Africa but more than 4 bytes leaf nodes. (I.e., blue at the node of FIG. 4 bytes but more than the red)

Page number (Child Page Number) 3 where the minimum value, the role of positioning Record.

4 secondary index leaves node (red)

A secondary key index (Secondary Key Fields), which is necessary for the B + tree.

2 primary key (Cluster Key Fields), the primary index tree to do a search of a B + tree to find the entire record.

Benpian following are the most important part of the structure and contents of four kinds of Record previously described in conjunction with the B + tree, finally we can draw a panorama. Because the B + tree with a primary key index secondary indexes have similar structures, here only shown a block diagram of the primary key index tree, containing only a "primary key non-leaf nodes" and "primary key leaf node" two types of nodes, i.e. the FIG. part of green and yellow.

This figure is reduced to the more compact tree diagram below, which is part of the B + tree. Note that Page is not between the B + tree node and one to one relationship, Page Record only as a storage container, and it exists facilitate batch management of disk space, the above figure Page number 47 in the tree the structure was split into two separate nodes.

So far Benpian even if the end, Benpian just InnoDB index data structure and implementation carried out some sort summary, did not involve the practical experience of Mysql. This is mainly based on several reasons:

1 is a cornerstone principle that only fully understand how InnoDB indexes, we have the ability to efficient use of it.

2 principle of knowledge is particularly suitable for use icon, I personally like this expression.

3 on InnoDB optimization, in the "high-performance Mysql" There is a more comprehensive introduction to students interested in Mysql optimization can get your own knowledge, I have not yet reached the point where their accumulated share of these.

Reproduced in: https: //juejin.im/post/5cf0c91df265da1bb27715f7

Guess you like

Origin blog.csdn.net/weixin_33973600/article/details/91441851