Article thoroughly get to know MySQL indexes

Foreword

MySQL MyISAM and InnoDB are two most commonly used storage engine, this will be a detailed description and comparison. For the rest of several MySQL storage engine, the reader is searching on their own learning.

This article will illustrate the difference between the index structure of two kinds of engines, and then explain the principles of the index, understand this article, it is possible to understand the reasons behind the various principles of index optimization.

Due to space limitations, no knowledge Benpian introduced, one by one will be explained in a subsequent blog. For example: MySQL engine locking mechanism, the rules take effect multi-column index, index optimization topics.

The following describes the SQL table structure used when the difference between the structure of the engine in the manual carefully to facilitate a better understanding of the reader.

CREATE TABLE `user` (
  `id` int(11) NOT NULL AUTO_INCREMENT COMMENT '唯一码',
  `age` int(5) NOT NULL COMMENT '年龄',
  `name` varchar(5) NOT NULL COMMENT '名字',
  PRIMARY KEY (`id`),
  KEY `name` (`name`)
) ENGINE=InnoDB AUTO_INCREMENT=92 DEFAULT CHARSET=utf8mb4;
复制代码

B- trees, B trees and B-tree data structure is the same, but after the English translation, some people thought it was misunderstood variety of trees. So a lot of tree data structure to explain blog is completely misleading for beginners. . . Readers carefully distinguish.

MyISAM and index are used InnoDB B + tree data structure, so the next tree to introduce the B and B + trees.

B-tree and B + trees

B-tree

B-tree is a multi-way search tree.

Define any non-leaf node son at most M and M> 2.
The number of root son [2, M].
The number of non-leaf nodes son except root of [M / 2, M].
Each storage node of at least M / 2-1 (taken over the whole), and at most M-1 keyword; (at least two keywords).
The keywords in the non-leaf nodes = the number of pointers pointing son -1.
The non-leaf node key: K [1], K [2], ..., K [M-1], and K [i] <= K [i + 1].
The non-leaf node pointer: P [1], P [2], ..., P [M] (where P [1] key is smaller than point K [. 1] subtree, P [M] key is larger than K points [M-1] subtree, other P [i] point to the keyword belongs (K [i-1], K [i]) of the sub-tree).
All leaf nodes at the same level.

FIG next order M = 4 is a B-tree.

B-tree search, start from the root, keywords (ordered) sequence of nodes within the binary search, if the hit is ended, otherwise go to the son node belongs to the scope of the query keywords; repeated until the corresponding It is a leaf node.

File search process 29:

Disk blocks to find the root file directory pointer to the root node according to 1, wherein the information into memory. (Disk IO operations 1)
At this time in memory there are two data file name 17, 35, and three other storage disk page address. According to the algorithm we found that 17 <29 <35, we find pointer p2.
The pointer p2, we locate the disk block 3, and wherein introduction information memory. (Disk IO operations 2)
At this time in memory there are two data file name 26, 30, and three other storage disk page address. According to the algorithm we found that 26 <29 <30, we find pointer p2.
The pointer p2, we locate the disk block 8, wherein the memory information and import. (Disk IO operations 3 times)
At this point there are two file names in memory 28, 29. According to the algorithm we find 29 file, locate the file and disk memory address.

The following is a fourth-order B-tree animation during insertion.

B-tree features:

Keywords distributed in whole pieces all nodes in the tree.
Any keyword appears once and only one node in.
There are search may end in a non-leaf node.
Its search performance equivalent to do a binary search keyword in the Complete Works.

B + Tree

FIG next order of M = 3 is a B + tree.

Generally used in the database system or file system B + Tree structure is optimized on the basis of B + Tree is a classic, an increase of sequential access pointer.

B + tree is a B-tree tree modification, sum up, the database index difference between the B + and B-tree is that the tree:

Sub-tree leaf node pointer is not the same as the number of keywords.
The non-leaf sub-tree node pointer P [i], the value belonging to point key [K [i], K [i + 1]) of the sub-tree (note that the opening section is closed before and after).
A chain pointer increase for all leaf nodes.
All keywords appear in the leaf node.

B + tree features:

All keywords appear in the list of leaf nodes, and the list of keywords are ordered.
Search only in leaf nodes hit.
Is a non-leaf index leaf nodes corresponding to nodes, leaf node is stored in the data layer key data.

B - / reason to do tree index +

Before explaining this problem, we need to understand some basics.

The principle of locality and disk read-ahead

Due to the nature of the storage medium, the disk itself is much slower than access to the main memory, plus the cost mechanical movement, the disk access speed is often a few percent of the main memory, so in order to improve efficiency, to minimize disk I / O. For this purpose, the disk read demand is often not critical, but each will preread, even if it requires only one byte, the disk will begin from this position, a length of data sequentially read back into the memory. To do so is based on the theory of computer science in the famous locality principle:

When the data is used, the data in its vicinity also often be used immediately - the required data during the run usually more concentrated.

Because of the high efficiency of disk read sequentially (no seek time, with little rotation time), so for having a local program, the read-ahead can improve I / O efficiency.

The read-ahead length is generally an integral multiple of the page. Logical page blocks of computer memory management, hardware and operating systems are often divided main memory and disk storage for the successive blocks of equal size, each memory block called an (in many operating systems, the page size is typically obtained 4k), the primary disk memory and to exchange data in page units. When the program data to be read is not in main memory, a page fault will trigger an exception, then the system will send a signal to the disk to read the disk, the disk will find the start position of continuous data or read a few pages later and loaded into memory, and abnormal returns, the program continues to run.

B - / + tree indexing of reason analysis

Generally, the disk I / O frequency and can be used to evaluate the advantages and disadvantages of the index structure. Find B-Tree, the apparent need to access up to retrieve a node h (exemplified above file search process 29). Designers clever use of the database system disk read-ahead principle, a node is set equal to the size of a page, so that each node requires only one I / O can be fully loaded.

For this purpose, in an actual implementation, B tree technique used is as follows:

Each time a new node to directly apply a page space, thus ensuring on one physical node be stored in a page, the computer memory allocations are combined page-aligned, a node is realized with a single I / O.
A B-tree retrieval times can take up to h-1 I / O (memory resident root). Usually in practice, the degree D (tree branch number) is very large number, typically in excess of 100; h is very small, usually not more than 3.

In summary, as a B-tree index structure is very high efficiency.

Red-black tree or other balanced binary tree structure,

h is significantly deeper and more low efficiency.
Close the logical node (parent-child) may be physically very far, can not use the locality,
The amount of data stored at each node is too small, resulting in a waste of disk space, bringing frequent IO operations.

Other tree structure so the efficiency is significantly much worse than B-tree.

Relative B tree, B + tree index advantages do

B + tree lower the cost of disk reads and writes: internal nodes of the B + tree pointers to specific information is not a keyword, so the internal node B is relatively smaller tree, if all the keywords stored within the same node in the same disk block , then the number of keywords disk blocks can accommodate the more keyword-time into the memory of the more need to find the relative frequency and reduces the IO to read and write.
B + tree query efficiency is more stable: Since all data is stored in the leaf nodes. All the paths of the same length keyword query, the query efficiency of each data fairly.
B-tree does not solve the problem of low efficiency of the elements I traverse at the same time improve the IO performance, precisely in order to solve this problem, B + tree application for us. B + tree leaf nodes just need to traverse the entire tree can be achieved traversal.

I believe that the third reason is the MySQL using B + tree instead of a B-tree index of the main reasons to do, after all, it is a B-tree index MongoDB, so the two data structures and there is no absolute good or bad, depending on actual business needs.

MyISAM

Disk Storage

MyISAM There are three files on disk storage, each file name begins with the table name, extension indicates the file type.

.frm: used to define the storage table.
.MYD: used to store data.
.MYI: Table used to store the index.

index

Primary key index

MyISAM engine using B + tree index as a result, data stored in the leaf node domain address data is recorded.

MyISAM index and data files are separated, only the index file save pointer (physical location) where the page is recorded, the page is read by the address, read row in turn be indexed.

Tree leaves are stored in the physical location of the corresponding row. Storage engine query can be back to the table by the value smoothly, get a complete record of his party. At the same time, each leaf page also save the page points to the next leaf pointer. Thus facilitating the leaf node traversal range.

Secondary indexes

In MyISAM, the primary key index and a secondary index is not any difference in structure, only the primary key index key is the only requirement, the auxiliary key index may be repeated.

Innodb

MySQL5.5 began to support the InnoDB engine, and as its default database engine.

Disk Storage

Innodb There are two ways to store, share table space to store and multi-table storage space.

Innodb table structure only files and data files.

Like MyISAM table structure and file, beginning with the table name, extension is .frm.

Data files and storage related to:

If you use a shared table space, all table data files and index files are stored in a table space, a table space can have multiple files, settings, location and name of the shared table space by innodb_data_file_path and innodb_data_home_dir parameters, generally shared table space name ibdata1-n.
If you use multi-table space, each table has a table space is used to store files for each table and index data, filename begins with the table name to .ibd extension.

index

Primary key index

Innodb primary key index, stored in both the primary key value, and storing the line data.

Secondary indexes

For the secondary index, InnoDB uses a way to save the primary key value in the leaf page, back and forth through the primary key value table (above) to query a complete record, so press the secondary index retrieval actually conducted a second inquiry, efficiency will definitely retrieval key is not in accordance with the primary high.

MyISAM is the difference Innodb

1. Storage structure

MyISAM file storage table is divided into three FRM (table structure), MYD (table data), MYI (table index), while Innodb mentioned above, depending on storage, different storage structures.

2. Transaction Support

MyISAM does not support transactions, while Innodb support services, with the transaction, the transaction is rolled back and secure recovery.

3. The primary keys and foreign keys

MyISAM does not support foreign keys, and Innodb support foreign keys. MyISAM allows no primary key, but must have a primary key Innodb, if not specified the primary key, it will automatically generate a length of 6 bytes in the primary key.

4. Lock

MyISAM supports only table-level locking, and Innodb supports row-level locking, concurrent with a relatively good performance, but row-level locking only in the where clause is the primary key screening to take effect, where the primary key will lock the entire table

5. Index

MyISAM using B + tree as an index structure, leaf node is stored in the address storing data, a unique primary key values of the index key, the secondary key index be repeated, they are identical in structure. Innodb is using B + tree as an index structure, the data table itself is in accordance with b + tree tissue, primary key leaf node key value data recording, data fields for the complete data record, the secondary index data stored in the field is the primary key of the data record.

FAQ

Why MongoDB's B-tree index, and the index is Mysql B + Tree

MongoDB is not a traditional relational database, but as stored nosql to Json format, the aim is high performance, high availability, easy to expand. First, it got rid of the relational model, the range query and traversal query demand is not so strong, and secondly due to the use Mysql B + tree data on a leaf node, each query requires access to the leaf node, and MongoDB to use B- tree, all nodes have the Data field, just find the specified index can be accessed.

Overall, Mysql B + tree selection and MongoDB choice B- tree or to their own needs to choose.

Index-related Glossary

General index

General Construction of the index table column, without any limitation

The only index

The value of the unique index columns must be unique, but allow free value. If it is a combination of the index, the column value must be unique.

Primary key index

The establishment of the primary key index, does not allow duplicate, allow null values;

Full-text index

MyISAM tables can only be used for large data, the full-text index is very time-consuming and space (when generating FULLTEXT index, will generate a list of words to text, according to the list in the index and the word indexed) .

Composite index

Also known as joint index. A plurality of columns built with a combination of indexes, the plurality of values in the column does not allow nulls. You can specify when you create a table, you can also modify table structure.

ALTER TABLE 'table_name' ADD INDEX index_name('col1','col2','col3')；

In order to further improve the efficiency of mysql can create a composite index, follow the "most left-prefix" principle. It should be the most common (frequency) When you create a composite index for the constraint of the column on the far left, in descending order. Combination index corresponds to an example of the establishment of col1, col1col2, col1col2col3 three indexes, col2 and col3 or not to use the index.

Leftmost prefix rule

Joint assumed by a column index (a, b, c) composition, the leftmost bit sequence satisfy prefix rule: a, ab, abc; selece, where, order by, group by matching can leftmost prefix. Other cases do not meet the most left-prefix rule will not use the joint index.

Clustered index

The same sequence of physical row and column values of data (that is typically the primary key column) of the logical order, a table can have only one clustered index: defined.

If the primary key defined, Innodb selects a master key as an aggregate index; if not defined a primary key, Innodb choose a unique index does not contain a NULL value as the aggregate index; if no such unique index column, Innodb choose built 6 bytes long rowID as implied clustered index, RowId here will be with the written record and increment primary keys, but it is not a reference and viewing, using internal database engine.

If we use auto-increment primary keys, so each time you insert a new record in the tail of the original recording of the order, appended to the index of the current node, when a fast-filled, it will open up a new page. Data recording and deposit themselves on the leaf nodes of the primary index, B + tree to tree. This requires that each data record stored primary key order within each leaf node, so whenever there is a new record when inserted, MYSQL Based according to its primary key which is inserted into the appropriate nodes and the position, if the page to achieve load factor (INNODB default 15/16), then open a new page (node)

If non-increment primary key (if the ID number or student number, etc.), because each insertion primary key value is approximated by a random, each time a new record is inserted into an existing index page must have a middle position in which the MySQL had to order a new record into the proper position and movement data, even the target page may have been written back to disk and cleared from the cache, then have to read back from the disk, which adds a lot of overhead, frequent moves, paging operation caused a lot of debris, was not compact index structure, the follow-up had to rebuild the table by oPTIMIZE tABLE and optimize fill the page.

Non-clustered index

Definition: The logical order of the index index and the physical disk storage order different uplink, a table can have multiple non-clustered index.

In addition to the primary key index InnoDB, other indexes in mysql forms are non-clustered index.

Covering index

Refers able to obtain assistance from the index needs to record without the need to find records primary key index. Use of a covering index advantage because auxiliary index does not include an entire row of information recording, so the amount of data less than the aggregate index, can reduce a lot io operation.

Cover the query fails

Select the selected field contained not in the index field, i.e., the index does not cover the entire column.
where conditions can not operate like contained in the index.