The underlying implementation principle of MySQL index

Excellent blog post:

  1. The data structure and algorithm principle behind MySQL index
  2. B tree, B-tree, B+ tree, B* tree [turn], mysql index
  3. Those things about MySQL and B-trees

The nature of indexes

MySQL's official definition of an index is: An index is a data structure that helps MySQL efficiently obtain data. Extracting the backbone of the sentence, you can get the essence of the index: the index is a data structure.

We know that database query is one of the most important functions of the database. We all hope that the speed of querying data can be as fast as possible, so the designers of database systems will optimize from the perspective of query algorithms. The most basic query algorithm is of course linear search. This algorithm with a complexity of O(n) is obviously bad when the amount of data is large. Fortunately, the development of computer science has provided many better search algorithms. , such as binary search, binary tree search, etc. If you analyze it a little, you will find that each search algorithm can only be applied to a specific data structure. For example, binary search requires the retrieved data to be ordered, and binary tree search can only be applied to binary search trees, but the data itself The organizational structure cannot fully satisfy various data structures (for example, it is theoretically impossible to organize both columns in order at the same time), so, in addition to the data, the database system also maintains data structures that satisfy specific search algorithms. Structures reference (point to) data in some way so that advanced lookup algorithms can be implemented on those data structures. This data structure is the index.

See an example:
image.png-32.8kB

The figure above shows one possible way of indexing. On the left is the data table, with a total of two columns and seven records. The leftmost one is the physical address of the data record (note that logically adjacent records are not necessarily physically adjacent on the disk). In order to speed up the search of Col2, a binary search tree shown on the right can be maintained, each node contains an index key value and a pointer to the physical address of the corresponding data record, so that binary search can be used in \(O(log_2 The corresponding data is obtained within the complexity of ^n)\) .

While this is a bona fide index, practical database systems are rarely implemented using binary search trees or their evolutionary variant, the red-black tree, for reasons explained below.

binary sorted tree

Before introducing the B tree, let's look at another magical tree, the Binary Sort Tree. First of all, it is a tree. The description of "binary" is already obvious. It is a tree on the tree. The root branch has two forks, so it is a binary tree recursively (as shown in the figure below), and the nodes on this tree are already sorted. The specific sorting rules are as follows:

  • If the left subtree is not empty, the value of all nodes in the left subtree is less than the value of its root node
  • If the right subtree is not empty, the value of all nodes on the right word is greater than the value of its root node
  • Its left and right subtrees are also binary sorted numbers (recursive definition)

As can be seen from the figure, when the binary sorting tree organizes data, it is more convenient to use for search, because each time a node is passed through, the possibility can be reduced by up to half, but in extreme cases, all nodes are located on the same side, Intuitively, it is a straight line, so the efficiency of this kind of query is relatively low, so it is necessary to balance the height of the left and right subtrees of the binary tree, so there is a balanced binary tree (Balanced Binary Tree).

The so-called "balance" means that the height of each branch of this tree is uniform, and the absolute value of the difference between the heights of its left subtree and right subtree is less than 1, so that there will be no particularly long branch. . Therefore, when searching in such a balanced tree, the total number of node comparisons does not exceed the height of the tree, which ensures the efficiency of the query (time complexity is O(logn))

B-tree

It is clearer to look directly at the picture. As shown in the picture, the B-tree is actually a balanced multi-fork search tree, which means that at most m forks (m>=2) can be opened, which we call the m-order b-tree , in order to reflect the conscience of this blog, unlike other places where you can see a 2-order B-tree, a 5-order B-tree is specially drawn here.

In general, the m-order B-tree satisfies the following conditions:

  • Each node can have at most m subtrees.
  • The root node has at least 2 nodes (or in extreme cases, a tree has only one root node, and a single-celled organism is a root, a leaf, and a tree).
  • The non-root and non-leaf nodes have at least Ceil(m/2) subtrees (Ceil means rounding up, and the 5th-order B-tree in the figure, each node has at least 3 subtrees, that is, at least 3 forks).
  • The information in the non-leaf node includes [n,A0,K1,A1,K2,A2,…,Kn,An], where n represents the number of keywords stored in the node, K is the keyword and Ki<Ki+ 1, A is a pointer to the root node of the subtree.
  • Each path from the root to the leaf has the same length, that is, the leaf nodes are in the same layer, and these nodes have no information, in fact, these nodes indicate that the specified value cannot be found, that is, point to these nodes pointer is null.

The query process of B-tree is similar to that of binary sorting tree. Each node is compared in turn from the root node, because the keywords in each node and the left and right subtrees are ordered , so as long as the keywords in the nodes are compared, Or you can quickly find the specified keyword along the pointer. If the search fails, it will return the leaf node, that is, a null pointer.

For example, query the K in the alphabet in the graph:

  1. Starting from the root node P, the position of K is before P, and enters the left pointer.
  2. In the left subtree, compare C, F, J, M in turn, and find that K is between J and M.
  3. Along the pointer between J and M, continue to visit the subtree, and compare in turn, and find that the first keyword K is the value of the specified search.

A simple pseudo-algorithm for B-tree search is as follows:

BTree_Search(node, key) {
    if(node == null) return null;
    foreach(node.key)
    {
        if(node.key[i] == key) return node.data[i];
            if(node.key[i] > key) return BTree_Search(point[i]->node);
    }
    return BTree_Search(point[i+1]->node);
}

data = BTree_Search(root, my_key);

The characteristics of B-trees can be summarized as follows:

  1. The set of keywords is distributed throughout the tree.
  2. Any one keyword appears and only appears in one node.
  3. It is possible for the search to end at a non-leaf node.
  4. Its search performance is equivalent to doing a binary search in the keyword set.
  5. Inserting and deleting new data records in B-tree will destroy the properties of B-Tree, because when inserting and deleting, it is necessary to perform a split, merge, transfer and other operations on the tree to maintain the properties of B-Tree.

Plus version — B+ tree

As an enhanced version of B-tree, the difference between B+ tree and B-tree is that

  • A node with n subtrees contains n keywords (also considered n-1 keywords).
  • All keywords are stored on the leaf nodes, and the leaf nodes themselves are connected in ascending order according to the keywords.
  • The non-leaf node can be regarded as the index part, and the node only contains the largest (or smallest) key in its subtree (root node).

The search process of the B+ tree is similar to that of the B tree, except that when searching, if the keyword on the non-leaf node is equal to the given value, it does not terminate, but continues to follow the pointer until the position of the leaf node. Therefore, in the B+ tree, no matter whether the search is successful or not, each search is a path from the root to the leaf node.

The characteristics of B+ tree are as follows:

  • All keys are stored on leaf nodes, and the keys in the linked list happen to be ordered.
  • It is impossible to return a non-leaf node hit.
  • Non-leaf nodes are equivalent to the index of leaf nodes, and leaf nodes are equivalent to the data layer that stores (keyword) data.
  • More suitable for file indexing system.

B+Tree with sequential access pointers

The B+Tree structures generally used in database systems or file systems are optimized on the basis of classic B+Trees, adding sequential access pointers.

As shown in the figure above, adding a pointer to an adjacent leaf node to each leaf node of a B+Tree forms a B+Tree with sequential access pointers. The purpose of this optimization is to improve the performance of interval access. For example, in Figure 4, if you want to query all data records with keys from 18 to 49, when 18 is found, you only need to traverse the nodes and pointers in order for one-time access. To all data nodes, the efficiency of interval query is greatly mentioned.

Why does MySQL use B-tree (B+ tree)

Data structures such as red-black trees can also be used to implement indexes, but file systems and database systems generally use B-trees or B+ trees. This section will discuss B-/+Trees as the theoretical basis for indexes in combination with the knowledge of computer composition principles.

In general, the index itself is also very large, and it is impossible to store it all in memory, so the index is often stored on disk in the form of index files. In this case, disk I/O consumption will be generated during the index search process. Compared with memory access, the consumption of I/O access is several orders of magnitude higher. Therefore, the most important indicator to evaluate the quality of a data structure as an index is The asymptotic complexity of the number of disk I/O operations during a seek. In other words, the structure of the index should minimize the number of disk I/O accesses during the lookup process. The following first introduces the principles of memory and disk access, and then combines these principles to analyze the efficiency of B-/+Tree as an index.

Main memory access principle

At present, the main memory used by computers is basically random access memory (RAM). The structure and access principle of modern RAM are relatively complex. Here, this article abandons the specific differences and abstracts a very simple access model to illustrate the working principle of RAM.

From an abstract point of view, main memory is a matrix of a series of memory cells, each of which stores a fixed size of data. Each storage unit has a unique address. The addressing rules of modern main memory are more complicated. Here, it is simplified to a two-dimensional address: a row address and a column address can uniquely locate a storage unit. The image above shows a 4 x 4 main memory model.

The main memory access process is as follows:

When the system needs to read the main memory, it puts the address signal on the address bus and uploads it to the main memory. After the main memory reads the address signal, it parses the signal and locates the specified storage unit, and then puts the data of this storage unit on the data bus. , for other components to read.

The process of writing to the main memory is similar. The system places the unit address and data to be written on the address bus and data bus respectively, and the main memory reads the contents of the two buses and performs the corresponding write operation.

It can be seen here that the main memory access time is only linearly related to the number of accesses. Because there is no mechanical operation, the "distance" of the data accessed twice will not have any effect on the time. For example, take A0 first and then take A1 takes the same time as taking A0 and then taking D3.

Disk access principle

As mentioned above, indexes are generally stored on disk in the form of files, and index retrieval requires disk I/O operations. Unlike main memory, disk I/O has mechanical movement costs, so the time consumption of disk I/O is huge.

The following figure is a schematic diagram of the overall structure of the disk:

A disk consists of circular platters of the same size that are coaxial, and the disks can rotate (the disks must rotate in sync). There is a magnetic head bracket on one side of the magnetic disk, and the magnetic head bracket fixes a group of magnetic heads, and each magnetic head is responsible for accessing the contents of a magnetic disk. The magnetic head cannot be rotated, but it can move in the radial direction of the disk (actually it is oblique tangential movement). At present, there is a multi-head independent technology, which is not limited).

The following figure is a schematic diagram of the disk structure:

The platter is divided into a series of concentric rings, the center of which is the center of the platter, each concentric ring is called a track, and all tracks with the same radius form a cylinder. The track is divided into small segments along the radius line, each segment is called a sector, and each sector is the smallest storage unit of the disk. For simplicity, we assume below that the disk has only one platter and one head.

When data needs to be read from the disk, the system will transmit the logical address of the data to the disk, and the control circuit of the disk will translate the logical address into a physical address according to the addressing logic, that is, determine which track and sector the data to be read is in. In order to read the data in this sector, the head needs to be placed above the sector. In order to achieve this, the head needs to move to align with the corresponding track. This process is called seek, and the time it takes is called seek time. Then the disk rotates to The target sector rotates under the head, and the time spent in this process is called rotation time.

The principle of locality and disk read-ahead

Due to the characteristics of the storage medium, the access of the disk itself is much slower than that of the main memory. In addition to the cost of mechanical movement, the access speed of the disk is often one-hundredth of the main memory. Therefore, in order to improve efficiency, it is necessary to reduce the number of disks as much as possible. I/O. In order to achieve this purpose, the disk is often not read strictly on demand, but will read ahead every time. Even if only one byte is required, the disk will start from this position and sequentially read data of a certain length backward into memory. The rationale for this is the well-known locality principle in computer science:

When a piece of data is used, its nearby data is usually used immediately.

Therefore, the data required during program operation should usually be concentrated.

Since disk sequential reads are very efficient (no seek time, only very little spin time), read-ahead can improve I/O efficiency for programs with locality.

The read-ahead length is generally an integer multiple of the page. A page is a logical block of computer management memory. Hardware and operating systems often divide main memory and disk storage into consecutive blocks of equal size. Each block of storage is called a page (in many operating systems, the size of a page is usually 4k), main memory and disk exchange data in units of pages. When the data to be read by the program is not in the main memory, a page fault exception will be triggered. At this time, the system will send a disk read signal to the disk, and the disk will find the starting position of the data and read one or several pages continuously. Load into memory, then return abnormally, and the program continues to run.

Performance Analysis of B-/+Tree Index

At this point, we can finally analyze the performance of the B-/+Tree index.

As mentioned above, the number of disk I/Os is generally used to evaluate the pros and cons of the index structure. First, from the B-Tree analysis, according to the definition of B-Tree, it can be known that a retrieval needs to visit h nodes at most. The designers of the database system cleverly used the principle of disk read-ahead to set the size of a node equal to a page, so that each node can be fully loaded with only one I/O. In order to achieve this goal, the following techniques need to be used in the actual implementation of B-Tree:

Each time a new node is created, it directly applies for a page of space, which ensures that a node is also physically stored in a page. In addition, the computer storage allocation is page-aligned, so that only one I/O is required for a node.

A retrieval in B-Tree requires at most h-1 I/O (root node resident memory), and the asymptotic complexity is \(O(h)=O(log_dN)\) . In general practical applications, the out-degree d is a very large number, usually more than 100, so h is very small (usually no more than 3). (h represents the height of the tree & out-degree d represents the degree of the tree, that is, the maximum degree of each node in the tree)

To sum up, it is very efficient to use B-Tree as an index structure.

In the structure of red-black tree, h is obviously much deeper. Since logically close nodes (father and son) may be physically far away, locality cannot be utilized, so the I/O asymptotic complexity of red-black tree is also O(h), which is significantly less efficient than B-Tree.

As mentioned above, B+Tree is more suitable for external memory indexing, and the reason is related to the out-degree d of internal nodes. As can be seen from the above analysis, the larger the d, the better the performance of the index, and the upper limit of the out-degree depends on the size of the key and data in the node:

\[ d_{max}=floor(pagesize/(keysize+datasize+pointsize))\]

floor means round down. Since the data field is removed from the nodes in the B+Tree, it can have a larger out-degree and better performance.

I still don't understand why the B+ tree performs well! ! !

MySQL index implementation

In MySQL, indexes belong to the concept of storage engine level. Different storage engines implement indexes in different ways. This article mainly discusses the index implementation methods of MyISAM and InnoDB storage engines.

MyISAM Index Implementation

The MyISAM engine uses B+Tree as the index structure, and the data field of the leaf node stores the address of the data record. The following figure is a schematic diagram of the MyISAM index:

There are three columns in the table here. Assuming that we use Col1 as the primary key, the above figure is a schematic representation of the primary key of a MyISAM table. It can be seen that the index file of MyISAM only saves the address of the data record. In MyISAM, there is no difference in structure between the primary index and the secondary key (Secondary key), but the primary index requires the key to be unique, while the key of the secondary index can be repeated. If we build a secondary index on Col2, the structure of this index is shown in the following figure:

It is also a B+ tree, and the data field holds the address of the data record. Therefore, the index retrieval algorithm in MyISAM is to first search the index according to the B+Tree search algorithm. If the specified Key exists, the value of the data field is taken out, and then the corresponding data record is read with the value of the data field as the address.

The index method of MyISAM is also called "non-clustered", which is called to distinguish it from the clustered index of InnoDB.

InnoDB index implementation

Although InnoDB also uses B+Tree as the index structure, the specific implementation is completely different from MyISAM.

The first major difference is that InnoDB's data files are themselves index files. From the above, it is known that the MyISAM index file and the data file are separated, and the index file only saves the address of the data record. In InnoDB, the table data file itself is an index structure organized by B+Tree, and the data field of the leaf node of this tree saves complete data records. The key of this index is the primary key of the data table, so the InnoDB table data file itself is the primary index.

The above figure is a schematic diagram of the InnoDB main index (which is also a data file). You can see that the leaf nodes contain complete data records. Such an index is called a clustered index. Because the data files of InnoDB are aggregated by the primary key, InnoDB requires that the table must have a primary key (MyISAM may not have it). If it is not specified explicitly, the MySQL system will automatically select a column that can uniquely identify the data record as the primary key. If it does not exist For this type of column, MySQL automatically generates an implicit field for the InnoDB table as the primary key. This field is 6 bytes in length and the type is a long integer.

The second difference from MyISAM indexes is that InnoDB's secondary index data field stores the value of the corresponding record's primary key instead of its address. In other words, all secondary indexes in InnoDB refer to the primary key as the data field. For example, the picture above shows an auxiliary index defined on Col3:

Here, the ASCII code of English characters is used as the comparison criterion. The implementation of the clustered index makes the search by the primary key very efficient, but the secondary index search needs to retrieve the index twice: first, the secondary index is retrieved to obtain the primary key, and then the primary key is used to retrieve the records in the primary index.

Knowing how indexes are implemented in different storage engines is very helpful for correct use and optimization of indexes. For example, after knowing the index implementation of InnoDB, it is easy to understand why it is not recommended to use a field that is too long as a primary key, because all secondary indexes refer to the primary key. Index, a long primary index will make the secondary index too large. For another example, it is not a good idea to use a non-monotonic field as the primary key in InnoDB, because the InnoDB data file itself is a B+Tree, and the non-monotonic primary key will cause the data file to maintain the B+Tree characteristics when inserting new records. Frequent split adjustment is very inefficient, and using an auto-increment field as the primary key is a good choice.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325027989&siteId=291194637