Why choose MySQL as a B + tree index structure?

Comparison of various structures

Transfer from: http: //www.gxlcms.com/mysql-366759.html

1, balanced binary tree (AVL): time-consuming rotation

Disadvantages: Due to the rotation of time-consuming, AVL tree is inefficient when you delete data

AVL balanced binary tree is strict, left and right subtrees of all nodes of the difference in height should not exceed 1; AVL tree search, insertion and deletion in the average and worst-case is O (lgn).

AVL is the key to achieving a balanced rotation: Insert and Delete may undermine a balanced binary tree, in which case one or more tree rotation to rebalance the tree need. When data is inserted, a maximum of only one rotation (rotation of a single or double rotation); however, when data is deleted, the imbalance results in a tree, AVL node is deleted from the need to maintain the balance of all the nodes on this path root, the rotation the order of O (lgn).

When more delete operation, maintenance costs may be higher than the required balance its benefits, so AVL actual use is not widespread.

2, red-black tree: the tree is too high

Advantages: in the case of the data memory (e.g., a Java TreeMap and HashMap), red-black tree performance is very excellent.

Cons: But for the case of data (such as MySQL databases, etc.) in the disk auxiliary storage device, red-black tree is not good, because the red-black tree still looks too high.

When the data on the disk, the disk IO will become the biggest performance bottleneck, the design goal should be to minimize the number of IO; the higher the height of the tree, CRUD require the more the number of IO, it will seriously affect performance.

Compared with the AVL tree, red-black tree does not pursue a strict balance, but roughly balanced.

Compared with the AVL tree, red-black tree query efficiency will decline, because the balance deteriorates tree height higher. Remove red-black tree, but the efficiency is greatly improved, because the color red-black tree while introducing, when the insert or delete data, requires only O (1) and the number of revolutions can ensure basic color balance.

Thus, in practical applications, using a relatively small AVL trees, red-black tree is used widely. For example, Java TreeMap using a red-black tree in memory of the sort key; the Java8 HashMap using the hash list + red-black tree to resolve conflicts (when there is less conflict node, linked lists, when there are many conflicting nodes, the use of red black tree).

3, B tree: born to disk (also called B- tree)

advantage:

  1. Compared with the binary tree, each non-leaf node B-tree can have multiple sub-tree. Therefore, when the total number of nodes is the same, the height of the B-tree is much smaller than AVL trees and red-black tree (B-tree is a "stocky"), greatly reducing the number of disk IO.
  2. B-tree tree upper primary advantage of, as well as use of the locality principle (ie, operating system paging memory) in addition.

The so-called principle of locality, means that when the data is used, the data in its vicinity have a greater probability of being used in a short time. When one IO, disk address not only the current data, but also the adjacent data is read into the memory buffer.
The B-tree data similar key stored in the same node, wherein when a data access to the database node will read the entire cache; when it is accessed immediately adjacent data can be read directly from the cache without the need for disk IO; in other words, a higher cache hit rate of B-trees.

The most important concept is the order of the B-tree is defined (the Order), for an m-order B-tree, the following is required:

  1. Each node contains a maximum of m child nodes.
  2. If the root node comprising a child node, the child nodes comprising at least 2; except the root node, each non-leaf node comprises at least m / 2 child nodes.
  3. K has a child node of the non-leaf node will contain k - 1 records.
  4. All leaf nodes are in the same layer.

As can be seen, the definition of the B-tree, and primarily to limit the number of child nodes recording the number of non-leaf nodes.
Here Insert Picture Description
B-tree there are some applications, such as the use of mongodb index B-tree structure in the database. However, in many database applications, using a B-tree is a variant of B + tree.

4, B + tree

B + trees are more than balanced search tree, the difference lies mainly with the B-tree :

  1. Each node B-tree, including leaf nodes and non-leaf nodes are stored real data, B + tree leaf nodes store only real data, non-leaf node stores only key.
  2. B-tree in a record only once, will not be repeated, while key B + tree is likely to repeat reproduce - will appear in the leaf node, it may be repeated in the non-leaf nodes.
  3. Between the B + tree leaf nodes linked by a doubly linked list.

B non-leaf nodes in the tree, a number of records less than the number of child nodes; the B + tree in the same number of records and the number of child nodes.
Here Insert Picture Description
This is the key! ! !
Thus, B + tree compared to the B-tree, has the following advantages :

  1. Fewer times the IO : B + non-leaf node of the tree contains only key, does not contain real data, so many (i.e., larger order m) stored at each node records the number larger than the number B, the height of the B + tree is more low, less the number of IO access when needed. In addition, due to the number of records stored at each node more, so take advantage of better access to the principle of locality, higher cache hit rate.
  2. More suitable range query : when the range of a query in the B-tree, first find the lower limit of looking for, and then the B-tree traversal sequence, until it finds the upper limit of the lookup; and B + span tree queries only for linked list traversal It can be.
  3. More stable query efficiency : B tree query time complexity of the tree height between 1 (respectively recorded in the root node and leaf nodes), and the query complexity of the B + tree is a tree high stability, since all data is in leaf node.

B + tree also has a disadvantage: Because the key will be repeated, and therefore will take up more space . However, compared with the performance benefits, disadvantages spatial often acceptable, therefore B + tree is used in the database is more extensive than B-tree.

5, the principle of locality (high tree care reasons)

In modern operating systems, the external memory to read data from the memory unit used is generally referred to as "pages", each read data needs to be read in an integer number of "pages", and can not read a half page or 0.8. Page size is determined by the operating system, common page size typically 4KB = 4096 bytes. So whether we are to read one byte or 2KB, and finally all need to read a full 4KB page size, then the cost of reading a node depends on the number of pages to be read.

In such a case, if a node size smaller than the size of the page, then there will be a portion of the time spent reading (data outside the node) we do not need the data.
Binary tree in this regard will waste a lot of time, which is a node in only one data per IO can only get a useful data, and more data if a node included in a more useful data IO, efficiency will Huge improvements.

Published 67 original articles · won praise 32 · views 60000 +

Guess you like

Origin blog.csdn.net/weixin_43751710/article/details/104671783
Recommended