Those interviewers often mention b-trees (MySQL index underlying data structure)

1. Basic concept of tree

insert image description here

The characteristics of the tree: there is a root, there are many branches on the root, there are many branches on the branch, and there are many leaves on the branches. The tree is the
most data structure and has similar characteristics.
The tree is a finite set of
root nodes: there are and There is only one specific root node,
node: contains data elements and several branches pointing to its subtree
parent node, child node, brother node
A tree can have no nodes, called empty tree
A tree can have only 1 node, also That is, there are only root
nodes, subtrees, left subtrees, and right subtrees.
Node degree (degree): the number
of subtrees. Tree degree: the maximum value of all node degrees.
Leaf node (leaf): a node with a degree of 0
is not a leaf Node: a node whose degree is not 0
Layers (level): the root node is at the first layer, the child nodes of the root node are at the second layer, and so on (some tutorials also calculate from the 0th layer) the
depth of the node (depth ): The total number of nodes on the unique path from the root node to the current node The
height of the node (height): The total number of nodes on the path from the current node to the farthest leaf node The depth of the tree: The height of the maximum tree
among all node depths
: maximum value among all node heights
The structure of the tree is recursive

2. Binary tree

Binary tree is an important type of tree structure and the cornerstone of many data structures.
Each node can only have at most two child nodes called binary tree.
Therefore, the characteristic of a binary tree is that each node does not allow more than two child nodes

insert image description here
special type

1. Full binary tree: If a binary tree only has nodes with degree 0 and nodes with degree 2, and the nodes with degree 0 are on the same level, then the binary tree is a full binary tree.

insert image description here

2. Complete binary tree: A binary tree with a depth of k and n nodes is called a complete binary tree if and only if each node corresponds to the nodes numbered from 1 to n in a full binary tree with a depth of k [4 ].
The characteristic of a complete binary tree is that the leaf nodes can only appear on the two largest layers of the sequence, and the maximum sequence of descendants under the left branch of a node is equal to or greater than the maximum sequence of descendants under the right branch by 1 [4].

insert image description here

3.b tree

B-tree: (also called B-tree, some people will habitually read B-tree as B-subtractive tree, in fact, there is no B-subtractive tree, just a difference in pronunciation)

Wikipedia defines B-tree as "In computer science, a B-tree is a tree-like data structure that can store data, sort it, and allow time complexity of O(log n) Data structures that perform lookups, sequential reads, insertions, and deletions.

B-tree, in general, is a binary search tree in which a node can have more than 2 child nodes. Unlike self-balancing binary search trees, B-trees optimize the read and write operations of large blocks of data for the system. The B-tree algorithm reduces the intermediate process of locating records, thereby speeding up access. Commonly used in databases and file systems. "
B tree can be seen as an extension of 2-3 search tree, that is, he allows each node to have M-1 child nodes

4. b+ tree

The B+ tree is a deformed tree of the B tree generated according to the needs of the file system

Definition of B+ tree
A B+ tree of order m satisfies the following conditions:
each node has at most m child nodes.
Except for the root node, each non-leaf node has at least m/2 (rounded down) child nodes.
A root node that is not a leaf node has at least two child nodes.
A non-leaf node with k subtrees has k keys, and the keys are arranged in increasing order.
The leaf nodes are all in the same layer.

5. Comparison between b tree and b+ tree

1. The non-leaf nodes of the B+ tree do not store data, only key values ​​are stored, while B-tree nodes not only store key values, but also store data.
The reason for this is that the page size in the database is fixed, and the default page size in InnoDB is 16KB. If the data is not stored, then more key values ​​will be stored, the corresponding tree order (the child node tree of the node) will be larger, and the tree will be shorter and fatter, so that we search for data for disk storage The number of IOs will be reduced again, and the efficiency of data query will be faster. In addition, the order of the B+ tree is equal to the number of key values. If one node of our B+ tree can store 1000 key values, then the 3-layer B+ tree can store 1000×1000×1000=1 billion data.
Generally, the root node is resident in memory, so generally we only need 2 disk IOs to search for 1 billion data.

2. All the data of the B+ tree index are stored in the leaf nodes, and the data is arranged in order.
Then the B+ tree makes range search, sort search, group search and deduplication search extremely simple. However, because the data of the B-tree is scattered in each node, it is not easy to achieve this.

Therefore, there are a large number of range query scenarios, which are suitable for using B+ trees (such as databases);

For a large number of single key query scenarios, B-trees (such as NOSQL's MongoDB) can be considered

5. MySQL index underlying data structure

The underlying data structure of the MySQL index is the B+ tree data structure

The B+ tree has three characteristics
1. The B+ tree is a balanced multi-fork tree. Compared with the balanced binary tree, which has at most two child nodes under each node, the B+ tree has multiple child nodes under each node.
2. B+ tree leaf nodes (that is, nodes without child nodes at the bottom layer) have a doubly linked list, left and right are for the convenience of range search (if I look for the first 100 data, then I find the data of the first leaf node You can directly fetch 100 data backwards from the leaf node, no need to search down from the root node)
3. The leaf node of the B+ tree has data data (that is, all the field data in the database), and the non-leaf nodes only have index data.

Why does MySQL use B+ tree instead of B tree at the bottom?

There are two differences between B-tree and B+ tree. One is the doubly linked list of leaf nodes, and the other is that B-tree does not only have data data in leaf nodes, but all nodes have data data.
The reason why the B+ tree removes the data data of other nodes and only leaves the data data of the leaf nodes is because it involves the IO operation in the computer. The computer IO can only get the data of one data page at a time. If each node has data data , then the computer IO may only be enough to take out one node at a time. In this way, the result may be found only after a hundred times of IO. If other nodes do not store data data, then the index occupies less space, and one IO can take out multiple nodes. In this way, the number of IOs is greatly reduced, and one IO is more performance-consuming, so the use of B+ trees improves performance.

Guess you like

Origin blog.csdn.net/qq_54796785/article/details/126692766
Recommended