[MySQL database] database index and B-tree B+ tree concept

What is an index?

An index is a data structure that sorts the data of one or more columns in a database table. It can speed up the retrieval of data. An index acts as a directory of books.

The default index for InnoDB and MyISAN is a B+ tree. Memony is a hash table

What is a B-tree?

B means balance.
A B-tree is a self-balancing tree data structure that maintains ordered data and allows searches, sequential access, insertions, and deletions in logarithmic time .
A B-tree is a generalization of a binary search tree that can have more than two child nodes. B-trees are ideal for storage systems that read and write relatively large blocks of data

insert image description here
The order of
the B-tree In the B-tree, the maximum number of child nodes of a node is the order of the B-tree, as shown in the figure above for a 4-order B-tree.

The root node of a B-tree
If the root node is not the only node, it has at least two children

Internal nodes of a B-tree Internal
nodes are all nodes except leaf nodes and root nodes

The leaf node of the B-tree The leaf
node has no child nodes, and there is no pointer to the child node. The elements of the leaf node in the m-order B-tree conform to (m/2)-1 <= K <=m-1

Each intermediate node contains K-1 elements and K children

insert

Insert an element, first see if it exists. If it does not exist, it ends at the leaf node, and then inserts the new element in the leaf node.

  1. If the number of keys in the node is less than m - 1 , insert directly
  2. If the number of keys is equal to m - 1, the node will be split, the node will be divided into 2 with the intermediate key as the boundary, a new node will be generated, and the intermediate key will be inserted into the parent node to
    repeat the above work, the worst case is Split to the root node and add one layer to the entire B-tree
    insert image description here

Inquire

insert image description here
For example, searching for 13,
the first disk IO, locates 17 35, is smaller than 17, queries the left subtree
for the second disk IO, locates 8 12, and queries the right subtree
for the third disk IO than 12, locates 13, 15, found 13

When comparing, the comparison is performed in memory, so I am not afraid of more comparisons, and I am afraid that the tree depth is large.

delete

more complicated

B+ tree

B+ tree is a variant of B tree, which has better query efficiency. The difference between B+ tree and B tree:
insert image description here
each intermediate node contains K elements and K children, each element does not store data, it is only used for indexing, and all data exists Leaf Nodes
All leaf nodes contain information about all elements and pointers to records containing these elements, and the leaf nodes themselves are linked in ascending order according to the size of the key.
All intermediate node elements also exist in the child node, in the child node element is the largest (or smallest) element.

B+ tree search

The advantage of the B+ tree is the search efficiency. Since there is no satellite data in the middle node of the B+ tree, more elements can be stored in the same area. Yes, the B+ tree is more squat and has fewer IO operations.
In the clustered index of the database, leaf nodes directly contain satellite data. In a non-clustered index, leaf nodes carry pointers to satellite data

Due to the difference in satellite data, the B-tree query performance is unstable. Sometimes the root node can get the data, and sometimes the leaf node can get the data.
The B+ tree must query the leaf nodes every time, and the performance is stable

The range search B-tree is less efficient, and requires continuous in-order traversal to find the upper and lower bounds, while the B+ tree (because the leaf nodes are connected by a sequential linked list) only needs to find the lower bound of the range.

insert image description here
Insert and delete almost

Summarize

The advantages of B+ tree compared to B tree:
  1. A single node stores more elements, so that the IO times of the query are less;
  2. All queries must find leaf nodes, and the query performance is stable;
  3. All leaf nodes form an orderly Linked list for easy range query.

Guess you like

Origin blog.csdn.net/weixin_44179010/article/details/124014522