MySQL index structure evolution history

MySQL index structure evolution history

what is index

Index definition: Index relies on certain data structures and algorithms to organize data, and ultimately guides users to quickly retrieve the required data.

For example, in Xinhua Dictionary, we can quickly find the word we need to find through radicals or pinyin; the radicals and pinyin here are the indexes

Index selection data structure history

1.Ordered array

advantage:

Data can be accessed randomly via subscripts

shortcoming:

When searching for data, the entire table data needs to be loaded into the memory, which causes very high memory pressure.

And the pointer movement problem needs to be taken into consideration when storing data.

2. Linked list

advantage:

  1. Can quickly locate the previous or next node
  2. You can quickly delete data by just changing the pointer. This is better than an array.

shortcoming:

  1. Data cannot be accessed randomly through subscripts like an array.
  2. To find data, you need to start traversing from the first node, which is not conducive to data search. The search time is similar to that without data. It requires a full traversal. The worst time is O(N)

3. Binary search tree

Advantages and disadvantages of binary trees:

  1. The efficiency of querying data is unstable. If the left and right sides of the tree are relatively balanced, the worst case is O(logN). If the inserted data is in order, it degenerates into a linked list, and the query time becomes O(N).
  2. When the amount of data is large, the height of the tree will become higher. If each node corresponds to a block on the disk to store a piece of data, the number of IO times required will increase significantly. Obviously, it is not advisable to use this structure to store data.

normal data

image-20230710150628613

abnormal data

image-20230710150649235

4. Balanced binary tree (AVL tree)

The balanced binary tree is a special kind of binary tree, so it also satisfies the two characteristics of the binary search tree mentioned earlier, and it also has another characteristic:

The absolute value of the height difference between its left and right subtrees does not exceed 1, and both left and right subtrees are balanced binary trees.

Compared with a binary tree, a balanced binary tree has a relatively balanced left and right side of the tree and will not degenerate into a linked list like a binary tree. No matter how data is inserted, through some adjustments, the height difference between the left and right sides of the tree can be ensured to be no more than 1.

But when the amount of data is very large, the problem of the tree height being too high will also occur like the binary tree.

5.B-tree

image-20230710150606776

Derived from the balanced binary tree, each node stores multiple elements, and multiple elements in the nodes are related through pointers, which solves the problem of the tree height being too high when the amount of data is large;

However, the range search problem cannot be solved. For example, searching for [15,36] still requires access to 7 disk blocks (1/2/7/3/8/4/9)

6.b+tree

image-20230710150550907

After optimization, only data is stored in leaf nodes, other nodes only store keywords, and leaf nodes are related through bidirectional pointers.

Fixed range lookup issue

Search process

First locate the maximum and minimum values ​​of the range, and then rely on the linked list to traverse the range data in the child nodes.

Guess you like

Origin blog.csdn.net/itScholar001/article/details/131639775