A brief analysis of MySQL index structure

index structure

image-20230808101522006

InnoDB B-tree

image-20230808101753808

The above is the structure of a binary tree and a red-black tree. In fact, the red-black tree is a self-balancing binary search tree , which can be used to solve the problem of forming an ordered linked list when the binary tree is inserted sequentially.

However, both have an obvious shortcoming, that is, when the amount of data is too large, the level is deeper and the retrieval speed is slow.

Let’s analyze the B-tree (==multi-path==balanced search tree)

image-20230808102152203

Noun analysis:

  • Degree: refers to the number of child nodes of a node.

The B-tree diagram above has a degree of 5 and is also a 5th order. It can store up to 4 keys and 5 pointers.

For example: If it is less than 20, it will go to the first pointer to find the child node [10, 15, 18]. If it is between 20 and 30, it will find the child node [23, 25, 28]. According to this inference, if a node N key, then there are N+1 pointers.

The advantages of such a data structure are very obvious. The amount of data that can be stored in each layer is increased, and the number of levels of the tree is effectively reduced.

image-20230808103204593

The figure above shows the evolution process of insertion splitting of a B number.

Assuming that the maximum degree of this B number is 5, it can store up to 4 keys and 5 pointers.

The core lies in the insertion order and splitting process: simply speaking, it is inserted in order. When the node is found to meet 4 keys during insertion, then it is necessary to enter the upward splitting process. This split is from the middle key of the current node. Upward splitting process.

eg: The element to be inserted [2456], the current node element [1800, 1888, 1980, 2000]

  1. Before inserting, it is found that the key stored in the current node has reached the maximum value. At this time, the current node needs to be split upward.
  2. After insertion, the node elements are [1800, 1888, 1980, 2000, 2456]. At this time, the middle element is selected and split upward to become the parent node of the current node.
  3. Split results: [1980] (parent node element), [1800, 1888] | [2000, 2456]
  4. The final effect is: pointers smaller than [1980] point to [1800, 1888], and pointers larger than [1980] point to [2000, 2456]

image-20230808104343480

InnoDB B+ number

First, take a look at the classic B+ tree structure:

image-20230808104914123

The above is a classic B+ tree. Its insertion order and splitting process are similar to those of a B tree. There are three main differences:

  1. Splitting process: During the splitting process, not only the middle element is split upward, but the element that is split upward is left in the split child node.
  2. All data is in leaf nodes.
  3. Leaf nodes form a single-line linked list.

The characteristics of this splitting process are mainly understood in combination with the fact that all data is in the leaf nodes .

Because the data I want to insert may cause the node to split upward , but because all the data in the B+ tree is in the leaf nodes , the data to be inserted needs to be saved in the leaf nodes .

B+ tree structure in InnoDB.

image-20230808112435443

From the picture above, you can see that MySQL is optimized for the classic B+ tree (based on the physical storage structure and ensuring query efficiency).

First, let’s take a look at the optimizations for data structures:

  1. The leaf nodes become a circular doubly linked list (to improve the efficiency of interval access)
  2. Each node is stored in pages (the basic unit of disk I/O for data in MySQL).

The concepts of MySQL table space and data area:

  • Table space: From the perspective of InnoDB's logical storage structure , all data is logically stored in a space, which is called tablespace. *Table space consists of segments, extents, and pages*. Create a table and have corresponding .ibdfiles on disk.

    There are many groups of data areas in the table space. A group of data areas ( 256MB ) is 256 data areas. Each data area ( 1MB ) contains 64 data pages, and each page ( 16KB )

  • Segment: It is divided into index segment, data segment, rollback segment, etc. Among them, the index segment is the non-leaf node part, and the data segment is the leaf node part, and the rollback segment is used for data rollback and multi-version control. A segment contains 256 areas (256M size).

  • Extent: An extent is a collection of pages, an extent contains 64 consecutive pages, and the default size is 1MB (64*16K).

  • Page: Page is the smallest unit managed by InnoDB. Common types include FSP_HDR, INODE, INDEX and other types. The structure of all pages is the same, divided into file header (first 38 bytes), page data and file tail (last 8 bytes). Page data varies depending on the type of page.

summary:

This time, a simple analysis of B-tree, B+ tree, and InnoDB B+ tree is carried out.

In order to better understand the InnoDB B+ tree, you may need to understand the specific implementation of the B+ tree and the principle of self-balancing (left-handed and right-handed) in detail.

Later, I am going to study the tuning of MySQL in detail, from all aspects of table creation specifications, query statement optimization, index optimization, and sub-database and sub-table.

Guess you like

Origin blog.csdn.net/Trouvailless/article/details/132168443