Mysql High Availability and High Performance Storage Application Series 1 - Index

index

The nature of indexing

  • It is a data structure that helps mysql colleges and universities to obtain data
  • In mysql, the data is finally stored in the hard disk

Accessing the disk is equivalent to an I/O operation. Mysql has the concept of a page (page). A page is a node in the tree. Every time Mysql takes out a page, that is, the data of a node, and mysql saves a page by default. 16k data.

binary tree

Binary tree definition:

  • All values ​​of the left subtree are less than the root node
  • All values ​​of the right subtree are greater than the root node
  • Each root node splits up to two child nodes

Balanced binary tree definition:

  • Relatively balanced, the absolute value of the depth difference between the left and right subtrees cannot exceed 1
  • The left and right subtrees must also be balanced binary trees
  • Can avoid extreme cases of binary trees

B-Tree structure

Features: Multi-fork (multi-level)

  • 1 node can store 2 checked elements, and can have more than 2 child nodes
  • Has some properties of a binary tree
  • Balanced, all subtrees of each node have the same height and are relatively short

Calculate the number of elements:

Known conditions: m-order B-tree has at most m child nodes, assuming that the number of storage elements of a node is x.

  • Root node calculation formula:1 <= x < m-1
  • Non-root node (rounded up), calculation formula:m/2 <= x <= m-1
  • The number of child nodes: y = x + 1, the root node calculation formula:2 <= y <= m
  • Non-root node (rounded up), calculation formula:m/2 <= y <= m
  • Each node has at most m child nodes
  • In addition to the root node, each node has at least m/2one child node. Note that if the result is indivisible, it will be rounded up
  • The root node is either empty or unique, otherwise it has at least 2 child nodes
  • A node with K child nodes must have k keywords, that is, there are m forks with m data
  • The height of the leaf nodes is consistent

A single node can store multiple data, and one page can obtain more valid data. At the same time, because of more forks, the data level will definitely be smaller, and the number of queries will be reduced.

How many pieces of data can a 3-layer Btree store? Assuming that a piece of data occupies 1k space (its identification can be ignored first), the number of data pieces stored in a 3-layer B-tree structure:

16 * 16 * 16 = 4096

If there are 500w data in a table, the hierarchy will still be very deep, so that when querying data, there will still be a lot of disk I/O. (2) The data is distributed in different levels of the tree in order from small to large. When performing range search, get The larger the range, the more nodes are obtained. In extreme cases, all data is traversed once, which is equivalent to traversing the entire tree. The more nodes, the more I/O operations will occur, and the performance will be stuck.

B+Tree

B+Tree solves the problems of B-Tree structure.

insert image description here

  • Leaf nodes save data information, non-leaf nodes do not save
  • The element tree saved by the node is equal to m, and it is left closed and right open
  • The leaf nodes are linked by pointers, which is convenient for range search, just traverse the leaf nodes

Why does Mysql use B+Tree instead of B-Tree? Leaf nodes are better based on index sorting, non-leaf nodes do not save data, save more index data, and obtain more target data in one I/O. The bottom-level data structure belongs to the doubly linked list, which is very convenient when doing sorting or range search, and it does not need to traverse the above nodes.

How to use Mysql

Myisam

*.frm Data table definition information
*.myi Save index information
*.myd Save data file

Innodb

*.frm The definition information of the data table
*.ibd Save the index information and data information

Under the Innodb engine, if the table does not create a primary key index, the data table will automatically create a primary key index.

return form

Back to the table, as the name implies, is to return to the table, that is, to scan the row where the data is located through the ordinary index (the index we built ourselves, whether it is a single-column index or a joint index, is called an ordinary index), and then retrieve the index through the row’s primary key ID Data not included in . Therefore, the generation of the table return also requires certain conditions. If all the select records can be obtained in one index query, there is no need to return the table. If there are other non-index columns in the columns required by the select, the table return action will occur. That is, queries based on non-primary key indexes need to scan an additional index tree.

Mysql return table refers to the index column queried by the secondary index under the InnoDB storage engine. If you need to find the data of all columns, you need to go to the primary key index to retrieve the data. This process is called back-to-table.

There are fields such as Id, Name, Age, etc. Id and Name are indexes. If they are used select Id,Name from Tablein index items, they will be returned directly. If they are used select * from Tableto query other fields, they need to use the primary key index to obtain data, resulting in redundant table return operations.

Covering index: You can consider creating a composite index for the queried columns to avoid returning to the table.

Index leftmost matching principle

If the index of name, age, address is created, the B+Tree structure is executed strictly according to the index order.

//使用到索引了
Select * from user where name = ? AND age = ? AND address = ? 

//使用到索引了
Select * from user where name = ?

//使用到了索引但是只用到name的索引了
Select * from user where name = ? AND address = ? 

question

  • Why doesn't mysql use a binary search tree or a balanced binary tree?
  • Why does mysql use B+tree instead of B-Tree?
  • Why does Mysql not recommend using uuid as the primary key?
  • How to understand the clustered index and sparse index in mysql?
  • Like 'aaa%' will definitely use the index?
  • Why is it not recommended to write select * fromqueries?
  • How to understand the leftmost matching principle?
  • Why is it suggested that the primary key ID is incremented, and what is the relationship with B+Tree?
  • Why does the innodb engine require the establishment of a primary key index?

1. Why doesn't mysql use binary search tree and balanced binary tree?
A binary search tree is equivalent to a linked list. In extreme cases, querying the last piece of data will traverse the entire table. The operation of each node in mysql is an I/O operation on the disk. Although a balanced binary tree avoids extreme cases, a node only One element can be saved, which will cause each node to save less data, increase I/O operations, and affect performance.

2. Why does mysql use B+tree instead of B-Tree?
1) Leaf nodes are associated with pointers. When sorting and range searching, the efficiency will be higher. It will not query all nodes, so index-based table scanning will be better, and index-based sorting will be better.
2) No data information is saved in the child node, only identification information and pointer information are saved, so that more data is saved in the same page structure, reducing disk I/O.

3. Why doesn’t mysql choose to use B-Tree?
According to the calculation, the data stored in the 3-layer B-Tree tree is still very small, and the data is distributed in different levels of the number in order from small to large. When performing range search, the larger the acquisition range , the more nodes are obtained.
In extreme cases, it is equivalent to traversing the entire tree. The more nodes there are, the more times they are fetched, and the more I/O operations will occur, so performance will encounter bottlenecks.

4. Why does mysql not recommend using uuid as the primary key?
5. Why does it suggest that the primary key ID is incremented, and what does it have to do with B+Tree?

  1. Because B+Tree creates indexes in order from small to large, and places adjacent nodes in the same page to ensure full utilization of a page and reduce forks (that is, reduce the number of searches).
  2. UUid has no rules, resulting in a waste of pages. Btree will not use UUid as the primary key because of the unreasonable storage structure, which will lead to more nodes.

6. Why is it not recommended to use the select * from Table statement to query data?

There are fields such as Id, Name, Age, etc. Id and Name are indexes. If they are used select Id,Name from Tablein index items, they will be returned directly. If they are used select * from Tableto query other fields, they need to use the primary key index to obtain data, resulting in redundant table return operations.

7. Why does the Innodb engine require the establishment of a primary key index?

This is determined by the Innodb special engine structure, and the Innodb engine data is stored under the primary key ID

8. Index leftmost matching principle

If the index of name, age, address is created, the B+Tree structure is executed strictly according to the index order.

//使用到索引了
Select * from user where name = ? AND age = ? AND address = ? 

//使用到索引了
Select * from user where name = ?

//使用到了索引但是只用到name的索引了
Select * from user where name = ? AND address = ? 

Guess you like

Origin blog.csdn.net/xuezhiwu001/article/details/129678652
Recommended