Detailed explanation of basic knowledge of Mysql index

1. What is an index?

In a relational database, an index is a separate storage structure that sorts one or more columns in a data table. It is a collection of one or more column values ​​in a table and a corresponding pointer to the physical identifier of these values ​​in the table. List of logical pointers.

In fact, it is equivalent to the catalog of a book. You can quickly find the content you need according to the catalog.

2. Why use index?

In fact, it is to speed up the access speed of data in the database.

3. Why can using indexes speed up access?

1. Where is the performance bottleneck of mysql query data?

The table data of mysql is stored on the disk, and the data must be loaded from the disk into the memory when used. Therefore, IO operations are the bottleneck of query performance.

2. How much data should be retrieved from the disk?

No, the smallest logical unit of interaction between disk and memory is a page (dataPage). The memory will read one or an integer multiple of pages from the disk each time. This is called disk read-ahead.

The page size is determined by the operating system, usually 4k or 8k. The innoDB engine reads 16k data each time.

tips:磁盘预读背后的原理是——局部性原理

时间局部性:指的是同一个内存位置,从时间维度看来,它如果被引用了一次,那么较短时间内有很大可能被再次引用

空间局部性:指的是同一个内存位置,从空间维度看来,它如果被引用了一次,那么它附近的内存很大可能会被引用

3. When is the index loaded into memory?

The index is stored on disk, and when querying data, the index will be loaded into memory first.

4. What information will be stored in the index?

First of all, you must understand that the index is also a table, which saves the primary key and index fields and points to the records of the entity table (real table data)

5. Why does the data structure of the index not use a Hash table?

First of all, if it is stored in a Hash table, Key is the index value and Value is the corresponding table record.
Insert image description here

  1. Hash conflicts will cause uneven data hashing, resulting in a large number of linear queries and a waste of time.
  2. The Hash table structure does not support range queries. When performing range queries, it must be traversed sequentially.
  3. The memory space requirements are relatively high because all index tables need to be loaded into memory.

But it’s not all without benefits. The index of the Hash table structure is very fast when performing equivalent queries.

Is there a Hash index in Mysql?

① The hash index used by the memory engine

② The innoDB engine supports adaptive hash index

6. Why does Mysql not use BST storage index?

BST: Binary search tree. The value on the left subtree of each node is less than the node value, and the value on the right subtree of each node is greater than the node value.

Based on this rule, we can use the binary search method to improve query efficiency. But imagine a situation where
the data in a table is increasing or decreasing as a whole, then the binary search tree will degenerate into a linked list, that is, a linear query.

Insert image description here

7. Why doesn’t Mysql use AVL to store indexes?

AVL: Balanced binary tree. The height difference between the shortest subtree and the longest subtree cannot exceed 1. As

mentioned earlier why BST is not used, AVL actually solves the degradation problem of BST, but in order to maintain the balance of the tree, when inserting data, it must Make a left or right turn. In essence, this solution uses the loss of insertion performance in exchange for the improvement of query performance.

If the business requires more reading than writing, this structure has certain advantages. What if there are equally many reads and writes?
Solution: Red-black tree

8. Why does Mysql not use red-black tree storage?

Red-black tree: a self-balancing binary search tree, an efficient search tree

Insert image description here

  1. Each node is either red or black
  2. The root node must be black
  3. Each leaf node (leaf node refers to the NIL pointer or NULL node at the end of the tree) must be black
  4. If a node is red, then both of its child nodes are black
  5. For any node, each path to the NIL pointer at the end of the leaf node tree contains the same number of black nodes. The above five

    properties ensure that the longest subtree of the red-black tree will not exceed twice the shortest subtree, reducing The number of rotations also ensures a certain balance. It is essentially a compromise between query performance and insertion performance. However, there are still

    problems - as the data increases, the depth of the tree also increases, that is, This means that the number of IOs increases and performance decreases.

9. Why doesn’t Mysql use B-tree to store indexes?

B-tree: It is a balanced multi-way search tree

Insert image description here

Properties of B-tree:
1. The root node has at least two children.
2. Each intermediate node contains k-1 elements and k children, where ceil(m/2) ≤ k ≤ m 3. Each
leaf node contains k-1 elements, where ceil(m/2) ≤ k ≤ m
4. All leaf nodes are located on the same layer.
5. The elements in each node are arranged from small to large. The k-1 elements in the node are exactly the value range division of the elements contained in the k children. 6. The structure of
each node is: (n, A0, K1, A1 ,K2,A2,…,Kn,An)
where Ki(1≤i≤n) is the keyword, and Ki<Ki+1(1≤i≤n-1).
Ai (0≤i≤n) is a pointer pointing to the root node of the subtree. And the keywords in all nodes of the subtree pointed to by Ai are less than Ki+1.
n is the number of keywords in the node, satisfying ceil(m/2)-1≤n≤m-1.

But we can see that data is stored on each node. Assuming that the pointer space is not included, the size of each key+data is 1k. InnoDB fetches 16K data from the disk each time, that is, it can only fetch 16 data each time. key+data, as the data increases, the depth of the B-tree will also increase, and the number of IOs will also increase.

10. Mysql uses B+ tree to store indexes

Characteristics of the B+ tree:

All leaf nodes of the B+ tree are on the same level, and the leaf nodes are connected through pointers to form an ordered linked list. This facilitates range queries.
The non-leaf nodes of the B+ tree only store index information and do not store specific data. This reduces the size of non-leaf nodes and improves memory utilization.

This is where the B+ tree has an advantage over the B-tree in storing indexes. The non-leaf nodes of the B+ tree do not store data. Therefore, more pointers and indexes can be stored in each page. The B+ tree is chunkier than the B-tree. , effectively reducing the number of IOs. And the leaf nodes of the B+ tree are doubly linked list structures, supporting random query data.

Note: Generally speaking, 3/4 layers of the B+ tree can meet most needs, otherwise it will need to be divided into tables.
Insert image description here

4. What is a storage engine?

The storage engine is the component of the database responsible for data storage and retrieval. It defines how data is stored, organized, and accessed.

1. What storage engines does Mysql support?

  • InnoDB: Mysql's default storage engine, supports transactions, supports foreign keys, supports full-text indexes, supports clustered indexes (index and data in one file), supports row-level locks and table locks, etc.
  • MyISAM: does not support transactions, does not support foreign keys, supports full-text indexes, does not support clustered indexes, supports non-clustered indexes (indexes and data are no longer in the same file), does not support row-level locks, supports table locks, and does not support memory space The requirements are relatively high, and the query is faster than innoDB
  • Memory: does not support persistence, all data is stored in memory, not commonly used
  • CSV、Archive、Blackhole、NDB (MySQL Cluster)、Merge、Federated、Example

2. Let’s talk about Mysql’s default innoDB engine

  • Let’s first take a look at the basic structure of Mysql:tips:mysql8.0之后Server层抛弃了缓存

Insert image description here

  • innoDB must exist and can only have one clustered index. Why? Because if there are multiple clustered indexes, data redundancy will occur (one clustered index corresponds to one piece of data). The key of this clustered index can be the primary key. If there is no primary key, it will be the unique key. If there is no unique key, innoDB will automatically generate a 6-byte rowid as the key of the clustered index. This rowid is invisible to the user. Here is the explanation from the official website:
    Insert image description here
  • What is a return form?

It refers to the process of obtaining the primary key of the record by querying the index, and then querying the data table to obtain the complete record through the primary key. When using index query, if the queried field is not in the index, the query result only contains the index field, and the query result needs to be passed The primary key in the data table needs to be queried again to obtain the complete record. This process is table return. It

can be seen that the table return operation will add additional IO operations, because the clustered index needs to be accessed again according to the primary key to obtain the data, which will lead to performance degradation.

In order to reduce the number of table returns, you can use a covering index to avoid table returns. A covering index means that the index contains all the fields required for the query, so that there is no need to retrieve the data through a table return operation, thereby improving query performance.

Here are two examples to illustrate:

select name,age from User where name=?;
#假设User表在name字段建立了普通索引,那么这个普通索引会指向主键id,然后根据主键id去聚集索引里面查找完整记录,返回。这就叫做回表。
select name,age from User where name = ? and age = ?;
#假设User表在name和age上建立了组合索引,那么查询的字段name和age就被索引全部覆盖了,直接查询这个组合索引就可返回数据,不需要再根据主键去聚集索引查完整记录,这就是覆盖索引

  • Should I increment myself when setting the primary key?

Try to keep auto-increment, which can effectively reduce disk block splitting when inserting data. However, auto-increment fields can easily make data crawled, which ultimately depends on business needs.

  • What is index pushdown?

Index pushdown is an optimization technology used to improve performance in database queries. It pushes filtering conditions down to the storage engine level, reducing disk IO operations and data transfer volume. Due to the reduction in the amount of data that needs to be transferred, index pushdown Pushing can also reduce network transmission overhead.

  • What is leftmost match?

This means that when using a multi-column index for querying, the leftmost column of the index will be used first, and subsequent columns will only be used if the leftmost columns are equal.

For example: The User table creates an index on the age field.

  1. select * from User where age = ? and name = ?; //生效
  2. select * from User where name = ? and age = ?;//InnoDB engine optimization will change the order when it takes effect
  3. select * from User where age = ?; //生效
  4. select * from User where name = ?;//失效

tips:组合索引的结构就是二元组

Guess you like

Origin blog.csdn.net/qq_56919740/article/details/132569154