Crusade against MySQL-Index

This article is to organize the knowledge points when learning MySQL index.

table of Contents

1.Data structure of MySQL index

1.1 Linked List

1.2 Binary search tree

1.3 Red and black trees

1.4 B tree

1.5 B+ tree

2. MySQL's implementation of indexes

2.1 MyISAM storage engine

2.2 InnoDB storage engine

3 The relationship between indexes and keys

3.1 Rowid

3.2 Hash index

3.3 B+ tree index range search

3.4 Joint Index

4 Ali index specification


In the MySQL database, when there are a large number of records in a table, if you want to query the table, the first query method is to perform a full table search-take out all the records in the table, compare them with the query conditions one by one, and then Return the records that meet the conditions; the second is to create an index in the table, find the index value that meets the query condition through the index, and then quickly find the corresponding record in the table.

The index is a database to help efficiently get the data has been sorted the data structure .

1.Data structure of MySQL index

The commonly used data structure of MySQL index is B+ tree.

The evolution process of B+ tree: linked list-binary sort tree-red-black tree-B tree-B+ tree

1.1 Linked List

If we directly store all the data in a table in a linked list, no matter what kind of query we want to perform on the table, a full table search will be performed.

If we want to improve query efficiency, we can think of using a binary search tree.

1.2 Binary search tree

Binary search tree is also called binary sort tree and binary search tree. In general, the query efficiency is higher than the linked list structure.

For each node of the binary search tree:

  • If the left subtree is not empty, the value of all nodes on the left subtree is less than the value of the node
  • If the right subtree is not empty, the value of all nodes on the right subtree are greater than the value of the node
  • No node with equal key value

The problem that may be encountered when using a binary search tree is that when the depth of the left and right subtrees is too large, the binary tree may be very high, and the higher the binary tree, the worse the query performance.

1.3 Red and black trees

The red-black tree is a specialized balanced binary tree: it maintains the balance of the binary search tree through specific operations during insertion and deletion, so as to obtain higher search performance.

The characteristics of red-black trees are:

  • Node is red or black
  • The root node is black
  • All leaf nodes are black
  • The two child nodes of each red node are black
  • The path from any node to each leaf node below it contains the same number of black nodes

More about red-black trees: https://www.jianshu.com/p/e136ec79235c

Even in the worst case, the query efficiency of the red-black tree is very good, and it is efficient in practice.

But even so, in the case of a large amount of data, the height of the red-black tree may be very high.

1.4 B tree

If multiple pieces of data can be stored in each node of the red-black tree, the height of the tree can be controlled very low. This is the B tree.

For a B tree:

  • The leaf nodes have the same depth, and the pointer of the leaf node is empty
  • All index elements are not repeated
  • The data index in the node is arranged in increasing order from left to right

As shown in the figure, the blank area in the figure stores the pointer to the lower node:

1.5 B+ tree

The B+ tree is a variant of the B tree, and is a commonly used data structure for MySQL indexes.

For a B+ tree:

  • Non-leaf nodes do not store data, only indexes (redundancy)
  • Leaf node contains all index fields
  • Use pointers to connect leaf nodes to improve the performance of interval access

MySQL reads data page by page from the disk (following the principle of program locality).

Each node in the B+ tree defaults to a disk page, and the default size is 16KB, which is 16384 bytes.

If the primary key of a table is bigint type (8 bytes), the pointer to the lower node occupies 6 bytes by default, and a 16KB data page can store 16384/(8+6)=1170 index elements.

 

2. MySQL's implementation of indexes

Data in MySQL is stored in disk (or memory) using a variety of different technologies, and each technology is called a storage engine.

The storage engine is at the table level, not at the database level.

There are two common storage engines in MySQL:

  • MyISAM storage engine
  • InnoDB storage engine

2.1 MyISAM storage engine

MyISAM storage engine implements indexing:

The MyISAM storage engine separates the table data and index of a table-stores the table data in a .myd file, and stores the index in a .myi file.

When we want to query a record from the table, we first query the index in the .myi file. If we locate a certain index, we can get a physical address, and then we can query the target record through the physical address in the .myd file.

The MyISAM storage engine separates index files and data files, and its index is called a non-clustered index.

2.2 InnoDB storage engine

The InnoDB storage engine is now the most commonly used storage engine for MySQL.

The InnoDB storage engine puts the table data and indexes of a table in the same .ibd file, which is an index structure file organized by B+ tree.

The InnoDB storage engine puts all other column data in the row where the index is located on the leaf node.

The InnoDB storage engine stores the table data and indexes of a table in the same file, and its index is called a clustered index.

 

3 The relationship between indexes and keys

3.1 Rowid

If a table does not have a primary key, a unique index will be used. If there is no primary key and a unique index, a rowid will be used.

Why is it recommended that InnoDB tables must have a primary key, and it is recommended to use an integral self-incrementing primary key?

InnoDB is designed to have a B+ tree to organize data. If a primary key is established, the primary key index is used by default to organize the data of the entire table.

If there is no primary key, InnoDB will try to find a unique key that does not duplicate all elements, add a unique index to this column, and use this column to organize the data of the entire table.

If MySQL does not find the unique key, it will help you maintain a hidden column: rowid.

3.2 Hash index

Hash is an array + linked list.

  • Perform a hash calculation on the key of the index to locate the data storage location.
  • In many cases, Hash index is more efficient than B+ tree index.

But Hash index is rarely used in work, because it:

  • Only "=" and "IN" can be satisfied, and range query is not supported.
  • Hash conflict problem.

3.3 B+ tree index range search

B+ tree leaf node two-way pointer: each leaf node stores a pointer to the adjacent node.

Assuming to find data greater than 20 and less than 50, first locate 20 and 50, and traverse the data between the two in turn.

The difference between B tree and B+ tree?

The B-tree does not maintain pointers between leaf nodes. (Cannot perform range search)

B-trees have no redundant indexes.

The B-tree may be very high.

3.4 Joint Index

The joint index is also a B+ tree.

Multiple fields form a joint index, which is compared field by field.

If all fields are the same, compare the primary key and rowid.

When querying, whether to use an index can be judged by the principle of the leftmost prefix.

 

4 Ali index specification

1. [Mandatory] Fields with unique characteristics in the business, even if it is a combination of multiple fields, must build a unique index.

Note: Do not think that the unique index affects the insert speed. This speed loss can be ignored, but it is obvious to improve the search speed; in addition, even if a very complete verification control is done at the application layer, as long as there is no unique index, according to Murphy's law, Dirty data must be generated.

2. [Mandatory] Join is prohibited for more than three tables. The data types of the fields that need to be joined must be absolutely consistent; when multiple tables are associated with queries, ensure that the associated fields need to have indexes.

Note: Even double-table join should pay attention to table index and SQL performance.

3. [Mandatory] When creating an index on a varchar field, the index length must be specified. It is not necessary to index the entire field. The index length can be determined according to the actual text discrimination.

Note: The length of the index and the degree of discrimination are a pair of contradictions. Generally, for string type data, the index with a length of 20 will have a degree of discrimination as high as 90% or more. You can use count(distinct left(column name, index length))/ The degree of discrimination of count(*) is determined.

4. [Mandatory] Page search is strictly prohibited left fuzzy or full fuzzy, if necessary, please use the search engine to solve it.

Note: The index file has the leftmost prefix matching feature of B-Tree. If the value on the left is not determined, this index cannot be used.

5. [Recommended] If there is an order by scene, please pay attention to the orderliness of the index. The last field of order by is part of the composite index and is placed at the end of the index combination order to avoid file_sort and affect query performance.

 正例:where a=? and b=? order by c; 索引:a_b_c 
 反例:索引中有范围查找,那么索引有序性无法利用,如:WHERE a>10 ORDER BY b; 索引a_b无法排序。

6. [Recommended] Use the covering index for query operations to avoid returning to the table.

Explanation: If a book needs to know the title of Chapter 11, will you turn to the page corresponding to Chapter 11? Just browse the catalog, this catalog serves as a covering index.

Positive example: There are three types of indexes that can be created: primary key index, unique index, and ordinary index. The covering index is just an effect of a query. With the result of explain, the extra column will appear: using index.

7. [Recommended] Use delayed association or sub-query to optimize super-multi-page scenarios.

Note: MySQL does not skip the offset rows, but takes the offset+N rows, then returns the offset rows before giving up, and returns N rows. When the offset is particularly large, the efficiency is very low, or it controls the total number of pages returned. , Or perform SQL rewriting on the number of pages that exceed a certain threshold.

Positive example: first quickly locate the id segment that needs to be obtained, and then associate: SELECT a.* FROM table 1 a, (select id from table 1 where condition LIMIT 100000,20) b where a.id=b.id

8. [Recommendation] The goal of SQL performance optimization: at least reach the range level, the requirement is the ref level, if it can be consts the best. Description:

1) There is at most one matching row (primary key or unique index) in a consts single table, and the data can be read during the optimization phase. 2) ref refers to the use of a normal index.

3) Range performs range search on the index. Counter-example: The result of the explain table, type=index, index physical file full scan, the speed is very slow, this index level is lower than the range, which is indistinguishable from the full table scan.

9. [Recommended] When building a composite index, the one with the highest degree of discrimination is on the far left.

Note: When there is a mixture of non-equal signs and equal signs, please put the column of equal sign conditions in front when building an index. Such as: where c>? and d=? Then even if c has a higher degree of discrimination, d must be placed in the forefront of the index, that is, index idx_d_c.

Positive example: If where a=? and b=?, if column a is almost close to the unique value, then only the idx_a index needs to be built.

10. [Recommended] Prevent the implicit conversion caused by different field types, which may cause index failure.

11. [Reference] Avoid the following extreme misunderstandings when creating an index:

1) It's better to have no shortage. Think that a query needs to build an index.

2) Ning Que not overrun. It is believed that the index will consume space and seriously slow down the update and new addition speed.

3) Boycott the unique index. It is believed that the uniqueness of the business needs to be solved in the application layer through the "check before insert" method.

 

Learning video link:

https://www.bilibili.com/video/BV1wv4y1f7xs

加油! (d • _ •) d

Guess you like

Origin blog.csdn.net/qq_42082161/article/details/113783645