MySQL performance optimization-basic principles of indexing

This tutorial is my study notes. If there is any error, please correct me. Thank you for reviewing.

1. Index Introduction

1. The role of the index

The establishment of the MySQL index is very important for the efficient operation of MySQL. After the index is created, the database will not perform full-table query, but will use a method similar to directory retrieval to query directly to locate the relevant data. The index can greatly To improve the retrieval speed of MySQL, the more commonly used ones are: primary key index, unique index, ordinary index, etc.

Essence: Index is a sorted data structure that helps MySQL obtain data efficiently.

2. How to create an index

#表结构存在时创建索引
CREATE INDEX indexName ON mytable(username(length)); 
#或 
ALTER table tableName ADD INDEX indexName(columnName)
#建表时创建索引
CREATE TABLE mytable(  
		ID INT NOT NULL,   
		username VARCHAR(16) NOT NULL,  
		INDEX [indexName] (username(length))  
);  
#删除索引
DROP INDEX [indexName] ON mytable; 

2. Introduction to commonly used tree structures

MySQL index uses B+Tree , let us analyze the pros and cons of each attribute structure.

1. Binary Search Trees

Each parent node of the tree structure has at most two child nodes. When inserting data, it is compared with the parent node. If it is greater than, it will be inserted on the right, if it is less, it will be inserted on the left:
Binary tree insertion
while searching, you only need to compare the parent node. Quickly locate the child node:
Binary tree search
As shown in the figure, if the index is not used, the database will perform a full table search. To find the node 9 we may need to perform up to 7 searches, while using a binary tree as an index, only three comparisons are required Just find it.

But mysql does not use the binary tree as the index method. Let us look at the accidental situation of the binary tree structure:
Special case of binary tree
if our data is always increasing/decreasing, when using the binary tree, the child node data will always be added to the right/left of the parent node, in the query When, there will still be a full table scan!

2. Red-Black Trees

The basic principle of the red-black tree is the same as that of the binary tree, but the red-black tree is compared with the binary tree, and the automatic balance algorithm is added. If the nodes on both sides of the tree structure are unbalanced, the tree structure will be automatically reorganized to ensure the balance on both sides. The tree is a kind of "binary balanced tree" (compared to the real binary balanced tree, the red-black tree does not guarantee the balance of all nodes, avoiding excessive reorganization of the tree structure and waste of resources)!

Now let's take a look at the situation just now:
Red black tree
the automatic balance algorithm of the red-black tree makes up for the defects of the binary tree, but mysql still does not use the red-black tree as the mysql index structure, such as the example in the figure, although there is an automatic balance algorithm, the database still needs The disk is read very frequently to determine the location of the child nodes, and the system directly interacts with the disk is a very slow process, so mysql does not use red-black trees as the index structure!

3. B Trees


Since the B-tree frequently reads content from the disk is very slow, we directly store multiple data as a node. The child nodes of the node also store multiple data. When reading the node from the disk, directly combine multiple data. And read out and compare.
B number structure
The birth of the B-tree is mainly to reduce the number of interactions between the system and the disk. A certain amount of data can be directly read into the memory for one comparison. Our mysql uses the B+ tree, a variant of the B-tree, for index storage.

3. Detailed B+ Trees (B+ Trees)

B+ tree
The B+ tree is a variant of the B tree. The main change is that the parent node only stores the reference and index columns of the leaf nodes, and all data is stored only in the leaf nodes. Although there will be data redundancy, it improves the entire tree. Storage capacity.

B+ tree
From this figure, we can see that with so much data, a certain amount of data is read into the memory as a node, and it can be queried only by interacting with the disk three times at most. The interaction speed between the system and the memory is much higher than that of the disk. . So mysql uses B+ tree as index structure.

For example: set a long integer ID as the primary key index, now let us calculate the capacity of a tree!

First look at the size settings of mysql for each node:

SHOW GLOBAL STATUS LIKE 'INNODB_PAGE_SIZE'

Insert picture description here
The default size of each node in mysql is 16384 bytes (16KB).

The long integer ID occupies a size of 8 bytes, and each data in the B+ tree will store an address pointing to the leaf node data (6 bytes)

The number of indexes that each node of the parent node can store:

16384/(8+6)=1170 indexes.

For the time being, the height of the calculation tree is 3 levels. In the limit, the amount of data that can be stored is:

1170 * 1170 = 1368900 leaf nodes

The size of each leaf node is 16kb, the amount of data that can be stored is:

1170 1170 16 = 21902400kb = 20GB of data

If the size of each piece of data is 1kb, the number of pieces of data that can be stored is:

1170 1170 16/(16/1)=21902400

In other words, about 20 million pieces of data can be found by searching only three times, which is why the B+ tree is used as a database index to search quickly!

B+ tree search process:

1. Read a node file with a size of 16kb from the disk at a time. The node contains the data of the index column and the reference address of the lower node (if it is a long integer ID, 1170 pieces of data are read at a time)

2. Compare the read data one by one (the tree structure will arrange the data of the parent node at equal intervals), and when the value to be searched is in the middle of the two data ranges, it will look down for its child nodes.

3. If it is not a leaf node, continue to compare according to part (1) and find the child nodes. If it is a leaf node, compare the leaf nodes and find the data.

For example, if the
Insert picture description here
data 11 is between 7-13, then look for the child nodes down, compare the child nodes, and if the data is greater than or equal to 11, press the red line to continue searching down until 11 is found.

The role of pointers between leaf nodes

Insert picture description here
We can see that every leaf node stores pointers to adjacent leaf nodes.

Suppose we have such a SQL statement (where the age column is an index column)

select * from user where age >20

Because the index column is arranged in order, when the critical value of 20 is found, the following data must meet the requirements. With the leaf node pointer, you can quickly locate the data in different nodes through the pointer, instead of Return to the parent node to search again!

This is also one of the reasons why the mysql database does not use the hash algorithm by default (the hash algorithm can quickly locate data through hash mapping, and the search speed is very fast, but it cannot be applied to the range search).

Four, different storage engine B+ tree difference analysis

A variety of different storage engines are provided in mysql. The storage engine is related to the data table. MySQL is the innodb storage engine by default. Myisam is also commonly used. The storage methods of the B+ tree of different storage engines are slightly different.

1. The difference between clustered index (clustered index) and non-clustered index

  • Clustered index: The data is stored directly according to the sorting rules of the index, and the leaf nodes are directly all the data.
  • Non-clustered index: The index is stored separately, and the leaf nodes only store data references.

2.myisam storage engine

The leaf nodes of the myisam storage engine B+ tree only store the current data reference, not the entire data, and the index file is stored separately from the data file.

That is to say: the data of the myisam storage engine is stored separately, and will not be sorted in a certain order. To find the data, first find the leaf node from the index file B+ tree, and the leaf node continues to the address of the current record, directly through the address Orient to the data.

Therefore: Myisam's index is a non-clustered index. There is no index (primary key) for one hundred million in a table. At the same time, the same lock reference tree height can store more data.

But: Myisam searches for data through pointers, which is relatively slow, and the myisam engine itself does not support many functions such as transaction management, so it is generally not used!

Insert picture description here

3.innodb storage engine

The innodb storage engine stores data directly on the leaf nodes in order. Finding a leaf node means finding the data. The data is sorted according to the primary key index and stored together with the index.

That is to say: Innodb's index and data are stored together, no need to search by address, the search time is faster, but the same tree height stores less data. Innodb uses a clustered index (primary key), so the data must be sorted in a certain order, which requires a primary key in the data table. If there is no primary key, mysql will automatically create a hidden primary key to sort and store data.

Insert picture description here
No matter how many primary keys the myisam storage engine has, it is stored in the manner of non-clustered index, but innodb is different, the primary key is a clustered index, but other indexes are not clustered, but to build additional B+ trees, store index columns, and store leaf nodes The primary key of the data, and then find the data according to the primary key index.

In other words: Innodb's other indexes need to be searched twice, the primary key corresponding to the data is searched for the first time, and the corresponding data is found through the primary key!

The reason why the ordinary index does not record the address but records the primary key is that the data of innodb is sorted. When inserting data, the structure of the entire tree may be changed, which will affect the address of the data, and the primary key will never change Yes, so the data can only be found through the second search of the primary key.
Insert picture description here

In addition, innodb's primary key is recommended to use auto-incremented integer data instead of UUID.

Since the self-incrementing data has a small impact on the entire tree, and the UUID has an uncertain hash value, it is possible to insert one from the left and one from the right, which will have a greater impact on the structure of the entire tree and slower speed. Moreover, comparing UUID to hash when searching, and comparing integer data directly, the comparison of integer data is faster!

Insert picture description here
Insert picture description here

The essence of the index is to avoid MySQL query data is to perform a full table scan, to quickly find data in a form similar to the directory structure, indexing can greatly improve the speed of data search, but too many indexes will take up disk space and cause waste, and build indexes reasonably It is very necessary. I hope this tutorial can help you understand the index better. Thank you for watching.

Guess you like

Origin blog.csdn.net/qq_42628989/article/details/106652538