Performance optimization|In-depth understanding of mysql index data structure and algorithm

What is an index?

In mysql, an index is a data structure that helps mysql quickly find a certain piece of data. It is sorted and independent of mysql table data.

What kind of index data structure is divided into

Binary tree, red-black tree, Hash table, B tree.

Here we mainly introduce the hash table and B tree

Hash table

What is hash?
Hash is a kind of hash function, by mapping the input value to a value, such as: hash(100) = 1, different hash algorithms, the value after the hash may be different.
Hash table exists in mysql in the form of data mapping, so how is the hash table generated?
When adding a piece of data to the table, first hash the primary key, and then establish a mapping relationship between the address and hash value of this piece of data. When we look up this piece of data based on the primary key, we only need to hash the primary key , Get the hash value, and finally locate this data directly according to the hash value. Therefore, the hash algorithm only needs to perform disk IO once, and the query speed is very fast.

Insert picture description here

BTree

The B-tree is also called a multi-fork tree. It is divided into multiple forks on the basis of the binary tree. Let's look at its data structure diagram.
Insert picture description here

We can see from the figure that the B-tree has these characteristics:
1. The nodes are sorted ascending from left to right
2. Each data node will be followed by a pointer that points to the memory address of the next level. The next level refers to the address in the memory where the data record located between the values ​​on the left and right sides of the current pointer exists.
3. The pointer of the leaf node is empty
. 4. All index elements are not repeated.
5. Each index node stores the currently pointed record data (or memory address)

B+Tree

The B+ tree is actually a variant of the B tree. It has made some improvements on the basis of the B tree. All the data records associated with the index node are moved to the leaf nodes. The purpose is to store more index nodes. But it increases the redundancy of the index node, because the leaf node contains all the index nodes.

Insert picture description here

As can be seen from the figure, the B+ tree has the following characteristics:
1. Leaf nodes contain all index nodes
2. Non-leaf nodes do not store data records
3. Use pointer connections between leaf nodes to improve the convenience of interval access
4. The leftmost index node pointed to by the pointer is a value greater than or equal to the left side of the depth of the pointer

What is optimized for mysql's b+ tree?

Let's see what the B+ tree in mysql looks like

Insert picture description here

1. Added a two-way pointer
. 2. The first and last nodes are also related by pointers. The
main purpose is to support the range search within the index more friendly. If we don't add the doubly linked list pointer, every time we look up, we have to go back to the root node to search, which increases the disk IO and increases the query time.

How to calculate the maximum amount of data supported by B+ tree

In mysql, you can use the SHOW GLOBAL STATUS LIKE 'Innodb_page_size%'command to find the mysql setting of the index node page size. The size of this parameter determines how many indexes we can load from the disk at a time.
In version 5.7, Innodb_page_size is set to 16384 by default, which is 16k.
We now calculate how much data can be supported in myssql if the storage engine is innodb?
We calculate according to a tree with a height of 3:

1. According to the field storage of each bigint data type, each non-leaf index node needs at most 8B
2. In addition to the pointer connected behind each index node, the size of the pointer set in innodb is 6B
3. Add the two A total of 14B, so the first-level node can store a total of 16kB/14B = 1170 index nodes
. 4. The second-level nodes are divided from the first-level nodes, that is, each node in the first-level nodes can be divided into 1170, so A total of 1170 1170 = 1368900 index nodes

can be stored in the secondary node and the primary node. 5. The third-level node is also the leaf node. The leaf node stores the primary key value + record data. The record data is up to 1K. At this time, the primary key value 8B can be ignored, so each leaf node can store up to 16k/1k = 16 records. 6. So the table of Innodb engine structure can support up to 1170 1170*16 = 21902400 data, about 2.1 billion. If it is greater than this value, basically all need to sub-database and table, MySQL recommends that the depth of the B+ tree should be less than 3.
``

The hash algorithm is fast, why does MySQL rarely use hash indexes?

As mentioned above, the hash algorithm only needs to perform disk IO once when searching for data, and the query speed is very fast, but why is mysql not recommended? There are mainly the following reasons:
1. Hash conflict (the proportion is small, because the quality of mysql's hash algorithm is relatively high, and the probability of hash conflict is relatively low)
2. Range query cannot be performed (because the hash value is stored in the hash table, not The data itself, so it is impossible to compare the data. If you are sure that your table will only be used for precise search, you can use the index of the hash structure)

What is the difference between B tree and B+ tree?

1. A doubly linked list is added to facilitate range search.
2. Only leaf nodes store data records, which means that more index nodes can be stored.

What is the difference between a clustered (clustered) index and a non-clustered (clustered) index?

Clustered (clustered) index: index files and data files are stored together.
Non-clustered (clustered) index: index files and data files are stored separately

Innodb storage engine implementation (primary key and secondary key)

Primary key index: The
B+ tree structure type is used by default in InnoDB, which stores the clustered index. The data area of ​​the leaf node stores the entire record associated with the current primary key.
Secondary key:
The data area of ​​the secondary key stores the primary key value. That is to say, if you use the secondary key index query, you must finally find the corresponding record by the primary key value.

The index of the myisam storage engine, regardless of the primary key or auxiliary index, the data area saves the memory address of the associated data, because myisam is a non-clustered index, the index file and the data file are stored separately.

Why do Innodb tables have to have primary keys? And it is recommended to use integer and auto-increment the primary key?

1. Why does the Innodb table have to have a primary key?
In the innodb storage engine table, mysql will add a clustered index to the primary key. If there is no primary key, mysql will elect the field with the unique index in the election table as the primary key and create the primary key index;
if the table is If there is no field set as a unique index, mysql will generate a row_id as the primary key to create a primary key index.
2. Why does mysql recommend using plastic as the primary key field type?
When constructing the B-tree, mysql will be constructed in the order from small to large. If it is an integer number, mysql can directly compare it. If it is of other types, mysql also needs to convert the value to ascill code for comparison. , It will increase the time to create indexes and queries.
3. Why is the requirement a self-increment type?
This is determined by mysql restrictions:
1. mysql sets the page size of innodb's one-time read into memory to 16384B, that is, the maximum size of each node is 16k,
2. btree is arranged in order from left to right;
Insert picture description here

If the primary key is not self-increasing, if a new value of 11 is added at this time, then after the comparison, 11 needs to be stored between 10 and 12:
1. If the node is already 16k at this time, add it again If a piece of data exceeds the limit set by mysql, it will split into two nodes. This operation will also increase the index creation time.
2. If it is set to auto-increment according to the field, data smaller than the current serial number will not be inserted, just continue to expand on the right side, and there will be no node splitting.

Why the leaf nodes of the non-primary key index structure store the primary key value (consistency and storage space)

1. If you store specific data, it will cause data inconsistency, because the primary key index and the auxiliary index will maintain data records at the same time. If one of the maintenance fails, there will be inconsistencies.
2. If both specific data are stored, It will cause a waste of storage space. If you only store the primary key records, you can store more index records, but you need to find specific data based on the primary key twice to exchange space with time

The underlying storage structure of the joint index

Insert picture description here
Insert picture description here

A search on WeChat [Le Zai open talk] Follow the handsome me, reply [Dry goods], there will be a lot of interview materials and architect must-read books waiting for you to choose, including java basics, java concurrency, microservices, middleware, etc. The information is waiting for you.

The more you read the book without thinking, you will feel that you know a lot; and the more you read and think, the more clearly you will see that you know very little. --Voltaire

Guess you like

Origin blog.csdn.net/weixin_34311210/article/details/108982264