The nature of the index
Index is to help Mysql efficient access to data sorted the data structure .
data structure
Binary tree red black tree Hash table B-Tree B+Tree
Binary tree
If the database is not indexed, the query data will be compared row by row to obtain the queried data. In this way, if the amount of data in the query becomes larger and larger, the time consumed will be longer and longer.
As shown in the figure below, Col2 is used as an index column. It only takes twice to get the index of 89 to get Col2=89, locate the data, and get it. If you do not use the index, you need to compare 6 times row by row to get it.
The bottom layer of Mysql does not use the binary tree data structure as the index. If Col1 is used as the index column, the data structure of the binary tree is shown in the following figure. If you want to get the data, even if you go through the index, you need 6 lookups, which is not a big optimization for efficiency. Indexing does not help much in improving efficiency.
Red black tree
The data structure of the red-black tree in the figure below seems to solve the problem of unilateral growth of the binary tree. Index 6 can be obtained by only querying 3 times. But this is not the best solution. The current amount of data is not large. If the amount of data reaches several million, tens of millions, the height of the red-black tree will become higher and higher. For example, the height of the tree=18, then query a piece of data It requires 18 disk IO, and the efficiency is not very high.
B-Tree
- The leaf nodes have the same depth.
- All indexes are not repeated.
- The indexes in the nodes are sorted from left to right
B+Tree (B-Tree variant)
The data structure used by Mysql
- Non-leaf nodes do not store data, only indexes, which can free up space to store more indexes.
- The leaf node contains all index fields.
- The leaf nodes are connected by pointers to improve interval access performance.
Query statement: show variables like'innodb_page_size'
A non-leaf node can store 16384 bytes by default, which is 16KB. For example, one index occupies 8 bytes, and the next index file address occupies 6 bytes. 16384 divided by 8 plus 6 equals 1170, which means that a non-leaf node can store approximately 1170 index elements. But leaf nodes need to store data elements, and 1KB allocation is enough, that is, a leaf node can store 16 elements. In this way, if the tree height is 3, the data that can be stored is equal to 1170 times 1170 times 16 equal to more than 20 million. In other words, more than 20 million data disk io can get the data you want three times.
Hash
- Perform a hash calculation on the index key to locate the data location.
- Hash performance is often more efficient than B+Tree index.
- But does not support range search
Why does Mysql choose B+Tree instead of B-Tree?
- B+Tree moves all the data elements on the non-leaf nodes of the B-Tree to the leaf nodes, freeing up more space to store the index elements, so the height of the tree is only 3 when the 20 million data is stored. If you use B-Tree Store 20 million data, the height of the tree is far greater than 3
- B+Tree leaf nodes also have pointer associations. For range search such as id>15, it will locate to 15 and then obtain the following data according to the pointer in order to improve the interval access performance.
MyISAM storage engine index implementation (non-clustered)
InnoDB storage engine index implementation (aggregation)
Secondary index The
data element of the secondary index stores the primary key value, and then returns to the table according to the primary key id.
Joint index storage structure
Data structure visualization URL: https://www.cs.usfca.edu/~galles/visualization/Algorithms.html