One: Why does MySQL choose B+ number as the underlying data structure
Recommended algorithm demonstration platform
- The underlying algorithms of MySQL include B numbers, red-black trees, hashes, and B+ trees. Why did you choose B+ trees as the underlying physical structure of the database?
- The characteristic of a binary tree is that the left node is smaller than the root node, and the root node is smaller than the right node.
- Due to special circumstances of the B-tree, such as the left node is empty and the nodes are all on the right node, the result will be the same as no index, search in order, the efficiency is very low, and it will also cause the depth of the tree to be very deep, and the number of IOs Too much results in low query efficiency.
- Due to the depth of the tree, the red-black tree consumes a lot of data IO, and the index reduces the performance.
- Hash index organizes data in hash form, so when searching for a record, the speed is very fast. At the same time, the index of the .hash algorithm has a disadvantage because it is not sorted by size. Therefore, it cannot search by range.
- The B+ tree structure is optimized for these situations.
The depth of the B tree is set to three layers, the size of each layer is 16K, and the corresponding sizes of each index and index pointer are 8kb and 6kb, so the index and index pointer add up The size is 14kb, one layer can store 16 1024/14 = 1170, the first and second layers all store indexes and index pointers, and the third layer stores pointers and data. Each pointer and data store The occupied size is 1k, so that the third level index corresponding to the second index is at most 16, so the number of indexes is at most 1170 1170*16 = 21902400, which is more than 21 million, which is also Why the performance bottleneck of the database is more than 20 million.
Advantages:
1: Avoid the depth of the tree, which can reduce the consumption of IO, and the corresponding data can be found quickly
2: The left and right indexes of the leaf nodes have pointers, which greatly improves the efficiency of range queries
Two: the difference between clustered index and non-clustered index
InnoDB primary index is clustered index, and MyISAM primary index is non-clustered index
-
MyISAM database engine:
The data and indexes of the MyISAM database engine are stored separately.
The address points to our data strip. This achieves a non-clustered index in which our index and data are separated.
There is a feature here, each leaf node can be connected by a pointer, which supports us to print sequentially. -
InnoDB database engine
InnoDB database engine indexes and data are stored together.
-
Different database storage engine storage format
MyISAM storage format:
.frm file stores the data structure of the table . MYI file stores the
index
. MYD file stores the data
InnoDB storage format: The
.frm file stores the data structure of the table . The ibd file stores
the index and data of the table
Three: the difference between the primary key index and the ordinary index
In MySQL, the index is implemented at the storage engine layer, so there is no unified index standard. Since the InnoDB storage engine is the most widely used in the MySQL database, the following uses InnoDB as an example to analyze the index model. In InnoDB, tables are stored in the form of indexes according to the order of the primary key. InnoDB uses the B+ tree index model, so the data is stored in the B+ tree, as shown in the figure:
As can be seen from the figure, According to the content of leaf nodes, index types are divided into primary key indexes and non-primary key indexes.
Primary key indexes are also called clustered indexes, and leaf nodes store entire rows of data; non-primary key indexes are called secondary indexes, and leaf nodes store Is the value of the primary key.
If you query based on the primary key, you only need to search the ID B+ tree
. If you query through a non-primary key index, you need to search the k index tree first, find the corresponding primary key, and then search the ID index tree again, this process It is called back table. In
summary, the query of non-primary key index needs to scan one more index tree, which is relatively inefficient.
Four: How is the bottom layer of the joint index implemented?
- Table Structure
CREATE TABLE `city` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT COMMENT '城市编号',
`province_id` int(10) unsigned NOT NULL COMMENT '省份编号',
`city_name` varchar(25) DEFAULT NULL COMMENT '城市名称',
`description` varchar(25) DEFAULT NULL COMMENT '描述',
PRIMARY KEY (`id`),
KEY `index_pro_ci_de` (`province_id`,`city_name`,`description`) USING BTREE COMMENT '联合索引'
) ENGINE=InnoDB AUTO_INCREMENT=1850386 DEFAULT CHARSET=utf8;
The joint index is an index order. The order of the
joint index absolutely determines the priority of the index. For example, surname and first name, first find the corresponding surname according to the surname, and then find the corresponding person according to the first name, so that the joint index is effective. There is only name but no sex, so that the index cannot be used and the joint index becomes invalid. This is why the order of the joint index must be paid attention to when constructing the joint index.
Five: Why is the primary key of MySQL recommended to be integer and self-increment
- Why MySQL's primary key is recommended to be
an integer The advantages of using an integral primary key as an integral type are as follows:
1: The integral type occupies less memory, so the number of indexes that can be stored is more.
2: Integer is more efficient than UUID in comparison to size.
The benefit of the primary key integer auto-increment:
If UUID is used as the primary key, then there will be a split and rearrangement of the B+ tree.
This problem does not exist with integer auto-increment, just add it at the end of the B+ tree each time.