Analysis of the underlying principle of MySQL

One: Why does MySQL choose B+ number as the underlying data structure

Recommended algorithm demonstration platform

  • The underlying algorithms of MySQL include B numbers, red-black trees, hashes, and B+ trees. Why did you choose B+ trees as the underlying physical structure of the database?
  • The characteristic of a binary tree is that the left node is smaller than the root node, and the root node is smaller than the right node.
  • Due to special circumstances of the B-tree, such as the left node is empty and the nodes are all on the right node, the result will be the same as no index, search in order, the efficiency is very low, and it will also cause the depth of the tree to be very deep, and the number of IOs Too much results in low query efficiency.
    B-tree structure
  • Due to the depth of the tree, the red-black tree consumes a lot of data IO, and the index reduces the performance.

Red black tree

  • Hash index organizes data in hash form, so when searching for a record, the speed is very fast. At the same time, the index of the .hash algorithm has a disadvantage because it is not sorted by size. Therefore, it cannot search by range.
    hash structure
  • The B+ tree structure is optimized for these situations.
    The depth of the B tree is set to three layers, the size of each layer is 16K, and the corresponding sizes of each index and index pointer are 8kb and 6kb, so the index and index pointer add up The size is 14kb, one layer can store 16 1024/14 = 1170, the first and second layers all store indexes and index pointers, and the third layer stores pointers and data. Each pointer and data store The occupied size is 1k, so that the third level index corresponding to the second index is at most 16, so the number of indexes is at most 1170 1170*16 = 21902400, which is more than 21 million, which is also Why the performance bottleneck of the database is more than 20 million.
    B+ tree
    Advantages:
    1: Avoid the depth of the tree, which can reduce the consumption of IO, and the corresponding data can be found quickly
    2: The left and right indexes of the leaf nodes have pointers, which greatly improves the efficiency of range queries

Two: the difference between clustered index and non-clustered index

InnoDB primary index is clustered index, and MyISAM primary index is non-clustered index

  • MyISAM database engine:
    MyISAM database storage engine
    The data and indexes of the MyISAM database engine are stored separately.
    The address points to our data strip. This achieves a non-clustered index in which our index and data are separated.
    There is a feature here, each leaf node can be connected by a pointer, which supports us to print sequentially.

  • InnoDB database engine
    InnoDB database engine indexes and data are stored together.
    InnoDB database engine

  • Different database storage engine storage format
    MyISAM storage format:
    MyISAM storage format
    .frm file stores the data structure of the table . MYI file stores the
    index
    . MYD file stores the data

InnoDB storage format: The
InnoDB storage format
.frm file stores the data structure of the table . The ibd file stores
the index and data of the table

Three: the difference between the primary key index and the ordinary index

In MySQL, the index is implemented at the storage engine layer, so there is no unified index standard. Since the InnoDB storage engine is the most widely used in the MySQL database, the following uses InnoDB as an example to analyze the index model. In InnoDB, tables are stored in the form of indexes according to the order of the primary key. InnoDB uses the B+ tree index model, so the data is stored in the B+ tree, as shown in the figure:
The difference between primary key index and ordinary index
As can be seen from the figure, According to the content of leaf nodes, index types are divided into primary key indexes and non-primary key indexes.
Primary key indexes are also called clustered indexes, and leaf nodes store entire rows of data; non-primary key indexes are called secondary indexes, and leaf nodes store Is the value of the primary key.
If you query based on the primary key, you only need to search the ID B+ tree
. If you query through a non-primary key index, you need to search the k index tree first, find the corresponding primary key, and then search the ID index tree again, this process It is called back table. In
summary, the query of non-primary key index needs to scan one more index tree, which is relatively inefficient.

Four: How is the bottom layer of the joint index implemented?

  • Table Structure
CREATE TABLE `city` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT COMMENT '城市编号',
  `province_id` int(10) unsigned NOT NULL COMMENT '省份编号',
  `city_name` varchar(25) DEFAULT NULL COMMENT '城市名称',
  `description` varchar(25) DEFAULT NULL COMMENT '描述',
  PRIMARY KEY (`id`),
  KEY `index_pro_ci_de` (`province_id`,`city_name`,`description`) USING BTREE COMMENT '联合索引'
) ENGINE=InnoDB AUTO_INCREMENT=1850386 DEFAULT CHARSET=utf8;

The joint index is an index order. The order of the
Joint index
joint index absolutely determines the priority of the index. For example, surname and first name, first find the corresponding surname according to the surname, and then find the corresponding person according to the first name, so that the joint index is effective. There is only name but no sex, so that the index cannot be used and the joint index becomes invalid. This is why the order of the joint index must be paid attention to when constructing the joint index.

Five: Why is the primary key of MySQL recommended to be integer and self-increment

  • Why MySQL's primary key is recommended to be
    an integer The advantages of using an integral primary key as an integral type are as follows:
    1: The integral type occupies less memory, so the number of indexes that can be stored is more.
    2: Integer is more efficient than UUID in comparison to size.
    The benefit of the primary key integer auto-increment:
    If UUID is used as the primary key, then there will be a split and rearrangement of the B+ tree.
    This problem does not exist with integer auto-increment, just add it at the end of the B+ tree each time.

Guess you like

Origin blog.csdn.net/qq_37469055/article/details/105780516