[MySQL] B+ tree index - index scheme in InnoDB; comparison of index scheme in MylSAM and index scheme in InnoDB

1. Index scheme in InnoDB

1. Clustered index

Clustered indexes have two characteristics:

  1. Use the size of the primary key value of the record to sort the records and pages , which includes 3 meanings.

    (1) The records in the page (including leaf nodes and internal nodes) are arranged in a one-way linked list according to the size of the primary key . The records in the page are divided into several groups, and the record with the largest primary key value in each group is listed on the page. The offsets within will be used as slots and stored sequentially in the page directory (of course the Supreme record is larger than any user record), and we can quickly locate the record whose primary key column is equal to a certain value in the page directory by dichotomy.
    (2) Each page storing user records is also arranged in a doubly linked list according to the order of the primary key size of the user records in the page.
    (3) The pages storing directory entry records are divided into different levels, and the pages in the same level are also arranged into a doubly linked list according to the order of the primary key size of the directory entry records in the page.

  2. The leaf nodes of the B+ tree store complete user records.
    The so-called complete user record means that the values ​​of all columns (including hidden columns) are stored in this record.

We call the B+ tree with these two characteristics a clustered index , and all complete user records are stored at the leaf nodes of this clustered index. This kind of clustered index does not require us to explicitly use the INDEX statement in the SQL statement to create (the index-related statements will be introduced later), and the InnoDB storage engine will automatically create a clustered index for us. Another interesting point is that in the InnoDB storage engine, the clustered index is the storage method of data (all user records are stored in the leaf nodes), which is the so-called "index is data, and data is index".

2. Secondary Index

1) Secondary index (or, auxiliary index)

The B+ tree established with the size of the non-primary key column as the sorting rule needs to perform a table return operation to locate the complete user record. This B+ tree is also called a secondary index (Secondary Index) or auxiliary index.

2) Index column

Since we use the size of the c2 column as the sorting rule of the B+ tree, we also call this B+ tree the index built for the c2 column, and call the c2 column the index column.

3) Back to the table

  1. Back to the table: the process of relocating the complete user record by carrying the primary key information to the clustered index .

  2. Why do we still need a table return operation? Wouldn’t it be good to just put the complete user records in the leaf nodes? You are right, if you put the complete user records in the leaf nodes, you don’t need to return the table, but it takes up too much space One is equivalent to copying all user records every time a B+ tree is built, which is a waste of storage space.

  3. Application: The B+ tree established with the size of the non-primary key column as the sorting rule needs to execute the table back operation to locate the complete user record.

4) Similarities and differences between secondary index and clustered index

Secondary index records and clustered index records use the same record row format, but the columns stored in secondary index records are not as complete as clustered index records.
The leaf nodes of the clustered index or secondary index are recorded as user records . In order to distinguish,

  • The records in the leaf nodes of the clustered index are called complete user records .
    •  完整的用户记录:指这个记录中存储了所有列的值(包括隐藏列)。
      
  • The records in the leaf nodes of the secondary index are called incomplete user records .

3. Joint index

The B+ tree built with the size of the c2 column and the c3 column as the sorting rule is called a joint index, also called a composite index or a multi-column index. It is essentially a secondary index, and its index columns include c2 and c3.

Note: The expressions of "creating a joint index with the size of columns c2 and c3 as the sorting rules" and "creating indexes for columns c2 and c3 respectively" are different, and the differences are as follows.

  • Building a joint index will only create a B+ tree as shown in Figure 6-15.
  • When creating indexes for columns c2 and c3, two B+ trees will be built with the size of columns c2 and c3 as the sorting rules .
    insert image description here

4. Precautions for B+ tree index in InnoDB

1) The root page is not moving for ten thousand years

一个 B+ 树索引的根节点自创建之日起便不会再移动(也就是页号不再改变)。

2) Uniqueness of directory entry records in internal nodes

二级索引的内节点的目录项记录的内容实际上是由三部分构成的:

 - 索引列的值 
 - 主键值
 - 页号。

	对于二级索引记录来说,是先按照二级索引列的值进行排序,
	在二级索引列值相同的情况下,再按照主键值进行排序。

3) A page contains at least 2 records

为了避免 B+ 树的层级增长得过高。

2. Comparison of the index scheme in MylSAM and the index scheme in InnoDB

1) Structure

  • The index in lnnoDB is data , that is, the leaf nodes of the B+ tree of the clustered index already contain
    all complete user records.
  • "Index is index, data is data" in MyISAM. Although the MyISAM index scheme also uses a tree structure, it stores the index and data separately.
    • Store the records in the table separately in a file (called a data file ) according to the insertion order of the records. This file is not divided into several data pages, as many records are stuffed into this file as there are. In this way, we can quickly access a record by line number.
    • Tables using the MyISAM storage engine store index information separately in another file (called an index file ). MyISAM will create a separate index for the primary key of the table, but what is stored in the leaf node of the index is not the complete user record, but the combination of the primary key value and the row number. That is, first find the corresponding row number through the index, and then find the corresponding record through the row number.

2) Query method

  • In the InnoDB storage engine, we only need to search the clustered index once according to the primary key value to find the corresponding record.
  • However, in MyISAM, a table return operation is required , which means that all the indexes established in MyISAM are equivalent to secondary indexes.
    • If necessary, we can also create separate indexes or joint indexes for other columns. The principle is similar to the index in lnnoDB, except that the corresponding column + row number is stored at the leaf node. These indexes are also all secondary indexes.
    • The row format of MyISAM includes fixed-length record format (Static), variable-length record format (Dynamic), compressed record format (Compressed), etc.
      • The fixed-length record format means that the storage space occupied by the record is fixed. In this way, the address offset of a certain record in the data file can be easily calculated using the line number.
      • Variable-length record format, MyISAM directly stores the address offset of the record in the data file at the index leaf node. It can be seen from this that the table return operation of MyISAM is very fast, because it directly fetches data from the file with the address offset, while InnoDB finds records in the clustered index after obtaining the primary key. It's not slow, but it's still not as good as accessing directly with the address.

——Just taking notes, the summary is excerpted from "How MySQL Works"

Guess you like

Origin blog.csdn.net/xiaoyue_/article/details/130215727