MySQL index introduction and underlying data structure B+ tree introduction

1. Index overview

1.1 Index

  • An index is a data structure that helps MySQL quickly obtain data;
  • Creating an index can effectively reduce the number of disk I/O;

1.2 Advantages and disadvantages of indexes

1.2.1 Advantages

  • Reduce the number of disk I/O and improve data acquisition efficiency;
  • Create a unique index to ensure the uniqueness of each row of data in the database table;
  • In terms of achieving referential integrity of data, it can speed up the joins between tables. In other words, the query speed can be improved when the dependent child table and the parent table are jointly queried;
  • When using grouping and sorting clauses for data query, it can significantly reduce the time of grouping and sorting in the query, and reduce the CPU consumption;

1.2.2 Disadvantages

  • Creating and maintaining indexes takes time;
  • Indexes take up disk space;
  • Indexes will lead to low update efficiency of data tables;

2. Indexes in InnoDB

2.1 Primary key index design scheme

  • Data page: Store real data, and the data in the page is arranged in order according to the size of the primary key;
  • Directory item record data page: create a directory item for each data page, and the data page composed of directory item records is the directory item record data page;
  • Page of a directory entry record data page: represents a wider range of directory entry record pages;

Note : Each data page will generate a page directory according to the primary key, and binary search can be performed in turn to improve search efficiency;
insert image description here

2.2 Index underlying data structure

  • Under the InnoDB storage engine, the underlying data structure of the index is implemented using a B+ tree;
  • B+ tree is a multi-fork balanced search tree, which has a low height, which can greatly reduce the number of disk I/O and improve search efficiency;
    insert image description here
  • In this structure, only leaf nodes are real data pages, and the default data page size is 16KB; non-leaf nodes are directory entry record pages, which are used to store subordinate primary keys and page numbers of corresponding data pages;
  • The leaf nodes are connected by a two-way linked list, and the records in the data page are connected by a one-way linked list;
  • The leaf node is the 0th layer, and the number of layers becomes larger from bottom to top;
  • Each data page has a corresponding page directory, so binary search can be used to quickly locate;
  • In actual projects, the B+ tree height generally does not exceed 4 layers, because the amount of data that can be stored is already very large when the height is 4 layers;
  • There are three points to pay attention to when creating a B+ tree:
    1) The position of the root page remains unchanged for ten thousand years through page copying and page splitting operations;
    2) The uniqueness of directory entries in internal nodes must be guaranteed;
    3) A page stores at least two records;

2.3 Common indexes

  • Common indexes can be divided into clustered indexes and non-clustered indexes;
  • The clustered index generally refers to the primary key index. If the data table does not have a primary key, a non-empty unique index will be used instead. If there is no non-empty unique index, InnoDB will implicitly define a primary key to create a clustered index;
  • Nonclustered indexes, also known as secondary indexes or secondary indexes;
    insert image description here

2.3.1 Clustered Index

  • Clustered index generally refers to the index created according to the primary key;
  • The complete data record of the user is stored in the leaf node;
  • InnoDB automatically creates clustered indexes;
  • The data in the data page is sorted according to the primary key;
  • For clustered indexes, "data is the index, and the index is the data";
  • InnoDB supports clustered indexes, while MyISAM does not support clustered indexes;
  • There is only one sorting method for physical storage of data, so each data table has one and only one clustered index , which is generally the primary key index;
  • In order to make full use of the clustering characteristics of the clustered index, the data table generally chooses an ordered sequence id as the primary key ;
    insert image description here

2.3.1.1 Advantages

  • The data access speed is faster, and there is no need to return to the table compared with the secondary index;
  • The sort lookup and range lookup for the primary key are very fast;
  • The clustering feature makes the disk I/O times lower;

2.3.1.2 Disadvantages

  • Insertion speed is heavily dependent on insertion order;
  • Primary key updates are expensive;
  • Secondary index lookup data requires two index lookups, involving back-to-table operations;

2.3.2 Non-clustered index

  • A non-clustered index is an index created based on a non-primary key;
  • The non-leaf nodes in the underlying structure of the non-clustered index are similar to the clustered index structure, the difference is that the leaf nodes of the non-clustered index only store the field value on which the index was created and the corresponding primary key value of the record;
  • The data in the non-clustered index data page is sorted according to the corresponding field;
  • When looking up data based on a non-clustered index, first determine the primary key value corresponding to the field value, and then determine the complete record through the clustered index. This operation is called a table return operation;
    insert image description here

2.3.3 Joint Index

  • The essence of the joint index is also a secondary index;
  • Combine multiple non-primary key columns to create a secondary index, called a joint index;
  • The data in the data page is sorted according to multiple columns at the same time;

insert image description here

3. Indexes in MyISAM

  • The MyISAM storage engine uses B+ tree to implement index by default;
  • MyISAM does not support clustered indexes, and its indexes can be understood as secondary indexes;
  • Each record in the index leaf node under MyISAM stores the address corresponding to the real data record;
  • MyISAM index files are stored separately from data files;

insert image description here

3.1 Comparison between MyISAM and InnoDB

insert image description here

Guess you like

Origin blog.csdn.net/qq_43665602/article/details/131543786
Recommended