[MySQL Series]-Do you really understand table return and covering index?

[MySQL Series]-Do you really understand table return and covering index?


I often get asked some conceptual questions during interviews. These contents are actually less used in development, but you must learn them in order to show your knowledge reserve. Bloggers often encounter this problem when taking the MySQL certification test recently. Collate MySQL concepts and output this blog post.

1. MYSQL index structure

1.1 The concept of index

MYSQL's official definition of index is: Index (Index) is a data structure that helps MySQL improve data acquisition. The essence of an index is a data structure. It can be simply understood as "prearranging a set of data structures that can be quickly queried". These data structures point to the data in some way, and advanced query algorithms can be implemented through these data structures.

1.2 Characteristics of index

  1. Indexing a sorted data structure can speed up database retrieval.
  2. Indexes reduce the difficulty of database maintenance tasks such as Insert, Update, and Delete.
  3. MySQL indexes can only be created on tables, not views.
  4. The query processor executes SQL statements. Only one index can be used on a table at a time.

1.3 Advantages of indexing

  1. Improve data retrieval efficiency and reduce database IO costs
  2. Create a unique index to ensure the uniqueness of each row of data in the database table.
  3. Speed ​​up tables and joins between tables.
  4. When using grouping and sorting clauses for data retrieval, you can significantly reduce the time spent grouping and sorting in queries.

1.4 Disadvantages of indexes

  1. Creating and maintaining indexes takes time, and this time increases with the amount of data.
  2. Indexes need to occupy physical space. In addition to the data space occupied by the data table, each index also occupies a certain amount of physical space. If a clustered index is to be established, the space required will be larger.
  3. When adding, deleting, and modifying data in the table, the index must be dynamically maintained, which reduces the data maintenance speed.

2. B-Tree and B+Tree

2.1 B-Tree

Insert image description here

B-Tree is a B-tree. B-tree is a self-balancing tree that can keep data in order. This data structure enables data querying, sequential access, data insertion and deletion to be completed in logarithmic time. Generally speaking, the B number is a generalized binary search tree that can have more than 2 child nodes. Unlike self-balancing binary search trees, B-trees are optimized for reading and writing large chunks of data in the system. B-tree reduces the intermediate process experienced when locating records, thereby speeding up access. B-tree is a data structure that can be used to describe external storage.

2.2 B+Tree

Insert image description here

B+Tree is an optimization of B-Tree. Only key values ​​are stored on the node, not data. This design can store more key values ​​and pointers in the limited node space (page space). All data is stored in leaf nodes, and there are link pointers (two-way circular lists) between all leaf nodes, which facilitates range query and sorting.

2.3 The difference between B-Tree and B+Tree tree

  1. In B-Tree, all nodes will have pointers to specific records; in B+Tree, only leaf nodes will have pointers to specific records.
  2. Different leaves in B-Tree are not connected together; all leaf nodes in B+Tree are connected together through pointers.
  3. In B-Tree, the pointer to a specific record may be obtained from a non-leaf node, and the search efficiency is unstable; in B+Tree, the pointer to a specific record must be obtained from a leaf node, and the search efficiency is stable.

In B+Tree, since non-leaf nodes do not have pointers to specific records, more index items can be stored in non-leaf nodes, which can effectively reduce the height of the tree and improve search efficiency.

In B+Tree, leaf nodes are connected together through pointers, so if there is a need for range scanning, it will be very easy to implement. However, for B-Tree, range scanning requires constantly scanning leaf nodes and non-leaf nodes. Move between points.

2.4 So why is it best for InnoDB’s primary keys to be in order?

The primary key index in InnoDB is a clustered index, and all data is stored in the leaf nodes of the B+Tree structure of the clustered index where the primary key index is located. If the size of the primary key inserted each time is random, the location of the leaf nodes found each time the data comes in will be random. In this case, the pages where some leaf nodes are located are already full, and as a result, another piece of data comes, which will inevitably cause page changes. Splitting will lead to performance degradation; but if the primary key is in order, the position in front of the current leaf will be found every time, and the leaves will be filled up one page and then another page in order, so there will be no problem of page splitting. Therefore, auto-incrementing primary keys have better performance for storage engines like InnoDB that use B+Tree indexes.

3. Return table query

Table return query means that MySQL requires two internal queries during the data query process. First locate the primary key value of the table where the query data is located, and then locate the row record based on the primary key.

To understand the table query, we must first start with the index implementation of InnoDB. InnoDB indexes are divided into two categories: clustered index and secondary index.

3.1 InnoDB clustered index

A clustered index is an index in which the index structure and data are stored together. The primary key index is a clustered index.

The leaf nodes of the InnoDB clustered index store row records, so InnoDB must have one and only one clustered index.

  1. If the table defines a PK (Primary Key, primary key), then the PK is a clustered index;
  2. If the table does not define a PK, the first NOT NULL UNIQUE column is the clustered index.
  3. Otherwise InnoDB will create an additional hidden ROWID as a clustered index.

Since this mechanism directly locates row records, it makes PK-based queries very fast.

3.2 InnoDB non-clustered index

A non-clustered index is an index in which the index structure and data exist separately. Auxiliary index is a non-clustered index.

The leaf nodes of the non-clustered index do not necessarily store pointers to the data (the leaf nodes of the auxiliary index store the primary key, and then query the data back to the table based on the primary key.)

3.3 InnoDB tables

Querying back to the table means first querying the corresponding primary key through the non-clustered index, and then querying the corresponding value through the primary key index. Go through the B+Tree index twice.

4. Covering index

If you execute a query statement to directly obtain the value to be queried without going through two B+Tree queries, there is no need to return the table at this time. That is to say, in this query, the index "covers" the query. This is called a covering index.

Because covering indexes reduce the number of B+Tree searches and improve query performance, using covering indexes is a common indexing method. The most common way to use a covering index is to create a joint index and put all the fields that need to be queried on the joint index.

Use explain sql. If there is using index in Extra, it proves that a covering index is used.

5. The leftmost prefix principle

The leftmost prefix uses the index to speed up retrieval. The leftmost prefix can be the leftmost N fields of the joint index, or the leftmost M characters of the string index. That is to say, if you want to query N fields, they are included in a certain Within the leftmost N fields of the joint index, simply put, the data in the index fields must be in order to achieve this type of search and to use the index.

Summary of the leftmost prefix principle

  1. Suppose there are three fields (col1, col2, col3), MySQL can support joint indexes of (col1), (col1, col2), and (col1, col2, col3).
  2. The more controversial question (col1, col3) is whether it supports joint indexing. It is supported in the official documents, and it is also supported in our experiments.
  3. Changing the order of several search conditions in the where clause will not affect the query results, because there is a query optimizer in Mysql that will automatically optimize the query order.
  4. Where clause, if it encounters a range query (> < between, like) or an index pair not created in Summary 1, it will stop matching (the range query encountered still participates in the index).

6. Index failure

After the index is built, some bad SQL will cause the index to fail. There are several scenarios that will cause the index to fail.

  1. If there is OR in the query conditions, even if some of the conditions are indexed, they will be invalid;
  2. LIKE query already starts with %;
  3. If the column type is a string, the data needs to be quoted in quotation marks in the query conditions, otherwise the index will not be used;
  4. Participating in calculations on index columns will cause index failure;
  5. Violates the leftmost matching principle;
  6. If Mysql estimates that a full table scan is faster than using an index, the index will not be used.
  7. The B-tree index will not go if it is null, but it will go if it is not null. The bitmap index will go if it is null, and it will go if it is not null;
  8. The joint index is not null will be used as long as the index columns are created (in no particular order). When in null, it must be used with the first column of the index. When the first position condition of the index is is null, other indexed columns It can be is null (but it must be when all columns satisfy is null), or = a value; when the first position of the index is = a value, other index columns can be any situation (including is null = a value ), the index will go away in both cases. Won't leave under other circumstances

7. Index push down

Index condition pushdown (index condition pushdown), referred to as ICP, was launched in MySQL 5.6 and later versions to optimize table return queries; when ICP is not used, non-primary key indexes (also called ordinary indexes or secondary indexes) are used ) when querying, the storage engine retrieves the data through the index, and then returns it to the MySQL server. The server then determines whether the data meets the conditions; in the case of using ICP, if there are certain conditions for the indexed columns, the MySQL server will This part of the judgment condition is passed to the storage engine,
and then the storage engine judges whether the index meets the conditions passed by the MySQL server. Only when the index meets the conditions will the data be retrieved and returned to the MySQL server;

  • Check the status of index pushdown
show VARIABLES like '%optimizer_switch%';
-------------------------------------------------------
optimizer_switch	index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,engine_condition_pushdown=on,index_condition_pushdown=on,mrr=on,mrr_cost_based=on,block_nested_loop=on,batched_key_access=off,materialization=on,semijoin=on,loosescan=on,firstmatch=on,duplicateweedout=on,subquery_materialization_cost_based=on,use_index_extensions=on,condition_fanout_filter=on,derived_merge=on,use_invisible_indexes=off,skip_scan=on,hash_join=on,subquery_to_derived=off,prefer_ordering_index=on,hypergraph_optimizer=off,derived_condition_pushdown=on
  • Turn off index pushdown
#索引下推是mysql 5.6优化查询回表的功能,在5.6之前都不支持索引下推
set optimizer_switch='index_condition_pushdown=off';
  • Enable index pushdown
set optimizer_switch='index_condition_pushdown=on';
  • Summarize
    1. The index pushdown function is an operation introduced in MySQL 5.6 to optimize table return. It only supports upward compatibility and is not supported by lower versions;
    2. Index pushdown only optimizes the number of table returns, but the number of scanned rows remains the same.

Guess you like

Origin blog.csdn.net/songjianlong/article/details/132352142