MySQL interview stereotyped essay: index article

 

index definition

An index is a data structure used in a database to speed up data queries. It can sort one or more columns in the data table to quickly find data, reduce the number of scans of the database, and improve the query speed.

Advantages and disadvantages of indexing

The advantage of indexing is that it can greatly increase the speed of data query, especially for large databases. At the same time, the index can ensure the uniqueness of data, reduce redundant data, and improve the integrity and security of the database. However, indexes also have disadvantages. First of all, indexes need to occupy a certain amount of storage space. If too many indexes are created, it may take up too much space. Secondly, the update and maintenance of the index will increase the overhead of the database, so in some scenarios with many write operations, the index will affect the performance.

When do you need to build an index

In some large data tables, if you need to quickly query some data, you can increase the query speed by building an index. Usually, indexes need to be established for fields that are frequently queried, sorted, and grouped, including primary keys, foreign keys, and fields that are often used for queries. It should be noted that indexing is not a brainless addition, and should be selected according to the actual situation. Too many indexes will waste storage space and affect performance.

when you don't need to build an index

  1. Fields that are not used in the where condition are not suitable for indexing
  2. The table has fewer records. For example, if there are only a few hundred pieces of data, there is no need to add an index.
  3. Need to add and delete frequently. Need to evaluate whether it is suitable for indexing
  4. Columns participating in column calculations are not suitable for indexing
  5. Fields with low discrimination are not suitable for indexing, such as gender, which only has three values: male/female/unknown. Adding an index will not improve query efficiency.

indexed data structure

Common index data structures include B-tree index, B+ tree index and Hash index. Among them, the B-tree index is a self-balancing multi-fork tree structure, which can quickly find data, but the efficiency is not as good as the B+ tree index. The B+ tree index is a B-tree-based index structure. Compared with the B-tree index, it has higher query efficiency and better storage capacity. Hash index is a structure that uses hash function for indexing. It is suitable for equivalent query and scenarios with small query range, but it does not support range query and sorting operations.

B+ tree index

The B+ tree is implemented based on the B tree and leaf node sequential access pointers. It has the balance of the B tree and improves the performance of interval queries through sequential access pointers.

In the B+ tree, the keys in the nodes are arranged in increments from left to right. If the left and right adjacent keys of a pointer are keyi and keyi+1 respectively, then all the keys pointed to by the pointer are greater than or equal to keyi and less than or equal to keyi+1 .

When performing a search operation, first perform a binary search on the root node, find the pointer where the key is located, and then search recursively on the node pointed to by the pointer. Until the leaf node is found, then binary search is performed on the leaf node to find out the data item corresponding to the key.

The difference between Hash index and B+ tree index

Hash index and B+ tree index are very different in implementation. The Hash index maps the value of the index column to a hash table through a hash function, and then looks up the corresponding data in the hash table. This method can quickly locate data, but it is not very friendly to range query and sorting operations. The B+ tree index is a multi-layer balanced tree structure, which can quickly locate data through binary search. Compared with the Hash index, the B+ tree index supports range query and sorting operations, and has a wider scope of application.

Why is B+ tree more suitable for implementing database index than B tree?

B+ tree has better storage and query efficiency than B tree. First of all, the non-leaf nodes of the B+ tree do not save data, but only save keywords, so more keywords can be accommodated, thereby reducing the height of the tree and improving query efficiency. Secondly, the leaf nodes of the B+ tree are connected through pointers to form an ordered linked list, which can facilitate range query and sorting operations. In addition, the B+ tree has a higher utilization rate of disk I/O, because the B+ tree reads the entire node each time, which is more efficient than the B tree.

Classification of Index

Indexes can be classified according to different characteristics. Common classifications include clustered index and non-clustered index, unique index and non-unique index, single-column index and multi-column index, etc.

leftmost matching principle

The leftmost matching principle means that for a composite index, only some columns from left to right can be used as query conditions when querying, and the columns cannot be skipped. For example, for the index (a,b,c), it can only be used in the order of (a), (a,b) and (a,b,c), not just (b,c) or (c).

clustered index

A clustered index is a special index method, and its index order is consistent with the physical order, that is, data is stored in the index order. A clustered index can effectively improve the query speed of data because it can directly locate the physical location of the data. In MySQL, each table can have only one clustered index, usually the primary key index.

covering index

Covering index means that the query statement can directly obtain the required data through the index without accessing the data table, thereby improving query efficiency. For example, for the query statement SELECT id FROM table WHERE name='abc', if there is a compound index of (name, id) in the table, the query can be completed directly on the index without accessing other columns in the table, thereby improving the query speed .

Index Design Principles

When designing an index, there are several principles to consider:

  • Try to select highly differentiated columns as index columns to reduce the duplication rate of the index.
  • Try to choose a column with a small amount of data as the index column to reduce the space occupied by the index.
  • Try to select frequently queried columns as index columns to improve query efficiency.
  • Avoid using too many indexes, because too many indexes increase maintenance costs and may affect the performance of write operations.
  • For composite indexes, index columns need to be selected according to the leftmost matching principle.
  • For clustered indexes, the primary key is usually chosen as the clustered index.
  • For a covering index, you need to select index columns according to the requirements of the query statement.

When will the index become invalid?

The situations where the index will become invalid mainly include the following:

  • A function or expression is used in the query condition, so the index cannot be used.
  • The use of the not equal (<>) operator or the not in (NOT IN) operator in the query condition may also result in the inability to use the index.
  • The fuzzy matching operator (LIKE) is used in the query condition. If the matching string starts with a wildcard character, the index may not be used.
  • The OR operator is used in the query condition. If there are indexes available on both sides of the OR operator, the index can be used, otherwise the index cannot be used.
  • The data type on the index column is inconsistent with the data type of the query condition, so the index cannot be used.
  • The data in the table is unevenly distributed, causing the index to fail.

What is a prefix index?

A prefix index refers to indexing only a part of the index column, thereby reducing the space occupied by the index. For example, for a column of VARCHAR type, only the first few characters can be indexed. The disadvantage of the prefix index is that the precision is not high enough, which may lead to inaccurate query results.

What is index pushdown?

Please move to Mysql, what is index pushdown, and understand it in a minute (send a book at the end of the article)

Guess you like

Origin blog.csdn.net/Dark_orange/article/details/130493068