【Face Jing】Database index (MySQL)

Article Directory

Indexing is an optimization scheme for data query/retrieval in the database.

1. The data structure used by the index

There are mainly two data structures: Hash index and B+ Tree .

In Mysql's default InnoDB engine, the default is B+ Tree.

2. Why use B+ Tree? Are there any advantages and disadvantages compared to the Hash index?

The bottom layer of the Hash index is a hash table. The hash table stores data in a key-value pair structure; multiple data does not have any order relationship in the storage relationship. In the interval query, the index query cannot be directly used, and a full table scan is required. Therefore, the Hash index can only be used for equivalent query scenarios . Other issues: Hash index cannot use the index to complete the sorting , and does not support the leftmost matching rule of the multi-column joint index . There is a hash collision problem .
B+ Tree is a multi-way balanced search tree . Its nodes are originally ordered, so full table scans are not required for range queries.

3. The leaf nodes of the B+ tree.

The leaf nodes of the B+ tree may store the entire row of data , or may store the value of the primary key key ;

Clustered index : Divided according to the physical storage of the data. For a bunch of records, the use of a clustered index is to divide the heap of records, which mainly describes the physical storage. It is precisely because of this division method that the clustered index must be unique. The clustered index can help put a large range and quickly reduce the range. But to find the record, it is necessary to scan from this small area;
Non-clustered index : The leaf node in the B+ tree stores the value of the primary key, also known as the non-primary key index. The non-clustered index converts a large area into a small map, and then you need to find the location of the information you are looking for in this small map, and finally through this location, you can find the records you need.

When querying data, the clustered index is generally faster than the non-clustered index, because the non-clustered index query data is the value of the primary key, and the data table needs to be queried again through the value of the primary key, which is called this situation For back to the table .

Non-primary key indexes can also avoid returning to the table by covering indexes .

Covering index: The execution of a query statement can be obtained only from the index, without reading from the data table. It can also be said that index coverage is achieved.

When executing a query statement, if the required data can be obtained in the index, there is no need to find the index and then return to the data table to operate, reducing I/O and improving efficiency.

eg：

There is a common index idx_key1_key2(key1,key2) in the table covering_index_sample.

When we use the SQL statement: select key2 from covering_index_sample where key1 ='keytest';, we can query key2 through the covering index and return it without returning to the table.

4. Joint index, leftmost match

Because the leftmost matching principle , we are in the process of establishing the joint index, the index should be noted that in accordance with the use of the frequency of high and low and high and low degree of dispersion values from left to the right in order, able to do so because the index to minimize the failure of sequential scanning Case.

The principle of leftmost matching : The conditions in the where statement are used in the order of the fields created by the index (it does not mean that the and conditions must be written in order). If there is no condition in a column in the middle, or the use of like will cause the subsequent columns to be unable to use the index.

What fields are suitable for indexing?
1. Frequently selected fields for query
2. Fields that are often connected to tables
3. Fields that often appear after order by, group by, distinct
Under what circumstances will index failure occur and a full table scan will be performed?
1. LIKE fuzzy query with "% (any character)" on the far left, such as "%abc%", "%__", etc.;
2. Index fields are not used at the same time before and after the OR statement;
3. Implicit conversion of data type (such as varchar without single quotes may be automatically converted to int type);
4. For multi-column indexes, the leftmost matching principle is not satisfied

5. Disadvantages of indexing

Considering mainly from time and space:

Time: It takes time to create and maintain indexes. Specifically, when adding, deleting, and modifying data in a table, indexes should also be dynamically maintained, which reduces the speed of data maintenance;
Space: The index needs to occupy physical space.

6. Optimization of indexes in MySql 5.6

Index Condition Pushdown

MySQL 5.6 introduces index push-down optimization, which is enabled by default.

Use SET optimizer_switch ='index_condition_pushdown=off'; to turn it off.

The examples and explanations given in the official documents are as follows:

The people table (zipcode, lastname, firstname) constitutes an index

SELECT * FROM people WHERE zipcode=‘95054’ AND lastname LIKE ‘%etrunia%’ AND address LIKE ‘%Main Street%’;

If the index push technology is not used, MySQL will query the corresponding data from the storage engine through zipcode='95054' and return it to the MySQL server, and then the MySQL server will be based on the lastname LIKE'%etrunia%' and address LIKE'%Main Street%' to determine whether the data meets the conditions.

If the index push technology is used, MYSQL will first return an index that meets the zipcode='95054', and then judge whether the index meets the conditions based on the lastname LIKE'%etrunia%' and address LIKE'%Main Street%'. If the conditions are met, the corresponding data is located according to the index, if not, it is directly rejected. With index push-down optimization, you can reduce the number of back to the table in the case of query with like conditions.

Reference article: