In-depth understanding of mysql index (4) index usage principles

In the previous article, I learned about the specific two types of indexes. This article learns the principles of index usage. When we are daily sql tuning, the first thing we think of may be "adding an index", but have you considered this? Is there any problem with the approach, everything is the opposite of everything, and the more the better, the index is the same

We all know that in mysql, the index is also stored in a file, and the index is a tree-type data structure, his data structure needs to be maintained, so if there are too many indexes in a table (generally A table does not exceed 10 indexes), then the index of this table takes up a lot of space, and when such a table is added, deleted, or modified, it will also consume a lot of resources and cause performance problems.

Principles of Index Use

1. Column dispersion

Formula:
count(distinct(column_name)): count(*) , the ratio of all different values ​​of the column and all data rows.
In the case of the same number of data rows, the larger the numerator, the higher the dispersion of the columns.

In the vernacular: If the column has more repeated values, the dispersion will be lower, and the fewer repeated values, the higher the dispersion.
Example: Create an index on name and gender separately.
When we use the index created on gender to retrieve data, because there are too many duplicate values, more rows need to be scanned. For example, we now create an index on the gender column, and then look at the execution plan.

ALTER TABLE user_innodb DROP INDEX idx_user_gender;
ALTER TABLE user_innodb ADD INDEX idx_user_gender (gender); -- 耗时比较久
EXPLAIN SELECT * FROM `user_innodb` WHERE gender = 0;

Insert picture description here
The dispersion of name is higher, such as the name "Zhang San", only one line needs to be scanned.

ALTER TABLE user_innodb DROP INDEX idx_user_name;
ALTER TABLE user_innodb ADD INDEX idx_user_name (name);
EXPLAIN SELECT * FROM `user_innodb` WHERE name = '张三';

Insert picture description here
Conclusion: To build an index, use fields with higher dispersion (selectivity).
If there are too many duplicate values ​​in the B+Tree and the MySQL optimizer finds that the index is
not much different from using the full table scan , even if the index is built, it will not necessarily be indexed.
Insert picture description here

2. The leftmost matching principle of the joint index

Two points need to be explained here:

  • Single-column index can be regarded as a special joint index
  • Joint index is a index

We created a joint index for name and phone on the user table.

ALTER TABLE user_innodb DROP INDEX comidx_name_phone;
ALTER TABLE user_innodb ADD INDEX comidx_name_phone (name,phone);

Insert picture description here

The joint index is a composite data structure in B+Tree. It builds the search tree in the order from left to right
(name is on the left and phone is on the right).

It can be seen from this picture that the name is ordered and the phone is disordered. When the names are equal, the phones are ordered.

At this time, when we use where name='Mic' and phone = '133xx' to query the data, B+Tree will compare the name first to determine the direction of the next search, left or right. If the name is the same, then compare phone. But if there is no name in the query condition, you don't know which node should be checked in the first step, because name is the first comparison factor when building the search tree, so the index is not used.

You can try it in your own database.

In the joint index (ABC), the index can be used by using where followed by ABC, A, AB. Indexes cannot be used with where B and where BC and where AC.
Conclusion: When creating a joint index, be sure to put the most commonly used columns on the leftmost side. Can not use the first field, in order, can not be interrupted.

If there is a joint index ab, then the where condition is followed by where b = XX and a = XX can also use the index, why? Because the underlying optimizer will automatically optimize, know that you want to use the ab joint index.

Graphical understanding:
Insert picture description here

another:

CREATE INDEX idx_name on user_innodb(name);
CREATE INDEX idx_name_phone on user_innodb(name,phone);
当我们创建一个联合索引的时候,按照最左匹配原则,用左边的字段 name 去查询
的时候,也能用到索引,所以第一个索引完全没必要。

如果我们创建三个字段的索引 index(a,b,c),相当于创建三个索引:
index(a)
index(a,b)
index(a,b,c)
虽然说相当于三个索引,但是 **注意** :联合索引算一个索引

The above is the leftmost matching principle of MySQL joint index.

Covering index

Insert picture description here

**Back to table: **Non-primary key index, we first find the key value of the primary key index through the index, and then find out the data that is not in the index through the primary key value. It scans one more index tree than the query based on the primary key index. This process is called back to the table.

In the auxiliary index, whether it is a single-column index or a joint index, if the select data column can be obtained only from the index, it does not need to be read from the data area. The index used at this time is called the covering index, which avoids returning Table .

Let's first create a joint index:

-- 创建联合索引
ALTER TABLE user_innodb DROP INDEX comixd_name_phone;
ALTER TABLE user_innodb ADD INDEX `comixd_name_phone` (`name`,`phone`);

Covering indexes are used in these three query statements:

EXPLAIN SELECT name,phone FROM user_innodb WHERE name= '张三' AND phone = '13888888888';
EXPLAIN SELECT name FROM user_innodb WHERE name= '张三' AND phone = ' 13888888888';
EXPLAIN SELECT phone FROM user_innodb WHERE name= '张三' AND phone = ' 13888888888';

The value " Using index " in Extra means that the covering index is used.
Insert picture description here
select *, the covering index is not used.
Obviously, because the covering index reduces the number of IOs and the amount of data access, the query
efficiency can be greatly improved .

4. Push under index conditions (ICP)

Index Condition Pushdown ( Understand), perfect function after 5.6. Only applies to secondary indexes. The goal of ICP is to reduce the number of reads of complete rows of the access table, thereby reducing I/O operations.

The push-down mentioned here actually means that the filtering action is done in the storage engine, without the need to filter at the Server layer.

Example:
There is such a table, create a joint index on last_name and first_name.

drop table employees;
CREATE TABLE `employees`
(
    emp_no     int(11)        NOT NULL,
    birth_date date           NULL,
    first_name varchar(14)    NOT NULL,
    last_name  varchar(16)    NOT NULL,
    gender     enum ('M','F') NOT NULL,
    hire_date  date           NULL,
    PRIMARY KEY (emp_no)
) ENGINE = InnoDB
  DEFAULT CHARSET = utf8;

alter table employees add index idx_lastname_firstname(last_name,first_name);

Now we want to query all employees whose last name is wang and whose last word is zi, such as Fatty Wang and Thin Wang. Query SQL:

select * from employees where last_name='wang' and first_name LIKE '%zi'

Normally, because the characters are sorted from left to right, when you add% to the front, it cannot be compared based on the index, so only the last_name (surname) field can be used for index comparison and filtering.

So the query process is like this :

  1. Find out all the secondary index data of wang according to the joint index (3 primary key values: 6, 7, 8).
  2. Return to the table and query all the eligible data (3 data) on the primary key index.
  3. Return these 3 pieces of data to the Server layer, and filter out employees whose names end with zi on the Server layer.

Insert picture description here
Note that the comparison of indexes is performed on the storage engine, and the comparison of data records is performed on the Server layer . When the condition of first_name cannot be used for index filtering, the Server layer will not pass the condition of first_name to the storage engine, so two unnecessary records are read.
At this time, if there are 100,000 records satisfying last_name='wang', there will be 9,999 records that do not need to be read. So, can the filtering based on the first_name field be done at the storage engine layer?

The second query method:

  1. Find out all the secondary index data of wang according to the joint index (3 primary key values: 6, 7, 8)
  2. Then filter out the indexes whose first_name ends with zi from the secondary indexes (1 index)
  3. Then return to the table, query all the eligible data (1 data) on the primary key index, and return it to the Server layer.

Insert picture description here
Obviously, the second way to query data on the primary key index is less.

ICP is turned on by default, that is to say for the secondary index, as long as the condition can be pushed down to the storage engine, it will push down without our intervention:

set optimizer_switch = 'index_condition_pushdown=on';

The execution plan at this time: using index condition; after
Insert picture description here
pushing first_name LIKE %zi' down to the storage engine, only the required 1 record will be read from the data table.

Turn off ICP:

set optimizer_switch = 'index_condition_pushdown=off;

View parameters:

show variables like 'optimizer_switch';

Execute the following SQL, Using where:

explain select * from employees where last_name='wang' and first_name LIKE '%zi';

Insert picture description here
Using Where means that the data retrieved from the storage engine does not meet the conditions and needs to be filtered at the Server layer.
First use the last_name condition to scan the index range, read the data table records, and then compare to check whether it meets the first_name LIKE'%zi' condition. At this time, only one of the three meets the conditions.

Index creation and use

Because indexes play a huge role in improving query performance, our goal is to use indexes as much as possible.

Index creation

  1. Create an index on the (on) and group by fields used for where to determine order sorting and join
  2. Do not have too many indexes. (Generally no more than 10)-waste of space, slow update
  3. Fields with low dispersion (discrimination) , such as gender, should not be indexed. ——The dispersion is too low, resulting in too many scanning lines. May not use index
  4. Frequently updated values ​​should not be used as primary keys or indexes. - cause page split
  5. The composite index puts the value with high hashability (high discrimination) first.
  6. When you can create a composite index, do not create a single-column index.
  7. It is not recommended to use unordered values ​​(such as ID card, UUID) as an index-disorder, split
  8. Too long fields, create a prefix index
    CREATE TABLE 'pre_test' (
    `content` varchar(20) DEFAULT NULL,
    KEY `pre_idx` (`content` (6))
    )ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
    

When to use the index

  1. Use function (replace\SUBSTR\CONCAT\sum count avg), expression, calculation (+-* /) on the index column:
explain SELECT * FROM `student` where id+1 = 4;
  1. String is not quoted, implicit conversion occurs
ALTER TABLE user_innodb DROP INDEX comidx_name_phone;
ALTER TABLE user_innodb add INDEX comidx_name_phone (name,phone);

explain SELECT * FROM `user_innodb` where name = 136; -- 没有用到索引
explain SELECT * FROM `user_innodb` where name = '136'; -- 用到了索引

Note: The field type is int, and the where condition is added with single quotation marks'' can be indexed. For example, where id = '123' can use the index

  1. Like condition with% in the front-Doesn't it mean that all data meet the condition with% in front? The index is definitely not needed. The cost of filtering is too large, and full-text indexing can be used at this time.
  2. Negative query
    1. NOT LIKE cannot:
    explain select *from employees where last_name not like 'wang'
    
    1. != (<>) and NOT IN can in some cases:
    explain select * from employees where emp_no not in (1)
    explain select * from employees where emp_no <> 1
    

Note: Whether a SQL statement uses an index is related to the database version, data volume, and data selection

In the end: In fact, whether or not an index is used is ultimately the final decision of the optimizer.

What is the optimizer based on?
Based on cost overhead (Cost Base Optimizer), it is not based on rules (Rule-Based Optimizer),
nor is it based on semantics. Do whatever the cost is.

There are basic principles for the use of indexes, but there are no specific rules. There is no rule that indexes must be used under any circumstances, and no rules are used under any circumstances.

Guess you like

Origin blog.csdn.net/nonage_bread/article/details/108432157