[MYSQL] MYSQL Learning Tutorial (4) Index Failure Scenario

1. Reasons for MySQL index failure

Reasons for MySQL index failure:

  1. The query condition contains or, which may cause index failure
  2. Implicit type conversion, index failure
  3. like wildcard "%" in front of keywords causes index failure
  4. When using MYSQL's built-in functions on index columns, the index becomes invalid.
  5. When performing operations on indexed columns (such as , +, -, *, /), the index becomes invalid.
  6. When using negative queries ( NOT, !=, <>, NOT IN, NOT LIKE) on index fields, it may cause index failure
  7. The index field can be null. Using is null, is not nullmay cause the index to fail.
  8. Violates the leftmost matching principle of the index (joint index, the condition column when querying is not the first column in the joint index, the index is invalid)
  9. The associated field encoding formats of left join queries or right join queries are different, which may cause index failure.
  10. MYSQL estimates that using a full table scan is faster than using an index, so the index is not used.

1.1 Query conditions containing or may cause index failure

userIdWith index, agewithout index: no indexing

select * from user where userId = '123' or age = 18

[Explanation] : Even if it goes to userIdthe index, agewhen it reaches the query conditions, it still has to go through a full table scan, which requires three steps: index scan + full table scan + merge. If you perform a full table scan from the beginning, you'll be fine. For the sake of efficiency and cost, the MYSQL optimizer orwill invalidate the index when encountering conditions. This is also reasonable.

[Note]: If the columns of the or condition are indexed, the index may or may not be used.

If the or condition does not go through the index, you can consider splitting the two SQLs for optimization.

[Optimization] : Change to UNIONOr UNION ALL, divide into multiple sql and use their own indexes

UNIONThe difference between and UNION ALL:
  1. Processing of duplicate results: UNION will remove duplicate records, UNION ALL will not
  2. Processing of sorting: UNION will sort, UNION ALL simply merges the two result sets
  3. Differences in efficiency: Because UNION will perform deduplication and sorting, the efficiency is much slower than UNION ALL.

1.2 Implicit type conversion may cause index failure

The official documentation rules for MySQL 8.0 describe how comparison operations perform conversions:

  1. If both arguments in the comparison operation are strings, they are compared as strings
  2. If both arguments are integers, compare them as integers
  3. In all other cases, arguments are compared as floating-point (double-precision) numbers . For example, comparisons of string and numeric operands are performed as comparisons of floating point numbers

num_intThe field is intof type, num_strthe field is varcharof type, and each has its own index.

# int 类型
SELECT * FROM `test1` WHERE num_int = 10000;
SELECT * FROM `test1` WHERE num_int = '10000';
# varchar 类型
SELECT * FROM `test1` WHERE num_str = 10000;
SELECT * FROM `test1` WHERE num_str = '10000';

In the case of large data volume, the execution time of the first, second and fourth SQL statements is almost the same; while the third SQL statement does not use the index, so the execution time is longer.

According to the document description, we know that implicit conversion has occurred in the second and third sql , "=", the left and right sides will be converted to floating point numbers and then compared.

The second sql:

SELECT * FROM `test1` WHERE num_int = '10000';

The left side is of int type, and only 10000the floating point number with value is 10000; the right side is of varchar type '10000', so the conversion to floating point number is 10000. The conversion results on both sides are unique and certain, so the use of indexes is not affected.

The third sql:

SELECT * FROM `test1` WHERE num_str = 10000;

The left side is of varchar type, not only '10000'the floating point number with the value of 10000, but other strings can also be converted to 10000, such as: '10000a', '010000', '10000'etc. can be converted into floating point numbers of 10000; the right side is of int type, which can be converted into floating point numbers 10000. Since varchar is not unique when converted to a floating point number, the index cannot be used.

Summarize
  1. When the data types on the left and right sides of the operator are inconsistent, implicit conversion occurs
  2. When the query field is a numeric type, implicit conversion occurs, which has little impact on efficiency, but it is still not recommended.
  3. When the query field is a character type, implicit conversion occurs, which will cause the index to become invalid and cause the full table scan to be extremely inefficient.
  4. When a string is converted to a numeric type, the string starting with a non-number will be converted to 0, and the string starting with a number will intercept the value from the first character to the first non-number content as the conversion result.

Therefore, we must develop good habits when writing SQL. Whatever type of field is being queried, the condition on the right side of the equal sign should be written as the corresponding type. Especially when the query field is a string, the condition on the right side of the equal sign must be enclosed in quotation marks to indicate that it is a string, otherwise it will cause the index to fail and trigger a full table scan.

1.3 like wildcard, may cause index failure

select * from user where userId like%123
  • %On the left: matches the data at the end of the string. Since the data at the end is out of order, it cannot be queried in the order of the index. The index will not be used and the index will be invalid.
  • %On the right side: Since the index order of the B+ tree is sorted according to the size of the first letter, the % sign on the right matches the first letter, so you can perform an orderly search on the B+ tree and use the index.
  • When there is a %in : This is to query the letters at any position that meet the conditions. Only the first letter is indexed and sorted. The other positions are relatively unordered, and the index is not used at this time.

question:

It is necessary that the wildcard is on the left and the index is not invalid.

Build: covering index

For example: Create a composite index: (name, age)

select name
from test
where name like '%5'

1.4 When using MYSQL’s built-in functions on index columns, the index becomes invalid.

Field create_timeindexing

# 索引失效
select count(*) from tradelog where month(create_time) = 7;

# 索引生效
select count(*) from tradelog where create_time = '2018-7-1

create_timeIndex diagram (the number above the box is month()the value corresponding to the function):

Insert image description here

If your sql statement condition uses where create_time = '2018-7-1’, the engine will follow the green arrow route above to quickly locate create_time='2018-7-1’the required results.

In fact, the fast positioning capability provided by the B+ tree comes from the orderliness of sibling nodes on the same layer.

However, if you calculate month()the function, you will see that when 7 is passed in, you don't know what to do at the first level of the tree.

That is: performing functional operations on index fields may destroy the orderliness of index values , so the optimizer decides to give up the tree search function.

1.5 When performing operations on index columns (such as , +, -, *, /), the index becomes invalid.

Same as 1.4

1.6 When using negative queries ( NOT, !=, <>, NOT IN, NOT LIKE) on index fields, it may cause index failure

This is also related to the MYSQL optimizer. If the optimizer feels that even if the index is used, it still needs to scan many, many rows, it feels that it is not cost-effective. It is better not to use the index directly.

1.7 The index field can be null. Using is null, is not nullmay cause the index to fail.

  1. Null columns make indexing/index statistics/value comparisons more complex and harder to optimize for MySQL
  2. null This type of MySQL requires special processing internally, which increases the complexity of database processing records; under the same conditions, when there are many empty fields in the table, the database processing performance will be reduced a lot.
  3. Null values ​​require more storage. Null columns in each row in either the table or the index require additional space to identify.
  4. When dealing with null, you can only use is nullor is not null, but not the operation symbols such as in, <, <>, !=, and not in.

1.8 Violates the leftmost matching principle of the index

When MySQL builds a joint index, it will follow the principle of leftmost prefix matching, that is, leftmost first. In the joint index, the index will only take effect normally when the query conditions meet the leftmost matching principle.

Composite index (name, age, pos)

# 情况一:索引生效
select name
from test
where name = 'zzc' and age = 22 

# 情况二:索引失效
select name
from test
where age = 22 and pos = '11'

# 情况三:name 索引不失效,pos 索引失效
select name
from test
where name = 'zzc' and pos = '11'   

1.9 The encoding formats of fields associated with left join queries or right join queries are different, which may lead to index failure.

1.10 MYSQL estimates that using a full table scan is faster than using an index, so do not use an index.

solution:

  • Use force indexto force select an index
  • Modify your SQl to direct it to use the index we expect
  • Optimize your business logic
  • Optimize your indexes, create new ones that are more suitable, or delete misused indexes

Guess you like

Origin blog.csdn.net/sco5282/article/details/135081370