MySQL Advanced - Index

This article introduces the structure, syntax, and usage rules of MySQL indexes.

Index introduction

An index is a data structure (ordered) that helps MySQL efficiently retrieve data . In addition to data, the database system also maintains data structures that satisfy specific search algorithms. These data structures refer to (point to) data in a certain way, so that advanced query algorithms can be implemented on these data structures. This data structure is an index. .
Comparison of advantages and disadvantages:
insert image description here

index structure

The MySQL index is implemented at the storage engine layer. Different storage engines have different index structures, mainly including the following:
insert image description here

binary tree

Disadvantages of binary tree:
When inserted sequentially, it will degenerate into a one-way linked list, and the performance will be greatly reduced. When the amount of data is large, the level of the binary tree is deep, and the speed of retrieving data is slow.

insert image description here
The shortcomings of binary numbers can be compensated by red-black trees, but red-black trees also have the problem of deep levels and slow retrieval speed in the case of large data volumes.
insert image description here

B-Tree

B-Tree (multi-way balanced search tree) Take a b-tree with a maximum degree (max-degree, referring to the number of child nodes of a node) of 5 (5th order) as an example

  • Each node stores up to 4 keys and 5 pointers).
  • Once the number of keys stored in a node reaches 5, it will be fissioned, and the middle elements will be split upwards.
  • In B-Tree, both non-leaf nodes and leaf nodes store data.
    insert image description here

B+Tree

B+Tree is a variant of B-Tree. Let's take a b+tree with a maximum degree (max-degree) of 4 (4th order) as an example: the difference with
insert image description here
B-Tree:

  • All data will appear in the leaf nodes
  • Leaf nodes form a singly linked list

The MySQL index data structure is optimized for the classic B+Tree. On the basis of the original B+Tree, a linked list pointer pointing to the adjacent leaf nodes is added to form a B+Tree with sequential pointers, which improves the performance of range access.
insert image description here

Hash

The hash index uses a certain Hash algorithm to convert the key value into a new Hash value, maps it to the corresponding slot, and then stores it in the Hash table.

If two (or more) key values ​​are mapped to the same slot, they will generate a hash conflict (also known as a hash collision), which can be resolved through a linked list.

Features:

  • Hash indexes can only be used for peer-to-peer comparisons (=, in), and range queries (between, >, <, ...) are not supported
  • Unable to complete the sort operation using the index
  • High query efficiency, usually (in the absence of hash conflicts), only one retrieval is required, and the efficiency is usually higher than that of B+Tree index

insert image description here

In MySQL, the memory storage engine supports the hash index. InnoDB has an adaptive hash function, and the hash index is automatically built by the InnoDB storage engine under specified conditions based on the B+Tree index.

index classification

insert image description here
In the InnoDB storage engine, according to the storage form of the index, it can be divided into the following two types:
insert image description here
as shown in the figure:
insert image description here

The data of this row hangs under the leaf node of the clustered index.
The leaf node of the secondary index hangs the primary key value corresponding to the field value.

insert image description here

Since the query is based on the name field, first perform a matching lookup in the secondary index of the name field based on name='Arm'. But only the primary key value 10 corresponding to Arm can be found in the secondary index.
Since the data returned by the query is *, at this time, it is also necessary to search for the record corresponding to 10 in the clustered index according to the primary key value 10, and finally find the row corresponding to 10.
Finally, get the data of this row and return it directly.

Table-back query: This method of searching data in the secondary index first, finding the primary key value, and then obtaining data in the clustered index based on the primary key value is called table-back query.

Clustered index selection rules:

  • If there is a primary key, the primary key index is a clustered index
  • If no primary key exists, the first unique (UNIQUE) index will be used as the clustered index
  • If the table has no primary key or no suitable unique index, InnoDB will automatically generate a rowid as a hidden clustered index

thinking questions

  1. Which of the following SQL statements is more efficient? Why?
select * from user where id = 10;
select * from user where name = 'Arm';
-- 备注:id为主键,name字段创建的有索引

Answer: The first statement, because the second one needs to return to the table query, is equivalent to two steps.

  1. What is the B+Tree height of the InnoDB primary key index?

Answer: Assuming that the size of a row of data is 1k, 16 rows of such data can be stored in one page. The pointer of InnoDB occupies 6 bytes of space, the primary key is assumed to be bigint, and the number of bytes occupied is 8. The formula
can be obtained: n * 8 + (n + 1) * 6 = 16 * 1024, where 8 represents the number of bytes occupied by bigint, n represents the number of keys stored in the current node, (n + 1) indicates the number of pointers (one more than key). Calculate n to be about 1170.

If the height of the tree is 2, then the amount of data he can store is roughly: 1171 * 16 = 18736;
If the height of the tree is 3, then the amount of data he can store is roughly:1171 * 1171 * 16 = 21939856。

In addition, if there are tens of thousands of data, then sub-tables must be considered, which involves the knowledge of operation and maintenance.

grammar

Create an index:
CREATE [ UNIQUE | FULLTEXT ] INDEX index_name ON table_name (index_col_name, ...);
If you do not add an index type parameter after CREATE, you will create a regular index

View index:
SHOW INDEX FROM table_name;

Delete the index:
DROP INDEX index_name ON table_name;

#name字段为姓名字段,该字段的值可能会重复,为name字段创建索引
CREATE INDEX idx_user_name ON tb_user(name);

#phone手机号字段的值时非空且唯一的,为phone字段创建索引
CREATE UNIQUE INDEX idx_user_phone ON tb_user(phone);

#为profession、age、status创建联合索引
CREATE INDEX idx_user_pro_age_status ON tb_user(profession,age,status);

#为email创建合适的索引来提升查询效率
CREATE INDEX idx_user_email ON tb_user(email);

#删除idx_user_email索引
DROP INDEX idx_user_email ON tb_user;

SQL performance analysis

SQL execution frequency

After the MySQL client connects successfully, the server status information can be provided through the show [session|global] status command. Through the following command, you can view the access frequency of INSERT, UPDATE, DELETE, and SELECT of the current database:

#session 是查看当前会话 ;
#global 是查询全局数据 ;
SHOW GLOBAL STATUS LIKE 'Com_______';

Through the above instructions, we can check whether the current database is mainly query or whether it is mainly based on addition and deletion, so as to provide a reference for database optimization. If it is based on addition and deletion, we can consider not optimizing it for indexing. If it is mainly about query, then it is necessary to consider optimizing the index of the database.

slow query log

The slow query log records all ·long_query_time,单位:秒,默认10秒SQL statements whose execution time exceeds the specified parameter).

MySQL's slow query log is not enabled by default, we can check the system variables slow_query_log.

(/etc/my.cnf)If you want to enable the slow query log, you need to configure it in the MySQL configuration file .

#开启慢日志
slow_query_log=1
#设置阈值为1秒
long_query_time=1

After the configuration is complete, restart the MySQL server through the command to test, and check the information recorded in the slow log file

#重启MySQL服务器
systemctl restart mysqld

#查看慢日志
cat /var/lib/mysql/localhost-slow.log

PROFILEDetails

show profiles can help us understand where the time is spent when doing SQL optimization. Through the have_profiling parameter, you can see whether the current MySQL supports profile operations:

SELECT @@have_profiling;
#查看每一条SQL的耗时基本情况
SHOW PROFILES;

#查看指定query_id的SQL语句各个阶段的耗时
SHOW PROFILE FOR QUERY query_id;

#查看指定query_id的SQL语句CPU的使用情况
SHOW PROFILE CPU FOR QUERY query_id;

EXPLAINexecution plan

The EXPLAIN or DESC command obtains information about how MySQL executes the SELECT statement, including how tables are joined and the order in which they are joined during the execution of the SELECT statement.

#直接在select语句之前加上关键字 explain / desc
EXPLAIN SELECT 字段列表 FROM 表名 WHERE 条件 ;

insert image description here

Index Usage Rules

leftmost prefix rule

If multiple columns are indexed (joint index), the leftmost prefix rule must be followed. The leftmost prefix rule means that the query starts from the leftmost column of the index and does not skip columns in the index. If a column is skipped, the index will be partially invalidated (the subsequent field index will be invalidated).

In the joint index, a range query (<, >) appears, and the column index on the right side of the range query becomes invalid. You can use >= or <= to avoid the index invalidation problem.

Index failure

Index column operations

Do not perform operations on indexed columns, the index will be invalid.

EXPLAIN SELECT * FROM tb_user WHERE SUBSTRING(phone,10,2) = '15'

string without quotes

When a string type field is used, without quotation marks, the index will be invalid.

explain select * from tb_user where profession = '软件工程' and age = 31 and status = '0';

explain select * from tb_user where profession = '软件工程' and age = 31 and status = 0;

explain select * from tb_user where phone = '17799990015';

explain select * from tb_user where phone = 17799990015;

fuzzy query

If only the tail fuzzy match, the index will not be invalid. If the header is a fuzzy match, the index will be invalid.

explain select * from tb_user where profession like '软件%';#生效

explain select * from tb_user where profession like '%工程';#失效

explain select * from tb_user where profession like '%工%';#失效

The condition of the or connection

Conditions separated by or, if the column in the condition before or has an index, but there is no index in the subsequent column, then the involved indexes will not be used.

explain select * from tb_user where id = 10 or age = 23;

explain select * from tb_user where phone = '17799990017' or age = 23;

Since age has no index, even if id and phone have indexes, the index will be invalid. So it is necessary to build an index for age.

Data Distribution Impact

select * from tb_user where phone >= '17799990005';

select * from tb_user where phone >= '17799990015';

Because MySQL will evaluate the efficiency of using the index and the efficiency of the full table scan when querying. If the full table scan is faster, the index will be abandoned and the full table scan will be used. Because the index is used to index a small amount of data, if a large amount of data is returned through an index query, it is not as fast as a full table scan, and the index will fail at this time.

SQL hints

It is an important means of optimizing the database. Simply put, it is to add some artificial hints to the SQL statement to achieve the purpose of optimizing the operation.

For example, use index:
explain select * from tb_user use index(idx_user_pro) where profession="软件工程";
which index does not use:
explain select * from tb_user ignore index(idx_user_pro) where profession="软件工程";
which index must be used:
explain select * from tb_user force index(idx_user_pro) where profession="软件工程";

use is a suggestion, which index MySQL actually uses will also weigh the running speed to change it, and force means to force the index to be used no matter what.

Covering index & query back to the table

Try to use the covering index (the query uses the index, and all the columns that need to be returned can be found in the index), reduce select *.

The meaning of the extra field in explain:
using index condition: The search uses the index, but needs to go back to the table to query data
using where; using index;: the search uses the index, but the required data can be found in the index column, so there is no need to go back to the table to query

  • If the corresponding row can be found directly in the clustered index, the row data will be returned directly, and only one query is required, even if it is select *;
  • If you are looking for the clustered index in the auxiliary index, for example, select id, name from xxx where name='xxx';,you only need to find the corresponding id through the auxiliary index (name), and return the id corresponding to the name and name index, and only one query is required;
  • If you are looking for other fields through auxiliary indexes, you need to query back to the table , such asselect id, name, gender from xxx where name='xxx';

So try not to use select *, it is easy to return to the table query and reduce efficiency, unless there is a joint index that contains all fields

Interview question: A table has four fields (id, username, password, status). Due to the large amount of data, the following SQL statements need to be optimized. How to proceed is the optimal solution:
select id, username, password from tb_user where username='itcast';

Solution: To build a joint index for username and password fields, you don’t need to query back to the table, and directly cover the index

prefix index

When the field type is a string (varchar, text, etc.), sometimes it is necessary to index a very long string, which will make the index very large. When querying, a lot of disk IO will be wasted and the query efficiency will be affected. At this time, you can just Reduce a part of the prefix of the string and build an index, which can greatly save the index space and improve the index efficiency.

grammar:create index idx_xxxx on table_name(columnn(n));

Prefix length: It can be determined according to the selectivity of the index, and the selectivity refers to the ratio of the unique index value (cardinality) to the total number of records in the data table. The higher the selectivity of the index, the higher the query efficiency, and the selectivity of the unique index It is 1, which is the best index selectivity and the best performance.
Find the selectivity formula:

select count(distinct email) / count(*) from tb_user;
select count(distinct substring(email, 1, 5)) / count(*) from tb_user;

The sub_part in the show index can see the length of the access

Single Column Index & Joint Index

Single-column index : that is, an index contains only a single column
Joint index : that is, an index contains multiple columns
In a business scenario, if there are multiple query conditions, it is recommended to build a joint index instead of a single column when considering building an index for the query field index.

Single-column index situation:
explain select id, phone, name from tb_user where phone = '17799990010' and name = '韩信';
This sentence will only use the phone index field

Precautions

  • When multi-condition joint query is performed, the MySQL optimizer will evaluate which field's index is more efficient, and will choose this index to complete the query

Design Principles

  1. Create indexes for tables with large amounts of data and frequent queries.
  2. Create indexes for fields that are often used as query conditions (WHERE), sorting (ORDER BY), and grouping (GROUP BY) operations.
  3. Try to choose columns with high discrimination as the index, and try to build a unique index. The higher the discrimination, the higher the efficiency of using the index.
  4. If it is a field of string type, the length of the field is relatively long, and a prefix index can be established according to the characteristics of the field .
  5. Use joint indexes as much as possible to reduce single-column indexes . When querying, joint indexes can cover indexes in many cases, saving storage space, avoiding table return, and improving query efficiency.
  6. To control the number of indexes, the more indexes the better, the more indexes, the greater the cost of maintaining the index structure, which will affect the efficiency of adding, deleting, and modifying.
  7. If an indexed column cannot store NULL values, constrain it with NOT NULL when creating the table. When the optimizer knows whether each column contains NULL values, it can better determine which index is most efficient to use for queries.

Guess you like

Origin blog.csdn.net/baidu_33256174/article/details/130673664