SQL index for SQL


1. Index overview

introduce

Index (index) is a data structure (ordered) that helps MysQL get data efficiently. In addition to data, the database system also maintains data structures that satisfy specific search algorithms. These data structures refer to (point to) data in a certain way, so that advanced search algorithms can be implemented on these data structures. This data structure is an index. .
insert image description here

demo

insert image description hereRemarks: The above binary tree index structure is just a schematic diagram, not a real index structure.

Advantages and disadvantages

insert image description here

2. Index structure

insert image description hereThe MySQL index is implemented at the storage engine layer. Different storage engines have different structures, mainly including the following types; the
insert image description hereinsert image description hereindex we usually refer to refers to the index organized by the B+ tree structure unless otherwise specified.

binary tree

insert image description hereDisadvantages of binary tree: When sequentially inserted, a linked list will be formed, and the query performance will be greatly reduced. In the case of a large amount of data, the hierarchy is deep and the retrieval speed is slow.
Red-black tree: In the case of a large amount of data, the hierarchy is deep and the retrieval speed is slow.

B-Tree (multi-way balanced search tree)

Take a b-tree with a maximum degree (max-degree) of 5 (order 5) as an example (each node can store up to 4 keys and 5 pointers); knowledge tips: the degree of a tree refers to a
insert image description here
node The number of child nodes of .

Insert 100 65169 368 900 556 780 352151200 234 888158 90 1000 88 120 268250 data as an example.
insert image description hereThe specific dynamic change process can refer to the website:https://www.cs.usfca.edu/~galles/visualization/BTree.html

B+Tree

Take a b+tree with a maximum degree (max-degree) of 4 (4th order) as an example:
insert image description here

Insert the data of 10065169 368 900 55678035 2151200 234 888 158 90 1000 88 120268250 as an example.

insert image description hereCompared with B-Tree difference:

.所有的数据都会出现在叶子节点
②.叶子节点形成一个单向链表

The MySQL index data structure optimizes the classic B+Tree. On the basis of the original B+Tree, a linked list pointer pointing to the adjacent leaf nodes is added to form a B+Tree with sequential pointers, which improves the performance of interval access.
insert image description here

Hash

The hash index uses a certain hash algorithm to convert the key value into a new hash value, maps it to the corresponding slot, and then stores it in the hash table.
If two (or more) key values ​​are mapped to the same slot, they will have a hash conflict (also known as a hash collision), which can be resolved through a linked list.
insert image description here

Hash index features

  1. Hash indexes can only be used for peer-to-peer comparisons (=, in), and range queries (between, >, <, ...) are not supported
  2. Unable to complete the sort operation using the index
  3. The query efficiency is high, usually only one search is required, and the efficiency is usually higher than that of the B+tree index

Storage engine support

In MysQL, the memory engine supports the hash index, while innoDB has an adaptive hash function. The hash index is automatically built by the storage engine based on the B+Tree index under specified conditions.

Why does the InnoDB storage engine choose to use the B+tree index structure?

  • Compared with the binary tree, it has fewer levels and higher search efficiency;
  • For B-tree, whether it is a leaf node or a non-leaf node, data will be saved, which will reduce the key value stored in a page, and the pointer will decrease accordingly. To save a large amount of data, the height of the tree can only be increased, resulting in performance degradation;
  • Compared with Hash index, B+tree supports range matching and sorting operations;

3. Index classification

insert image description hereIn the InnoDB storage engine, according to the storage form of the index, it can be divided into the following two types:
insert image description here

Clustered index selection rules;

  • If a primary key exists, the primary key index is a clustered index.
  • If no primary key exists, the first UNIQUE index will be used as the clustered index.
  • If the table has no primary key, or no suitable unique index, InnoDB will automatically generate a rowid as a hidden clustered index.
    insert image description hereinsert image description here

Thinking question
1. Which of the following SQL statements is more efficient? Why?

select * from user where id =10;
select * from user where name = 'Arm';
Remarks: id is the primary key, and the name field has an index;
query based on id>name

insert image description hereHow high is the B+tree height of the lnnoDB primary key index?
Suppose:
the size of a row of data is 1k, and 16 rows of such data can be stored in one page. InnoDB pointers occupy 6
bytes of space, even if the primary key is bigint, the number of bytes occupied is 8.
The height is 2:
n * 8+(n +7) 6=16 1024, the calculated n is about 1170
1171 16= 18736
and the height is 3:
7171
1171* 16= 21939856

4. Index syntax

create index

CREATE [ UNIQUE |FULLTEXT] INDEX index_name ON 
table_name ( index_col_name...) ;

view index

SHOW INDEX FROM table_name ;

delete index

DROP INDEX index_name ON table_name ;

According to the following requirements, complete the creation of the index

  1. The name field is a name field, the value of this field may be repeated, and an index is created for this field.
  2. The value of the phone number field is non-null and unique, and a unique index is created for this field.
  3. Create a joint index for profession, age, status.
  4. Build appropriate indexes for emails to improve query efficiency.
CREATE INDEX idx_user_name ON tb_user(name);
CREATE UNIQUE INDEX idx_user_phone ON tb_user(phone);
CREATE INDEX idx_user_pro_aye_sta ON tb_user(profession,age,status);
CREATE INDEX idx_email ON tb_user(email);

Five, SQL performance analysis

SQL execution frequency

After the MySQL client is successfully connected, the server status information can be provided through the show [session global status command. Through the following command, you can view the access frequency of INSERT, UPDATE, DELETE, and SELECT of the current database:

SHOW GLOBAL STATUS LIKE 'Com______';

insert image description here

slow query log

The slow query log records all the logs of all 5QL statements whose execution time exceeds the specified parameter (long_query_time, unit: second, default 10 seconds). The slow
query log of MySQL is not enabled by default, and it needs to be configured in the MySQL configuration file (/etc/my. cnf) configure the following information:

#开启MySQL慢日志查询开关
slow_query_log=1
#设置慢日志的时间为2秒,5QL语句执行时间超过2秒,就会视为慢查询,记录慢查询日志
long_query_time=2

After the configuration is complete, restart the MySQL server for testing, and check the information recorded in the slow log file /varltib/mysql/localhost-slow.log.

profile details

show profiles can help us understand where time is spent when doing SQL optimization. Through the have_profiling parameter, you can see whether the current MySQL supports
profile operations:

SELECT @@have _profiling ;

The default profiling is off, you can enable profiling at the session/global level through the set statement:
SET profiling = 1;

Execute a series of business SQL operations, and then view the execution time of the instructions through the following instructions:

#查看每一条SQL的耗时基本情况
show profiles;
#查看指定query_id的SQL语句各个阶段的耗时情况
show profile for query query_id;
#查看指定query_id的SQL语句CPU的使用情况
show profile cpu for duery query_id;

explain execution plan
The EXPLAIN or DESC command obtains information about how MySQL executes the SELECT statement, including how tables are joined and the order in which they are joined during the execution of the SELECT statement.
grammar:

#直接在select语句之前加上关键字explain / desc
EXPLAIN SELECT 字段列表 FROM 表名 WHERE 条件;

insert image description hereThe meaning of each field in the EXPLAIN execution plan:

ld

The serial number of the select query, indicating the order in which the select clause or the operation table is executed in the query (the id is the same, the execution order is from top to bottom; the id is different, the larger the value, the earlier the execution).

select_type

Indicates the type of SELECT. Common values ​​include SIMPLEg simple table, that is, no table connection or subquery), PRIMARY (main query, that is, the outer query), UNION (the second or subsequent query statement in UNION) , SUBQUERY (subqueries are included after SELECT/WHERE), etc.

type

Indicates the connection type. The connection types with good performance to poor performance are NULL, system, const, eq_ref, ref, range, index, and all.

possible_key

Displays the indexes, one or more, that may be applied to this table.

Key

The actual index used, if NULL, no index is used.

Key_len

Indicates the number of bytes used in the index. This value is the maximum possible length of the index field, not the actual length used. On the premise of not losing accuracy, the shorter the length, the better.

rows

The number of rows that MySQL considers necessary to execute the query, in InnoDB tables, is an estimate and may not always be accurate.

filtered

Indicates the percentage of the number of rows returned in the result to the number of rows to be read. The larger the value of filtered, the better.

Index Usage Principles

Validate Index Efficiency

Before the index is created, execute the following SQL statement to view the time-consuming SQL.

SELECT*FROM tb sku WHERE sn ='100000003145001';

Create an index on the field

create index idx_sku_sn on tb_sku(sn) ;

leftmost prefix rule

If multiple columns are indexed (joint index), the leftmost prefix rule must be followed. The leftmost prefix rule means that the query starts from the leftmost column of the index and does not skip columns in the index. If a column is skipped, the index will be partially invalidated (the subsequent field index will be invalidated).

explain select * from tb_user where profession= '软件工程' and age = 31 and status = '0';
explain select * from tb_user where profession= '软件工程' and age = 31;
explain select * from tb_user where profession= '软件工程";
explain select * from tb_user where age = 31 and status = '0';
explain select *from tb_user where status = '0';

Index column operation

Do not perform operations on indexed columns, the index will be invalid.

explain select * from tb_user where substring(phone,10,2)= '15';

string without quotes

When a string type field is used, without quotation marks, the index will be invalid.

explain select * from tb_user where profession= '软件工程' and age = 31 and status =0;
explain select * from tb_user where phone = 17799990015;

fuzzy query

If only the tail fuzzy match, the index will not be invalid. If it is a header fuzzy match, the index becomes invalid.

explain select * from tb_user where profession like '软件%*;
explain select * from tb_user where profession like '8工程";
explain select * from tb_user where profession like '%工%";

The condition of the or connection

Conditions separated by or, if the column in the condition before or has an index, but there is no index in the subsequent column, then the involved indexes will not be used.

explain select * from tb_user where id= 10 or age = 23;
explain select * from tb_user where phone = '17799990017' or age = 23;

Since age has no index, even if id and ppone have indexes, the index will be invalid. So it is necessary to build an index for age.
insert image description hereinsert image description hereinsert image description here

Data Distribution Impact

If MySQL evaluates using an index as slower than a full table, then no index is used.

select * from tb_user where phone >='17799990005';
select *from tb_user where phone >='17799990015';

insert image description here

.SQL hints

SQL prompt is an important means to optimize the database. Simply put, it is to add some artificial prompts to the 5QL statement to achieve the purpose of optimizing the operation.

use index:
explain select * from tb_user use index(idx_user_pro) where profession= '软件工艺';

ignore index:
explain select * from tb_user ignore index(idx_user_pro) where profession=“软件工程”;

force index:
explain select * from tb_user force index(idx_user_pro) where profession= 'Software Engineering';
insert image description here

covering index

Try to use the covering index (the query uses the index, and all the columns that need to be returned can be found in the index), and reduce select *.

explain select id, profession from tb_user where profession='软件工程' and age= 31 and status = 'o' ;
explain select id,profession,age, status from tb_user where profession='软件工程'’ and age =31 and status = '0';
explain select id,profession,age,status, name from tb_user where profession ="软件工程’ and age=31 and status = '0';
explain select * from tb_user where profession= "软件工程’ and age =31 and status = '0' ;
知识小贴士:
using index condition :查找使用了索引,但是需要回表查询数据
using where; using index:查找使用了索引,但是需要的数据都在索引列中能找到,所以不需要回表查询数据

insert image description here

prefix index

When the field type is a string (varchar, text, etc.), sometimes a very long string needs to be indexed, which will make the index large and waste a lot of disk 10 when querying, affecting query efficiency. At this time, only a part of the prefix of the string can be indexed, which can greatly save the index space and improve the index efficiency.

grammar

create index idx xooox on table_name(column(n));

prefix length

It can be determined according to the selectivity of the index, and the selectivity refers to the ratio of the unique index value (cardinality) to the total number of records in the data table. The higher the index selectivity, the higher the query efficiency. The selectivity of the unique index is 1. This is the best index selectivity and the best performance.

select count(distinct email) / count(*) from tb_user ;
select count(distinct substring(email,1,5))/ count(*) from tb_user ;

Single column index and joint index

Single-column index: That is, an index contains only a single column. Joint index: That is, an index contains multiple columns.
In a business scenario, if there are multiple query conditions, it is recommended to build a joint index instead of a single-column index when considering building an index for the query field.
Single-column index case:

explain select id, phone, name from tb_user where phone = '17799990010' and name = '韩信';

insert image description hereWhen multi-condition joint query is performed, the MySQL optimizer will evaluate which field's index is more efficient, and will choose this index to complete the query

The Joint Index Case:
insert image description here
Index Design Principles

  1. Create indexes for tables with large amounts of data and frequent queries.
  2. Create indexes for fields that are often used as query conditions (where), sorting (order by), and grouping (group
    by). Try to choose columns with high discrimination as the index, and try to build a unique index. The higher the discrimination, the higher the efficiency of using the index.
  3. If it is a field of string type, the length of the field is relatively long, and a prefix index can be established according to the characteristics of the field.
  4. Use joint indexes as much as possible to reduce single-column indexes. When querying, joint indexes can cover indexes in many cases, saving storage space, avoiding returning tables, and improving query efficiency.
  5. To control the number of indexes, the more indexes the better, the more indexes, the greater the cost of maintaining the index structure, which will affect the efficiency of adding, deleting, and modifying.
  6. If an indexed column cannot store NULL values, constrain it with NOTNULL when creating the table. When the optimizer knows whether each column contains NULL values, it can better determine which index is most efficient to use for queries.

Summarize

insert image description here
insert image description here

Guess you like

Origin blog.csdn.net/hsuehgw/article/details/130170218