Data Structure MySQL - Index

Table of contents

1. Index overview

2. Index structure

3. Index classification

4. Index syntax

 Five, SQL performance analysis

1. Check the execution frequency

2. Slow query log

3. show profiles command

 4. explain execution plan

6. Index Usage Rules

1. Verify index efficiency

2. Leftmost prefix rule

 3. Range query

4. Index invalidation

5. SQL Prompt

 6. Covering indexes

7. Prefix index

 8. Selection of single-column index and joint index

 7. Index Design Principles


1. Index overview

Index (index) is  a data structure  ( ordered ) that helps Mysql to obtain data  efficiently  . In addition to data, the database system also maintains data structures that satisfy specific search algorithms. These data structures refer to (point to) data in a certain way, so that advanced search algorithms can be implemented on these data structures. This data structure is an index. .

  • Demo:
select * from user where age = 45;

 Note: The above binary tree index structure is just a schematic diagram, not a real index structure.

  •  Advantages and disadvantages:
Advantage disadvantage
  1. Improve the efficiency of data retrieval and increase the zero cost of the database.
  2. Sorting data through indexes reduces the cost of data sorting and reduces CPU consumption.
  1. Index columns also take up space.
  2. Indexes greatly improve query efficiency, but also reduce the speed of updating tables. For example, when inserting, updating, and deleting tables, the efficiency decreases.

2. Index structure

The MySQL index is implemented at the storage engine layer. Different storage engines have different structures, mainly including the following:

index structure describe
B+Tree index The most common index type, most engines support B+ tree index.
Hash index The underlying data structure is implemented with a hash table, and only queries that exactly match index columns are valid, and range queries are not supported.

R-tree

(spatial index)

Spatial index is a special index type of MyISAM engine, mainly used for geospatial data types, usually used less

Full-text

(full text index)

It is a way to quickly match documents by building an inverted index. Similar to Lucene, Solr, ES.
  • B+Tree index

The MySQL index data structure optimizes the classic B+Tree. On the basis of the original B+Tree, a linked list pointer pointing to the adjacent leaf nodes is added to form a B+Tree with sequential pointers, which improves the performance of interval access.

  •  Hash index

The hash index uses a certain hash algorithm to convert the key value into a new hash value, maps it to the corresponding slot, and then stores it in the hash table.
If two (or more) key values ​​are mapped to the same slot, they will have a hash conflict (also known as a hash collision), which can be resolved through a linked list.

① Hash index features:

  1. Hash indexes can only be used for peer-to-peer comparisons (=, in), and range queries (between, >, <, ....) are not supported
  2. Unable to complete the sort operation using the index
  3. The query efficiency is high, usually only one search is required, and the efficiency is usually higher than that of the B+tree index

② Storage engine support: In MsaL, the memory engine supports the hash index, while innoD8 has an adaptive hash function, and the hash index is automatically constructed by the storage engine based on the B+Tree index under specified conditions.

Thinking: Why does the InnoDB storage engine choose to use the B+Tree index structure? (interview questions)

  • Compared with the binary tree, there are fewer levels and the search efficiency is higher.
  • For B-tree, no matter it is a leaf node or a non-leaf node, it will save data, which will reduce the key value stored in a page, and the pointer will decrease accordingly. To save a large amount of data, you can only increase the height of the tree, resulting in performance degradation.
  • Compared with Hash index, B+Tree supports range matching and sorting operations.

3. Index classification

Classification meaning features keywords
primary key index Index created on the primary key of the table Automatically created by default, only one PRIMARY
unique index Avoid duplicate values ​​in a data column in the same table can have multiple UNIQUE
regular index Quickly locate specific data can have multiple
full text index The full-text index looks for keywords in the text, not the values ​​in the comparison index can have multiple FULL TEXT

In the InnoDB storage engine , according to the storage form of the index , it can be divided into the following two types:

Classification meaning features

clustered index

(Clustered lndex)

Put the data storage and the index together, and the leaf nodes of the index structure save the row data must have and only one
Secondary index (Secondary lndex) Store the data separately from the index, and the leaf nodes of the index structure are associated with the corresponding primary key There can be multiple

Clustered index selection rules:

  • If a primary key exists, the primary key index is a clustered index.
  • If no primary key exists, the first UNIQUE index will be used as the clustered index.
  • If the table has no primary key, or no suitable unique index, InnoDB will automatically generate a rowid as a hidden clustered index.
Return table query: first get the primary key value according to the secondary index, and then use the clustered index to get the data of this row according to the primary key value.

Thinking: Which of the following SQL statements has high execution efficiency? Why?

 Answer: The first one has high execution efficiency. Find the B+Tree constructed by the aggregated (primary key) index directly according to the id, and directly find the row data and return it; according to the name field, you need to first go to the B+Tree of the secondary index to find the value of the primary key corresponding to the name, and then go back to the table to query Find the corresponding row data on the B+Tree of the clustered index.

4. Index syntax

  • Create an index: CREATEUNIQUE|FULLTEXT  ] INDEX index_name ON table_name ( index_col_name, ... );
  • View the index: SHOW INDEX FROM table_name;
  • Drop an index: DROP INDEX index_name ON table_name;

Example exercise: Complete the index creation according to the following requirements

  1. The name field is a name field, and the value of this field may be repeated. Create an index for this field.
  2. phone The value of the mobile phone number field is non-null and unique, and a unique index is created for this field.
  3. Create a joint index for profession, age, status.
  4. Build proper index for email to improve query efficiency.
show index from tb_user;

-- 1. name 字段为姓名字段,该字段的值可能会重复,为该字段创建索引。
create index idx_user_name on tb_user (name);

-- 2. phone 手机号字段的值,是非空,且唯一的,为该字段创建唯一索引。
create unique index idx_user_phone on tb_user(phone);

-- 3. 为 profession、age、status 创建联合索引。
create index idx_user_pro_age_stu on tb_user(profession,age,status);

-- 4. 为 email 建立合适的索引来提升查询效率。
create index idx_user_email on tb_user(email);

 Five, SQL performance analysis

1. Check the execution frequency

After the MySQL client is successfully connected, the server status information can be provided through the show [ session|global ] status command . Through the following command, you can view the access frequency of INSERT, UPDATE, DELETE, and SELECT of the current database:   SHOW GLOBAL STATUS LIKE  ' Com_ _ _ _ _ '; (one underscore and one character)

 Check the SQL execution frequency in this way to provide support for SQL optimization.

2. Slow query log

  • Check the status of the slow query log

The slow query log records the logs of all SQL statements whose execution time exceeds the specified parameter (long_query_time, unit: second, default 100 seconds) . MySQL's slow query log is not enabled by default, and the information needs to be configured in the MySQL configuration file (/etc/my.cnf):

After the configuration is complete, restart the MySQL server for testing with the following command, and check the information recorded in the slow log file /var/lib/mysql/localhost-slow.log.

3. show profiles command

  • show profiles can help us understand where the time is spent when doing SQL optimization. Through the have_profiling parameter , you can see whether the current MySQL supports profile operations: SELECT @@have_profiling ;

  • The default profiling is off , you can enable profiling at the session / global level through the set statement: SET profiling= 1;
  •  Execute a series of business SQL operations, and then check the execution time of the instructions through the following instructions:

 4. explain execution plan

 The EXPLAIN or DESC command obtains information about how MySQL executes the SELECT statement , including how tables are joined and the order in which they are joined during the execution of the SELECT statement . grammar: 

 The meaning of each field of the EXPLAIN execution plan:

  • id: The sequence number of the select query, indicating the order in which the select clause or the operation table is executed in the query ( if the id is the same, the execution order is from top to bottom; if the id is different, the larger the value, the earlier the execution ).

> Many-to-many multi-table association: the id is the same, and the execution order is from top to bottom

> Subquery (query students who have taken the "MySQL" course): the id is different, the larger the value, the earlier it will be executed

  • select_type: Indicates the type of SELECT . Common values ​​include SIMPLE (simple table, that is, no table connection or subquery is used), PRIMARY (main query, that is, the outer query), UNION (the second or later in UNION query statement), SUBQUERY (subquery is included after SELECT/WHERE), etc.
  • type: Indicates the connection type . The connection types with good performance to poor performance are NULL, system, const, eq_ref, ref, range, index, and all .
  • possible_key: Displays one or more indexes that may be applied to this table .
  • key: The index actually used , if it is NULL, the index is not used.
  • Key_len: Indicates the number of bytes used in the index . This value is the maximum possible length of the index field, not the actual length used. The shorter the length , the better without loss of accuracy .
  • rows: The number of rows that MySQL considers necessary to execute the query , in the table of the innodb engine, is an estimated value, which may not always be accurate.
  • filtered: Indicates the percentage of the number of rows returned in the result to the number of rows to be read . The larger the value of filtered, the better .

6. Index Usage Rules

1. Verify index efficiency

  • Before creating the index, execute the following SQL statement to view the time-consuming SQL: SELECT *FROM tb_sku WHERE sn = '100000003145001' ;

We found that it takes 20.78 seconds to execute a data query, which is extremely inefficient. The reason is: because the id in the table is the primary key, the default primary key index, and the sn field has no index, so the efficiency is low.

  •  Create an index for the field: create index idx_sku_sn on tb_sku(sn) ; (Build B+Tree index structure)

  • Then execute the same SQL statement again, and check the time-consuming SQL again: SELECT *FROM tb_sku WHERE sn = '100000003145001' ;

 PS: The above proves that the index improves query efficiency.

2. Leftmost prefix rule

  • If multiple columns are indexed (joint index) , the leftmost prefix rule should be followed. The leftmost prefix rule means that the query starts from the leftmost column of the index and does not skip columns in the index .
  • If a column is skipped, the index will be partially invalidated (the subsequent field index will be invalidated) .
  • It has nothing to do with the storage location, as long as it exists.

Example: Joint index idx_user_pro_age_sta

explain select * from tb_user 
where profession = '软件工程' and age = 31 and status = '0' ;

explain select * from tb_user 
where profession = '软件工程';

explain select * from tb_user where age = 31 and status = '10';

explain select * from tb_user 
where age = 31 and status = '0' and profession = '软件工程';

 3. Range query

In the joint index, a range query (>,<)  appears , and the column index on the right side of the range query becomes invalid .

explain select * from tb_user 
where profession = '软件工程'and age > 30 and status = '0';

explain select * from tb_user 
where profession = '软件工程'and age >= 30 and status = '0';

4. Index invalidation

  • Do not perform operations on indexed columns , the index will be invalid .
explain select * from tb_user where substring(phone,10,2) = '15';

  •  When a string type field is used, without quotation marks , the index will be invalid .
explain select * from tb_user 
where profession='软件工程' and age = 31 and status = 0;

  •  If it is only a tail fuzzy match , the index will not be invalidated . If it is a header fuzzy match , the index becomes invalid .
explain select * from tb_user where profession like '软件%';

explain select * from tb_user where profession like '%工程';

  •  For the condition separated by or, if the column in the condition before or has an index , but there is no index in the subsequent column , then the involved index will not be used .
explain select * from tb_user where id = 10 or age = 23;

Since age does not have an index, even if id has an index, the index will fail, and the index needs to be indexed for age.

5. SQL Prompt

Learn about SQL hints with a small example:

It is known that the profession query will use the composite index, so if we create a single-column index.

create index idx_user _pro on tb_user (profession);

So when we query again, will we choose to use a composite index or a single column index?

Therefore: SQL hints are an important means of optimizing the database. Simply put, it is to add some artificial hints to the SQL statement to achieve the purpose of optimizing operations.

  • use index: It is recommended to use an index

  • ignore index: ignore the use of an index

  • force index: force the use of an index

 6. Covering indexes

We mentioned in the basics before that try not to use select *. On the one hand, it is not intuitive, the readability is poor, and on the other hand, it is inefficient. So try to use the covering index (the query uses the index, and all the columns that need to be returned can be found in the index).

explain select id,pofession,age,status from tb_user 
where profession = '软件工程' and age = 31 and status = '0';

Explanation: There is a joint index between id, pofession, and age, which belongs to the secondary index, and can get the data we want to find and return it directly, without needing to search for the clustered index.

explain select id,profession,age,status, name from tb_user 
where profession = '软件工程' and age = 31 and status = '0';

 Explanation: id, profession, age, and status can be queried through the secondary index, but the name field cannot. You need to look up the name field in the clustered index through the id, that is, return to the table query.

Knowledge stickers:

  • using index condition ; The search uses the index, but it needs to query the data back to the table .
  • using where; using index: The search uses an index, but all the required data can be found in the index column, so there is no need to go back to the table to query the data .

7. Prefix index

When the field type is a string (varchar, text, etc.) , sometimes a very long string needs to be indexed, which will make the index large, and waste a lot of disk 10 when querying, affecting query efficiency. At this time, only a part of the prefix of the string can be indexed , which can greatly save the index space and improve the index efficiency.

  • 语法:create index idx xooxx on table_name(column(n)) ;
  • Prefix length: It can be determined according to the selectivity of the index , and the selectivity refers to the ratio of the unique index value (cardinality) to the total number of records in the data table. The higher the selectivity of the index , the higher the query efficiency , and the selectivity of the unique index It is 1, which is the best index selectivity and the best performance.
  • official:

select count(distinct ermail) / count(*) 

from tb_user ;

select coint(distinct substring(email,1,5)) / count(*) 

from tb_user ;

-- 创建前缀索引 长度为5
create index idx_email_5 on tb_user (email(5));

explain select * from tb_user where email = '[email protected]' ;

 8. Selection of single-column index and joint index

  • Single-column index: That is, an index contains only a single column.
  • Joint index: That is, an index contains multiple columns.
  • In a business scenario, if there are multiple query conditions, it is recommended to build a joint index instead of a column index when considering indexing the query field.

Single-column index case:

explain select id, phone, name from tb_user 
where phone = '123456789' and name = '张三';

When performing a joint query with multiple conditions, the MySQL optimizer will evaluate which field has a higher index efficiency, and will select this index to complete the query.

而我们想要使用我们创建联合索引,不使用单列索引:

create unqiue index idx_user_phone_name on tb_user (phone, name) ;
explain select id,phone, name from tb_user use index(idx_user_phone_name)
where phone = '123456789' and name = '张三';

 七、索引设计原则

  1. 针对于数据量较大,且查询比较频繁的表建立索引
  2. 针对于常作为查询条件(where)、排序(order by)、分组(group by)操作的字段建立索引
  3. 尽量选择区分度高的列作为索引,尽量建立唯一索引,区分度越高,使用索引的效率越高。
  4. 如果是字符串类型的字段,字段的长度较长,可以针对于字段的特点,建立前缀索引
  5. 尽量使用联合索引,减少单列索引,查询时,联合索引很多时候可以覆盖索引,节省存储空间,避免回表,提高查询效率。
  6. 要控制索引的数量,索引并不是多多益善,索引越多,维护索引结构的代价也就越大,会影响增删改的效率。
  7. 如果索引列不能存储NULL值,请在创建表时使用 NOTNULL 约束它。当优化器知道每列是否包含NULL值时,它可以更好地确定哪个索引最有效地用于查询。
     

Guess you like

Origin blog.csdn.net/hdakj22/article/details/129769146