Chapter 08_Index Creation and Design Principles

Chapter 08_Index Creation and Design Principles

1. Declaration and use of index

1. 1 Classification of indexes

MySQL's indexes include ordinary indexes, unique indexes, full-text indexes, single-column indexes, multi-column indexes and spatial indexes, etc.

  • From 功能逻辑the above, there are four main types of indexes, namely ordinary indexes, unique indexes, primary key indexes, and full-text indexes.

  • According to 照物理实现方式the index, it can be divided into two types: clustered index and non-clustered index.

  • According to 作用字段个数the division, it is divided into single column index and joint index.

  1. Ordinary index

  2. unique index

  3. primary key index

  4. Single column index

  5. Multi-column (combination, union) index

    leftmost prefix

  6. Full text index

  7. Supplement: spatial index

    Use 参数SPATIALcan set the index to 空间索引. Spatial indexes can only be built on spatial data types, which can improve the efficiency of the system in obtaining spatial data. Spatial data types in MySQL include GEONETRY, POINT, LINESTRING, POLYGON, etc. Currently, only the MyISAM storage engine supports spatial retrieval, and the indexed fields cannot be empty. For beginners, this type of index is rarely used.

**Summary: Different storage engines support different index types**

InnoDB : supports B-tree, Full-text and other indexes, does not support Hash index;

MyISAM : supports B-tree, Full-text and other indexes, but does not support Hash index;

Memory: supports B-tree, Hash and other indexes, but does not support Full-text index;

NDB: supports Hash index, does not support B-tree, Full-text and other indexes;

Archive: does not support B-tree, Hash, Full-text and other indexes;

1. 2 Create index

MySQL supports multiple methods to create an index on a single or multiple columns: CREATE TABLEspecify the index column in the definition statement to create the table, use ALTER TABLEa statement to create an index on an existing table, or use CREATE INDEXa statement to add an index on an existing table.

1. Create an index when creating a table

When using CREATE TABLE to create a table, in addition to defining the data type of the column, you can also define primary key constraints, foreign key constraints or unique constraints. No matter which constraint is created, defining the constraint is equivalent to creating a constraint on the specified column. an index.

Example:

CREATE TABLE dept(
dept_id INT PRIMARY KEY AUTO_INCREMENT,
dept_name VARCHAR( 20 )
);
CREATE TABLE emp(
emp_id INT PRIMARY KEY AUTO_INCREMENT,
emp_name VARCHAR( 20 ) UNIQUE,
dept_id INT,
CONSTRAINT emp_dept_id_fk FOREIGN KEY(dept_id) REFERENCES dept(dept_id)
);

However, if you create an index explicitly when creating a table, the basic syntax is as follows:

CREATE TABLE table_name [col_name data_type]
[UNIQUE | FULLTEXT | SPATIAL][INDEX |KEY][index_name] (col_name [length]) [ASC | DESC]

  • UNIQUE, FULLTEXT and SPATIAL are optional parameters, representing unique index, full-text index and spatial index respectively;
  • INDEX and KEY are synonyms. They have the same function and are used to specify the creation of an index;
  • index_name specifies the name of the index, which is an optional parameter. If not specified, MySQL defaults to col_name as the index name;
  • col_name is the field column that needs to be indexed, and this column must be selected from multiple columns defined in the data table;
  • length is an optional parameter, indicating the length of the index. Only string type fields can specify the index length;
  • ASC or DESC specifies index value storage in ascending or descending order.

1. Create a normal index

Create a normal index on the year_publication field in the book table. The SQL statement is as follows:

#显式的方式创建
#1创建普通的索引
CREATE TABLE book (
    book_id INT ,
    book_name VARCHAR (100) ,
    AUTHORS VARCHAR (100) ,
    info VARCHAR(100) ,
    COMMENT VARCHAR (100) ,
    year_publication YEAR,
    #声明索引
	INDEX idx_bname (book_name))
;

#通过命令查看索引
#方式l:
mysql> show  create table book \G
*************************** 1. row ***************************
       Table: book
Create Table: CREATE TABLE `book` (
  `book_id` int(11) DEFAULT NULL,
  `book_name` varchar(100) DEFAULT NULL,
  `AUTHORS` varchar(100) DEFAULT NULL,
  `info` varchar(100) DEFAULT NULL,
  `COMMENT` varchar(100) DEFAULT NULL,
  `year_publication` year(4) DEFAULT NULL,
  KEY `idx_bname` (`book_name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
1 row in set (0.00 sec)

# 方式2: 
show index from book;
show index from book

It’s so useful, you must know it

2. Create a unique index

Example:

# 创建唯一索引
CREATE TABLE book (
    book_id INT ,
    book_name VARCHAR (100) ,
    #声明索引
	UNIQUE INDEX uk_idx_bname (book_name))
;

show index from book;

After the statement is executed, use SHOW CREATE TABLE to view the table structure:

3. Primary key index

After setting the primary key, the database will automatically create an index. Innodb is a clustered index. Syntax:

CREATE TABLE book (
    # 创建主键索引
    book_id INT  primary key,
    book_name VARCHAR (100)
;

Delete primary key index:

ALTER TABLE student
drop PRIMARY KEY ;

Modify the primary key index: you must first delete (drop) the original index, and then create (add) the index

4. Create a combined index

# 创建唯一索引
CREATE TABLE book (
    book_id INT ,
    book_name VARCHAR (100) ,
    author VARCHAR (100) ,
    #声明索引
	INDEX union_key_ba (book_name,author))
;

show index from book;

5. Create a full-text index

6. Create spatial index

When creating a spatial index, fields of spatial type must be non-null.

Example: Create table test5 and create a spatial index on the field with spatial type GEOMETRY. The SQL statement is as follows:

2. Create an index on an existing table

To create an index on an existing table, use the ALTER TABLE statement or the CREATE INDEX statement.

  1. Create an index using the ALTER TABLE statement. The basic syntax of the ALTER TABLE statement to create an index is as follows:

    ALTER TABLE table_name ADD [UNIQUE | FULLTEXT | SPATIAL] [INDEX | KEY]
    [index_name] (col_name[length],...) [ASC | DESC]
    
    ALTER TABLE book ADD INDEX index_name(book_name);
    ALTER TABLE book ADD UNIQUE uk_idx_bname(book_name);
    ALTER TABLE book ADD UNIQUE mul_bid_na(book_name,author);
    
  2. Use CREATE INDEX to create an index. The CREATE INDEX statement can add an index to an existing table. In MySQL, CREATE INDEX is mapped to an ALTER TABLE statement. The basic syntax structure is:

    CREATE [UNIQUE | FULLTEXT | SPATIAL] INDEX index_name
    ON table_name (col_name[length],...) [ASC | DESC]
    
    create 索引类型 索引名称 on 表名(字段);
    create index idx_cmt on book(comment);
    create unique index idx_cmt on book(comment);
    create index idx_cmt on book(comment,author);
    
3 Delete index
  1. Use ALTER TABLE to delete an index. The basic syntax format of ALTER TABLE to delete an index is as follows:

    ALTER TABLE table_name DROP INDEX index_name;
    
  2. Use the DROP INDEX statement to delete an index. The basic syntax format of DROP INDEX to delete an index is as follows:

    DROP INDEX index_name ON table_name;
    

When you need to delete a large amount of table data or modify the table data, you can consider deleting the index first. Wait until the data has been modified before inserting it.

AUTO_INCREMENT The unique index of the constraint field cannot be deleted

Tip: When you delete a column in a table, if the column you want to delete is part of an index, the column is also deleted from the index. If all columns that make up the index are dropped, the entire index will be dropped.

2.MySQL 8.0 new index features

2. 1 Support descending index

Descending index stores key values ​​in descending order. Although syntax-wise, descending index syntax has been supported since MySQL 4, but in fact the DESC definition was ignored, and it was not until MySQL 8.x that descending indexes were actually supported (limited to the InnoDB storage engine).

MySQL still created ascending indexes before version 8.0, and reverse scans were performed when used, which greatly reduced the efficiency of the database . In some scenarios, descending indexing makes sense. For example, if a query requires sorting multiple columns, and the order requirements are inconsistent, using a descending index will avoid the database from using additional file sorting operations, thus improving performance.

Example: Create data table ts 1 in MySQL 5. 7 version and MySQL 8. 0 version respectively. The results are as follows:

CREATE TABLE ts1(a int, b int, index idx_a_b(a, b desc) ) ;

Check the structure of data table ts 1 in MySQL version 5.7. The results are as follows:

mysql> show create table ts1 \G
*************************** 1. row ***************************
       Table: ts1
Create Table: CREATE TABLE `ts1` (
  `a` int(11) DEFAULT NULL,
  `b` int(11) DEFAULT NULL,
  KEY `idx_a_b` (`a`,`b`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
1 row in set (0.00 sec)

As can be seen from the results, the index is still in the default ascending order.

Check the structure of data table ts 1 in MySQL 8.0 version. The results are as follows:

mysql> show create table ts1 \G
*************************** 1. row ***************************
       Table: ts1
Create Table: CREATE TABLE `ts1` (
  `a` int DEFAULT NULL,
  `b` int DEFAULT NULL,
  KEY `idx_a_b` (`a`,`b` DESC)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3
1 row in set (0.00 sec)

As can be seen from the results, the index is already in descending order. Let's continue to test the performance of descending index in the execution plan.

Insert 800 pieces of random data into the data table ts 1 of MySQL 5. 7 version and MySQL 8. 0 version respectively. The execution statement is as follows:

CREATE TABLE ts1(a int,b int,index idx_a_b(a,b desc));
DELIMITER //
CREATE PROCEDURE ts_insert () BEGIN
	DECLARE
		i INT DEFAULT 1;
	WHILE
			i < 800 DO
			INSERT INTO ts1 SELECT
			rand()* 80000,
			rand()* 80000;
		
		SET i = i + 1;
		
	END WHILE;
	COMMIT;
	
END // 
DELIMITER;
#调用
CALL ts_insert ();

Check the execution plan of data table ts1 in MySQL version 5.7. The results are as follows:

mysql> explain select * from ts1 order by a, b desc limit 5;
+----+------+----------+-----------------------------+
| id | rows | filtered | Extra                       |
+----+------+----------+-----------------------------+
|  1 | 1598 |   100.00 | Using index; Using filesort |
+----+------+----------+-----------------------------+
1 row in set, 1 warning (0.01 sec)

As can be seen from the results, the number of scans in the execution plan is 1598, and Using filesort is used.

Tip: Using filesort is a relatively slow external sort in MySQL, and it is best to avoid it. In most cases, administrators can optimize indexes to avoid Using filesort, thereby improving database execution speed.

View the execution plan of data table ts1 in MySQL version 8.0.

mysql> explain select * from ts1 order by a, b desc limit 5;
+----+---------+-----+----------+-------------+
| id | key     |rows | filtered | Extra       |
+----+---------+-----+----------+-------------+
|  1 | idx_a_b |   5 |   100.00 | Using index |
+----+---------+-----+----------+-------------+
1 row in set, 1 warning (0.03 sec)

As can be seen from the results, the number of scans in the execution plan is 5, and Using filesort is not used.

Note that the descending index is only effective for the specific sort order in the query. If used improperly, the query efficiency will be lower. For example, if the above query sorting condition is changed to order by a desc, b desc, the execution plan of MySQL 5.7 is significantly better than that of MySQL 8.0.

2.2 Hidden index

In MySQL 5.7 and earlier, indexes can only be deleted explicitly. At this time, if an error occurs after deleting the index, the deleted index can only be created back by explicitly creating the index. If the amount of data in the data table is very large, or the data table itself is relatively large, this operation will consume too many system resources and the operation cost will be very high.

Supported starting from MySQL 8.x 隐藏索引(invisible indexes), you only need to set the index to be deleted as a hidden index so that the query optimizer will no longer use this index (even if you use force index (force index use), the optimizer will not use this index) Confirm After setting the index to hidden index, the system will not respond to any response, and the index can be completely deleted. 这种通过先将索引设置为隐藏索引,再删除索引的方式就是软删除.

At the same time, if you want to verify that an index has been deleted 查询性能影响, you can temporarily hide the index.

Notice:

Primary keys cannot be set as hidden indexes. When there is no explicit primary key in the table, the first unique non-null index in the table becomes the implicit primary key and cannot be set as a hidden index.

The index is visible by default. When using statements such as CREATE TABLE, CREATE INDEX or ALTERTABLE, you can set the visibility of the index through the VISIBLE or INVISIBLE keywords.

Create directly when creating the table

1. Create in MySQL

Hidden indexes are implemented through the SQL statement INVISIBLE, whose syntax is as follows:

CREATE TABLE tablename(
	propname1 type1 [ CONSTRAINT1],propname2 type2[ CONSTRAINT2],
    ...
	propnamen typen,
	INDEX [indexname ](propname1 [ ( length)]) INVISIBLE
);

create table book2(
	id int primary key,
    book_name varchar(32)
);

The above statement has one more keyword INVISIBLE than the ordinary index, which is used to mark the index as an invisible index.

2.Create on an existing table

Hidden indexes can be set for existing tables. The syntax is as follows:

CREATE [UNIQUE | FULLTEXT | SPATIAL] INDEX index_name ON table_name (col_name[length] [ASC | DESC] ,...) [INVISIBLE|VISIBLE]

3. Created through ALTER TABLE statement

ALTER TABLE book2 ADD index idx_name(book_name) INVISIBLE;

4. Switch the index visible state

Existing indexes can be switched to visible status through the following statement:

ALTER TABLE book2 alter index idx_name visible; # 切换成非隐藏索引
ALTER TABLE book2 alter index idx_name invisible; # 切换成非隐藏索引

If you switch the index_cname index to the visible state and view the execution plan through explain, you will find that the optimizer selected the idx_name index.

Note that when an index is hidden, its contents are still updated in real time like a normal index. If an index needs to be hidden for a long time, it can be deleted because the existence of the index will affect the performance of inserts, updates, and deletes.

You can see how indexes can help with tuning by setting the visibility of hidden indexes.

5. Make hidden indexes visible to the query optimizer

There is just a global place to set visibility, which is of no use.

In the MySQL 8.x version, a new way to test indexes is provided. You can turn on a certain setting through a switch of the query optimizer (use_invisible_indexes) to make hidden indexes visible to the query optimizer. If use_invisible_indexes is set to off (default), the optimizer ignores hidden indexes. If set to on, the optimizer still considers hidden indexes when generating execution plans, even if they are not visible.

(1) Execute the following command on the MySQL command line to view the switch settings of the query optimizer.

mysql> select @@optimizer_switch \G
*************************** 1. row ***************************
@@optimizer_switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,engine_condition_pushdown=on,index_condition_pushdown=on,mrr=on,mrr_cost_based=on,block_nested_loop=on,batched_key_access=off,materialization=on,semijoin=on,loosescan=on,firstmatch=on,duplicateweedout=on,subquery_materialization_cost_based=on,use_index_extensions=on,condition_fanout_filter=on,derived_merge=on,use_invisible_indexes=off,skip_scan=on,hash_join=on,subquery_to_derived=off,prefer_ordering_index=on,hypergraph_optimizer=off,derived_condition_pushdown=on
1 row in set (0.12 sec)

Find the following attribute configuration in the output result information.

use_invisible_indexes=off

The configuration value of this attribute is off, which means that the hidden index is not visible to the query optimizer by default.

(2) To make the hidden index visible to the query optimizer, you need to execute the following command on the MySQL command line:

mysql> set session optimizer_switch="use_invisible_indexes=on" ;
Query OK, 0 rows affected (0.06 sec)

The SQL statement is executed successfully. Check the query optimizer switch settings again.

At this time, you can see the following attribute configuration in the output result.

mysql> select @@optimizer_switch \G
*************************** 1. row ***************************
@@optimizer_switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,engine_condition_pushdown=on,index_condition_pushdown=on,mrr=on,mrr_cost_based=on,block_nested_loop=on,batched_key_access=off,materialization=on,semijoin=on,loosescan=on,firstmatch=on,duplicateweedout=on,subquery_materialization_cost_based=on,use_index_extensions=on,condition_fanout_filter=on,derived_merge=on,use_invisible_indexes=on,skip_scan=on,hash_join=on,subquery_to_derived=off,prefer_ordering_index=on,hypergraph_optimizer=off,derived_condition_pushdown=on
1 row in set (0.03 sec)

The value of the use_invisible_indexes attribute is on, indicating that the hidden index is visible to the query optimizer at this time.

3. Index design principles

3.1 Data preparation

Step 1: Create database and create tables

CREATE DATABASE atguigudb1;
USE atguigudb1;

#1.创建学生表和课程表
CREATE TABLE `student_info` (
`id` INT( 11 ) NOT NULL AUTO_INCREMENT,
`student_id` INT NOT NULL ,
`name` VARCHAR( 20 ) DEFAULT NULL,
`course_id` INT NOT NULL ,
`class_id` INT( 11 ) DEFAULT NULL,
`create_time` DATETIME DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`)
) ENGINE=INNODB AUTO_INCREMENT= 1 DEFAULT CHARSET=utf8;

CREATE TABLE `course` (
`id` INT( 11 ) NOT NULL AUTO_INCREMENT,
`course_id` INT NOT NULL ,
`course_name` VARCHAR( 40 ) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=INNODB AUTO_INCREMENT= 1 DEFAULT CHARSET=utf8;

Step 2: Create the stored functions necessary to simulate the data

#函数 1 :创建随机产生字符串函数

DELIMITER //
CREATE FUNCTION rand_string(n INT)
	RETURNS VARCHAR( 255 ) #该函数会返回一个字符串
BEGIN
	DECLARE chars_str VARCHAR( 100 ) DEFAULT
'abcdefghijklmnopqrstuvwxyzABCDEFJHIJKLMNOPQRSTUVWXYZ';
	DECLARE return_str VARCHAR( 255 ) DEFAULT '';
	DECLARE i INT DEFAULT 0 ;
    WHILE i < n DO
        SET return_str =CONCAT(return_str,SUBSTRING(chars_str,FLOOR( 1 +RAND()* 52 ), 1 ));
        SET i = i + 1 ;
    END WHILE;
    RETURN return_str;
END //
DELIMITER ;
#函数 2 :创建随机数函数
DELIMITER //
CREATE FUNCTION rand_num (from_num INT ,to_num INT) RETURNS INT( 11 )
BEGIN
DECLARE i INT DEFAULT 0 ;
SET i = FLOOR(from_num +RAND()*(to_num - from_num+ 1 )) ;
RETURN i;
END //
DELIMITER ;

Create a function, if an error is reported:

This function has none of DETERMINISTIC......

Since the query log bin-log is too slow to be enabled, we must specify a parameter for our function.

In master-slave replication, the host will record write operations in the bin-log log. The slave reads the bin-log log and executes statements to synchronize data. If functions are used to operate data, the slave and primary key operation times will be inconsistent. Therefore, by default, mysql does not enable creation function settings.

  • Check whether mysql allows creating functions:

    show variables like 'log_bin_trust_function_creators';
    
  • Command on: Allow creation of function settings:

    set global log_bin_trust_function_creators= 1 ;  # 不加global只是当前窗口有效。
    
  • When mysqld restarts, the above parameters will disappear again. Permanent method:

    • Under windows: my.ini[mysqld] plus:

      log_bin_trust_function_creators= 1
      
    • Under linux: add my.cnf[mysqld] under /etc/my.cnf:

      log_bin_trust_function_creators= 1
      

Step 3: Create a stored procedure that inserts mock data

#存储过程 1 :创建插入课程表存储过程
DELIMITER //
CREATE PROCEDURE insert_course( max_num INT )
BEGIN
    DECLARE i INT DEFAULT 0 ;
    SET autocommit = 0 ;  #设置手动提交事务
    REPEAT #循环
    SET i = i + 1 ;  #赋值
    INSERT INTO course (course_id, course_name ) VALUES
    (rand_num( 10000 , 10100 ),rand_string( 6 ));
    UNTIL i = max_num
    END REPEAT;
	COMMIT;  #提交事务
END //
DELIMITER ;
#存储过程 2 :创建插入学生信息表存储过程

DELIMITER //
CREATE PROCEDURE insert_stu( max_num INT )
BEGIN
DECLARE i INT DEFAULT 0 ;
	SET autocommit = 0 ;  #设置手动提交事务
	REPEAT #循环
    SET i = i + 1 ;  #赋值
    INSERT INTO student_info (course_id, class_id ,student_id ,NAME ) VALUES
    (rand_num( 10000 , 10100 ),rand_num( 10000 , 10200 ),rand_num( 1 , 200000 ),rand_string( 6 ));
    UNTIL i = max_num
	END REPEAT;
	COMMIT;  #提交事务
END //
DELIMITER ;

Step 4: Call the stored procedure

CALL insert_course( 100 );
CALL insert_stu( 1000000 );

3.2 Under what circumstances is it suitable to create an index?

1. The value of the field has uniqueness restrictions.

Fields with unique business characteristics, even combined fields, must be built into unique indexes. (Source: Alibaba)

Note: Do not think that the unique index affects the insert speed. This speed loss can be ignored, but the search speed is obviously improved.

2. Fields frequently used as WHERE query conditions

If a certain field is frequently used in the WHERE condition of the SELECT statement, then you need to create an index for this field. Especially when the amount of data is large, creating a common index can greatly improve the efficiency of data query.

For example, in the student_info data table (containing 1 million pieces of data), suppose we want to query the user information of student_id=123110.

3. Frequent GROUP BY and ORDER BY columns

Indexes allow data to be stored or retrieved in a certain order, so when we use GROUP BY to perform group queries on data, or use ORDER BY to sort data, we need it 对分组或者排序的字段进行索引. If there are multiple columns to be sorted, a combined index can be built on these columns.

4. WHERE conditional columns of UPDATE and DELETE

After querying the data according to a certain condition and then performing an UPDATE or DELETE operation, if an index is created on the WHERE field, the efficiency can be greatly improved. The principle is that we need to retrieve this record based on the WHERE condition column first, and then update or delete it. If the updated fields are updated when updating, 非索引字段the efficiency improvement will be more obvious. This is because updating non-index fields does not require index maintenance.

5. The DISTINCT field needs to be indexed

Sometimes we need to deduplicate a certain field. Using DISTINCT, creating an index on this field will also improve query efficiency.

For example, we want to query the different student_ids in the curriculum. If we do not create an index on student_id, execute

SQL statement:

SELECT DISTINCT( student_id)FROM 'student_info `;

Running results (600637 records, running time 0.683s):

... 加索引语句
SELECT DISTINCT( student_id)FROM 'student_info `;

If we create an index on student_id and then execute the SQL statement:

Running results (600637 records, running time 0.010s):

You can see that the SQL query efficiency has been improved, and the displayed student_id is still 递增的顺序displayed as shown. This is because

The index will sort the data in a certain order, so deduplication will be much faster. Because they are right next to each other, it is particularly convenient to remove weight.

6. Precautions when creating indexes during multi-table JOIN connection operations

First of all 连接表的数量尽量不要超过 3 张, because each addition of a table is equivalent to adding a nested loop, the order of magnitude increases

The session length is very fast, which seriously affects the efficiency of the query.

Secondly, WHERE 条件创建索引,because WHERE is the filtering of data conditions. If the amount of data is very large,

Filtering without WHERE conditions is terrible.

Finally, 对用于连接的字段创建索引,and the field is in multiple tables 类型必须一致. For example, course_id is in

Both the student_info table and the course table are of type int(11), but one cannot be of int type and the other is of varchar type.

For example, if we only create an index on student_id, execute the SQL statement:

SELECT course_id,name,student_info.student_id, course_name
FROM student_info JOIN course
ON student_info .course_id = course.course_id
WHERE name = '462eed7ac6e791292a79' ;

Running results (1 piece of data, running time 0.189s):

Here we create an index on name and then execute the above SQL statement. The running time is 0.002s.

7. Create an index using a column type that is small

What we are talking about here 类型大小refers to the size of the data range represented by this type.

When we define the table structure, we must explicitly specify the column type. Taking the integer type as an example, there are TINYINT, MEDIUMINT, INT,
BIGINTetc. The storage space they occupy increases in order, and the range of integers that can be represented also increases in order. If we want to index an integer column, if the range of the represented integer allows, try to use a smaller type for the index column. For example, if we can use it, INTdon't BIGINTuse it; if we can, MEDIUMINTdon't use it INT. This is because:

  • The smaller the data type, the faster the comparison operation at query time
  • The smaller the data type, the less storage space the index takes up, and more records can be placed in one data page, thus reducing the performance loss caused by disk I/0, which means that more data pages can be Cached in memory to speed up reading and writing efficiency.

This suggestion is for tables 主键来说更加适用, because not only the primary key value will be stored in the clustered index, but also the primary key value of a record will be stored in all other secondary index nodes. If the primary key uses a smaller data type, it means This saves more storage space and enables more efficient I/O.

8. Create index using string prefix

Assuming that our string is very long, storing a string will take up a lot of storage space. When we need to index this string column, it means that there are two problems in the corresponding B+ tree:

  • The records in the B+ tree index need to store the complete string of the column, which is more time-consuming. And the longer the string, the在索引中占用的存储空间越大。
  • If the string stored in the index column in the B+ tree index is very long, then do the string比较时会占用更多的时间。

We can create an index by intercepting the previous part of the field, which is called 前缀索引. In this way, although the location of the record cannot be accurately located when searching for a record, the location of the corresponding prefix can be located, and then the complete string value can be returned to the table based on the primary key value of the record with the same prefix. Not only 节约空间, but 减少了字符串的比较时间also can generally solve the problem of sorting.

For example, full-text search for TEXT and BLOG type fields will be a waste of time. If only a few characters in front of the field are searched, the search speed can be improved.

Create a merchant table. Because the address field is relatively long, create a prefix index on the address field.

create table shop(address varchar( 120 ) not null);

alter table shop add index(address( 12 ));

The question is, how much to intercept? If you intercept too much, the purpose of saving index storage space will not be achieved; if you intercept too little, there will be too much duplicate content, and the hashing degree (selectivity) of the field will be reduced. How to calculate selectivity for different lengths ?

Let’s first look at the field’s selectivity in all data:

select count(distinct address) / count(*) from shop;

Calculate through different lengths and compare with the selectivity of the entire table:

official:

count(distinct left(列名, 索引长度))/count(*)

For example:

select count(distinct left(address, 10 )) / count(*) as sub10, -- 截取前 10 个字符的选择度
count(distinct left(address, 15 )) / count(*) as sub11, -- 截取前 15 个字符的选择度
count(distinct left(address, 20 )) / count(*) as sub12, -- 截取前 20 个字符的选择度
count(distinct left(address, 25 )) / count(*) as sub13 -- 截取前 25 个字符的选择度
from shop;

Leading to another question: the impact of index column prefix on sorting

If an index column prefix is ​​used, for example, only the first 12 characters of the address column are placed in the secondary index, the following query may be a bit awkward:

SELECT * FROM shop
ORDER BY address  # 这个地方order by 就不准了 如果用前12个建立索引的话
LIMIT 12;

Because the secondary index does not contain complete address column information, records with the same first 12 characters and different subsequent characters cannot be sorted, that is,
using the index column prefix 无法支持使用索引排序, and can only use file sorting.

Expansion: Alibaba "Java Development Manual"

[ 强制] When creating an index on a varchar field, the index length must be specified. It is not necessary to index the entire field. The index length is determined based on the actual text distinction.

Note: Index length and discrimination are a pair of contradictions. Generally, for string type data, the index with a length of 20 will have a high discrimination 90% 以上. You can use count(distinct left(column name, index length))/count(* ) to determine the degree of distinction.

9. Columns with high distinction (high hashability) are suitable as indexes

列的基数Refers to the number of unique data in a column. For example, a column contains the values ​​2, 5, 8, 2, 5, 8, 2, 5, 8. Although there are 9 records, the cardinality of the column is yes 3. That is to say, when the number of record rows is certain, the greater the cardinality of a column, the more dispersed the values ​​in the column; the smaller the cardinality of the column, the more concentrated the values ​​in the column . The cardinality index of this column is very important and directly affects whether we can effectively utilize the index. It is best to index a column with a large cardinality. Indexing a column with a too small cardinality may not be effective.

You can use a formula select count(distinct a)/count(*) from t1to calculate the distinction. The closer to 1, the better. Generally, if it exceeds 33%, it is considered a relatively efficient index.

Extension: The joint index puts columns with high distinction (high hashability) in the front.

10. Place the most frequently used columns on the left side of the joint index

This also allows fewer indexes to be created. At the same time, due to the "leftmost prefix principle", the usage of joint indexes can be increased.

11. When multiple fields need to be indexed, a joint index is better than a single value index.

3.3 Limit the number of indexes

In actual work, we also need to pay attention to balance. The more indexes, the better. We need to limit the number of indexes on each table, and the number of indexes on a single table is recommended 不超过6个. reason:

① Each index needs to be occupied 磁盘空间. The more indexes, the greater the disk space required.

② The index will affect the performance of statements such as INSERT, DELETE, , UPDATEetc., because when the data in the table changes, the index will also be adjusted and updated, which will cause a burden.

索引来进行评估③When the optimizer chooses how to optimize the query, it will generate the best execution plan for each available index based on unified information . If there are many indexes that can be used for the query at the same time, the MySQL optimizer will be added. Generate execution plan time and reduce query performance.

3. 4 What situations are not suitable for creating indexes?

1. Do not set indexes for fields that are not used in where

Fields that are not used in WHERE conditions (including GROUP BY and ORDER BY) do not need to be indexed. The value of indexes is quick positioning. If the fields that cannot be positioned are usually not indexed, there is no need to create indexes. for example:

SELECT course_id,student_id, create_time
FROM student_info
WHERE student_id = 41251;

Because we are retrieving based on student_id, there is no need to create indexes on other fields, even if these fields appear in the SELECT field.

2. It is best not to use indexes for tables with small data volumes.

If the table has too few records, such as less than 1,000, then there is no need to create an index. There are too few records in the table. Whether to create an index 对查询效率的影响并不大. It is even said that the query may take less time than traversing the index, and the index may not produce an optimization effect.

Example: Create table 1:

CREATE TABLE t_without_index(
a INT PRIMARY KEY AUTO_INCREMENT,
b INT
);

Provide stored procedure 1:

#创建存储过程

DELIMITER //
CREATE PROCEDURE t_wout_insert()
BEGIN
	DECLARE i INT DEFAULT 1 ;
    WHILE i <= 900
    DO
		INSERT INTO t_without_index(b) SELECT RAND()* 10000 ;
		SET i = i + 1 ;
	END WHILE;
	COMMIT;
END //
DELIMITER ;

#调用
CALL t_wout_insert();

Create table 2:

CREATE TABLE t_with_index(
a INT PRIMARY KEY AUTO_INCREMENT,
b INT,
INDEX idx_b(b)
);

Create stored procedure 2:

#创建存储过程

DELIMITER //
CREATE PROCEDURE t_with_insert()
BEGIN
DECLARE i INT DEFAULT 1 ;
WHILE i <= 900
DO
INSERT INTO t_with_index(b) SELECT RAND()* 10000 ;
SET i = i + 1 ;
END WHILE;
COMMIT;
END //
DELIMITER ;
#调用
CALL t_with_insert();

Query comparison:

You can see that the running results are the same, but when the amount of data is not large, the index cannot play its role.

mysql> select * from t_without_index where b = 9879 ;
+------+------+
| a | b |
+------+------+
| 1242 | 9879 |
+------+------+
1 row in set (0.00 sec)

mysql> select * from t_with_index where b = 9879 ;
+-----+------+
| a | b |
+-----+------+
| 112 | 9879 |
+-----+------+
1 row in set (0.00 sec)

Conclusion: When the number of data rows in the data table is relatively small, such as less than 1,000 rows, there is no need to create an index.

3. Do not create indexes on columns with a large amount of duplicate data

Create indexes on columns with many different values ​​that are often used in conditional expressions. However, if there is a large amount of duplicate data in the field, there is no need to create an index. For example, there are only two different values ​​​​in the " " field of the student table, 性别"male" and "·female", so there is no need to create an index. If you create an index, not only will it not improve query efficiency, but it will 严重降低数据更新速度.

Example 1: To find 500,000 rows (such as male data) among 1 million rows of data, once the index is created, you need to access the index 500,000 times first, and then access the data table 500,000 times. The overhead may be greater than not using indexes.

Example 2: Suppose there is a student table, the total number of students is 1 million, and there are only 10 males, which is 1/100,000 of the total population.

The student table student_gender has the following structure. The value of the student_gender field in the data table is 0 or 1, 0 represents female, and 1 represents male.

CREATE TABLE student_gender(
student_id INT( 11 ) NOT NULL,
student_name VARCHAR( 50 ) NOT NULL,
student_gender TINYINT( 1 ) NOT NULL,
PRIMARY KEY(student_id)
)ENGINE = INNODB;	

If we want to filter out the males in this student table, we can use:

SELECT * FROM student_gender WHERE student_gender = 1

Running results (10 pieces of data, running time 0.696s):

Insert image description here

You can see that without creating an index, the operation efficiency is not high. What if we create an index on the student_gender field?

SELECT * FROM student gender WHERE student_gender = 1

The same 10 pieces of data have the same running results, but the time is shortened to 0.052s, which greatly improves the query efficiency.

In fact, through these two experiments, you can also see that the value of the index is to help you locate quickly. If there is a lot of data that you want to locate, then the index loses its use value, such as the gender field in general.

In this example, the index is useful for quickly locating boys.

4. Avoid creating too many indexes on frequently updated tables

The first level of meaning: 更新的字段It is not necessary to create an index frequently. Because when updating data, the index also needs to be updated. If there are too many indexes, it will also cause a burden when updating the index, thus affecting efficiency.

Second level meaning: Avoid 经常更新的表creating too many indexes, and have as few columns as possible in the index. At this time, although the query speed is improved, the speed of updating the table will be reduced.

5. It is not recommended to use unordered values ​​as indexes

For example, ID card, UUID (needs to be converted to ASCII during index comparison, and may be lost during insertion 成页分裂), MD5, HASH, unordered long string, etc.

6. Delete indexes that are no longer used or rarely used

After the data in the table is heavily updated, or the way the data is used is changed, some of the original indexes may no longer be needed. Database administrators should regularly identify these indexes and delete them to reduce the impact of the indexes on update operations.

7. Don’t define redundant or duplicate indexes

① Redundant index

Example: The table creation statement is as follows

CREATE TABLE person_info(
    id INT UNSIGNED NOT NULL AUTO_INCREMENT,
    name VARCHAR( 100 ) NOT NULL,
    birthday DATE NOT NULL,
    phone_number CHAR( 11 ) NOT NULL,
    country varchar( 100 ) NOT NULL,
    PRIMARY KEY (id),
    KEY idx_name_birthday_phone_number (name( 10 ), birthday, phone_number),
    KEY idx_name (name( 10 ))
);		

We know that columns idx_name_birthday_phone_numbercan namebe quickly searched through indexes, and then creating an index specifically for namethe columns is just one 冗余索引. Maintaining this index will only increase the cost of maintenance and will not bring any benefit to the search.

② Repeat index

In another case, we might look at a certain column 重复建立索引, say this:

CREATE TABLE repeat_index_demo (
col1 INT PRIMARY KEY,
col2 INT,
UNIQUE uk_idx_c1 (col1),
INDEX idx_c1 (col1)
);

We see that col 1 is not only the primary key, but also defined as a unique index, and a normal index is defined for it. However, the primary key itself will generate a clustered index, so the unique index and the normal index defined are duplicates. This is This situation should be avoided.

3.5 Summary

The index is a handful 双刃剑, which can improve query efficiency, but also slows down insertion and updates and takes up disk space.

The ultimate purpose of selecting an index is to make the query faster. The principles given above are the most basic guidelines, but you cannot stick to the above guidelines. Continue to practice in future study and work, and base on the actual application situation. Conduct analysis and judgment to select the most appropriate indexing method.

Guess you like

Origin blog.csdn.net/github_36665118/article/details/134139799