1 Introduction to Indexing
1.1 What is a MySQL index
Official definition: an index is a data structure that helps MySQL efficiently obtain data
From the above definition, we can analyze that the index is essentially a data structure. Its function is to help us obtain data efficiently. Before we formally introduce the index, let’s first understand the basic data structure.
2 Index data structure
2.1 Hash index
Hash index is a relatively common index. It calculates the hash value corresponding to the record, and then stores it in the corresponding location according to the calculation result. When querying, the location is quickly found according to the hash value. His single record query is very efficient and has a time complexity of 1. However, Hash index is not the most commonly used type of database index, especially our commonly used Mysql Innodb engine does not support hash index.
The hash index is fast in the equivalent query, but has the following two problems
- Range queries are not supported
- Hash conflict, when the hash values of two records are the same, a hash conflict occurs, which needs to be stored in a linked list later
2.2 Binary tree
2.2.1 Classic binary tree
1. A node can only have two child nodes
2. The value of the left child node is smaller than the value of the parent node, and the value of the right child node is greater than the value of the parent node. Binary search is used, which is faster
An extreme example of a classic binary tree is a linked list, and the node data is getting bigger and bigger. In this case, binary tree search performance will be reduced
2.2.2 Balanced Binary Tree
A balanced binary tree is also called an AVL tree . It can be an empty tree, or a binary sorted tree with the following properties :
- The absolute value of the difference between the heights of its left subtree and right subtree (balance factor) does not exceed 1
- Its left subtree and right subtree are both balanced binary trees.
The numbers 1-6 are illustrated in a balanced binary tree as follows:
2.3 B-tree
B-tree is a multi-fork tree, also known as a balanced multi-way search tree, which can have multiple forks and has the following characteristics
(1) Sorting method: all node keywords are arranged in ascending order, and follow the principle of small on the left and large on the right;
(2) Number of child nodes: The number of child nodes of non-leaf nodes (root nodes and branch nodes) > 1, and the number of child nodes <= M, and M > = 2, except for empty trees (Note: M order means that a tree node has the most How many search paths are there, M=M path, when M=2 is a binary tree, M=3 is a 3-fork);
(3) Number of keywords: the number of keywords in a branch node is greater than or equal to ceil(m/2)-1 and less than or equal to M-1 (Note: ceil() is a function rounded towards positive infinity, such as ceil(1.1 ) results in 2);
(4) All leaf nodes are on the same layer. In addition to pointers to keywords and keyword records, leaf nodes also have pointers to their child nodes, but their pointer addresses are all null, corresponding to the spaces in the last layer of nodes in the figure below. son;
The B-tree storage structure in MySQL is as follows:
2.4 B+ tree
B+ tree is another improvement on the basis of B tree. It mainly improves two aspects, one is the stability of the query, and the other is more friendly in data sorting. The underlying data structure of MySQL index is B+ tree
(1) The non-leaf nodes of the B+ tree do not save specific data, but only save the index of keywords , and all data will eventually be saved to the leaf nodes. Because all data must be obtained from the leaf nodes, the number of data queries each time is the same, so that the query speed of the B+ tree will be relatively stable, and in the search process of the B tree, different keywords search The number of times is likely to be different (some data may be in the root node, and some data may be in the lowest leaf node), so at the application level of the database, the B+ tree is more suitable.
(2) The keywords of the B+ tree leaf nodes are arranged in order from small to large, and the data at the end of the left will save the pointer to the data at the beginning of the right node. Because the leaf nodes are arranged in an orderly manner, the B+ tree has better support for sorting data.
2.5 B* tree
The B-tree is a variant of the B+ tree. It is based on the B+ tree and connects the index layers with pointers (the B+ tree just connects the data layers with pointers), making the search for values faster
Summarize
After analyzing the above data structures, MySQL uses a B+ tree to store indexes. On the comprehensive level, this query efficiency is the best. Oracle uses B* tree
3 Index classification
MySQL indexes mainly include the following types
- primary key index
- unique index
- normal index
- composite index
- full text index
3.1 Primary key index
The primary key index is a special index. Generally, a primary key is set for the table when the table is created, and MySQL will add an index to the primary key by default. The primary key index leaf node stores a row of data in the data table . When the table does not create a primary key index, InnDB will automatically create a ROWID field for building a clustered index. The rules are as follows:
- Define the primary key PRIMARY KEY on the table, and InnoDB uses the primary key index as the clustered index.
- If the table does not define a primary key, InnoDB will choose the first unique index column that is not NULL as the clustered index.
- If neither of the above two is available, InnoDB will use a 6-byte long integer implicit field ROWID field to build a clustered index. The ROWID field is automatically incremented when new rows are inserted.
How to create:
CREATE TABLE `user` (
`id` int(11) NOT NULL AUTO_INCREMENT ,
`name` varchar(255) NOT NULL ,
PRIMARY KEY (`id`)
);
Why does MySQL use an implicit field ROWID to build a clustered index by default without specifying a primary key when creating a table? We will mention this later
3.2 Unique Index
Similar to the previous ordinary index, the difference is that the value of the index column must be unique, but null values are allowed. If it is a composite index, the combination of column values must be unique.
Creation method
CREATE UNIQUE INDEX indexName ON user(column)
或者
ALTER TABLE table_name ADD UNIQUE indexName ON (column)
3.3 Common Index
MySQL basic index, no restrictions
How to create:
CREATE INDEX index_name ON user(column)
或者
ALTER TABLE user ADD INDEX index_name ON (column)
3.4 Composite Index
Composite index, as the name suggests, adds indexes to multiple fields of MySQL at the same time, and must follow the leftmost matching principle when using it
How to create:
CREATE INDEX index_name ON user(column1,column2) -- 给 column1 和 column2 加上索引
3.5 Full-text indexing
The full-text index is mainly used to find keywords in the text, not directly compared with the index value. It is similar to our common search engines (such as elasticsearch, solr, etc.) in function. The performance of MySQL full-text index is average, so it is generally not used, just for understanding
How to create:
CREATE FULLTEXT INDEX index_column ON user(column)
或者
ALTER TABLE user ADD FULLTEXT index_column(column)
4 Index Design
4.1 Samsung Index
The Samsung index is a specification when we design the MySQL index, and the index design that conforms to the Samsung index is usually a better design
One star: the index rows related to the query in the index are contiguous, or at least close enough together
Two stars: the order of the data columns in the index is the same as the sort order in the lookup
Samsung: The columns in the index contain all the columns needed in the query. The index contains the data columns required by the query, and no longer performs full table lookup and table return operations
Let me give you an example to introduce what the Samsung index looks like
Now there is a table with the following structure
CREATE TABLE `user` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(10) NOT NULL,
`age` int(10) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
one star
We now index age
create index idx_age on user (age);
Inquire
select * from user where age in (10,20,35,43)
This statement does not necessarily match one star, because age is a range, and the data may be scattered
select * from user where age = 20;
This statement is in line with one star, because the index is sorted by age from small to large, so the data with age = 20 must be together
two stars
select * from user where age = 20 order by name;
This statement fits one star, but not two stars, because the order of the data columns is sorted by age. If it is now sorted by name, the index order may be different from the order by sorting result. The result is as follows:
select * from user where age = 20 order by age
This query statement is in line with one-star and two-star
Samsung
select * from user where age = 20
This statement does not conform to Samsung, because there are only id and age in the index column, no name
select age from user where age = 20
This statement is in line with Samsung, because only age is queried, age exists in the index, and there is no need to return the table
4.2 Return form
The above Samsung index mentioned a return form, so what is the return form?
To put it simply, the columns required in the query statement are not included in the index, and they need to be queried again according to the primary key id to obtain them. Returning to the table is equivalent to one more query, and we should try to avoid returning to the table when querying again.
Because ordinary indexes only contain the values of the corresponding columns and primary keys, such as the age index, the data contained in the age index includes age and id. If you need a name at this time, you need to find the corresponding id through the age index first, and then find the name on the primary key index, which contains the values of all records in a row. Here is the answer to the above question, why MySQL must have a primary key index, because the primary key index child node contains all the data
4.3 Index coverage
CREATE TABLE `user` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(10) NOT NULL,
`age` int(1) DEFAULT NULL,
`sex` varchar(2) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `idx_name_age` (`name`,`age`)
) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=utf8mb4;
select name,age from user where name = "张三"
-- 这条语句就使用了索引覆盖,因为 name 和 age 再 idx_name_age 索引中都有,不需要回表查询
select name,age,sex from user where name = "张三"
-- 如果加上了 sex,那么就需要回表查询了,因为索引中不存在 sex 字段
5 Index Optimization
5.1 Slow query
5.1.1 Introduction
The slow query log is a log record provided by MySQL to record all slow SQL statements. We can define what kind of SQL is slow SQL by setting the time threshold of slow query long_query_time. Through the slow query log, we can find out the SQL that needs to be optimized. The next step is to optimize the SQL
5.1.2 Slow query configuration
Step 1: We can use the show variables like 'slow_query_log' statement to check whether the slow query is enabled, the default is OFF
slow_query_log_file is the location where the slow query log is stored. If it is a window, it is usually in the Data directory of your installation folder
Step 2: Open slow query
set global slow_query_log = 1;
Step 3: Set the slow query threshold
What kind of query is called slow query? 1s, 5s or 10s, MySQL does not know this, so we need to set the long_query_time parameter through configuration
Check the slow query time by command show variables like '%long_query_time%', the default is 10 s
If you need to modify it, you can set global long_query_time = 5
set it by command
**Note:** After set global long_query_time = 5
setting the slow query time here, I checked again and found that the slow query time is still 10s. Could it be that the setting has not taken effect?
After using this command to modify, you need to reconnect or open a new session to see the modified configuration
Or show global variables like '%long_query_time%'
check with the command
5.1.3 Slow query log analysis
We have just set the slow query threshold to 5s, now we execute a sql statement like this
select sleep(6);
The execution time of this statement is 6s. We can find that some data has been added by opening the slow query log
# Time: 2022-10-02T09:16:23.194396Z
# User@Host: root[root] @ localhost [::1] Id: 6
# Query_time: 6.011569 Lock_time: 0.000000 Rows_sent: 1 Rows_examined: 0
SET timestamp=1664675770;
select sleep(6);
Let's analyze what each line represents one by one:
User@Host: the user who executes the SQL and the IP address of the slow query
Query_time: statement execution time
Lock_time: the length of time to acquire the lock
Rows_sent: the number of rows returned by MySQL to the client
Rows_examined: the number of rows scanned by MySQL
timestamp: indicates the timestamp of the slow SQL record
select sleep(6): it is slow query SQL
Let's analyze a real slow query SQL, a SQL statement in the previous test
# Time: 2022-07-27T09:26:44.440318Z
# User@Host: root[root] @ localhost [127.0.0.1] Id: 249
# Query_time: 68.461112 Lock_time: 0.000938 Rows_sent: 877281 Rows_examined: 877303
SET timestamp=1658914004;
SELECT id,prd_line_id,shift_name,shift_id,app_id,weight,upload_time,operator,status,prd_line_name FROM prd_weight
WHERE (upload_time > '2022-07-27 00:00' AND upload_time < '2022-07-27 17:24');
Query_time: total query time 68.461112s
Lock_time:0.000938s
Rows_examined: Scanned row 877281
Rows_sent: 877303 returned
Of course, this is for testing, and such outrageous SQL statements generally do not appear in production
5.1.4 Precautions
- In MySQL, management statements are not recorded in the slow query log by default, such as:
alter table, analyze table, check table, etc. However, it can be set via the following property:
set global log_slow_admin_statements = "ON" - In MySQL, you can also set to record the SQL statements that have not been indexed in the slow log query file (the default is off). It can be set through the following properties:
set global log_queries_not_using_indexes = "ON" - In MySQL, the log output format is supported: FILE (default) and TABLE, which can be used in combination. As follows:
set global log_output = "FILE, TABLE"
This setting will be written in FILE and slow_log table in MySQL library at the same time. However, logging to the system's dedicated log table consumes more system resources than logging to files. Therefore, if you need to enable slow query logging and obtain higher system performance, it is recommended to log to files first.
5.2 Explain execution plan
Through the above slow query log analysis, we can know what slow SQL statements there are. But where these SQLs are slow and how to optimize them, we also need a more detailed analysis plan. Here, MySQL provides us with the Explain keyword, through which we can analyze the detailed execution information of the SQL statement.
5.2.1 Use of Explain
We create a user table in the database for testing
SET NAMES utf8mb4;
SET FOREIGN_KEY_CHECKS = 0;
-- ----------------------------
-- Table structure for user
-- ----------------------------
DROP TABLE IF EXISTS `user`;
CREATE TABLE `user` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(20) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,
`password` varchar(10) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,
`sex` varchar(2) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,
`phone` varchar(11) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,
`dept_id` int(10) NULL DEFAULT NULL,
PRIMARY KEY (`id`) USING BTREE,
INDEX `idx_name`(`name`) USING BTREE,
INDEX `idx_dept_id`(`dept_id`) USING BTREE
) ENGINE = InnoDB AUTO_INCREMENT = 6 CHARACTER SET = utf8 COLLATE = utf8_general_ci ROW_FORMAT = Dynamic;
-- ----------------------------
-- Records of user
-- ----------------------------
INSERT INTO `user` VALUES (1, '张三', '123', '男', '12323432', 1);
INSERT INTO `user` VALUES (2, '李四', '456', '男', '178873937', 1);
INSERT INTO `user` VALUES (3, '小花', '123', '女', '1988334554', 2);
INSERT INTO `user` VALUES (4, '小芳', '334', '女', '18765287937', 2);
INSERT INTO `user` VALUES (5, NULL, '122', NULL, NULL, NULL);
DROP TABLE IF EXISTS `dept`;
CREATE TABLE `dept` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`dept_name` varchar(10) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,
PRIMARY KEY (`id`) USING BTREE
) ENGINE = InnoDB AUTO_INCREMENT = 3 CHARACTER SET = utf8 COLLATE = utf8_general_ci ROW_FORMAT = Dynamic;
-- ----------------------------
-- Records of dept
-- ----------------------------
INSERT INTO `dept` VALUES (1, '开发部');
INSERT INTO `dept` VALUES (2, '销售部');
explain is also very simple to use, just add the explain keyword in front of the query statement:
EXPLAIN SELECT * FROM user where id = 1;
From the figure, we can see that MySQL returns a row of records, let's analyze what each field means
field | meaning |
---|---|
id | The unique identifier of the select statement in a query process |
select_type | Query type, including four simple, primary, subquery, derived |
table | Which table is queried |
partitions | Table partition information |
type | Access type, analysis performance is mainly through this field |
possible_keys | Indexes that may be used |
key | the actual index used |
key_len | the number of bytes used in the index |
ref | This column shows the column or constant used for the table lookup value in the index of the key column record |
rows | MySQL Estimated Scan Rows |
filtered | After MySQL filtering, the proportion of the number of records that meet the conditions |
Extra | shows some additional information |
5.2.2 Explain in detail
1、id
id is the unique identifier in the query statement. The larger the value of id, the earlier the SQL statement corresponding to the id will be executed
explain select * from dept where id = (select dept_id from user where id = 1);
From the perspective of the execution plan, the statement select dept_id from user where id = 1 is executed first, because the outer query needs the result of this query statement
2、select_type
Query types, including four
simple: simple query. Query does not contain subquery and union
primary: the outermost select in complex queries
subquery: the subquery included in select (not in the from clause)
derived: The subquery contained in the from clause. MySQL will store the results in a temporary table, also known as a derived table (the English meaning of derived)
3、table
It is easier to understand which table is being queried
4、partitions
Partition information matched during query, the value is NULL for a non-partitioned table, when the query is a partitioned table, partitions displays the partitions hit by the partitioned table.
5、type
type: What type of query is used, which is a very important indicator in SQL optimization. The following performances are in order from good to bad:system > const > eq_ref > ref > ref_or_null > range > index > ALL
-
system is a special case of const, that is, when there is only one record in the table, the type is system
-
const, constant query, id is the primary key, all information can be queried through id
-
eq_ref, in the connection query, all parts of the primary key or unique key index are used by the connection
Note: The id of dept and the id of user are not related here, just to demonstrate the query type
The id of user and the id of dept are both primary keys. In the connection query, both primary keys are used
- ref, do not use a unique index, use a common index or a unique index, may find the value of multiple conditions, idx_name is a common index
- ref_of_null, similar to the ref function, the difference is that the index will additionally search for values containing NULL, the name field is a normal index, and there are data whose name is null in the database
- range, use range query on the index field, common queries are >, <, in, like, etc.
- index, full table scan through index tree
- ALL, full table scan, do not pass the index tree, because this is a select * query
6、possible_keys
MySQL analyzes the indexes that may be used in this query, but may not be used in actual queries
The analysis may use the idx_name index. The index is not used in the actual query, and the full table scan is performed.
7、key
The key actually used in the query
The index idx_name is actually used in the query
8、ken_len
Indicates the length of the index column used in the query
We use this index to analyze, how does key_len come from 63?
When creating the user table, I wonder if you have noticed that the character set of name is utf8
For MySQL version 5.0 and above, each character in the utf8 character set occupies 3 bytes, and varchar(20) occupies 60 bytes. At the same time, because varchar is a variable-length string, an additional byte is required to store the length of the character, a total of two Bytes, in addition, the name field can be a null value, and a null value occupies one byte alone, adding up to a total of 63 bytes
9、ref
When using the index column equivalence matching conditions to execute the query, that is, when the access method is one of const
, eq_ref
, ref
, ref_or_null
, unique_subquery
, , the column displays the specific information that is equivalent to the index column, such as just a constant or a column.index_subquery
ref
10、rows
functions expected to be scanned
11、filtered
filtered This is a percentage value, the percentage of the number of records in the table that meet the conditions. To put it simply, this field indicates the proportion of the number of records that meet the conditions after filtering the data returned by the storage engine.
12、Extra
Extra
It is used to explain some amount of information, so as to help us understand the query more accurately
5.3 High performance index usage strategy
5.3.1 Do not perform any operations on indexed columns
explain select * from user where left(name,2) = '小芳';
This sql performs a function operation on the name field, causing the index to fail
5.3.2 Leftmost prefix rule
When using a joint index query, the leftmost prefix principle should be followed, which means that the query starts from the leftmost front column of the index and does not skip columns in the index.
Create a goods table with a joint index containing three fields: name, price, and mark
DROP TABLE IF EXISTS `goods`;
CREATE TABLE `goods` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(10) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,
`price` int(10) NULL DEFAULT NULL,
`mark` varchar(20) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,
PRIMARY KEY (`id`) USING BTREE,
INDEX `idx_all`(`name`, `price`, `mark`) USING BTREE
) ENGINE = InnoDB AUTO_INCREMENT = 4 CHARACTER SET = utf8 COLLATE = utf8_general_ci ROW_FORMAT = Dynamic;
-- ----------------------------
-- Records of goods
-- ----------------------------
INSERT INTO `goods` VALUES (1, '手机', 5678, '华为手机');
INSERT INTO `goods` VALUES (2, '电脑', 9888, '苹果电脑');
INSERT INTO `goods` VALUES (3, '衣服', 199, '好看的衣服');
Execute the following query statement:
explain select * from goods where name = '手机' and price = 5678 and mark = '华为手机';
From the figure above, you can see that the type is ref.
Now we don't start the query from the leftmost, just skip the name field
explain select * from goods where price = 5678 and mark = '华为手机';
The type has changed from ref to index. This is because MySQL sorts the fields in the combined index according to the order of the fields in the composite index. If a field in the middle is skipped, it is not necessarily in order.
5.3.3 Try to use covering index
Covering index, the fields that need to be queried are all included in the index column, and there is no need to return to the table for query