1. Slow SQL optimization ideas.
- Slow Query Logging Slow SQL
- explain Analyze the execution plan of SQL
- Profile analysis execution time-consuming
- Optimizer Trace Analysis Details
- Identify problems and take appropriate action
1.1 Slow Query Logging Slow SQL
How to locate slow SQL? We can view slow SQL through the slow query log . By default, the MySQL database does not enable the slow query log ( slow query log
). So we need to turn it on manually.
To view the slow query log configuration, we can use show variables like 'slow_query_log%'
the command as follows:
slow query log
Indicates that the slow query is enabledslow_query_log_file
Indicates the location where the slow query log is stored
We can also use show variables like 'long_query_time'
the command to check how long it takes to record the slow query log, as follows:
long_query_time
Indicates how many seconds the query takes before it is recorded in the slow query log.
We can locate those SQL statements with low execution efficiency by checking the logs slowly, and focus on analysis.
1.2 explain to view and analyze the execution plan of SQL
After locating the SQL with low query efficiency, you can use the explain
viewed SQL
execution plan.
When used explain
with SQL
, MySQL will display information from the optimizer about the statement's execution plan. That is, MySQL
it explains how it will process the statement, including information about how the tables are joined and in what order.
A simple SQL, the effect of using it explain
is as follows:
In general, we need to focus type、rows、filtered、extra、key
.
1.2.1 type
type indicates the connection type , an important indicator to view the performance of the index. The following performance from best to worst:system > const > eq_ref > ref > ref_or_null > index_merge > unique_subquery > index_subquery > range > index > ALL
- system: This type requires only one piece of data in the database table, which is
const
a special case of the type and generally does not appear. - const: The data can be found through one index. It is generally used as a primary key or a unique index as a condition. This type of scan is extremely efficient and very fast.
- eq_ref: Commonly used for primary key or unique index scans, generally refers to associated queries using primary keys
- ref : Commonly used for non-primary key and unique index scans.
- ref_or_null: This join type is similar
ref
, except that rows containing valuesMySQL
are additionally searched forNULL
- index_merge: The index merge optimization method is used, and the query uses more than two indexes.
- unique_subquery: similar
eq_ref
, the condition usesin
a subquery - index_subquery: Different from
unique_subquery
, used for non-unique indexes, can return duplicate values. - range: Commonly used for range queries, such as: between ... and or In operations
- index: full index scan
- ALL: full table scan
1.2.2 rows
This column represents the number of rows MySQL estimates it needs to read to find the record we need. For InnoDB tables, this number is an estimate and not necessarily an exact value.
1.2.3 filtered
This column is a percentage value, the percentage of the number of records in the table that meet the criteria. To put it simply, this field indicates the proportion of the number of records that meet the conditions after the data returned by the storage engine is filtered.
1.2.4 extra
This field contains additional information about how MySQL parses the query, and it generally has these values:
- Using filesort: Indicates sorting by file, which usually occurs when the specified sorting and index sorting are inconsistent. Generally seen in the order by statement
- Using index: Indicates whether a covering index is used.
- Using temporary: Indicates whether a temporary table is used. The performance is particularly poor and needs to be optimized. Generally seen in the group by statement, or the union statement.
- Using where : Indicates the use of where conditional filtering.
- Using index condition: The new index after MySQL5.6 is pushed down. Data filtering is performed at the storage engine layer instead of at the service layer, and the existing data of the index is used to reduce the data returned to the table.
1.2.5 key
This column represents the actual index used. Generally possible_keys
read together with the column.
1.3 profile analysis execution time-consuming
explain
It is only SQL
the estimated execution plan that you see. If you want to know SQL
the real execution thread status and the time consumed , you need to use it profiling
. After the parameter is enabled profiling
, subsequent executed SQL
statements will record their resource overhead, including IO,上下文切换,CPU,内存
etc. We can further analyze the bottleneck of the current slow SQL based on these overheads and then further optimize them.
profiling
The default is off, we can use show variables like '%profil%'
to check whether it is on, as follows:
can be used to set profiling=ON
enable. After opening, you can run a few SQL, and then use show profiles
it to check.
show profiles
It will display multiple statements recently sent to the server. The number of statements is profiling_history_size
defined by variables, and the default is 15. If we need to see the analysis of a single SQL, we can show profile
view the analysis of the latest SQL, or use show profile for query id
(where id is the QUERY_ID in show profiles) to view the analysis of a specific SQL statement.
In addition to viewing the profile, you can also view the cpu and io, as shown above.
1.4 Optimizer Trace analysis details
The profile can only view the execution time of SQL, but cannot see the actual process information of SQL execution, that is, it does not know how the MySQL optimizer selects the execution plan. At this time, we can use Optimizer Trace
it, which can track the whole process of parsing, optimizing and executing the executed statement.
We can use set optimizer_trace="enabled=on"
the open switch, then execute the SQL to be traced, and finally execute select * from information_schema.optimizer_trace
the trace, as follows:
You can view and analyze its execution tree, which will include three stages:
- join_preparation: preparation phase
- join_optimization: analysis phase
- join_execution: execution phase
1.5 Identify the problem and take corresponding measures
Finally, confirm the problem and take corresponding measures.
- Most of the slow SQL is related to the index, such as no index, the index is not effective, unreasonable, etc. At this time, we can optimize the index .
- We can also optimize SQL statements, such as some problems with too many in elements (batching), deep paging problems (based on the last data filtering, etc.), and time-segmented queries
- SQl can't be optimized well, you can use ES method or data warehouse instead.
- If the large amount of data in a single table leads to slow query, you can consider sub-database sub-table
- If the database is flushing dirty pages and causing slow queries, consider whether some parameters can be optimized, and discuss the optimization plan with the DBA
- If the amount of stock data is too large, consider whether some data can be archived
2. Classic case analysis of slow query
2.1 Case 1: Implicit conversion
We create a user table
CREATE TABLE user (
id int(11) NOT NULL AUTO_INCREMENT,
userId varchar(32) NOT NULL,
age varchar(16) NOT NULL,
name varchar(255) NOT NULL,
PRIMARY KEY (id),
KEY idx_userid (userId) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
userId
The field is a string type, which is a common index of the B+ tree. If a number is passed in the query condition, the index will become invalid. as follows:
If you add '' to the number, that is to say, what is passed is a string, of course it is indexed, as shown in the figure below:
Why is the first statement not indexed without single quotes? This is because when the single quotes are not added, it is a comparison between a string and a number, and their types do not match. MySQL will perform an implicit type conversion and convert them to floating point numbers for comparison. Implicit type conversion, the index will be invalid.
2.2 Case 2: leftmost match
When MySQl builds a joint index, it will follow the principle of leftmost prefix matching, that is, leftmost first. If you create a (a,b,c)
joint index, it is equivalent to creating (a)、(a,b)、(a,b,c)
three indexes.
Suppose you have the following table structure:
CREATE TABLE user (
id int(11) NOT NULL AUTO_INCREMENT,
user_id varchar(32) NOT NULL,
age varchar(16) NOT NULL,
name varchar(255) NOT NULL,
PRIMARY KEY (id),
KEY idx_userid_name (user_id,name) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Suppose there is a joint index idx_userid_name
, we now execute the following SQL
, if the query column is name
, the index is invalid:
explain select * from user where name ='捡田螺的小男孩';
Because the query condition column is not the first column in name
the joint index and does not satisfy the leftmost matching principle, the index does not take effect. idx_userid_name
In a joint index, only when the query condition meets the leftmost matching principle, the index will take effect normally. As follows, the query condition column isuser_id
2.3 Case 3: Deep paging problem
limit
The problem of deep paging will lead to slow queries, which should be commonplace for everyone.
Why does limit deep paging slow down? Suppose there is a table structure as follows:
CREATE TABLE account (
id int(11) NOT NULL AUTO_INCREMENT COMMENT '主键Id',
name varchar(255) DEFAULT NULL COMMENT '账户名',
balance int(11) DEFAULT NULL COMMENT '余额',
create_time datetime NOT NULL COMMENT '创建时间',
update_time datetime NOT NULL ON UPDATE CURRENT_TIMESTAMP COMMENT '更新时间',
PRIMARY KEY (id),
KEY idx_name (name),
KEY idx_create_time (create_time) //索引
) ENGINE=InnoDB AUTO_INCREMENT=1570068 DEFAULT CHARSET=utf8 ROW_FORMAT=REDUNDANT COMMENT='账户表';
Do you know what the execution process is like for the following SQL?
select id,name,balance from account where create_time> '2020-09-19' limit 100000,10;
The execution flow of this SQL is as follows:
- Through the ordinary secondary index tree
idx_create_time
, filtercreate_time
the conditions to find the primary key that meets the conditionsid
. - Through the primary key
id
, go back toid
the primary key index tree, find the row that satisfies the record, and then take out the columns that need to be displayed (the process of returning to the table) - Scan
100010
the rows that meet the condition, then throw away the previous100000
row, and return.
Therefore, there are two reasons for the slowdown of SQL due to the deep paging of limit:
limit
The statement scansoffset+n
the rows first, then discards the previousoffset
row, and returns the datan
of the next row . That islimit 100000,10
, rows are scanned100010
, andlimit 0,10
only10
rows are scanned.limit 100000,10
Scanning more rows also means returning to the table more times.
How to optimize the deep paging problem?
We can optimize by reducing the number of times back to the table. Generally, there are label recording method and delayed correlation method .
label recording method
It is to mark which item was queried last time, and when you check next time, start scanning down from this item. It's like reading a book, you can fold it or place a bookmark where you saw it last time, and when you read it next time, you can just turn it over.
Assuming that it was recorded last time 100000
, the SQL can be modified as:
select id,name,balance FROM account where id > 100000 limit 10;
In this case, no matter how many pages are turned later, the performance will be good, because the id index is hit. But this method has limitations: a field similar to continuous self-increment is required.
delayed correlation method
The delayed association method is to transfer the condition to the primary key index tree , and then reduce it back to the table. as follows
select acct1.id,acct1.name,acct1.balance FROM account acct1 INNER JOIN (SELECT a.id FROM account a WHERE a.create_time > '2020-09-19' limit 100000, 10) AS acct2 on acct1.id= acct2.id;
The idea of optimization is to first idx_create_time
search through the secondary index tree to meet the conditions 主键ID
, and then connect to the original table through 主键ID
an inner connection, so that the primary key index is directly used later, and the return table is also reduced.
2.4 Case 4: Too many in elements
If it is used in
, even if the following conditions are indexed, it is still necessary to pay attention to in
the following elements not to be too many. in
It is generally recommended not to exceed 200
the number of elements. If it exceeds, it is recommended to group them in groups of 200 at a time.
Counterexample:
select user_id,name from user where user_id in (1,2,3...1000000);
If we in
do not impose any restrictions on the conditions, the query statement may query a lot of data at one time, which will easily cause the interface to time out. Especially sometimes, we use subquery, you don’t know the number of subqueries after in , which is easier to mine. The following subquery:
select * from user where user_id in (select author_id from artilce where type = 1);
What if type = 1
there are one thousand, or even tens of thousands? Definitely slow SQL. Indexing is generally recommended to be done in batches, 200 at a time, for example:
select user_id,name from user where user_id in (1,2,3...200);
Why is the in query slow?
This is because
in
the query isn*m
searched through the bottom layer of MySQL, similarunion
. When the in query calculates the cost (cost = number of tuples * average value of IO), the number of tuples is obtained by querying the values contained in in one by one, so the calculation process will be relatively slow, so MySQL sets A critical value (eq_range_index_dive_limit), after 5.6 and beyond this critical value, the cost of the column will not be involved in the calculation. Therefore, it will lead to inaccurate execution plan selection. The default is 200, that is, the in condition exceeds 200 data, which will cause problems in the calculation of the cost of in, and may cause the index selected by Mysql to be inaccurate.
2.5 Case 5: slow query caused by order by file sorting
If order by uses file sorting, slow queries may occur. Let's look at the following SQL:
select name,age,city from staff where city = '深圳' order by age limit 10;
What it means is: query the first 10 employees from Shenzhen with their names, ages, and cities, and sort them in ascending order of age.
When viewing the explain execution plan, you can see the Extra column, there is one Using filesort
, which indicates that file sorting is used.
Why is the efficiency of order by file sorting low?
You can see this picture below:
order by
Sort, Divide 全字段排序
and rowid排序
. It is max_length_for_sort_data
compared with the data length of the result row. If the length of the result row data exceeds max_length_for_sort_data
this value, it will be rowid
sorted. Otherwise, it will be sorted by all fields.
2.5.1 rowid sorting
For rowid sorting, generally you need to go back to the table to find the data that meets the conditions, so the efficiency will be slower. The following SQL is sorted by rowid, and the execution process is as follows:
select name,age,city from staff where city = '深圳' order by age limit 10;
MySQL
Initialize for the corresponding threadsort_buffer
, put in the fields that need to be sortedage
, and主键id
;- From the index tree
idx_city
, find the first one that satisfiescity='深圳’
the condition主键id
, assumingid
it isX
; id索引树
Go to the row of data obtained by the primary keyid=X
, take the value of age and primary key id, and save itsort_buffer
;idx_city
Get the next record from the index tree主键id
, assumingid=Y
;- Repeat steps 3 and 4 until
city
the value of is not equal to Shenzhen; - The previous 5 steps have found all
city
the data for Shenzhen. Insort_buffer
, sort all the data according toage
; traverse the sorting results, take the first 10 rows, and return to the original table according to the value of id, take outcity、name 和 age
three fields and return them to the client .
2.5.2 Full Field Sorting
The same SQL, if it is sorted by the whole field is like this:
select name,age,city from staff where city = '深圳' order by age limit 10;
- MySQL initializes the corresponding thread
sort_buffer
and puts it into the fields that need to be queriedname、age、city
; - From the index tree
idx_city
, find the first primary key id that satisfiescity='深圳’
the condition , assuming it is foundid=X
; id=X
Go to the row of data obtained from the primary key id index tree , takename、age、city
the values of the three fields, and store them insort_buffer
;idx_city
Get the primary key of the next record from the index treeid
, assumingid=Y
;- Repeat steps 3 and 4 until
city
the value of is not equal to Shenzhen; city
In the previous 5 steps, all the data for Shenzhen have been found ,sort_buffer
and in , sort all the data according to age;- Take the first 10 rows according to the sorting results and return them to the client.
sort_buffer
The size of is controlled by a parameter: sort_buffer_size
.
- If the data to be sorted is
sort_buffer_size
lesssort_buffer
than - If the data to be sorted is larger than
sort_buffer_size
, then sort with the help of disk files.
With the help of disk file sorting, the efficiency is slower. Because the data is put in first
sort_buffer
, when it is almost full. It will sort the order, and then putsort_buffer
the data in it into the temporary disk file, wait until all the data that meets the conditions have been checked and sorted, and then use the merge algorithm to merge the temporarily sorted small files on the disk into an ordered large file. document.
2.5.3 How to optimize the file sorting of order by
order by
Using file sorting, the efficiency will be a little lower. How do we optimize it?
- Because the data is unordered, it needs to be sorted. If the data itself is ordered, then file sorting will not be used anymore. The index data itself is ordered, and we optimize
order by
the statement by building an index. - We can also
max_length_for_sort_data、sort_buffer_size
optimize by adjusting and other parameters;
2.6 Case 6: Using is null and is not null on the index field, the index may fail
Table Structure:
CREATE TABLE `user` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`card` varchar(255) DEFAULT NULL,
`name` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `idx_name` (`name`) USING BTREE,
KEY `idx_card` (`card`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8;
Adding an index to a single name
field and querying name
for a non-empty statement will actually use the index, as follows:
Adding an index to a single card
field and querying name
for a non-empty statement will actually use the index, as follows:
But if it is connected with or, the index will be invalid, as follows:
In many cases, it is also because of the amount of data that the MySQL optimizer gives up on indexing. At the same time, explain
when we usually analyze SQL, type=range
we should pay attention to it, because this may cause the index to be invalid due to the amount of data.
2.7 Case 7: Use (!= or < >) on the index field, the index may fail
Suppose there is a table structure:
CREATE TABLE `user` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`userId` int(11) NOT NULL,
`age` int(11) DEFAULT NULL,
`name` varchar(255) NOT NULL,
PRIMARY KEY (`id`),
KEY `idx_age` (`age`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8;
Although age is indexed, it is used! = or < >, not in these, the index is like a fake. as follows:
In fact, this is also related to the mySQL optimizer. If the optimizer thinks that even if the index is used, it still needs to scan many, many rows, it thinks it is not cost-effective, so it is better not to use the index. We usually use it! = or < >, when not in, be careful.
2.8 Case 8: left and right connection, the associated field encoding formats are different
Create two new tables, one user
, oneuser_job
CREATE TABLE `user` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) CHARACTER SET utf8mb4 DEFAULT NULL,
`age` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `idx_name` (`name`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8;
CREATE TABLE `user_job` (
`id` int(11) NOT NULL,
`userId` int(11) NOT NULL,
`job` varchar(255) DEFAULT NULL,
`name` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `idx_name` (`name`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
user
The fields of the table name
are encoded as utf8mb4
and the fields user_job
of the name
table are encoded as utf8
.
Execute the left outer join query, user_job
the table still scans the whole table, as follows:
If their name fields are changed to the same encoding, the same SQL will still be indexed.
2.9 Case 9: group by using temporary table
Group by is generally used for grouping statistics, and the logic it expresses is to group by certain rules. In daily development, we use it more frequently. It's easy to generate slow SQL if you're not careful.
2.9.1 group by execution process
Suppose there is a table structure:
CREATE TABLE `staff` (
`id` bigint(11) NOT NULL AUTO_INCREMENT COMMENT '主键id',
`id_card` varchar(20) NOT NULL COMMENT '身份证号码',
`name` varchar(64) NOT NULL COMMENT '姓名',
`age` int(4) NOT NULL COMMENT '年龄',
`city` varchar(64) NOT NULL COMMENT '城市',
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=15 DEFAULT CHARSET=utf8 COMMENT='员工表';
Let's look at the execution plan of this SQL:
explain select city ,count(*) as num from staff group by city;
- Extra This field
Using temporary
indicates that a temporary table is used when performing grouping - Extra The representation of this field
Using filesort
uses file sorting
How does group by use temporary tables and sorting? Let's take a look at the execution flow of this SQL
select city ,count(*) as num from staff group by city;
- Create a memory temporary table with two fields
city和num
; - Scan the staff records in the whole table, and fetch the records with city = 'X' in turn.
- Determine whether there are
city='X'
promising rows in the temporary table, if not, insert a record(X,1)
; - If
city='X'
there are , add 1 to the num value of the X row;
- After the traversal is completed,
city
sort according to the field, and return the result set to the client. The execution diagram of this process is as follows:
What is the sorting of the temporary table?
It is to put the fields that need to be sorted into the sort buffer, and return after sorting. Pay attention here, sorting is divided into full field sorting and rowid sorting
- If it is sorting by all fields, the fields returned by the query need to be put into the sort buffer, sorted according to the sorting fields, and returned directly
- If it is rowid sorting, just put the fields that need to be sorted into the sort buffer, and then return to the table one more time, and then return.
2.9.2 Where might group by be slow?
group by
Improper use can easily cause slow SQL
problems. Because it uses both temporary tables and sorting by default. Sometimes disk temporary tables may also be used.
- If during execution, you will find that the size of the temporary memory table has reached the upper limit (the parameter that controls this upper limit is
tmp_table_size
), the temporary memory table will be converted into a temporary disk table. - If the amount of data is large, it is likely that the disk temporary table required by this query will occupy a large amount of disk space.
2.9.3 How to optimize group by
From which direction to optimize?
- Direction 1: Since it will be sorted by default, it’s fine if we don’t rank it.
- Direction 2: Since the temporary table is the X factor that affects the performance of group by, can we not use the temporary table?
Let's think about it together, group by
why does the execution statement need a temporary table? group by
The semantic logic of is to count the number of occurrences of different values. If these values are ordered from the beginning, shouldn’t we just scan down and count directly, so we don’t need a temporary table to record and count the results?
These optimizations are possible:
- The fields after group by are indexed
- order by null no sorting
- Try to use only in-memory temporary tables
- Use SQL_BIG_RESULT
2.10 Case 10: delete + in subquery does not use the index!
I have seen a problem of slow production SQL before. When delete encounters an in subquery, even if there is an index, it will not use the index. The corresponding select + in subquery can use the index.
The MySQL version is 5.7, assuming that there are currently two tables account and old_account, the table structure is as follows:
CREATE TABLE `old_account` (
`id` int(11) NOT NULL AUTO_INCREMENT COMMENT '主键Id',
`name` varchar(255) DEFAULT NULL COMMENT '账户名',
`balance` int(11) DEFAULT NULL COMMENT '余额',
`create_time` datetime NOT NULL COMMENT '创建时间',
`update_time` datetime NOT NULL ON UPDATE CURRENT_TIMESTAMP COMMENT '更新时间',
PRIMARY KEY (`id`),
KEY `idx_name` (`name`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=1570068 DEFAULT CHARSET=utf8 ROW_FORMAT=REDUNDANT COMMENT='老的账户表';
CREATE TABLE `account` (
`id` int(11) NOT NULL AUTO_INCREMENT COMMENT '主键Id',
`name` varchar(255) DEFAULT NULL COMMENT '账户名',
`balance` int(11) DEFAULT NULL COMMENT '余额',
`create_time` datetime NOT NULL COMMENT '创建时间',
`update_time` datetime NOT NULL ON UPDATE CURRENT_TIMESTAMP COMMENT '更新时间',
PRIMARY KEY (`id`),
KEY `idx_name` (`name`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=1570068 DEFAULT CHARSET=utf8 ROW_FORMAT=REDUNDANT COMMENT='账户表';
The executed SQL is as follows:
delete from account where name in (select name from old_account);
Check the execution plan and find that the index is not taken:
But if delete is replaced by select, the index will be taken. as follows:
Why select + in
does the subquery use the index, delete + in
but the subquery does not use the index?
Let's execute the following SQL to see:
explain select * from account where name in (select name from old_account);
show WARNINGS; //可以查看优化后,最终执行的sql
The result is as follows:
select `test2`.`account`.`id` AS `id`,`test2`.`account`.`name` AS `name`,`test2`.`account`.`balance` AS `balance`,`test2`.`account`.`create_time` AS `create_time`,`test2`.`account`.`update_time` AS `update_time` from `test2`.`account`
semi join (`test2`.`old_account`)
where (`test2`.`account`.`name` = `test2`.`old_account`.`name`)
It can be found that during actual execution, MySQL optimizes the select in subquery and changes the subquery to join, so the index can be used. But unfortunately, for delete in
subqueries, MySQL does not optimize it.