MySQL optimization ideas + classic case analysis

1. Slow SQL optimization ideas.

  1. Slow Query Logging Slow SQL
  2. explain Analyze the execution plan of SQL
  3. Profile analysis execution time-consuming
  4. Optimizer Trace Analysis Details
  5. Identify problems and take appropriate action

1.1 Slow Query Logging Slow SQL

How to locate slow SQL? We can view slow SQL through the slow query log . By default, the MySQL database does not enable the slow query log ( slow query log). So we need to turn it on manually.

To view the slow query log configuration, we can use show variables like 'slow_query_log%'the command as follows:

img

  • slow query logIndicates that the slow query is enabled
  • slow_query_log_fileIndicates the location where the slow query log is stored

We can also use show variables like 'long_query_time'the command to check how long it takes to record the slow query log, as follows:

img

  • long_query_timeIndicates how many seconds the query takes before it is recorded in the slow query log.

We can locate those SQL statements with low execution efficiency by checking the logs slowly, and focus on analysis.

1.2 explain to view and analyze the execution plan of SQL

After locating the SQL with low query efficiency, you can use the explainviewed SQLexecution plan.

When used explainwith SQL, MySQL will display information from the optimizer about the statement's execution plan. That is, MySQLit explains how it will process the statement, including information about how the tables are joined and in what order.

A simple SQL, the effect of using it explainis as follows:

img

In general, we need to focus type、rows、filtered、extra、key.

1.2.1 type

type indicates the connection type , an important indicator to view the performance of the index. The following performance from best to worst:system > const > eq_ref > ref > ref_or_null > index_merge > unique_subquery > index_subquery > range > index > ALL

  • system: This type requires only one piece of data in the database table, which is consta special case of the type and generally does not appear.
  • const: The data can be found through one index. It is generally used as a primary key or a unique index as a condition. This type of scan is extremely efficient and very fast.
  • eq_ref: Commonly used for primary key or unique index scans, generally refers to associated queries using primary keys
  • ref : Commonly used for non-primary key and unique index scans.
  • ref_or_null: This join type is similar ref, except that rows containing values MySQL​​are additionally searched forNULL
  • index_merge: The index merge optimization method is used, and the query uses more than two indexes.
  • unique_subquery: similar eq_ref, the condition uses ina subquery
  • index_subquery: Different from unique_subquery, used for non-unique indexes, can return duplicate values.
  • range: Commonly used for range queries, such as: between ... and or In operations
  • index: full index scan
  • ALL: full table scan

1.2.2 rows

This column represents the number of rows MySQL estimates it needs to read to find the record we need. For InnoDB tables, this number is an estimate and not necessarily an exact value.

1.2.3 filtered

This column is a percentage value, the percentage of the number of records in the table that meet the criteria. To put it simply, this field indicates the proportion of the number of records that meet the conditions after the data returned by the storage engine is filtered.

1.2.4 extra

This field contains additional information about how MySQL parses the query, and it generally has these values:

  • Using filesort: Indicates sorting by file, which usually occurs when the specified sorting and index sorting are inconsistent. Generally seen in the order by statement
  • Using index: Indicates whether a covering index is used.
  • Using temporary: Indicates whether a temporary table is used. The performance is particularly poor and needs to be optimized. Generally seen in the group by statement, or the union statement.
  • Using where : Indicates the use of where conditional filtering.
  • Using index condition: The new index after MySQL5.6 is pushed down. Data filtering is performed at the storage engine layer instead of at the service layer, and the existing data of the index is used to reduce the data returned to the table.

1.2.5 key

This column represents the actual index used. Generally possible_keysread together with the column.

1.3 profile analysis execution time-consuming

explainIt is only SQLthe estimated execution plan that you see. If you want to know SQLthe real execution thread status and the time consumed , you need to use it profiling. After the parameter is enabled profiling, subsequent executed SQLstatements will record their resource overhead, including IO,上下文切换,CPU,内存etc. We can further analyze the bottleneck of the current slow SQL based on these overheads and then further optimize them.

profilingThe default is off, we can use show variables like '%profil%'to check whether it is on, as follows:

img

can be used to set profiling=ONenable. After opening, you can run a few SQL, and then use show profilesit to check.

img

show profilesIt will display multiple statements recently sent to the server. The number of statements is profiling_history_sizedefined by variables, and the default is 15. If we need to see the analysis of a single SQL, we can show profileview the analysis of the latest SQL, or use show profile for query id(where id is the QUERY_ID in show profiles) to view the analysis of a specific SQL statement.

img

In addition to viewing the profile, you can also view the cpu and io, as shown above.

1.4 Optimizer Trace analysis details

The profile can only view the execution time of SQL, but cannot see the actual process information of SQL execution, that is, it does not know how the MySQL optimizer selects the execution plan. At this time, we can use Optimizer Traceit, which can track the whole process of parsing, optimizing and executing the executed statement.

We can use set optimizer_trace="enabled=on"the open switch, then execute the SQL to be traced, and finally execute select * from information_schema.optimizer_tracethe trace, as follows:

img

You can view and analyze its execution tree, which will include three stages:

  • join_preparation: preparation phase
  • join_optimization: analysis phase
  • join_execution: execution phase

img

1.5 Identify the problem and take corresponding measures

Finally, confirm the problem and take corresponding measures.

  • Most of the slow SQL is related to the index, such as no index, the index is not effective, unreasonable, etc. At this time, we can optimize the index .
  • We can also optimize SQL statements, such as some problems with too many in elements (batching), deep paging problems (based on the last data filtering, etc.), and time-segmented queries
  • SQl can't be optimized well, you can use ES method or data warehouse instead.
  • If the large amount of data in a single table leads to slow query, you can consider sub-database sub-table
  • If the database is flushing dirty pages and causing slow queries, consider whether some parameters can be optimized, and discuss the optimization plan with the DBA
  • If the amount of stock data is too large, consider whether some data can be archived

2. Classic case analysis of slow query

2.1 Case 1: Implicit conversion

We create a user table

CREATE TABLE user (
  id int(11) NOT NULL AUTO_INCREMENT,
  userId varchar(32) NOT NULL,
  age  varchar(16) NOT NULL,
  name varchar(255) NOT NULL,
  PRIMARY KEY (id),
  KEY idx_userid (userId) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

userIdThe field is a string type, which is a common index of the B+ tree. If a number is passed in the query condition, the index will become invalid. as follows:

img

If you add '' to the number, that is to say, what is passed is a string, of course it is indexed, as shown in the figure below:

img

Why is the first statement not indexed without single quotes? This is because when the single quotes are not added, it is a comparison between a string and a number, and their types do not match. MySQL will perform an implicit type conversion and convert them to floating point numbers for comparison. Implicit type conversion, the index will be invalid.

2.2 Case 2: leftmost match

When MySQl builds a joint index, it will follow the principle of leftmost prefix matching, that is, leftmost first. If you create a (a,b,c)joint index, it is equivalent to creating (a)、(a,b)、(a,b,c)three indexes.

Suppose you have the following table structure:

CREATE TABLE user (
  id int(11) NOT NULL AUTO_INCREMENT,
  user_id varchar(32) NOT NULL,
  age  varchar(16) NOT NULL,
  name varchar(255) NOT NULL,
  PRIMARY KEY (id),
  KEY idx_userid_name (user_id,name) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Suppose there is a joint index idx_userid_name, we now execute the following SQL, if the query column is name, the index is invalid:

explain select * from user where name ='捡田螺的小男孩';

img

Because the query condition column is not the first column in namethe joint index and does not satisfy the leftmost matching principle, the index does not take effect. idx_userid_nameIn a joint index, only when the query condition meets the leftmost matching principle, the index will take effect normally. As follows, the query condition column isuser_id

img

2.3 Case 3: Deep paging problem

limitThe problem of deep paging will lead to slow queries, which should be commonplace for everyone.

Why does limit deep paging slow down? Suppose there is a table structure as follows:

CREATE TABLE account (
  id int(11) NOT NULL AUTO_INCREMENT COMMENT '主键Id',
  name varchar(255) DEFAULT NULL COMMENT '账户名',
  balance int(11) DEFAULT NULL COMMENT '余额',
  create_time datetime NOT NULL COMMENT '创建时间',
  update_time datetime NOT NULL ON UPDATE CURRENT_TIMESTAMP COMMENT '更新时间',
  PRIMARY KEY (id),
  KEY idx_name (name),
  KEY idx_create_time (create_time) //索引
) ENGINE=InnoDB AUTO_INCREMENT=1570068 DEFAULT CHARSET=utf8 ROW_FORMAT=REDUNDANT COMMENT='账户表';

Do you know what the execution process is like for the following SQL?

select id,name,balance from account where create_time> '2020-09-19' limit 100000,10;

The execution flow of this SQL is as follows:

  1. Through the ordinary secondary index tree idx_create_time, filter create_timethe conditions to find the primary key that meets the conditions id.
  2. Through the primary key id, go back to idthe primary key index tree, find the row that satisfies the record, and then take out the columns that need to be displayed (the process of returning to the table)
  3. Scan 100010the rows that meet the condition, then throw away the previous 100000row, and return.

img

Therefore, there are two reasons for the slowdown of SQL due to the deep paging of limit:

  • limitThe statement scans offset+nthe rows first, then discards the previous offsetrow, and returns the data nof the next row . That is limit 100000,10, rows are scanned 100010, and limit 0,10only 10rows are scanned.
  • limit 100000,10Scanning more rows also means returning to the table more times.

How to optimize the deep paging problem?

We can optimize by reducing the number of times back to the table. Generally, there are label recording method and delayed correlation method .

label recording method

It is to mark which item was queried last time, and when you check next time, start scanning down from this item. It's like reading a book, you can fold it or place a bookmark where you saw it last time, and when you read it next time, you can just turn it over.

Assuming that it was recorded last time 100000, the SQL can be modified as:

select  id,name,balance FROM account where id > 100000 limit 10;

In this case, no matter how many pages are turned later, the performance will be good, because the id index is hit. But this method has limitations: a field similar to continuous self-increment is required.

delayed correlation method

The delayed association method is to transfer the condition to the primary key index tree , and then reduce it back to the table. as follows

select  acct1.id,acct1.name,acct1.balance FROM account acct1 INNER JOIN (SELECT a.id FROM account a WHERE a.create_time > '2020-09-19' limit 100000, 10) AS acct2 on acct1.id= acct2.id;

The idea of ​​optimization is to first idx_create_timesearch through the secondary index tree to meet the conditions 主键ID, and then connect to the original table through 主键IDan inner connection, so that the primary key index is directly used later, and the return table is also reduced.

2.4 Case 4: Too many in elements

If it is used in, even if the following conditions are indexed, it is still necessary to pay attention to inthe following elements not to be too many. inIt is generally recommended not to exceed 200the number of elements. If it exceeds, it is recommended to group them in groups of 200 at a time.

Counterexample:

select user_id,name from user where user_id in (1,2,3...1000000); 

If we indo not impose any restrictions on the conditions, the query statement may query a lot of data at one time, which will easily cause the interface to time out. Especially sometimes, we use subquery, you don’t know the number of subqueries after in , which is easier to mine. The following subquery:

select * from user where user_id in (select author_id from artilce where type = 1);

What if type = 1there are one thousand, or even tens of thousands? Definitely slow SQL. Indexing is generally recommended to be done in batches, 200 at a time, for example:

select user_id,name from user where user_id in (1,2,3...200);

Why is the in query slow?

This is because inthe query is n*msearched through the bottom layer of MySQL, similar union. When the in query calculates the cost (cost = number of tuples * average value of IO), the number of tuples is obtained by querying the values ​​contained in in one by one, so the calculation process will be relatively slow, so MySQL sets A critical value (eq_range_index_dive_limit), after 5.6 and beyond this critical value, the cost of the column will not be involved in the calculation. Therefore, it will lead to inaccurate execution plan selection. The default is 200, that is, the in condition exceeds 200 data, which will cause problems in the calculation of the cost of in, and may cause the index selected by Mysql to be inaccurate.

2.5 Case 5: slow query caused by order by file sorting

If order by uses file sorting, slow queries may occur. Let's look at the following SQL:

select name,age,city from staff where city = '深圳' order by age limit 10;

What it means is: query the first 10 employees from Shenzhen with their names, ages, and cities, and sort them in ascending order of age.

img

When viewing the explain execution plan, you can see the Extra column, there is one Using filesort, which indicates that file sorting is used.

Why is the efficiency of order by file sorting low?

You can see this picture below:

img

order bySort, Divide 全字段排序and rowid排序. It is max_length_for_sort_datacompared with the data length of the result row. If the length of the result row data exceeds max_length_for_sort_datathis value, it will be rowidsorted. Otherwise, it will be sorted by all fields.

2.5.1 rowid sorting

For rowid sorting, generally you need to go back to the table to find the data that meets the conditions, so the efficiency will be slower. The following SQL is sorted by rowid, and the execution process is as follows:

select name,age,city from staff where city = '深圳' order by age limit 10;
  1. MySQLInitialize for the corresponding thread sort_buffer, put in the fields that need to be sorted age, and 主键id;
  2. From the index tree idx_city, find the first one that satisfies city='深圳’the condition 主键id, assuming idit is X;
  3. id索引树Go to the row of data obtained by the primary key id=X, take the value of age and primary key id, and save it sort_buffer;
  4. idx_cityGet the next record from the index tree 主键id, assuming id=Y;
  5. Repeat steps 3 and 4 until citythe value of is not equal to Shenzhen;
  6. The previous 5 steps have found all citythe data for Shenzhen. In sort_buffer, sort all the data according to age; traverse the sorting results, take the first 10 rows, and return to the original table according to the value of id, take out city、name 和 agethree fields and return them to the client .

img

2.5.2 Full Field Sorting

The same SQL, if it is sorted by the whole field is like this:

select name,age,city from staff where city = '深圳' order by age limit 10;
  1. MySQL initializes the corresponding thread sort_bufferand puts it into the fields that need to be queried name、age、city;
  2. From the index tree idx_city, find the first primary key id that satisfies city='深圳’the condition , assuming it is found id=X;
  3. id=XGo to the row of data obtained from the primary key id index tree , take name、age、citythe values ​​of the three fields, and store them in sort_buffer;
  4. idx_cityGet the primary key of the next record from the index tree id, assuming id=Y;
  5. Repeat steps 3 and 4 until citythe value of is not equal to Shenzhen;
  6. cityIn the previous 5 steps, all the data for Shenzhen have been found , sort_bufferand in , sort all the data according to age;
  7. Take the first 10 rows according to the sorting results and return them to the client.

img

sort_bufferThe size of is controlled by a parameter: sort_buffer_size.

  • If the data to be sorted is sort_buffer_sizeless sort_bufferthan
  • If the data to be sorted is larger than sort_buffer_size, then sort with the help of disk files.

With the help of disk file sorting, the efficiency is slower. Because the data is put in first sort_buffer, when it is almost full. It will sort the order, and then put sort_bufferthe data in it into the temporary disk file, wait until all the data that meets the conditions have been checked and sorted, and then use the merge algorithm to merge the temporarily sorted small files on the disk into an ordered large file. document.

2.5.3 How to optimize the file sorting of order by

order byUsing file sorting, the efficiency will be a little lower. How do we optimize it?

  • Because the data is unordered, it needs to be sorted. If the data itself is ordered, then file sorting will not be used anymore. The index data itself is ordered, and we optimize order bythe statement by building an index.
  • We can also max_length_for_sort_data、sort_buffer_sizeoptimize by adjusting and other parameters;

2.6 Case 6: Using is null and is not null on the index field, the index may fail

Table Structure:

CREATE TABLE `user` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `card` varchar(255) DEFAULT NULL,
  `name` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_name` (`name`) USING BTREE,
  KEY `idx_card` (`card`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8;

Adding an index to a single namefield and querying namefor a non-empty statement will actually use the index, as follows:

img

Adding an index to a single cardfield and querying namefor a non-empty statement will actually use the index, as follows:

img

But if it is connected with or, the index will be invalid, as follows:

img

In many cases, it is also because of the amount of data that the MySQL optimizer gives up on indexing. At the same time, explainwhen we usually analyze SQL, type=rangewe should pay attention to it, because this may cause the index to be invalid due to the amount of data.

2.7 Case 7: Use (!= or < >) on the index field, the index may fail

Suppose there is a table structure:

CREATE TABLE `user` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `userId` int(11) NOT NULL,
  `age` int(11) DEFAULT NULL,
  `name` varchar(255) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_age` (`age`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8;

Although age is indexed, it is used! = or < >, not in these, the index is like a fake. as follows:

img

In fact, this is also related to the mySQL optimizer. If the optimizer thinks that even if the index is used, it still needs to scan many, many rows, it thinks it is not cost-effective, so it is better not to use the index. We usually use it! = or < >, when not in, be careful.

2.8 Case 8: left and right connection, the associated field encoding formats are different

Create two new tables, one user, oneuser_job

CREATE TABLE `user` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(255) CHARACTER SET utf8mb4 DEFAULT NULL,
  `age` int(11) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_name` (`name`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8;

CREATE TABLE `user_job` (
  `id` int(11) NOT NULL,
  `userId` int(11) NOT NULL,
  `job` varchar(255) DEFAULT NULL,
  `name` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_name` (`name`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

userThe fields of the table nameare encoded as utf8mb4and the fields user_jobof the nametable are encoded as utf8.

img

Execute the left outer join query, user_jobthe table still scans the whole table, as follows:

img

If their name fields are changed to the same encoding, the same SQL will still be indexed.

img

2.9 Case 9: group by using temporary table

Group by is generally used for grouping statistics, and the logic it expresses is to group by certain rules. In daily development, we use it more frequently. It's easy to generate slow SQL if you're not careful.

2.9.1 group by execution process

Suppose there is a table structure:

CREATE TABLE `staff` (
  `id` bigint(11) NOT NULL AUTO_INCREMENT COMMENT '主键id',
  `id_card` varchar(20) NOT NULL COMMENT '身份证号码',
  `name` varchar(64) NOT NULL COMMENT '姓名',
  `age` int(4) NOT NULL COMMENT '年龄',
  `city` varchar(64) NOT NULL COMMENT '城市',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=15 DEFAULT CHARSET=utf8 COMMENT='员工表';

Let's look at the execution plan of this SQL:

explain select city ,count(*) as num from staff group by city;

img

  • Extra This field Using temporaryindicates that a temporary table is used when performing grouping
  • Extra The representation of this field Using filesortuses file sorting

How does group by use temporary tables and sorting? Let's take a look at the execution flow of this SQL

select city ,count(*) as num from staff group by city;
  1. Create a memory temporary table with two fields city和num;
  2. Scan the staff records in the whole table, and fetch the records with city = 'X' in turn.
  • Determine whether there are city='X'promising rows in the temporary table, if not, insert a record (X,1);
  • If city='X'there are , add 1 to the num value of the X row;
  1. After the traversal is completed, citysort according to the field, and return the result set to the client. The execution diagram of this process is as follows:

img

What is the sorting of the temporary table?

It is to put the fields that need to be sorted into the sort buffer, and return after sorting. Pay attention here, sorting is divided into full field sorting and rowid sorting

  • If it is sorting by all fields, the fields returned by the query need to be put into the sort buffer, sorted according to the sorting fields, and returned directly
  • If it is rowid sorting, just put the fields that need to be sorted into the sort buffer, and then return to the table one more time, and then return.

2.9.2 Where might group by be slow?

group byImproper use can easily cause slow SQLproblems. Because it uses both temporary tables and sorting by default. Sometimes disk temporary tables may also be used.

  • If during execution, you will find that the size of the temporary memory table has reached the upper limit (the parameter that controls this upper limit is tmp_table_size), the temporary memory table will be converted into a temporary disk table.
  • If the amount of data is large, it is likely that the disk temporary table required by this query will occupy a large amount of disk space.

2.9.3 How to optimize group by

From which direction to optimize?

  • Direction 1: Since it will be sorted by default, it’s fine if we don’t rank it.
  • Direction 2: Since the temporary table is the X factor that affects the performance of group by, can we not use the temporary table?

Let's think about it together, group bywhy does the execution statement need a temporary table? group byThe semantic logic of is to count the number of occurrences of different values. If these values ​​are ordered from the beginning, shouldn’t we just scan down and count directly, so we don’t need a temporary table to record and count the results?

These optimizations are possible:

  • The fields after group by are indexed
  • order by null no sorting
  • Try to use only in-memory temporary tables
  • Use SQL_BIG_RESULT

2.10 Case 10: delete + in subquery does not use the index!

I have seen a problem of slow production SQL before. When delete encounters an in subquery, even if there is an index, it will not use the index. The corresponding select + in subquery can use the index.

The MySQL version is 5.7, assuming that there are currently two tables account and old_account, the table structure is as follows:

CREATE TABLE `old_account` (
  `id` int(11) NOT NULL AUTO_INCREMENT COMMENT '主键Id',
  `name` varchar(255) DEFAULT NULL COMMENT '账户名',
  `balance` int(11) DEFAULT NULL COMMENT '余额',
  `create_time` datetime NOT NULL COMMENT '创建时间',
  `update_time` datetime NOT NULL ON UPDATE CURRENT_TIMESTAMP COMMENT '更新时间',
  PRIMARY KEY (`id`),
  KEY `idx_name` (`name`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=1570068 DEFAULT CHARSET=utf8 ROW_FORMAT=REDUNDANT COMMENT='老的账户表';

CREATE TABLE `account` (
  `id` int(11) NOT NULL AUTO_INCREMENT COMMENT '主键Id',
  `name` varchar(255) DEFAULT NULL COMMENT '账户名',
  `balance` int(11) DEFAULT NULL COMMENT '余额',
  `create_time` datetime NOT NULL COMMENT '创建时间',
  `update_time` datetime NOT NULL ON UPDATE CURRENT_TIMESTAMP COMMENT '更新时间',
  PRIMARY KEY (`id`),
  KEY `idx_name` (`name`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=1570068 DEFAULT CHARSET=utf8 ROW_FORMAT=REDUNDANT COMMENT='账户表';

The executed SQL is as follows:

delete from account where name in (select name from old_account);

Check the execution plan and find that the index is not taken:

img

But if delete is replaced by select, the index will be taken. as follows:

img

Why select + indoes the subquery use the index, delete + inbut the subquery does not use the index?

Let's execute the following SQL to see:

explain select * from account where name in (select name from old_account);
show WARNINGS; //可以查看优化后,最终执行的sql

The result is as follows:

select `test2`.`account`.`id` AS `id`,`test2`.`account`.`name` AS `name`,`test2`.`account`.`balance` AS `balance`,`test2`.`account`.`create_time` AS `create_time`,`test2`.`account`.`update_time` AS `update_time` from `test2`.`account` 
semi join (`test2`.`old_account`)
where (`test2`.`account`.`name` = `test2`.`old_account`.`name`)

It can be found that during actual execution, MySQL optimizes the select in subquery and changes the subquery to join, so the index can be used. But unfortunately, for delete insubqueries, MySQL does not optimize it.

Guess you like

Origin blog.csdn.net/zhw21w/article/details/129563446