High-performance MySQL in practice (3): Performance optimization

34e84bc7db4828560fa10e6d4e31d35d.gif

This article mainly introduces some methods for optimizing slow SQL. Before explaining the specific optimization measures, I would like to introduce EXPLAIN first. It is a necessary operation when we analyze the query. It is more beneficial to understand the content of its output results. We optimize SQL. In order to facilitate everyone's reading, it is specified below that something like key1 represents the secondary index, key_part1 represents the first part of the joint index, unique_key1 represents the unique secondary index, and primary_key represents the primary key index. High-performance MySQL practice (1): table structure  and high-performance MySQL practice (2): index are the prerequisite knowledge of this article, everyone is welcome to read it.

1. Explain in detail

Explain is a commonly used statement before optimizing slow SQL. It can analyze the specific query plan, allowing us to optimize it purposefully. This section is mainly to let everyone understand what each column of the Explain query result is used for. Let's first take a brief look at the function of each column:

List
describe

id

In a large query statement, each SELECT keyword corresponds to a unique id. In a join query, the id values ​​of the records are the same; in a query with multiple SELECT keywords, the query optimizer may optimize the subquery so that the id values ​​of multiple SELECT records are the same.

select_type

Query type

table

Table Name

partitions

Matching partition information

type

Access method for single table

possible_keys

Possible indexes

key

actual index used

key_len

actual index length used

ref

When using the index column equivalence query, the object information that matches the index column for equivalence.

rows

Estimated number of records to be read

filtered

For the estimated records that need to be read, the percentage of remaining records after filtering by search conditions. It is meaningless in a single table query. In a joined table query, you can calculate how many times the query needs to be executed on the driven table after the driver table completes the query.

Extra

Additional notes

Most of these columns have been explained clearly enough in the description information. Below we mainly detail some necessary columns:

1.1 select_type
  • SIMPLE: Query that does not contain UNION or subquery in the query statement

  • PRIMARY: For a large query containing UNION, UNION ALL or subquery, it is composed of several small queries, in which the select_type of the leftmost query is PRIMARY

  • UNION: For a large query containing UNION and UNION ALL, it is composed of several small queries. Except for the leftmost small query, the select_type of the other small queries is UNION.

  • UNION RESULT: MySQL chooses to use a temporary table to complete the deduplication of UNION queries. The select_type of the query for this temporary table is UNION RESULT.

  • DEPENDENT UNION: UNION query related types

  • SUBQUERY, DEPENDENT SUBQUERY, MATERIALIZED: subquery related types

  • DERIVED: In a query that contains a derived table, the query is executed as a materialized derived table

1.2 type
  • const: Locate a record through equality comparison of the primary key or unique secondary index with a constant . If it is a joint index, this const access is only valid when each column of the index column is compared with the constant for equality.

  • ref: Through equivalent comparison between the secondary index and the constant, the scan interval formed is the access of the single-point scan interval.

  • ref_or_null: Compared with ref, some more secondary index columns with NULL values ​​are scanned.

  • range: When using an index to execute a query, the corresponding scan interval is access to several single-point scan intervals or range scan intervals.

  • index: Use a covering index and scan access to all secondary indexes. In addition, when executing a query on a table using the InnoDB engine through a full table scan, if an ORDER BY primary key statement is added , the statement will also be considered an index access when executed.

  • fulltext: full text index access

  • all: full table scan

  • eq_ref: When executing a join query , if the driven table is accessed through a primary key or a unique secondary index that does not allow NULL, equal value matching

In outer joins, the ON statement is specifically proposed for the scenario where "when a record in the driving table cannot find a matching record in the driven table, each field of the corresponding driven table record is filled with NULL"; in inner joins , ON and WHERE have the same effect

  • unique_subquery: For some query statements containing IN subqueries, if the query optimizer decides to convert the IN subquery into an EXISTS subquery, and the subquery can be executed using the primary key or a unique secondary index that allows NULL after conversion. Equivalent matching

  • index_subquery: Similar to unique_subquery, except that a normal secondary index is used when accessing

  • index_merge: Index merge exists

  • system: When there is only one record in the table and the statistics of the storage engine used are accurate (such as MyISAM and MEMORY)

1.3 ref

When the access method is one of const, ref, ref_or_null, eq_ref, unique_subquery, and index_subquery, the ref column shows what is equivalent to the index column :

  • const: represents a constant

  • func: represents a function

  • DBName.TableName.columnName: represents a column in a table of a database

1.4 Extra
  • No Table used: There is no FROM clause in the query statement

  • Impossible WHERE: The WHERE condition in the query statement is always FALSE

  • No matching min/max row: When there is a min or max aggregate function in the query, but no record meets the WHERE condition

  • Using Index: using a covering index

  • Using Index condition: The index condition pushdown feature is used when executing the query statement.

Index condition pushdown: It is optimized for the secondary index query conditions. When judging the secondary index conditions, all the conditions of the index-related columns will be judged, and then the table return operation will be performed if the conditions are met. If the conditions are met, table return will no longer be performed. This reduces the number of table return operations and thus reduces I/O. 

Examples below:

select * from specific_table where key1 > 'a' and key1 like '%b'; 

Pushing down the index condition will judge all conditions of key1 instead of returning to the table only after judging key1 > 'a'.

  • Using join buffer(Block Nested Loop): Indicates that when executing a join query, the driven table cannot effectively use the index to speed up access, but uses memory blocks to speed up the query.

  • Using intersect(index_name, ...), Using union(index_name, ...) and Using sort union(index_name, ...): Indicates using Intersection index merge, Union index merge or Sort-Union index merge to execute the query (below There is an introduction)

  • Using filesort: File sorting, sorting cannot use the index and can only be sorted in memory or disk.

  • Using temporary: The internal temporary table is used during query

2. Optimization considerations

Optimize based on access type

In the previous article, we have introduced the access type (type) in the EXPLAIN statement in detail. If the access type of a query is not what we expected, the simplest and most direct solution is to add an appropriate index to the search condition column .

Optimization to reduce the number of scan lines

In some cases, simply adding indexes does not solve the problem, such as executing the following SQL:

 
  
select name, count(name) from specific_table group by key1;

After this SQL is executed, only a few rows of data may be returned, but because of the COUNT aggregate function, the data that needs to be scanned may be thousands of rows, depending on the total amount of data in the table. For this kind of situation where a large amount of data is scanned but only a few rows are returned , it can usually be optimized by adding a separate summary table . Of course, this requires adding corresponding logic at the application layer to maintain the data in the summary table.

In addition, you can also optimize by rewriting complex queries . Below we introduce the directions that need to be considered when rewriting queries:

One complex query or multiple simple queries?

This is a question worth considering. Split complex queries into multiple simple queries to reduce the work of the database as much as possible, and move some processing logic to the application layer. Because MySQL processes simple queries very efficiently, doing so can usually improve efficiency.

Segmentation

In actual work, when carrying forward (or deleting) a database table with a large amount of data, the segmentation method is usually used to divide a large query into small queries. The function of each query is the same, but the operation The amount of data is different. After each small query is executed, the task of the large query is completed.

Carrying forward a large amount of data at one time may lock a lot of data, occupy the entire transaction log, exhaust system resources, and block many small queries. In order to avoid this situation, usually only 10,000 items are operated in a data forwarding task. This will have the least impact on the server, and each time the transfer is completed, it can be paused for a while before executing the next task. This can spread the pressure over a longer period of time and greatly reduce the impact on the server. and reduce the time the lock is held.

Optimize join queries

If there are too many joint tables, we need to split them into multiple queries or multiple single-table queries ( the cache efficiency of single-table queries will be higher ). After the query is decomposed, the lock competition between queries will be reduced. In addition, you need to pay attention to the following two points when querying joint tables:

  • Make sure there are indexes on the columns in the ON or USING clause

  • Make sure that any expressions in GROUP BY and ORDER BY only refer to columns in one table so that MySQL can potentially use the index to optimize the query.

IN() condition and OR condition

Generally speaking, we think that IN() is completely equivalent to multiple OR conditions, but there is a difference between the two in MySQL. When MySQL handles the IN() condition, it will sort the data in the list first, and then use binary search to determine whether the values ​​in the list meet the condition. This is an operation with a time complexity of O(logn). If Equivalently converted into an OR query, its time complexity is O(n), so when there are a large number of values ​​in the IN() condition, MySQL will process faster.

Whether the index is invalid during query
  • The index cannot be used if the search does not start at the leftmost column of the index

  • If columns in a union index are skipped, the index cannot be used or only a partial index can be used. There is the following SQL, where key_part1, key_part2 and key_part3 are joint indexes in order

 
  
select key_part1, key_part2, key_part3 from specific_table
where key_part1 = 1 and key_part3 = 3;

If key_part2 is omitted in the query condition, then only the first column of the index can be used. If key_part1 is omitted, the joint index cannot be used.

  • If there is a range query for a column in the query, all columns to the right of it cannot use index optimization to query or sort. In this case, if the number of range query column values ​​is limited, the range query can be replaced by multiple equal matches using OR connection

  • If the column name in the search condition does not appear alone in the form of a column name, but uses an expression or function, then the index cannot be used. As shown in the following SQL, the key1 column appears in the form of key1 * 2 and the index will not be used.

 
  
select * from specific_table where key1 * 2 > 4;
  • If you use a fuzzy query starting with % for a variable-length field, the index will not be used. This is easier to understand, because MySQL sorts strings character by character. If you use % at the beginning, the comparison cannot be completed and you can only use a full table scan.

Whether the index is invalid during sorting
  • If the order of the columns following the ORDER BY statement is not given in the column order of the joint index, the index cannot be used

  • If ASC and DESC are mixed, the index cannot be used

There is the following SQL, in which key_part1 and key_part2 are joint indexes in order, and the index cannot be used during execution.

 
  
select key_part1, key_part2 from specific_table 
order by key_part1, key_part2 desc;

In MySQL version 8.0, it is possible to support mixed use of ASC and DESC indexes.

  • If the sorting column contains columns that are not the same index, the index cannot be used, as shown in the following SQL

 
  
select id, key1, key2 from specific_table order by key1, key2;

Because they are not the same index, when key1 is the same, they will not be sorted according to the key2 column, so the index is not used.

  • If the sorting columns are index columns of a joint index, but these sorting columns are not consecutive in the joint index, then the index cannot be used. As shown in the following SQL, because the joint index is not sorted by key_part3 after sorting by key_part1, the index cannot be used.

 
  
select key_part1, key_part3 
from specific_table 
order by key_part1, key_part3;
  • If the sorting column does not appear as a separate column name in the ORDER BY statement, the index cannot be used. As shown in the following SQL, a function is used during sorting, so the index cannot be used

 
  
select id, key1, key2 from specific_table order by upper(key1)
Optimization of index columns not being empty

When Min() and Max() operations are required, having index columns that are not null can make them more efficient. For example, to find the minimum value of a certain column, you only need to query the leftmost record of the corresponding B-Tree index. The query optimizer will treat this expression as a constant, and you can find "Select tables optimized" in the Extra column of the ESPLAIN result. away".

Duplicate and redundant indexes

Duplicate indexes refer to the same type of indexes created on the same columns in the same order, as shown in the following SQL:

 
  
create table specific_table (
    id int not null primary key,
    unique key(id)
)engine=InnoDB;

It creates two identical indexes on the id column, and the unique index needs to be removed.

Redundant indexes usually occur when adding a new index to a table, such as adding an index (column_a, column_b) to an existing index (column_a). This is the situation where a redundant index occurs, because the second joint index can Plays the same role as a single column index.

In most cases, redundant indexes are not needed and we should try to extend existing indexes rather than create new ones.

Is there an index merge?

Creating multiple single-column indexes independently on multiple columns does not improve MySQL query performance in most cases.

There is an "index merging" strategy in MySQL, which can use multiple single-column indexes in the table to locate specified data rows and merge the scan results. The index merging strategy is sometimes very good, but more often it shows that the indexes in the table are poorly built :

  • When the query optimizer needs to merge multiple indexes, it usually means a union index containing all relevant columns, rather than multiple independent single-column indexes.

  • When the optimizer needs to merge multiple indexes, it usually needs to consume a lot of CPU and memory resources in the caching, sorting and merging operations of the algorithm, especially when some of the index column values ​​​​are not highly selective and need to merge scan returns of large amounts of data

  • The optimizer will not count these operations in the query cost, which will cause the query cost to be "underestimated", causing the execution plan to be worse than a full table scan.

Generally speaking, we need to consider rebuilding the index or using UNION to rewrite the query. In addition, the index merging function can be turned off by modifying the optimizer_switch parameter, as shown in the following SQL:

 
  
SELECT @@optimizer_switch;


-- 改成 index_merge=off 
set optimizer_switch = 'index_merge=off, ...';

You can also use the IGNORE INDEX syntax to let the optimizer ignore certain indexes, thereby preventing the optimizer from merging execution plans using indexes that include the index:

 
  
select * from specific_table ignore index(index_name)
where column_name = #{value};

In addition to considering ignoring the index when an index merge occurs, you also need to consider ignoring the index and using a full table scan when a suitable scan interval cannot be formed when executing a query and the purpose of reducing the number of scan records cannot be achieved.

Below we introduce three types of index merging, so that everyone can have a more complete understanding of index merging: they are Intersection index merging, Union index merging and Sort-Union index merging .

Intersection index merge

Let’s look at the following query:

select * from specific_table where key1 = 'a' and key2 = 'b';

What we all know is that when the index column values ​​are the same, the secondary index records are sorted according to the size of the primary key value . Then the primary key value filtered out by key1 and the primary key value filtered out by key2 can be intersected . , and then execute the table return operation based on the results. This is less expensive than returning the primary key values ​​filtered out for key1 and key2 respectively. In this case, the Intersection index merging strategy is used.

Union 索引合并

Let’s look at the following query:

 
  
select * from specific_table where key1 = 'a' or key2 = 'b';
Take the union of the primary key values ​​filtered out by key1 and the primary key values ​​filtered out by key2 , and then do the table return operation based on the results. This method is called Union index merging, which may be compared to directly doing a full table scan. The overhead should be low. It should be noted that Union index merging requires that the primary key values ​​filtered out by the secondary index are in order. If the primary key values ​​are out of order, Sort-Union index merging needs to be considered.
Sort-Union index merge

There is the following query:

 
  
select * from specific_table where key1 < 'a' or key2 > 'b';

We changed the above query conditions into range query conditions. Now the primary key values ​​filtered out by each index are out of order, so Union index merging cannot be used. Sort-Union index merging is added on the basis of Union index merging. Sorting operation: Sort the primary key values ​​filtered out by key1 and the primary key values ​​filtered out by key2 , so that you can continue to use the Union index to merge.

Optimize COUNT()

When we need to count results with values , we need to specify the column name or COUNT(0) in the COUNT() condition; when we need to count all rows , we need to specify COUNT(*), which will ignore all columns and Count all rows directly. After understanding these two points, we can communicate our intentions more clearly when doing data statistics.

Generally speaking, COUNT() queries need to scan a large number of data rows to obtain accurate results, so they are difficult to optimize. If the business scenario does not require complete accuracy, we can use EXPLAIN to estimate the number of rows instead; or we can remove some constraints in the query conditions and delete DISTINCT to avoid sorting operations. These practices may improve statistical query performance.

Optimizing UNION queries

When we use UNION query, if we do not need to eliminate duplicate rows, we must use UNION ALL, because if there is no ALL keyword, MySQL will add DISTINCT to the temporary table, which will deduplicate the data and the cost is relatively high. In addition, we can apply WHERE, LIMIT and ORDER BY statements to each query, which allows MySQL to better optimize them.

Optimize OFFSET

In paging queries, OFFSET will cause MySQL to scan a large number of unnecessary rows and then discard them. For example, the expression LIMIT 1000, 20 will query 1020 pieces of data and then discard the first 1000 pieces. This is very costly.

We can use bookmarks to record the "position" where the data was last read, so that the next query can start scanning directly from that position, avoiding the use of OFFSET. For example, each page displays 20 pieces of data, and we record that the data ID value of the current page is 200. Then when we look at the data on the next page, the query SQL is as follows:

 
  
select * from specific_table
where id <= 180
limit 20;

However, this situation also has shortcomings. It cannot specify the page number for query. For example, if I want to see the data on page 5, we cannot calculate the specific ID value range of the corresponding page. Unless we can ensure that the ID value is monotonically increasing and no data has been deleted, in this case, the ID value is continuous, we can easily calculate that the ID value of the data on page 5 starts from 120. The advantage of this is that it performs very well no matter how far back the page is turned.

Use WITH ROLLUP to optimize GROUP BY

We usually use GROUP BY to perform group aggregation queries. If we want to sum the grouped results again, we can use the WITH ROLLUP operation, but a better way is to take the WITH ROLLUP processing to the application layer.

OPTIMIZE TABLE

If we delete a lot of data , or when inserting data, it is not inserted in the increasing order of the primary key , it is likely to produce a lot of memory fragments, affecting the efficiency of data query. This is because when data is deleted, MySQL does not immediately clear them and organize the space, but marks them for deletion. OPTIMIZE TABLE can be used to organize the space and reduce memory fragmentation.

The InnoDB engine does not support the OPTIMIZE TABLE operation and will prompt the following message:

 
  
OPTIMIZE TABLE specific_table;


-- Table does not support optimize, doing recreate + analyze instead

We can reconstruct the expression for the above purpose through the ALTER command without doing any operation:

 
  
alter table specific_table engine=InnoDB;

After the execution is completed, we check the execution status through the following SQL. If the data_free column is 0, it means that our space defragmentation was successful.

 
  
show table status from specific_db like specific_table;

However, in most cases this is not required.

Find and repair corrupted tables

The index may be damaged due to hardware problems, MySQL itself defects or operating system problems. Of course, this problem is very rare. We can check most table and index errors through the following SQL:

 
  
check table specific_table;

If an exception is found, it can be repaired through the following SQL:

 
  
repair table specific_table;


-- 如果存储引擎不支持上述操作的话,也可通过表重建来完成
alter table specific_table engine=InnoDB;

References:

[1] "High-Performance MySQL Fourth Edition": Chapters 7 and 8

[2] "How MySQL works": Chapters 7, 10, 11, 14, and 15

[3] MySQL:optimizer_switch

[4] 8.9.4 Index Hints

[5] Mysql advanced: optimize table command

-end-

Guess you like

Origin blog.csdn.net/jdcdev_/article/details/133565446