"High Performance MySQL" Reading Notes (Part 2)

Table of contents

Optimization of Mysql query performance

Slow query basics

Optimize data access

Did you request unnecessary data from the database?

Unnecessary records were queried

Return all columns in multi-table join query

Is MySQL scanning for additional records

How to rewrite the query

Segment query (emphasis)

Decomposing join queries (emphasis)

How MySQL executes join queries

query optimizer

Sorting optimization (emphasis)

Limitations of the MySQL Query Optimizer

Optimize specific types of queries

Optimize count() query (key)

Optimizing Join Queries

Optimizing limit pagination for large amounts of data (emphasis)

Optimize union query (key)


Optimization of Mysql query performance

First of all, it is necessary to understand why the query speed is slow? If we regard a query as a task, then this task is composed of a series of subtasks, and each subtask will consume a certain amount of time when executed. If you want to optimize the execution efficiency of a query, you must actually optimize its subtasks. Tasks, either eliminate some subtasks, or reduce the number of executions of subtasks, or make subtasks execute faster.

To put it simply , the life cycle of a query is as follows: from the client to the server, then perform syntax analysis on the server, generate an execution plan, execute it , and return the result to the client. Among them , execution can be considered as the most important stage in the entire life cycle, which includes a large number of calls to the storage engine in order to retrieve data and data processing after the call , including sorting, grouping, etc.

In every case of a query that consumes a lot of time, we can see some unnecessary operations, such as some operations are performed repeatedly many times, some operations are performed too slowly , etc. The purpose of optimizing the query is to eliminate The time these operations take.

Slow query basics

Optimize data access

If the performance of a query statement is poor, the most common reason is that too much data is accessed. For inefficient queries, we can usually analyze it from the following two steps:

  • Confirm whether the application is retrieving large and unnecessary data (how to judge later). This usually means that too many rows have been accessed, but sometimes too many columns may have been accessed.

  • Confirm whether the MySQL server is analyzing (eg: sorting, grouping) a large number of unnecessary data rows .

Did you request unnecessary data from the database?

Some queries will request more data than actually needed, and then the excess data will be discarded by the application, and this process will increase network overhead , and it will also consume CPU and memory resources of the application server.

Unnecessary records were queried

MySQL queries will first return all the result sets and then perform calculations. For example, a query statement can query 100 pieces of data, but only the first 10 pieces of data need to be displayed on the page. Actually, MySQL will query the 100 pieces of data first . pieces of data, and then return all the 100 pieces of data to the client , and then discard most of the data.

Return all columns in multi-table join query

Can not select * do not select *. Query what columns you need. If all columns are taken out, the optimizer will not be able to complete optimizations such as index coverage scans, and will also bring additional IO, memory, and CPU consumption to the server.

Is MySQL scanning for additional records

After confirming that the query only returns the required data, the next step is to check whether the query scans too much data in order to return the results we need. For MySQL, the three easiest indicators to measure query overhead are as follows:

  • Response time (the specific response time is difficult to estimate, it can only be roughly estimated by some empirical methods, and will not be repeated here)

  • number of rows scanned

  • the number of rows returned

These three indicators will be recorded in MySQL's slow log . It should be noted that the ratio of the number of rows scanned to the number of rows returned is usually very low, usually 1:1 or 1:10.

The value of this type, except for All and const, is indexed. All means full table scan, and const means constant query.

In general, MySQL can apply where conditions in the following three ways, from best to worst:

  1. Use where conditions in the index to filter records that do not match. This is done in the storage engine.

  2. Use index coverage (Using index appears in the Extra column) to return records, filter unnecessary records directly from the index and return the hit results. This is done at the MySQL server layer, and there is no need to return to the table.

  3. Return data from the data table , and then filter records that do not meet the conditions (Using where appears in the Extra column). This is done at the MySQL server layer, MySQL needs to read the data from the table and then filter it.

How to rewrite the query

An important question to consider when designing queries is whether a complex query needs to be broken into simpler queries. MySQL is designed to make connection and disconnection very lightweight, and it is very efficient to return a small query result, and the speed of modern networks is much faster than before, which can greatly reduce latency, so run Multiple small queries are now less of a problem. All else being equal, it is of course better to use as few queries as possible. But sometimes, it is necessary to break a large query into multiple smaller queries.

Segment query (emphasis)

Sometimes for a large query, we need to divide the large query into small queries. Each query has exactly the same function, only a small part is completed, and only a small part of the query results are returned each time. The most common case is: delete old data. For example, when a large amount of data needs to be cleared regularly, if a large statement is used to complete it at one time, it may be necessary to lock a lot of data at one time, occupying the entire transaction log and exhausting system resources. Blocks many small but important queries. At this point, we can divide a large delete statement into multiple smaller queries (by using limit to limit the amount of data processed each time, and then control it through the loop in the program), which can affect MySQL as little as possible performance.

-- 比如需要每个月运行下面的SQL语句 data_sub()是一个函数,可以通过时间偏移量来进行运算,now()返回配置的时区的当前日期和时间,Interval用于添加和减去日期和时间值
delete from TableA where created < data_sub(now(),3 month);

-- 可以把上面的逻辑改写成如下
rows_affected = 0;
do{
 -- do_query是一个执行SQL语句的方法 
 rows_affected = do_query(
 "delete from TableA where created < data_sub(now(),3 month) limit 10000")
}while rows_affected > 0;

There is another advantage of deleting data in this way: the original one-time pressure on the server is distributed over a long period of time, which can greatly reduce the impact on the server, and can also greatly reduce the holding time of the lock when deleting.

Decomposing join queries (emphasis)

If the connection query speed of multiple tables is very slow, we can perform a single-table query on each table, and then put the results in the application program (such as performing multiple single-table queries in java code, and we can did this) to connect. The advantages of using multiple single-table queries are as follows:

  • Can make the cache more efficient

  • After the query is broken down, executing a single query can reduce lock contention.

  • Access to redundant records can be reduced. Because doing a connection query in the application program (in the code) means that the access to a certain record only needs to be queried once, and if the connection query is performed in the database, you may need to repeatedly send and access some data.

  • And when querying multiple values, we can control their access order, such as using sequential read in in() to access data in MySQL , which is much more efficient than random read.

  • Join processing in the application makes it easier to split the data and expand the program more easily.

Note: Not all join queries need to be split. Do not split for the sake of splitting. The following scenarios may be faster to use join queries in applications:

  • When you can take advantage of caching and reusing previous query results

  • When able to query large tables using in() lists instead of joins

  • When the same table is referenced multiple times in a single query

  • When distributing data across multiple servers

Note: Even if you are using a join query, if the join query uses fields with the same name as the join condition , then using the using() function to join is more effective than on . For example Using(id) <=> on A.id = B.id. (Because in the same join query statement , using on to connect requires several more scans of the join field than using using to connect, such as using on to join id, the id may appear twice in the execution plan, but if you use using to join the id, then the id will only appear once in the execution plan)

How MySQL executes join queries

(An important concept is involved here: temporary table)

For union queries, MySQL will first put a series of single query results into a temporary table , and then nest the loop to the next table to find matching rows, and so on until it finds matching rows in all tables. Finally, according to the matching rows of each table, return each column required in the query.

 The basics of query execution (roughly):

  1. The client sends an SQL statement to the server.

  2. The server side parses and preprocesses the SQL statement, and then the optimizer generates the corresponding execution plan.

  3. According to the execution plan generated by the optimizer, MySQL calls the API of the storage engine to execute the query .

  4. Return the results of the query to the client.

The communication protocol between the MySQL client and server is "half-duplex" , which means that at any moment, either the server sends data to the client, or the client sends data to the server. These two actions are cannot happen simultaneously . This mode of communication also means that once one end starts sending a message, the other end will not respond until it receives the entire message , which also means that MySQL cannot perform flow control.

So when querying, if we just want to get the first few pieces of data or the last few pieces of data in the query results, then the best way at this time is to use [limit] to limit , otherwise we may only need 10 pieces of data , but since the limit is not used, the MySQL server will return all query results to the client.

query optimizer

The following statement can be used to query an approximate cost of the current session as indicated by the value in the query result. When executing the above SQL statement, it may be necessary to do a random search of value data pages to complete the above query. The obtained value is only A simple reference value, because the optimizer will not consider the impact of the cache, and MySQL at this time does not know which data is in memory and which data is on disk.


show status like 'Last_query_cost';

The following are some of the optimization types that the MySQL optimizer can handle:

  1. Redefine the order of the joined tables

  2. Convert outer join to inner join

  3. Substitute with algebraically equivalent change rules

  4. Optimize count(), Min(), and MAX(), because the maximum value is generally the value of the rightmost or leftmost column in the index, so MySQL will treat this maximum value as a constant when optimizing.

  5. Estimate and convert it into a constant expression. For example, when accessing the where condition of the primary key column, the optimizer will know that the value is already determined, and then the type of access will be converted to const (the type column in the explain analysis You can also see that its value is const).

  6. Covering index scanning , when the columns in the index include all the columns that need to be used in the query, MySQL can use the index to return the required data without querying the corresponding data rows (that is, there is no need to return the table);

  7. subquery optimization

  8. Terminate the query prematurely. For example, if the desired result has already been queried, MySQL will immediately terminate the query. A typical case is the limit query.

  9. Equivalence propagation

  10. Comparison of lists in(). In () and or are almost equivalent in other databases, but it is different here in MySQL, because MySQL sorts the data in the in () list first, and then uses binary search to determine whether the values ​​in the list satisfy condition, this is a complexity of O(logn), and if it is equivalently transformed into an OR query, the complexity becomes O(n), so when there are a large number of values ​​in the list, the efficiency of MySQL using in() will be lower even higher.

Sorting optimization (emphasis)

Sorting itself is a very expensive operation, so from a performance point of view, sorting should be avoided as much as possible or sorting of large amounts of data should be avoided as much as possible.

When the index cannot be used to generate sorting results, MySQL needs to sort by itself. If the data is smaller than the "sort buffer", it will perform a quick sort in memory . If the amount of data is greater than the "sort buffer", MySQL will first sort the data To divide into blocks, use "quick sort" to sort each independent block, and store the sorted results of each block on the disk, then merge the sorted blocks, and finally return the sorted results. This process is collectively referred to as Sort the files.

If sorting is required during the join query, MySQL will handle it in two cases:

  • If all the columns in the order by clause come from the first table of the join, then MySQL will sort the files when the join processes the first table, then you can see the using in the extra column in the explain filesort;

  • In all other cases, MySQL will first store the result of the join in a temporary table, and then sort the files after all the joins are over. At this time, you can see "using" in the extra column in the explain temporary; using filesort"; If there is a limit in the query statement, the limit will also be applied after the file is sorted.

Limitations of the MySQL Query Optimizer

Union restrictions: MySQL cannot push down restrictions from the outer layer of the union to the inner layer.

If you want the clauses of the union to take only part of the result set according to the limit, or if you want to sort the results first and then merge the result sets, you need to use these clauses in the clauses of the union respectively. Case: You want to combine the results of two subqueries, and then fetch the first 20 records. (This requirement: MySQL will store the two tables in the same temporary table, and then take out the first 20 rows of records), the following will show two ways of writing:

(select first_name,last_name from actor order by last_name)
union all
(select first_name,last_name from customer order by last_name)
limit 20;
这种写法会先把actor表中的记录和customer表中的记录存放到一个临时表中,然后再从临时表中取出前20条数据。假设两张连接表中都有1000条数据,那么此时这个临时表就会有2000条左右的数据。
(select first_name,last_name from actor order by last_name limit 20)
union all
(select first_name,last_name from customer order by last_name limit 20)
limit 20;
如果是按照这种写法,那么临时表中只会存在40条数据,大大的减少了不必要的数据的扫描,这里需要注意一下,从临时表中取出的数据并不是一定的,如果想要获取到正确的顺序,那么还需要在limit之前加一个全局的排序操作。

Optimize specific types of queries

Optimize count() query (key)

First of all, we need to understand the function of the count() function:

  • Count the number of values ​​in a column , or count the number of rows; the count function can be a column name or a column expression.

Null is not counted when counting column values. For example, the output result of select count(od.setmeal_id) from order_detail od is 2; that is, if a column or column expression is specified in the brackets of count(), then the counted is the number of results for which this expression has a value.

  • count() can also count the number of [rows] in the [result set] ; when MySQL confirms that the expression in the brackets is not empty, it is actually counting the number of rows. For example when we use

    count(*), in this case the wildcard * will not count all columns, it will ignore all columns and directly count the number of rows satisfying the result set.

-- 统计输出结果是11 为这张表的所有数据
select count(*) from order_detail od 

-- 统计输出结果为3 为结果集的行数
select count(*) from order_detail od  where od.order_id  ='1522581871770824706'

Simple optimization of count():

Case: How to count the number of different values ​​of the same column in one query. (This is a relatively common case), such as counting the quantity whose price is greater than 500 and the quantity whose price is less than 100:

The results of the following two SQL statements are the same;

select sum(if(od.amount > 500,1,0)) as expensive_goods ,sum(if(od.amount < 100,1,0)) as fair_goods
from order_detail od 

select count(od.amount > 500 or null) as expensive_goods, count(od.amount < 100 or null) as fair_goods
from order_detail od 

It should be noted that when performing statistics, if the sum function is used, then the condition is true to add 1, and the condition to add is to add 0, and if the count function is used to perform such statistics, the condition is true It does not need to be processed. If the condition is false, it needs to be set to null, so the following or null must be added.

Optimizing Join Queries

It should be noted that the MySQL query optimizer will help us adjust the order of the tables to be joined and queried; usually when performing multi-table joint query, there can be a variety of different joining sequences to obtain the same execution results, MySQL's joint query optimization The processor chooses a join order with the lowest cost by evaluating the join lookup costs of different orders. In one sentence, the small table drives the large table ( the small table is used as the driving table, and each piece of data in the table is queried only once, while the data in the driven table will be queried multiple times ). This allows queries to do less backtracking and rereading. If you do not want to use the order provided by the MySQL optimizer, you can use the straight_join keyword to rewrite the query.

  • Make sure that the columns in the on or using clauses have indexes. When creating an index, the order of the connection must be taken into account. When table A and table B are joined through column c, if the join order of the optimizer is B, A, then there is no need to create an index on the corresponding column of table B. Indexes that are not used will only bring additional burden. Generally speaking, you only need to create indexes on the corresponding columns of the second table in the join order (the join order here refers to the join order of the optimizer).

  • Make sure that any group by and order by expressions only involve columns in one table, so that MySQL can use indexes to optimize this process.

Optimizing limit pagination for large amounts of data (emphasis)

When the offset is large, such as limit 10000, 20, MySQL needs to query 10020 pieces of data and finally returns only 20 pieces of data, and the previous 10000 records will be discarded, which is very expensive. If the data in this table is very large, then we can use index coverage to scan as much as possible instead of querying all rows.

-- 一般的查询
select film.film_id,film.description from sakila.film order by title limit 50,5;

-- 优化后的查询  通过延迟连接来进行优化
select film.film_id,film.description from sakila.film
inner join (
    select film.film_id from sakila.film order by title limit 50,5;
)as lim using(film_id);

An optimized SQL query is efficient because it allows the server to examine as little data as possible in the index without accessing rows. (The subquery inside just looks for a film_id, so we can go directly to the index to find the data) Then, once we find the required rows, we join them with the entire table to retrieve other columns we need from this row .

Optimize union query (key)

MySQL always executes union queries by creating and populating temporary tables. If you don't need MySQL to help us eliminate duplicate rows, then you must use union all for join queries. If there is no all keyword, MySQL will add a distinct selection to the temporary table, which will result in a deduplication check for the data in the entire temporary table. Although even with the all keyword, MySQL will still use a temporary table to store the results.

UNION statement : used to display the data queried in the same column in different tables; (excluding duplicate data)

UNION ALL statement : used to display the data queried in the same column in different tables; (including duplicate data)

Guess you like

Origin blog.csdn.net/weixin_53142722/article/details/129209922