[MySQL study notes (11)] cost-based single-table query optimization calculation and optimization calculation of join query

This article is published by the official account [Developing Pigeon]! Welcome to follow! ! !


Old Rules-Sister Town House:

One. Cost-based optimization

(I. Overview

       The execution cost of the so-called query statement is composed of two aspects. One is the I/O cost. The frequently used InnoDB and MyISAM storage engines store data and indexes on disk. When querying the records in the table, the data needs to be stored first. Or the index is loaded into the memory, and then the operation is performed. The time cost from the disk to the memory is the I/O cost; the other is the CPU cost, the time cost of reading records and checking whether the records meet the corresponding search conditions is CPU cost.

       The default cost of InnoDB reading a page is 1.0, and the cost of reading a record and checking whether a record meets the search criteria is 0.2 by default. These numbers are called cost constants.

(2) Optimization steps

       MySQL's optimizer will find all the solutions that can be used to execute the statement, and after comparing these solutions, find the least costly solution, that is, the execution plan, and then call the interface provided by the storage engine to execute the query.

1. According to the search criteria, find out all possible indexes

       According to all the search conditions, it is judged whether a suitable scan interval can be generated, and then the index that may be used is obtained.

2. Calculate the cost of a full table scan

       For InnoDB, a full table scan is to compare the records in the clustered index with the given search criteria at a time, and add the records that meet the search criteria to the result set. Therefore, it is necessary to load the page corresponding to the clustered index into the memory, and then check whether the record meets the search criteria. Query cost = I/O cost + CPU cost, so two pieces of information are needed when calculating the cost of a full table scan: the number of pages occupied by the clustered index and the number of records in the table.

       By querying the statistical information of the table, the number of records in the table and the number of bytes of storage space occupied by the table are obtained.

SHOW TABLE STATUS LILKE ‘表名’;

       Rows represents the number of records in the table, Data_length represents the number of bytes occupied by the table, for MyISAM it is the size of the data file, for InnoDB it is the size of the clustered index, which can be divided by 16KB to get the number of pages of the clustered index . After that, calculate the I/O cost:

I/O成本 = 聚簇索引页面数量 x 1.0 + 1.1 

       1.0 represents the cost constant of loading the disk into the memory, 1.1 is the fine-tuning value;

CPU成本 = 记录数 x 0.2 + 1.0

       0.2 is the cost constant for reading records and testing records, 1.0 is the fine-tuning value

3. Calculate the cost of executing queries using different indexes

       Analyze the cost of using different indexes to execute the query, and finally analyze whether it is possible to use index merging. Generally, the query executed by the secondary index + return to the table is used. The calculation of the cost of this query relies on two data: the number of scan intervals and the number of records that need to be returned to the table. No matter how many pages are occupied by the secondary index in a scan interval, the query optimizer rudely believes that the I/O cost is the same as the cost of reading a page.

       According to the primary key value of the records returned to the table, the clustered index performs the table return operation. When evaluating the cost of returning the table in MySQL, it is considered that each time the table is returned is equivalent to accessing a page, that is, there are as many page I/Os as there are records. Return to the table to get the complete user record, and then check whether other search conditions are established, and the detection is also calculated by the cost of reading the record and checking the record.

       Finally, whether the secondary index records obtained by querying different indexes are sorted according to the primary key to determine whether to merge the indexes.

4. Compare the cost of various implementation plans and find the lowest cost plan

       Compare the entire table, the query cost of different indexes, use the lowest cost.


(3) Cost of connection query

1. Conditional filtering

       The query cost of the join query consists of two parts: the cost of the word query-driven table and the cost of multiple queries of the driven table. The number of records obtained after querying the drive table is called the fan-out of the drive table. In two cases, it is necessary to guess when calculating the fan-out value of the driving table. One is to use a full table scan to perform a single table query, and the other is to use an index single table query. This guessing process is called conditional filtering.

2. Two-table join query cost analysis

连接查询的成本 = 单次访问驱动表成本 + 驱动表扇出值 x 单次访问被驱动表成本

       For left and right joins, the driving table is fixed, so you only need to select the lowest cost access method for the driving table and the driven table to get the optimal query plan. For internal connections, the order of the driving table and the driven table can be interchanged, so the optimal table connection order needs to be considered before selecting the access method. In this case, it is necessary to perform cost calculation and analysis for each connection sequence.

       It can be seen that the optimization goal of the join query is to minimize the fan-out of the driven table, and the cost of accessing the driven table should be as low as possible. In actual use, it is most effective to reduce the cost of accessing the driven table. For example, create an index on the connection column of the driven table, so that you can use the ref access method. In the best case, use the primary key or the only secondary index column , Use const access method.

3. Multi-table join query cost analysis

       For the join query of n tables, there are indeed n! join orders, and indeed the cost must be calculated once. There are many methods in MySQL to optimize the performance loss caused by calculating the query cost under different join orders.

(1) End the cost calculation of a certain connection sequence ahead of time

       Before calculating the cost of various connection sequences, MySQL maintains a global variable that represents the current minimum connection query cost. As long as the cost of a certain connection sequence exceeds it, continue to exit to the next connection sequence.


(2) System variable optimizer_search_depth

       In order to prevent endless analysis of various sequences, the system variable is set, which is equivalent to an upper limit. If the number of connection tables is less than this value, each connection sequence will be exhaustively analyzed, otherwise only the exhaustive analysis and the Tables with the same value.

Guess you like

Origin blog.csdn.net/Mrwxxxx/article/details/113836380