MySQL Advanced Road (21) - Understand the optimizer and cost model in MySQL in 5 minutes

Understand the optimizer and cost model in MySQL in 5 minutes

I. Overview

​ In the previous article, we learned the execution plan of SQL. Through the possible_keysfields, we can see that a SQL has multiple indexes to use when executing the query operation. In other words, there are many ways to complete the query. . In this case, we must choose an execution plan to complete the query. It is in order to complete the selection of the execution plan that we have 查询优化器this thing. This article will introduce the working principle of the query optimizer. After learning this, you will have a deeper understanding of the information output by Explain, which will be more helpful for your future SQL optimization.

2. Optimizer

The task of the optimizer is to find the best plan for executing SQL. Because the performance difference between a "good" plan and a "bad" plan can be orders of magnitude. Most query optimizers search for an optimal plan among many occasional plans. For join queries with multiple tables, the number of plans grows exponentially with the number of tables. If the number of joined tables is too large, it is likely to become a major bottleneck in performance.

One solution to the above problem is to give users control over how exhaustive they are when searching for a query plan. In layman's terms, the user can control the number of steps considered. The fewer steps are considered, the less time it takes to compile. Of course, because the executor will skip some execution plans, it may miss finding the best execution. opportunity to plan

optimizer_prune_level​ We can control the behavior of the optimizer through two variables optimizer_search_depth. The first variable controls whether the heuristic method is enabled . The default value is 1, which means that some plans with poor prospects will be deleted during the search. If it is set to 0 If it is disabled, then an exhaustive search will be performed. The second variable controls the maximum search depth performed by the optimizer. The default value is 62 (also the maximum value). If it is 0, then the system will automatically choose a reasonable value.

#查看系统变量
SHOW VARIABLES LIKE 'optimizer_prune_level'
SHOW VARIABLES LIKE 'optimizer_search_depth'

3. Cost Model

In order to measure the quality of each plan, it is necessary to make the plans comparable. Because no matter what kind of execution plan it is, it is composed of many atomic operations. We only need to set the cost for each atomic operation, and then we can calculate the cost of each execution plan.

​ For the cost of each operation, it is stored in the server_cost and engine_cost tables of the mysql system database, and can be configured by yourself. When the server starts, the cost model will be read into memory, and the value in memory will be used when running. , of course, you can also re-read the cost table, just add FLUSH OPTIMIZER_COSTSit after. The cost table is only used for a specific server, the server does not copy the cost table to the replica.

Among these cost items, a few are more commonly used:

  • row_evaluate_cost (default 0.2)

    The cost of processing a row of data increases with the number of rows, which is often referred to as the CPU cost , CPU cost = rows * 0.2

  • io_block_read_cost (default 1.0) and memory_block_read_cost (default 1.0)

    The cost of reading a data block, in other words, the IO cost of a data page (16k) , IO cost = (total data size (in bytes) / 1024) * 1.0

    Through the following SQL, you can view the information of the specified table and Data_lengthknow the total data size of the table through fields

    SHOW TABLE STATUS like 'user'
    

The cost can be customized by the following statement:

UPDATE mysql.engine_cost
  SET cost_value = 2.0
  WHERE cost_name = 'io_block_read_cost';
FLUSH OPTIMIZER_COSTS;

4. Example of Cost Calculation

In the cost calculation example below, the fine-tuned value in the cost is ignored (the value is small, so it is not shown)

1. Full table scan

Take the following example:

Please add image description

It Rowscan be known that there are a total of 14 rows of data, and the CPU cost is 14 * 0.2 = 2.8

According to the Data_lengthfield, 16384 ÷ 1024 = 16 (kb), it can be known that the total data volume is 16k, which is one data page, then the IO cost is 1 * 1 = 1

Then the full table scan cost of this table is 2.8 + 1 = 3.8

2. Use index for back table query

The cost of scanning a range in MySQL is the same as the IO cost of a data page, both being 1. And after returning the table, MySQL believes that each data corresponds to a data page

img

IO cost of secondary index: there are two values ​​in in, representing two ranges, 2 * 1.0 = 2.0

The CPU cost of the secondary index: You can know that there are 291 rows through the row field, 291 * 0.2 = 58.2

Return table IO cost: because one piece of data is for one data page, 291 * 1.0 = 291.0

Back to the table CPU cost: 291 * 0.2 = 25.2

Total cost: 2.0 + 58.2 + 291 + 58.2 = 409.4

3. Multi-table join query

The above two examples are examples of cost calculation for a single table, either a full table scan, or an index and then return to the table (may not be returned to the table). If there are multiple indexes, the cost is calculated separately, and then the index with the lowest cost is selected. After mastering the cost calculation of a single table, you only need to know the following formula for a multi-table connection:

Total cost = the cost of the driving table + the number of records queried by the driving table * the cost of the driven table

V. Summary

The above is the whole content. Today, I will introduce the optimizer and cost model in MySQL and how to calculate the cost. It is actually very simple to say it is simple. In fact, for possible_keysall the optional indexes in the content of the Explain output, the optimizer will calculate their costs one by one. Of course, it will also calculate the cost of full table scan, and then compare these costs by comparing these costs. size, choose the one with the lowest cost implementation. The cost of each operation can also be set according to the situation.

Guess you like

Origin blog.csdn.net/weixin_44829930/article/details/121658314