Issue 06: Use OPTIMIZER_TRACE to spy on the secrets of MySQL index selection

What is recorded here is the content of learning sharing, and the article is maintained on Github: studeyang/leanrning-share .

Optimizing query performance is an important aspect of MySQL database management. When optimizing query performance, choosing the right index is critical to reducing the response time of queries and improving system performance. But how do you determine MySQL's index selection strategy? How does MySQL's optimizer choose indexes?

In this "Index is dead? Check out these few common situations! " In the article, we introduced that low index discrimination may lead to index failure, but the "not high" here is not specifically quantified. In fact, MySQL will estimate the cost of the execution plan and choose the lowest cost plan to execute. Specifically, we still use a case to illustrate.

the case

Still taking the character table as an example, let's take a look at how the optimizer selects the index.

The table creation statement is as follows:

CREATE TABLE `person` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `name` varchar(64) NOT NULL,
  `score` int(11) NOT NULL,
  `age` int(11) NOT NULL,
  `create_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
  KEY `idx_name_score` (`name`,`score`) USING BTREE,
  KEY `idx_age` (`age`) USING BTREE,
  KEY `idx_create_time` (`create_time`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=utf8mb4;

Then insert 100,000 pieces of data:

create PROCEDURE `insert_person`()
begin
    declare c_id integer default 3;
    while c_id <= 100000 do
	    insert into person values(c_id, concat('name',c_id), c_id + 100, c_id + 10, date_sub(NOW(), interval c_id second));
	    -- 需要注意,因为使用的是now(),所以对于后续的例子,使用文中的SQL你需要自己调整条件,否则可能看不到文中的效果
	    set c_id = c_id + 1;
    end while;
end;
CALL insert_person();

As you can see, the earliest create_timeis 2023-04-14 13:03:44.

We query the person table through the following SQL statement:

explain select * from person where NAME>'name84059' and create_time>'2023-04-15 13:00:00'

Through the execution plan, we can see type=All, indicating that this is a full table scan. Next, we change the 13 o'clock in the create_time condition to 15 o'clock, and execute the query again:

explain select * from person where NAME>'name84059' and create_time>'2023-04-15 15:00:00'

This execution plan shows type=range, key=create_time, indicating that the MySQL optimizer chose the create_time index to execute this query instead of using the name_score joint index.

Maybe you will be surprised by this, next, let's analyze the reasons behind it together.

Introduction to the OPTIMIZER_TRACE tool

In order to better understand how the MySQL optimizer works, we can use a powerful debugging tool: OPTIMIZER_TRACE. It is provided in MySQL 5.6 and later versions, and you can view detailed query execution plans, including information such as the decision of the query optimizer, the index selected to be used, the connection order, and the number of rows estimated by the optimizer.

When OPTIMIZER_TRACE is turned on, MySQL will record the execution plan of the query and generate a detailed report. This report can be provided to developers or database administrators for analysis to understand how MySQL decides to execute queries for performance optimization.

In MySQL, enabling OPTIMIZER_TRACE requires specific statements in the query, as follows:

SET optimizer_trace='enabled=on';
SELECT * FROM mytable WHERE id=1;
SET optimizer_trace='enabled=off';

After executing the query, MySQL will generate an execution plan report in JSON format.

It should be noted that enabling OPTIMIZER_TRACE will increase query execution time and resource consumption, so it should only be used when debugging and optimizing query performance is required.

The official documentation is here: https://dev.mysql.com/doc/dev/mysql-server/latest/PAGE_OPT_TRACE.html

Total cost of full table scan

Before querying data, MySQL first generates an execution plan based on possible execution plans, and then decides which execution plan to use based on the cost. The cost here, including IO cost and CPU cost:

  • IO cost is the cost of loading data from disk to memory. By default, the IO cost constant for reading data pages is 1 (that is, the cost of reading 1 page is 1).
  • The CPU cost is the cost of CPU operations such as checking whether the data meets the conditions and sorting. By default, the cost of detecting records is 0.2.

MySQL maintains table statistics, which can be viewed with the following command:

SHOW TABLE STATUS LIKE 'person'

This command will return information including the number of rows, data length, and index size of the table. This information can help the MySQL optimizer make better decisions and choose a better execution plan. We use the above command to view personthe statistics of the table.

The total number of rows in the figure is 100064 rows (since the statistical information of MySQL is an estimate, it is normal to have more than 64 rows), and the CPU cost is about 100064 * 0.2 = 20012.8.

The data length is 5783552 bytes. For the InnoDB storage engine, 5783552 is the space occupied by the clustered index, which is equal to the number of pages of the clustered index * the size of each page. The size of each page of InnoDB is 16KB, so we can calculate that the number of pages is 353, so the IO cost is about 353.

So, the total cost of the full table scan is around 20365.8.

Track the process of MySQL selecting an index

select * from person where NAME>'name84059' and create_time>'2023-04-15 13:00:00'

The possible execution strategies for the above statement are:

  • Use the name_score index;
  • Use create_time index;
  • full table scan;

Then we enable OPTIMIZER_TRACE tracking:

SET OPTIMIZER_TRACE="enabled=on",END_MARKERS_IN_JSON=on;
SET optimizer_trace_offset=-30, optimizer_trace_limit=30;

Execute the following statements in sequence.

select * from person where NAME >'name84059';
select * from person where create_time>'2023-04-15 13:00:00';
select * from person;

Then view the trace results:

select * from information_schema.OPTIMIZER_TRACE;
SET optimizer_trace="enabled=off";

From the execution results of OPTIMIZER_TRACE, I extracted several important fragments to focus on analysis:

1. Using name_score to perform an index scan on the condition of name84059<name needs to scan 26420 rows, and the cost is 31705.

30435 is the sum of the IO cost and CPU cost of querying the secondary index, plus the sum of the IO cost and CPU cost of querying the clustered index back to the table.

{
    
    
    "index": "idx_name_score",
    "ranges": [
        "name84059 < name"
    ] /* ranges */,
    "index_dives_for_eq_ranges": true,
    "rowid_ordered": false,
    "using_mrr": false,
    "index_only": false,
    "rows": 26420,
    "cost": 31705,
    "chosen": true
}

2. Using create_time for index scanning needs to scan 27566 rows, and the cost is 33080.

{
    
    
    "index": "idx_create_time",
    "ranges": [
        "2023-04-15 13:00:00 < create_time"
    ] /* ranges */,
    "index_dives_for_eq_ranges": true,
    "rowid_ordered": false,
    "using_mrr": false,
    "index_only": false,
    "rows": 27566,
    "cost": 33080,
    "chosen": true
}

3. The cost of scanning 100,064 records in the full table is 20,366.

{
    
    
    "considered_execution_plans": [
        {
    
    
            "plan_prefix": [
            ] /* plan_prefix */,
            "table": "`person`",
            "best_access_path": {
    
    
                "considered_access_paths": [
                    {
    
    
                        "access_type": "scan",
                        "rows": 100064,
                        "cost": 20366,
                        "chosen": true
                    }
                ] /* considered_access_paths */
            } /* best_access_path */,
            "cost_for_plan": 20366,
            "rows_for_plan": 100064,
            "chosen": true
        }
    ] /* considered_execution_plans */
}

So MySQL finally chose the full table scan method as the execution plan.

Change the create_time condition in SQL from 13:00 to 15:00, and analyze OPTIMIZER_TRACE again to see:

{
    
    
    "index": "idx_create_time",
    "ranges": [
        "2023-04-15 15:00:00 < create_time"
    ] /* ranges */,
    "index_dives_for_eq_ranges": true,
    "rowid_ordered": false,
    "using_mrr": false,
    "index_only": false,
    "rows": 6599,
    "cost": 7919.8,
    "chosen": true
}

Because it is querying data at a later time, the number of rows that need to be scanned by using the create_time index is reduced from 33080 to 7919.8. The cost of running this index this time is 7919.8, which is less than 20366 for the full table scan, and even less than 31705 for running the name_score index.

So this execution plan chooses to use the create_time index.

manual intervention

The optimizer sometimes has a large gap between the actual cost and the MySQL statistics due to inaccurate statistical information or cost estimation problems, causing MySQL to choose the wrong index or directly choose to scan the entire table. Manual intervention is required at this time. Mandatory indexing is used.

For example, force the name_score index like this:

explain select * from person FORCE INDEX(name_score) where NAME >'name84059' and create_time>'2023-04-15 13:00:00'

the cover

related articles

Maybe you are also interested in the following article.

Guess you like

Origin blog.csdn.net/yang237061644/article/details/130317436