MySQL Optimizer MRR

What is MRR

The full name of MRR is Multi-Range Read Optimization, which is a means for the optimizer to convert random IO into sequential IO to reduce IO overhead during the query process. Let’s compare the execution plan when mrr=on & mrr=off:

The table structure is as follows:

mysql> show create table t1\G
*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE `t1` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `a` int(11) DEFAULT NULL,
  `b` int(11) DEFAULT NULL,
  `c` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `mrrx` (`a`,`b`),
  KEY `xx` (`c`)
) ENGINE=MyISAM AUTO_INCREMENT=11 DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

Here's how to do it:

mysql> set optimizer_switch='mrr=off';
Query OK, 0 rows affected (0.00 sec)

mysql>  explain select * from test.t1 where (a between 1 and 10) and (c between 9 and 10) ;
+----+-------------+-------+-------+---------------+------+---------+------+------+------------------------------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra                              |
+----+-------------+-------+-------+---------------+------+---------+------+------+------------------------------------+
|  1 | SIMPLE      | t1    | range | mrrx,xx       | xx   | 5       | NULL |    2 | Using index condition; Using where |
+----+-------------+-------+-------+---------------+------+---------+------+------+------------------------------------+
1 row in set (0.00 sec)

When MRR is turned off, the execution plan uses index xx(c), that is, reading a piece of data from index xx and then returning to the table to retrieve the complete data of the primary key. When the data is large and scattered, There will be a lot of random IO, resulting in low performance. We turn on MRR and perform the following operations:

mysql> set optimizer_switch='mrr=on';
Query OK, 0 rows affected (0.00 sec)

mysql>  explain select * from test.t1 where (a between 1 and 10) and (c between 9 and 10) ;
+----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------------------------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra                                         |
+----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------------------------------+
|  1 | SIMPLE      | t1    | range | mrrx,xx       | xx   | 5       | NULL |    2 | Using index condition; Using where; Using MRR |
+----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------------------------------+
1 row in set (0.00 sec)

You can see that there is more "Using MRR" information in the output of extra, which means that MRR Optimization is used to optimize the IO level and reduce IO overhead. For more detailed instructions, please refer toHere.

MRR principle

When MRR is not used, the optimizer needs to perform "table return" based on the records returned by the secondary index. This process generally involves more random IO. When using MRR, the execution process of the SQL statement is as follows:

  • The optimizer puts the records queried from the secondary index into a buffer;
  • If the secondary index is scanned to the end of the file or the buffer is full, quick sort is used to sort the contents in the buffer according to the primary key;
  • The user thread calls the MRR interface to get the cluster index, and then gets the row data according to the cluster index;
  • When the data is fetched according to the cluster index in the buffer, continue calling process 2) 3) until the scan is completed;

Through the above process, the optimizer sorts the random IO of the secondary index and converts it into an ordered arrangement of the primary key, thus realizing the conversion of random IO to sequential IO and improving performance.

MRR source code analysis

First, let’s take a look at the memory structure corresponding to mrr:

class DsMrr_impl
{
  ...
  handler *h;
  TABLE *table; /* Always equal to h->table */
private:
  /* Secondary handler object.  It is used for scanning the index */
  handler *h2;

  /* Buffer to store rowids, or (rowid, range_id) pairs */
  uchar *rowids_buf;
  uchar *rowids_buf_cur;   /* Current position when reading/writing */
  uchar *rowids_buf_last;  /* When reading: end of used buffer space */
  uchar *rowids_buf_end;   /* End of the buffer */

  bool dsmrr_eof; /* TRUE <=> We have reached EOF when reading index tuples */

  int dsmrr_init(handler *h, RANGE_SEQ_IF *seq_funcs, void *seq_init_param,
                 uint n_ranges, uint mode, HANDLER_BUFFER *buf);
  ….
  int dsmrr_fill_buffer();
  int dsmrr_next(char **range_info);
  bool get_disk_sweep_mrr_cost(uint keynr, ha_rows rows, uint flags, uint *buffer_size, Cost_estimate *cost);
  ….
}

Brief explanation: h2 refers to the second index or primary key index used by MRR, h refers to the handle that is queried using the main index returned by h2, rowids_buf is the cache area that stores the ordered primary keys during MRR execution, and the size is determined by the MySQL variable read_rnd_buffer_size Settings, let’s take a look at the source code based on the execution process of the program.

1) The collection process of orderly main construction in MRR

The optimizer analyzes the conditions of the query statement and selects an appropriate secondary index. It filters the conditions of the secondary index and assembles them into DYNAMIC_ARRAY ranges. During execution, the ranges are passed into the initialization function ha_myisam::multi_range_read_init, and then dsmrr_fill_buffer is called. Function, in dsmrr_fill_buffer, the handle of the secondary index will be used to find data that matches the ranges and added to the rowids_buf. When the scan is completed or the buffer is full, the rowids_buf will be quickly sorted. For the detailed process, please refer to the function: dsmrr_fill_buffer, its call stack Down:

 #0  DsMrr_impl::dsmrr_fill_buffer (this=0x2aab0000cf00)
 #1  0x00000000006e49dd in DsMrr_impl::dsmrr_init(...)
 #2  0x00000000017d35e4 in ha_myisam::multi_range_read_init(...)
 #3  0x0000000000d134c6 in QUICK_RANGE_SELECT::reset (this=0x2aab00014070)
 #4  0x00000000009a266f in join_init_read_record (tab=0x2aab0000f5b8)
 #5  0x000000000099d6d4 in sub_select
 #6  0x000000000099c914 in do_select (join=0x2aab000064b0)
 #7  0x00000000009982f8 in JOIN::exec (this=0x2aab000064b0)
 #8  0x0000000000a5bd7c in mysql_execute_select
 ........

2) The usage process of the main buffer in MRR

In the physical execution phase, call ha_myisam::multi_range_read_next. When using MRR, the primary key will be taken from the buffer of ordered primary keys collected in process 1), and then call rnd_pos of the engine layer to directly find the data, using the call stack of mrr. as follows:

 #0  DsMrr_impl::dsmrr_next (this=0x2aab0000cf00, range_info=0x2aaafc03de70)
 #1  0x00000000017d3634 in ha_myisam::multi_range_read_next (this=0x2aab0000ca40, range_info=0x2aaafc03de70)
 #2  0x0000000000d138cc in QUICK_RANGE_SELECT::get_next (this=0x2aab00014070)
 #3  0x0000000000d46908 in rr_quick (info=0x2aab0000f648)
 #4  0x00000000009a2791 in join_init_read_record (tab=0x2aab0000f5b8)
 #5  0x000000000099d6d4 in sub_select (join=0x2aab000064b0, join_tab=0x2aab0000f5b8, end_of_records=false)
 #6  0x000000000099c914 in do_select (join=0x2aab000064b0)

The coordination of secondary buffer index (h2) & primary index (h) is performed through rowids_buf_cur. During the initial initialization process, h2 will first fill the data into rowids_buf. If it is found that the data in the buffer has been fetched, it will continue to call dsmrr_fill_buffer to fill in the primary key and sort the rowids_buf. This repeats until h2 scans to the end of the file. , for details, please refer to the function DsMrr_impl::dsmrr_next.

Through the above analysis, do you feel that MRR is a bit like the join operation of secondary index and primary key, that is, it is a bit similar to BKA. Let's take a look at how BKA is implemented.

MRR usage scenarios

Scenario A: For index range scans and equijoin operations of InnoDB and MyISAM tables, MRR optimization can be used.

Implementation process:

  • A portion of the tuples at the index are accumulated in the buffer.
  • Tuples in the buffer are sorted by their row IDs.
  • Access data rows based on a sorted sequence of index tuples.

Scenario example:

1) Index range scan: Suppose there is a table named orders, which contains order_id and order_date columns, and an index is created for the order_date column. When executing the following query:

SELECT order_id FROM orders WHERE order_date BETWEEN '2022-01-01' AND '2022-12-31';

MRR optimization can use range scanning of the index to collect index tuples that meet the conditions into a buffer and sort them according to the data row ID. Then, the data rows are accessed sequentially according to the sorted index tuples without the need for table back operations.

2) Equijoin: Suppose there are two tables orders and customers, and they are connected through the customer_id column. When executing the following query:

SELECT order_id FROM orders INNER JOIN customers ON orders.customer_id = customers.customer_id;

MRR optimization can use equi-join operations to collect index tuples that meet the conditions into the buffer and sort them according to the data row ID. Then, the data rows are accessed sequentially based on the sorted index tuples to perform an equijoin operation.

Scenario B: MRR optimization can be used for multi-range index scans of NDB tables or equijoins by attributes.

Implementation process:

  • A portion of the range (possibly a single key range) is accumulated in a buffer at the central node where the query is submitted.
  • The range is sent to the execution node that accesses the data row.
  • Accessed rows are packed into packets and sent back to the central node.
  • Received packets with data lines are placed in the buffer.
  • Data lines are read from the buffer.

Scenario example:

1) Multi-range index scan: Suppose there is an NDB table products, which contains product_id and price columns, and a multi-range index is created for the price column. When executing the following query:

SELECT product_id FROM products WHERE price BETWEEN 10 AND 100;

MRR optimization can accumulate a part of the range (possibly a single-key range) that meets the conditions into the buffer at the central node where the query is submitted. These ranges are then sent to the execution nodes that access the data rows. The execution node packages the accessed rows and sends them back to the central node. After the central node receives a packet containing a data line, it puts it into a buffer. The rows of data can then be read from the buffer.

2) Equijoin by attributes: Assume there are two NDB tables

orders and customers are connected through the customer_id column. When executing the following query:

SELECT order_id FROM orders INNER JOIN customers ON orders.customer_id = customers.customer_id;

MRR optimization can use an equijoin operation to accumulate a part of the range (perhaps a single-key range) that meets the conditions into the buffer of the central node. These ranges are then sent to execution nodes that access data rows. The execution node packages the accessed rows and sends them back to the central node. After the central node receives a packet containing a data line, it puts it into a buffer. The rows of data can then be read from the buffer.

How to use MRR

//如果你不打开,是一定不会用到 MRR 的
set optimizer_switch='mrr=on';
set optimizer_switch ='mrr_cost_based=off';
set read_rnd_buffer_size = 32 * 1024 * 1024;

mrr_cost_based: on/off is used to tell the optimizer whether to decide whether to use MRR in a specific SQL statement based on the cost of using MRR and whether it is worth using MRR (cost-based choice).

Obviously, for queries that only return one row of data, MRR is not necessary. If you set mrr_cost_based to off, the optimizer will use MRR all the time, which is very stupid in some cases, so it is recommended that this configuration is still Set to on, after all the optimizer is correct in most cases.

Guess you like

Origin blog.csdn.net/weixin_47156401/article/details/134607583