"MySQL Series - InnoDB Engine 35" Index and Algorithm - Use of B+ Tree Index

Use of B+ tree index

1 The use of B+ tree index in different applications

In OLTP, after the B+ tree index is established, the use of the index should only obtain a small part of the data in the table through the index. At this time, it is meaningful to establish a B+ tree index, otherwise, even if it is established, the optimizer may not choose to use the index.

In OLAP, if it is a complex query, it involves the connection operation between multiple tables, so it makes sense to add an index. But if the connection uses hash join, then the index may become less important. However, in OLAP, it is usually necessary to index the time field, because most statistics need to filter data according to the time dimension.

2 joint index

A joint index refers to indexing multiple columns on a table. A joint index is created in the same way as a single index, except that there are multiple index columns.

select * from table where a=xx and b=xxFor example: for creating index index_a_b (a, b), then the index (a, b) can be used when querying . select * from table where a=xxxThis (a,b) index can also be used for single-column queries . But select * from table where b=xxxthe joint index is not used.

Another benefit of a joint index is that the second key value is already sorted. For example: using the (userid, sys_date) joint index, when the userid is queried and ordered, the (userid, sys_date) index will be used by default.

1) Create an index

mysql> create table index_test(
    -> userid int unsigned not null,
    -> sys_date date,
    -> key key_u (userid),
    -> key key_u_s (userid,sys_date)
    -> )engine=innodb;
Query OK, 0 rows affected (0.03 sec)

2) Insert data

mysql> insert into index_test values(1,'2020-01-01');
mysql> insert into index_test values(2,'2022-01-01');
mysql> insert into index_test values(3,'2022-05-01');
mysql> insert into index_test values(4,'2021-05-01');

3) Query userid, no sorting is required

When only querying userid, you can see that possible_keysthere are two indexes available in , namely ley_u of a single userid index and key_u_s of a joint index of (user_id, sys_date). But in the end, the optimizer chooses userid, because the leaf node of the index contains a single key value, so theoretically a page can store more records.

mysql> explain select * from index_test where userid = 2\G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: index_test
   partitions: NULL
         type: ref
possible_keys: key_u,key_u_s
          key: key_u
      key_len: 4
          ref: const
         rows: 1
     filtered: 100.00
        Extra: NULL
1 row in set, 1 warning (0.00 sec)

4) Query userid and ask for sorting

When querying userid and requiring sorting, possible_keyseither key_uindex or key_u_s index can be used. But the optimizer chooses ley_u_sthe index, because the sys_date field in this joint index is already sorted. You only need to fetch the data according to the joint index, and there is no need to do an additional sorting operation on sys_date.

mysql> explain select * from index_test where userid = 2 order by sys_date desc limit 3\G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: index_test
   partitions: NULL
         type: ref
possible_keys: key_u,key_u_s
          key: key_u_s
      key_len: 4
          ref: const
         rows: 1
     filtered: 100.00
        Extra: Using where; Using index
1 row in set, 1 warning (0.00 sec)

5) Query userid, require sorting, and force the use of key_u index

If the key_u index is forced to be used, you can see it in Extra using filesort, that is, an additional sort is required to complete the query, and this time it is obviously necessary to sort the column sys_date.

mysql> explain select * from index_test force index(key_u) where userid = 2 order by sys_date desc limit 3\G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: index_test
   partitions: NULL
         type: ref
possible_keys: key_u
          key: key_u
      key_len: 4
          ref: const
         rows: 1
     filtered: 100.00
        Extra: Using index condition; Using filesort
1 row in set, 1 warning (0.00 sec)

3 Covering Index

The InnoDB storage engine supports covering indexes, that is, the queried records can be obtained from the auxiliary index without querying the records in the clustered index. One advantage of using a covering index is that the auxiliary index does not contain all the information of the entire row record, so its size is much smaller than the clustered index, so it can reduce a lot of IO operations.

For example: when a table has a clustered index and an auxiliary index, innodb will automatically optimize the use of the auxiliary index instead of the clustered index when counting data.

4 The optimizer chooses not to adapt to the situation of the index

In some cases, when you execute the explain command to analyze the SQL statement, you will find that the optimizer does not select the index to find the data, but scans the clustered index, that is, scans the entire table directly to obtain the data. This situation often occurs in range search, join connection and other situations.

5 Index Hints

The MySQL database supports index hints (index hint), which explicitly tells the optimizer which index to use. There are probably two situations where index hints are needed:

  • The optimizer of the MySQL database incorrectly selects an index, causing the SQL statement to run very slowly. For the current SQL version, you may rarely encounter this kind of problem. If it exists, you can force the optimizer to use a certain index, so as to improve the speed of operation.
  • There are many indexes that can be selected for a certain SQL statement. At this time, the overhead for the optimizer to select the execution time may be greater than that of the SQL statement itself. For example, the optimizer analyzes the range query itself is a relatively time-consuming operation. At this time, you can use Index Hint to force the optimizer not to analyze the cost of each execution path, and directly specify the index to complete the query.

It is possible to use force index to specify the index.

6 Multi-Range Read Optimization

MySQL5.6 began to support Multi-Range Read (MRR) optimization. Its purpose is to reduce random access to the disk, and convert random access to sequential data access. At this time, it can bring great performance hints to IO-bound SQL queries. Multi-Range Read optimization is applicable to queries of range, ref, and eq_ref types.

Benefits of MRR optimization:

  • MRR makes data access more sequential. When querying the auxiliary index, firstly, according to the obtained query results, sort according to the primary key, and search for bookmarks according to the order of the primary key.
  • Reduce the number of times pages in the buffer pool are replaced
  • Batch processing of query operations on key values

For range query and JOIN query operations of InnoDB and MyISAM storage engines, MRR works as follows:

  • The auxiliary index key value obtained by query is stored in a cache, and the data in the cache is sorted according to the auxiliary index key value.
  • Sort the key values ​​in the cache according to RowID
  • The actual data files are accessed according to the sort order of RowID.

In addition, if the buffer pool of the InooDB storage engine or MyISAM storage engine is not large enough, that is, it cannot store all the data in the next table, frequent discrete read operations at this time will also cause the pages in the cache to be replaced out of the buffer pool, and then continuously is read into the buffer pool. This duplication can be minimized if access is done in primary key order.

Whether to enable Multi-Range Read optimization can be controlled by the flag in the parameter optimizer_switch. When mrr=on, it means that multi-range read optimization is enabled. The mrr_cost_based flag indicates whether to choose whether to enable mrr by means of cost based.

If mrr is set to on and mrr_cost_based is set to off, multi-range read optimization is always enabled. For example, the multi-range read optimization can be set to always be enabled by the following command:

mysql> set @@optimizer_switch='mrr=on,mrr_cost_based=off';

The parameter read_rnd_buffer_size is used to control the buffer size of the key value. When it is greater than this value, the executor sorts the buffered data according to the RowID, and obtains the row data through the RowID. The value defaults to 256K.

mysql> select @@read_rnd_buffer_size\G;
*************************** 1. row ***************************
@@read_rnd_buffer_size: 262144
1 row in set (0.00 sec)

7 Index Condition Pushdown (ICP) optimization

Like Multi-Range Read, Index COndition Pushdown is also a query optimization method supported by MySQL5.6. When performing index queries before, first search for records based on the index, and then filter the records based on where conditions. After supporting Index Condition Pushdown, the MySQL database will remove the index and at the same time determine whether it can filter the where condition, that is, place part of the filtering operation of where at the storage engine layer. Under certain queries, it can greatly reduce the request (fetch) of upper-level SQL for records, thereby improving the overall performance of the database.

Index Condition Pushdown optimization supports queries of range, ref, eq_ref, and ref_or_null types, and currently supports MyISAM and InnoDB storage engines. When the optimizer selects Index Condition Pushdown optimization, you can see the prompt of Using index condition in the Extra column of the execution record.

Guess you like

Origin blog.csdn.net/m0_51197424/article/details/129778504