Interviewer: What is index pushdown? What is MRR optimization? How can we better create indexes for tables?

Chat: What is index pushdown?

Index Pushdown Also known as Index Condition Pushdown (Index Condition Pushdown) ICP

MySQL's newly added features are used to optimize data queries.

Here, Nien will give you a systematic and systematic thread pool sorting, so that you can fully demonstrate your strong "technical muscles" and make the interviewer love "can't help yourself, drooling".

Also include this question and the reference answer in our " Nin Java Interview Collection PDF ", for the reference of the following friends, to improve everyone's 3-high architecture, design, and development level.

5.6 When querying through a non-primary key index before, the storage engine queries the data through the index, and then returns the result to the MySQL server layer, and judges whether it meets the conditions at the server layer.

In later versions, index pushdown can be used. When there is an index column as a judgment condition, the Mysql server will pass this part of the judgment condition to the storage engine.

Then the storage engine will filter out the index items that meet the delivery conditions , that is , filter out the index items that do not meet the conditions according to the index conditions at the storage engine layer , and then query the results back to the table, and then return the results to the Mysql server.

With the optimization of index pushdown, the storage engine layer will filter the data before querying back to the table if certain conditions are met, which can reduce the number of querying back to the table by the storage engine .

Note: This article is continuously updated in PDF, and the PDF files of the latest Nien architecture notes and interview questions can be obtained from here: Code Cloud

If there is a table user, the table has four fields id, name, level, tool

id name level tool
1 King 1 Telephone
2 Xiao Wang 2 cell phone
3 Xiao Li 3 BB machine
4 Big Lee 4 horse

Create a joint index (name, level)

To match users whose first character is "big" and whose level is 1, the sql statement is

select * from user where name like "大%" and level = 1

Before 5.6, the execution process is as shown below

According to the "leftmost prefix principle" mentioned above, when searching the index tree, this statement can only match records whose first character is 'big'. How to deal with it next?

Of course, starting with ID 1 and ID4, go back to the table one by one, find the corresponding records on the primary key index, and then compare whether the value of the level field matches.

In Figure 1, in the (name, level) index, only the records of "the first character of name is 'big'" are taken out one by one and returned to the table in order.

Therefore, it is necessary to go back to the table twice.

but! MySQL 5.6 introduces index pushdown optimization, which can first judge the fields contained in the index during the index traversal process, filter out unqualified records, and reduce the number of words returned to the table .
Figures 1 and 2 below show these two situations respectively.

5.6 and later, the execution flow chart is as follows

The difference between Figure 2 and Figure 1 is that InnoDB judges whether the level is equal to 1 within the (name,level) index, and directly judges and skips the records that are not equal to 1.

In our example, only two records ID 1 and ID4 need to be returned to the table for data judgment, and only one return to the table is required.

After using the index to push down, the query efficiency is improved from two times to one time.

Summarize

If there is no index pushdown optimization (or ICP optimization),

When performing an index query, first search for records based on the index, and then filter the records based on the where condition ;

After supporting ICP optimization, MySQL will take out the index and judge whether it can perform where condition filtering and then perform index query .

That is to say, some filtering operations of where are executed in advance. In some scenarios, the number of times of returning to the table can be greatly reduced, thereby improving the overall performance.

Chat: Precautions for using MySQL indexes?

MySQL indexes are usually used to improve the search speed when the data rows of the WHERE condition match,

In the process of using the index, there are some usage details and precautions.

1. Do not use functions and operations on columns

Do not use functions on columns, this will invalidate the index and perform a full table scan.

select * from news where year(publish_time) < 2017

In order to use indexes and prevent full table scans from being performed, it can be modified.

select * from news where publish_time < '2017-01-01'

There is also a suggestion, do not perform operations on the column, which will also cause the index to fail and perform a full table scan.

select * from news where id / 100 = 1

In order to use indexes and prevent full table scans from being performed, it can be modified.

select * from news where id = 1 * 100

2. Try to avoid using negation operators such as != or not in or <>

You should try to avoid using != or not in or <> operators in the where clause,

Because these operators will cause the index to fail and perform a full table scan.

Try to avoid using or to connect conditions

You should try to avoid using or to connect conditions in the where clause, because this will cause the index to fail and perform a full table scan.

select * from news where id = 1 or id = 2

3. Multiple single-column indexes are not the best choice. When there are multiple query conditions, composite indexes can be used

MySQL can only use one index, and will choose the most restrictive index from multiple indexes,

Therefore, creating a single-column index for multiple columns does not improve MySQL query performance.

Suppose there are two single-column indexes named news_year_idx(news_year) and news_month_idx(news_month).

Now, there is a scene that needs to be queried for the year and month of the information, then the SQL statement can be written as:

select * from news where news_year = 2017 and news_month = 1

In fact, MySQL can only use a single-column index.

In order to improve performance, you can use the composite index news_year_month_idx (news_year, news_month) to ensure that both news_year and news_month columns are covered by the index.

4. The leftmost prefix principle of composite index

The composite index follows the "leftmost prefix" principle, that is, the index will be used only if the first field of the composite index is used in the query condition.

Therefore, the order of indexed columns in a composite index is critical.

An index cannot be used without starting the search by the leftmost column of the index.

Assuming that there is a scenario that only needs to be queried for the month of information, then the SQL statement can be written as:

select * from news where news_month = 1

At this time, the news_year_month_idx(news_year, news_month) index cannot be used, because following the "leftmost prefix" principle, the index will not be used if the first field of the composite index is not used in the query condition.

Benefits of Covering Indexes

If an index contains the values ​​of all required query fields, the data can be returned directly according to the query results of the index without reading the table, which can greatly improve performance.

Therefore, it is possible to define an extra column for the index to include, even if the column is not useful for the index.

5. The impact of range query on multi-column query

If a column in the query has a range query, all columns to the right of it cannot use the index optimization search.

For example, suppose there is a scenario where you need to query news articles published this week, the condition is that they must be enabled and the release time is within this week. Then, the SQL statement can be written as:

select * from news where publish_time >= '2017-01-02' and publish_time <= '2017-01-08' and enable = 1

In this case, due to the impact of range queries on multi-column queries, all columns to the right of publish_time in the news_publish_idx(publish_time, enable) index cannot use index optimization lookup.

In other words, the news_publish_idx(publish_time, enable) index is equivalent to news_publish_idx(publish_time).

For this situation, my suggestion: For range query, be sure to pay attention to the side effects it brings, and use range query as little as possible, which can meet business scenarios by saving the country through curves.

For example, the requirement of the above case is to query the news articles published this week, so a news_weekth field can be created to store the weekly information of the news articles, so that the range query becomes a common query, and the SQL can be rewritten as:

select * from news where     news_weekth = 1 and enable = 1

However, not all range queries can be modified. For situations where range queries must be used but cannot be modified,

Suggestion: Instead of trying to solve everything with SQL, you can use other data storage technologies to control the timeline,

For example, Redis's SortedSet sorted collection saves time, or caches query results by caching to improve performance.

6. The index will not contain columns with NULL values

As long as the column contains NULL values, it will not be included in the index. As long as there is a column in the composite index that contains NULL values, then this column is invalid for the composite index.

Therefore, when designing the database, unless there is a very special reason to use NULL values, try not to let the default value of the field be NULL .

8. The impact of implicit conversion

Implicit conversion occurs when the types on the left and right sides of the query condition do not match.

The impact of implicit conversion is that it may cause the index to fail and perform a full table scan.

In the following case, date_str is a string, but it matches an integer type, so an implicit conversion occurs.

select * from news where date_str = 201701

Therefore, keep in mind the dangers of implicit conversion and always pay attention to comparisons by the same type.

9. The index invalidation problem of the like statement

Like way to query, index can be used in like "value%", but for like "%value%" way, to execute full table query,

There is no performance problem in tables with a small amount of data, but for massive data, full table scanning is a terrible thing.

Therefore, according to business needs, it is a good idea to consider using ElasticSearch or Solr.

Let's chat: How to create an effective index?

1. If you need to index a very long string, you need to consider the prefix index at this time

The prefix index is to select a part of the prefix of the required string as the index. At this time, a concept called index selectivity needs to be introduced.

Index selectivity refers to the ratio of unique index values ​​to the total number of records in the data table. It can be seen that the higher the index selectivity, the higher the query efficiency.

When the index selectivity is 1, the efficiency is the highest,

But in this scenario, it is obvious that if the index selectivity is 1, we will pay a relatively high price, and the index will be very large.

At this time, we need to select a part of the prefix of the string as an index. Usually, the prefix of a column is also very selective as an index.

How to choose a prefix: Calculate the selectivity of the complete column of the column, so that the prefix selectivity is close to the selectivity of the complete column

2. Use multi-column indexes

Try not to create a single-column index for multiple columns, because at most one index can be used in this case,

In this case, it is better to create a full coverage index. Creating a single-column index on multiple columns does not improve the query performance of MySQL in most cases.

Merge index was introduced in MySQL 5.0, to a certain extent, multiple single-column indexes in the table can be used to locate the specified result.

However, in versions prior to 5.0, if multiple conditions in where are based on multiple single-column indexes, MySQL cannot use these indexes. In this case, it is better to use union.

3. Select the appropriate index column order

The experience is to put the most selective column at the forefront of the index, so that fewer result sets can be filtered out during query.

But this is not always the best. If you consider the group by or order by, or consider the data of the guest account in some special scenarios, the above rule of thumb may not be the most applicable

4. Covering index

The so-called covering index means that the index contains all the fields in the query. In this case, there is no need to query back to the table.

Covering indexes are very effective for both MyISAM and InnoDB, which can reduce the time of system calls and data copying.

Tips: reduce select *operations

5. Use index scan for sorting

There are two ways for MySQL to generate ordered results: by sorting operations , or by scanning in index order ;

Using sorting operations requires a lot of CPU and memory resources, and using index performance is very good , so when we query ordered results, try to use index sequential scanning to generate ordered result sets .

How to ensure the use of index order scan?

  • The index column order is consistent with ORDER BYthe order
  • All columns are sorted in the same direction
  • If multiple tables are associated, then only when the fields referenced by ORDER BYthe clause are all the first table, the index can be used for sorting, and the restriction is still that the leftmost prefix requirement of the index needs to be met

6. Compressed index

MyISAM uses prefix compression technology, which will reduce the size of the index and store more indexes in memory. This part of optimization is only for strings by default, but you can customize the compression for integers.

This optimization has better performance in certain cases, but it may lead to slower performance in some cases, because prefix compression determines that each keyword must depend on the previous value, so binary search cannot be used, and only sequential scanning is possible. So if the search is in reverse order then the performance may not be good.

7. Reduce duplication, redundancy, and unused indexes

MySQL's unique restrictions and primary key restrictions are implemented through indexes, so there is no need to add primary keys and unique restrictions on the same column to create indexes. This is a duplicate index

For another example, if the index (A, B) has already been created, then if the index (A) is created again, it is a duplicate index, because the MySQL index is the leftmost prefix, so the index (A, B) itself can use the index (A), but creating an index (B) is not a duplicate index

Try to reduce the number of new indexes, but should expand the existing indexes, because the new indexes may cause INSERT, UPDATE, DELETE and other operations to be slower

You can consider deleting unused indexes and locating unused indexes. There are two ways. Open the userstates server variable in Percona Server or MariaDB, and then after the server has been running for a while, you can query each Index usage frequency

8. Indexes and locks

InnoDB supports row locks and table locks. Row locks are used by default, while MyISAM uses table locks.

So using an index can make the query lock fewer rows, which will also improve the performance of the query.

If 1000 rows are locked in the query, but only 100 rows are actually used,

Then before 5.1, these locks need to be committed after the transaction is committed. After 5.1, the locks can be released after the rows are filtered out on the server side, but some lock conflicts will still occur.

9. Reduce index and data fragmentation

First of all, we need to understand why fragmentation occurs. For example, when InnoDB deletes data, this space will be left empty.

If a large amount of data is deleted within a period of time, the vacant space will be larger than the actual storage space. At this time, if a new insert operation is performed, MySQL will try to reuse this part of the space, but it still cannot be fully occupied. will generate fragments

The consequence of fragmentation is, of course, reduced query performance, as this can lead to random disk access

The data can be sorted by OPTIMIZE TABLE or re-importing the data table

Chat: Use indexes to optimize query problems?

1. Create a single-column index or a multi-column index ?

If the where, order by, and group in the query statement involve multiple fields, it is generally necessary to create a multi-column index.

for example:

select * from user where nick_name = 'ligoudan' and job = 'dog';

2. How to choose the order of multi-column indexes?

In general, put the fields with high selectivity in front,

For example: query sql:

select * from user where age = '20' and name = 'zh' order by nick_name;

At this time, if you build an index, the first field should be age, because age locates less data and has higher selectivity.

But it is important to note that satisfying a certain query scenario may cause another query scenario to be slower .

3. Avoid range queries

In many cases, range queries can lead to unusable indexes.

4. Try to avoid querying unnecessary data

explain select * from user where job like 'ligoudan%';
explain select job from user where job like 'ligoudan%';

The same query, different return values, the second one can use the covering index, and the first one can only traverse the whole table.

5. The data type of the query must be correct

explain select * from user where create_date >= now();
explain select * from user where create_date >= '2020-05-01 00:00:00';

The first statement can use the create_date index, but the second one cannot.

Let's chat: What is MySQL's MRR optimization?

What is MRR optimization?

MRR,全称「Multi-Range Read Optimization」。

When MRR is not used, the optimizer needs to "return to the table" based on the records returned by the secondary index. This process generally involves more random IO.

When using MRR, the execution process of the SQL statement is as follows:

1. First cache the value retrieved through the secondary index in the buffer,

This buffer is called read_rnd_buffer, or rowid buffer for short.

2. Then sort the data in this part of the buffer according to the ID.

If the secondary index scans to the end of the index file or the buffer is full, use quick sort to sort the contents of the buffer according to the primary key ;

3. Then go to the clustered index to get the entire data row according to the ID in turn.

The thread calls the MRR interface to fetch the rowId, and then fetches the row data according to the rowId;

When the data is fetched according to the rowId in the buffer, continue to call the process 2) 3) until the end of the scan;

The essence of MRR:

In the process of returning to the table, the scattered and unordered returning to the table is changed into a sorted and ordered returning table , so that random disk reading can be changed into sequential reading as much as possible.

Through the above process, the optimizer sorts the random IO of the secondary index and converts it into an ordered arrangement of the primary key, thus realizing the conversion from random IO to sequential IO and improving performance.

It can be seen that only one sorting is required to turn random IO into sequential IO, making data access more efficient.

read_rnd_buffer_size controls the size of the data that can be put into the buffer. If it is not enough at one time, it will be completed in multiple times.

The essence of MRR optimization

To put it simply: MRR improves the performance of index queries by converting "random disk reads" into "sequential disk reads".

The essence of MRR:

In the process of returning to the table, the scattered and unordered returning to the table is changed into a sorted and ordered returning table, so that random disk reading can be changed into sequential reading as much as possible.

The next question is:

  • Why convert random reads to sequential reads?
  • How was it transformed?
  • Why can sequential reading improve reading performance?

First of all, let's start with a normal table query without MRR optimization.

Let’s start with an ordinary table return query without MRR optimization

Perform a range query:

mysql > explain select * from stu where age between 10 and 20;
+----+-------------+-------+-------+------+---------+------+------+-----------------------+
| id | select_type | table | type  | key  | key_len | ref  | rows | Extra                 |
+----+-------------+-------+-------+----------------+------+------+-----------------------+
|  1 | SIMPLE      |  stu  | range | age  | 5       | NULL |  960 | Using index condition |
+----+-------------+-------+-------+----------------+------+------+-----------------------+

When this sql is executed, MySQL will go to the disk to read data (assuming the data is not in the data buffer pool) as shown in the figure below:

The red line in the figure is the entire query process, and the blue line is the movement route of the disk.

In order to simplify the drawing, this picture is drawn according to the index structure of Myisam,

Innodb involves the secondary structure of the secondary index and cluster index (cluster index), because Innodb involves the return table, and Myisam here does not involve the return table. Therefore, it will be more complicated to draw, so I won't spend a lot of time here.

However, the above Myisam disk movement route principle is also applicable to Innodb.

For Myisam, the left side is the secondary index of the field age, and the right side is where the complete row data is stored.

The search process is:

First go to the secondary index on the left to find the first record that meets the conditions ( in fact, each node is a page, and a page can have many records, here we assume that each page has only one record )

Then go to the right to read the complete record of this data.

After reading, go back to the left and continue to find the next record that meets the conditions.

After finding it, go to the right to read,

It is such a record one by one to read.

At this time, the problem came:

In the process of reading, you will find the location of the last piece of data and the location of the next piece of data, which are far away from each other in terms of physical storage locations!

Every time data is read, the disk and head have to travel a long way.

What to do, there is no way,

You can only let the disk and the magnetic head do mechanical movement together, to run errands for you, to run errands back and forth, to read the next piece of data.

The simplified structure of the disk can be seen as follows:

It can be imagined that in order to execute your sql statement, the disk must rotate continuously, and the magnetic head must continuously move.

These mechanical movements are very time-consuming.

A 10,000 RPM (Revolutions Per Minute) mechanical hard drive can perform about 167 disk reads per second.

So in extreme cases, MySQL can only return you 167 pieces of data per second, not counting the CPU queuing time.

For Innodb, it is the same. Innodb is a cluster index (cluster index):

  • If the return table is not involved, you only need to replace the right side with a B+ tree with complete data on the leaf node.
  • If it involves returning to the table, you only need to replace the right side with a leaf node cluster index (cluster index) B+ tree, and a secondary index B+ tree whose leaf node is the primary key value.

Calculation rules for disk IOPS

The three parameters that are mainly affected are the average seek time, the disk rotation speed and the maximum transmission speed:

first addressing time ,

Considering that the data to be read and written may be on any track of the disk, it may be in the innermost circle of the disk (the shortest seek time), or in the outermost circle of the disk (the longest seek time),

Therefore, in the calculation, we only consider the average seek time, that is, the average seek time indicated in the disk parameters. Here we use 5ms of the current most 10krmp hard disk.

The seek time Tseek refers to the time required to move the read-write head to the correct track.

The shorter the seek time, the faster the I/O operation. At present, the average seek time of the disk is generally 3-15ms.

The second rotation delay ,

Same as addressing, when the magnetic head is positioned on the track, it may be just above the sector to be read and written. At this time, the data can be read and written immediately without additional delay, but in the worst case, the disk does have to rotate for a full After one revolution, the magnetic head can read the data.

So here we also consider the average rotation delay, which is (60s/10k)*(1/2) = 2ms for a 10krpm disk.

The third transmission time ,

Disk parameters provide our maximum transmission speed, of course it is very difficult to achieve this speed,

But this speed is the speed at which the disk is purely read and written to the disk. Therefore, as long as the size of a single IO is given, we know how much time the disk needs to spend on data transmission. This time is IO Chunk Size / Max Transfer Rate. (Data transfer rate in Mb/s, Mega per second).

The data transfer time Ttransfer refers to the time required to complete the transfer of the requested data, which depends on the data transfer rate, and its value is equal to the data size divided by the data transfer rate.

At present, IDE/ATA can reach 133MB/s, and SATA II can reach 300MB/s interface data transfer rate, and the data transfer time is usually much shorter than the first two parts.

Therefore, theoretically, the maximum IOPS of the disk can be calculated, that is, IOPS = 1000 ms/ (Tseek + Troatation), ignoring the data transfer time.

Suppose the average physical seek time of the disk is 3ms, and the disk speed is 7200, 10K, 15K rpm,

Then the theoretical maximum value of disk IOPS is respectively,

IOPS = 1000 / (3 + 60000/7200/2) = 140

IOPS = 1000 / (3 + 60000/10000/2) = 167

IOPS = 1000 / (3 + 60000/15000/2) = 200

At this point you know how luxurious random disk access is, so it is obvious to convert random access to sequential access:

Sequential reading: A stormy revolution

It is obvious that MRR is turned on to convert random access into sequential access.

Set to enable MRR, re-execute the sql statement, and found that there is an additional "Using MRR" in Extra.

mysql > set optimizer_switch='mrr=on';
Query OK, 0 rows affected (0.06 sec)

mysql > explain select * from stu where age between 10 and 20;
+----+-------------+-------+-------+------+---------+------+------+----------------+
| id | select_type | table | type  | key  | key_len | ref  | rows | Extra          |
+----+-------------+-------+-------+------+---------+------+------+----------------+
|  1 | SIMPLE      | tbl   | range | age  |    5    | NULL |  960 | ...; Using MRR |
+----+-------------+-------+-------+------+---------+------+------+----------------+

Now the MySQL query process will become like this:

For Myisam, before going to the disk to get the complete data, it will be sorted according to the rowid, and then read the disk sequentially.

For Innodb, it will be sorted according to the clustered index key value, and then read the clustered index sequentially.

Sequential reading brings several benefits:

1. The disk and the magnetic head no longer need to do mechanical movement back and forth;

2. Can make full use of disk pre-reading

For example, when the client requests a page of data, it can also return the data of the next few pages and put them in the data buffer pool.

In this way, if the data of the next page is just needed next time, it is no longer necessary to read from the disk.

The theoretical basis for this is the well-known principle of locality in computer science : when a piece of data is used, nearby data is usually used immediately.

3. In a query, the data of each page will only be read from the disk once

After MySQL reads the data of the page from the disk, it will put the data into the data buffer pool. If the page is still used next time, it does not need to read from the disk, but directly from the memory.

But if it is not sorted, maybe after you read the data on page 1, you will read the data on pages 2, 3, and 4.

Then you are going to read the data on page 1. At this time, you find that the data on page 1 has been removed from the cache, so you have to go to the disk to read the data on page 1 again.

After converting to sequential reading, you will continuously use the data on page 1. At this time, according to the MySQL cache elimination mechanism,

The cache of this page will not be invalidated until you use up the data of this page, because it is a sequential read,

During the rest of this query, you are sure that this page of data will not be used again, so you can say goodbye to this page of data.

Sequential reading optimizes index reading to the greatest extent through these three aspects.

Don't forget that the index itself is to reduce disk IO and speed up queries, and MRR is to further amplify the role of index in reducing disk IO.

Split query conditions and perform batch queries

In addition , MRR can also split certain range queries into key-value pairs to perform batch data queries.

The advantage of this is that some data that does not meet the query conditions can be directly filtered during the splitting process.

SELECT * FROM t WHERE key_part1 >=1000 AND key_part1 < 2000 AND key_part2 = 1000;

Table t has a joint index of (key_part1, key_part2), so the index is sorted according to the positional relationship of key_part1, key_part2.

If there is no MRR and the query type is Range, the SQL optimizer will first extract all the data with key_part1 greater than 1000 and less than 2000, even if key_part2 is not equal to 1000.

After taking it out, filter according to the condition of key_part2. This results in useless data being fetched.

If the MRR optimizer is enabled, the performance will be greatly improved. The optimizer will first split the query conditions into (1000,1000),(1001,1000),(1002,1000)...(1999,1000) and finally based on these The split conditions are used to query data.

Some configuration about this revolution

Whether to enable MRR optimization can be controlled by the flag in the parameter optimizer_switch.

1. MRR switch: mrr = (on | off)

For example, to turn on the switch of MRR:

mysql > set optimizer_switch='mrr=on';

2. Used to tell the optimizer whether to use MRR based on the cost:

mrr_cost_based = (on | off)

For example, using MRR everywhere:

SET GLOBAL optimizer_switch='mrr=on,mrr_cost_based=off';

Consider whether it is worthwhile to use MRR (cost-based choice), to decide whether to use MRR in a specific sql statement.

Obviously, for queries that only return one row of data, there is no need for MRR, and if you set mrr_cost_based to off, the optimizer will use MRR entirely,

This is very stupid in some cases, so it is recommended to set this configuration to on, after all, the optimizer is correct in most cases.

3. Set the size of the memory used to sort the rowid: read_rnd_buffer_size, the default value is 256KB

view configuration

show VARIABLES like 'read_rnd_buffer_size';

Obviously, MRR is essentially an algorithm that trades space for time.

It is impossible for MySQL to give you unlimited memory for sorting. If the read_rnd_buffer is full, it will first sort the full rowids and read them from the disk, then clear them, and then continue to put rowids in them until the read_rnd_buffer reaches the read_rnd_buffer configuration again. upper limit, and so on.

In the absence of MRR, how many rows are obtained in the secondary index, and how many times the primary key index must be accessed (this cannot be said completely, because MySQL implements BNL, when the records of the driven table are loaded into memory, once Matching with records in multiple driving tables, which can greatly reduce the cost of repeatedly loading the driven table from the disk), and with MRR, the number of times is reduced to approximately the previous number of times t / buffer_size.

It can be simply understood as:

MRR aggregates scattered back-to-table operations into batch back-to-table operations. Of course, it is done with the help of spatial locality principles and underlying mechanisms such as disk pre-reading.

MRR usage restrictions

MRR applies to range, ref, eq_ref queries

Chat: How to use the EXPLAIN keyword?

In our daily work, we will record some SQL statements that have been executed for a long time. Finding out these SQL statements does not mean that we are done.

We often use the explain command to check the execution plan of one of these SQL statements, check whether the SQL statement uses an index, and whether a full table scan is performed, so we need to have a deep understanding of MySQL's cost-based optimizer.

What is the EXPLAIN keyword

Using the EXPLAIN keyword can simulate the optimizer to execute SQL query statements, so as to know how MySQL processes your SQL statements.

Analyze the performance bottleneck of your query statement or table structure.

Through EXPLAIN, we can analyze the following results:

  • table read order
  • The operation type of the data read operation
  • Which indexes can be used
  • which indexes are actually used
  • References between tables
  • How many rows per table are queried by the optimizer

The EXPLAIN keyword is used as follows:

EXPLAIN + SQL statement

explain select * from t_member where member_id = 1;

After executing the explain command, the displayed information has a total of 12 columns,

Information contained in the execution plan

They are:

  • id: selection identifier
  • select_type: query type
  • table: the table for the output result set
  • partitions: the matching partitions
  • type: the join type of the table
  • possible_keys: Indexes that may be used when querying
  • key: the index actually used
  • key_len: the length of the index field
  • ref: comparison of column and index
  • rows: the number of rows scanned
  • filtered: percentage of rows filtered by table condition
  • extra: implementation description and clarification

The meaning of each field in the execution plan

1. id: the order in which the select clause or table is executed in the query

The serial number of the select query, including a set of numbers, indicating the order in which the select clause or operation table is executed in the query

When the id is the same, the execution order is from top to bottom. In all groups, the larger the id value, the higher the priority, and the earlier the execution

There are 3 situations in the result of id

  • The id is the same, the execution order is from top to bottom

[Summary] The order of loading tables is shown in the table column above: t1 t3 t2

  • The id is different. If it is a subquery, the serial number of the id will increase. The larger the id value, the higher the priority, and the earlier it will be executed

  • The id is the same and different, and they exist at the same time

As shown in the figure above, when the id is 1, the table displays <derived2>, which refers to the table with the id 2, which is the derived table of the t3 table.

2. select_type: The type of each select clause.

Common and commonly used values ​​are as follows:

They are respectively used to indicate the type of query, and are mainly used to distinguish complex queries such as ordinary queries, joint queries, and subqueries.

  • SIMPLE 简单的select查询,查询中不包含子查询或者UNION
  • 包含任何复杂的If the subpart of the PRIMARY query ,最外层查询则被标记为PRIMARY
  • SUBQUERY 在SELECT或WHERE列表中包含了子查询
  • DERIVED (derived) contained in the FROM list 子查询被标记为DERIVED, MySQL will recursively execute these subqueries, put 结果放在临时表the
  • UNION If the second SELECT appears after UNION, it is marked as UNION: If UNION is included in the subquery of the FROM clause, the outer SELECT will be marked as: DERIVED
  • UNION RESULT SELECT to get results from UNION tables

3. table table

Refers to the currently executed table

4. type query type

The type shows which type is used in the query, and the types contained in type include several types as shown in the following figure:

From best to worst are:

system > const > eq_ref > ref > range > index > all

Generally speaking, it is necessary to ensure that the query reaches at least the range level, and it is best to reach the ref.

  • systemThe table has only one row of records (equal to the system table), which is a special column of the const type, which usually does not appear, and this can also be ignored

  • constIndicates that the index is found once, and const is used to compare primary key or unique index.

    Because only one row of data is matched, it is very fast. If the primary key is placed in the where list, MySQL can convert the query into a constant.

    First, perform a subquery to obtain a d1 temporary table of results. The subquery condition is that id = 1 is a constant, so the type is const. If the id is 1, it is equivalent to querying only one record, so the type is system.

  • eq_refUnique index scan, for each index key, only one record in the table matches it.

    Common for primary key or unique index scans

  • refA non-unique index scan, returning all rows matching a single value,

    Essentially also an indexed access, it returns all rows that match a single value,

    However, it may find more than one matching row, so it should be a hybrid of find and scan.

  • rangeRetrieves only the given range of rows, using an index to select rows, the key column shows which index was used,

    Generally, queries such as between, <, >, and in appear in your where statement. This kind of range scan index is better than full table scan.

    Because it only needs to start at one point of the index and end at another point, it does not need to scan the entire index.

  • index Full Index Scan,

    The difference between Index and All is that the index type only traverses the index tree. This is usually faster than ALL because index files are usually smaller than data files.

    (That is to say, although both all and Index read the entire table, index is read from the index, and all is read from the hard disk)

    id is the primary key, so there is a primary key index

  • allFull Table Scan will traverse the full table to find matching rows

5. possible_keys 和 key

possible_keysDisplays the indexes, one or more, that may be applied to this table.

If there is an index on the fields involved in the query, the index will be listed, but not necessarily actually used by the query .

key
  • The actual index used, if NULL, no index is used. (Possible reasons include no indexing or index failure)

  • If it is used in the query 覆盖索引(the field to be queried after select is exactly the same as the created index field), the index will only appear in the key list

    Then the index only appears in the key list

6. key_len number of bytes used in the index

Indicates the number of bytes used in the index. This column can be used to calculate the length of the index used in the query 不损失精确性的情况下,长度越短越好. The value displayed by key_len is the maximum possible length of the index field, not the actual length used.

That is, key_len is calculated according to the table definition, not retrieved from the table.

7 ref that column is used

The column that shows the index is used, preferably a constant if possible.

Which columns or constants are used to look up values ​​on indexed columns.

8. rows The number of rows that need to be read

According to table statistics and index selection, roughly estimate the number of rows that need to be read to find the required records, that is to say, the less the better

9. Extra

Contains important extra information that is not appropriate to be explicit in other columns

9.1 Using filesort

It means that mysql will use an external index to sort the data, instead of reading according to the index order in the table.

The sorting operation that cannot be done using indexes in MySQL is called "file sorting".

9.2 Using temporary

Use temporary tables to save intermediate results, and MySQL uses temporary tables when sorting query results.

It is common in sorting order by and grouping query group by.

9.3 Using index

Indicates that the corresponding select operation uses a covering index (Covering Index) to avoid accessing the data rows of the table, and the efficiency is good.

If using where appears at the same time, it indicates that the index is used to perform the search of the index key value;

If there is no using where at the same time, it indicates that the index is used to read data instead of performing search actions.

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-LuLcvY7Y-1683194805529) (E:\topcoder\blog\public account\img\20180521090712785.png)]

If there is no using where at the same time, it indicates that the index is used to read data instead of performing search actions.

One way to understand:

That is, the selected data columns can be obtained only from the index without reading the data row. MySQL can use the index to return the fields in the select list without reading the data file again according to the index. In other words, the query column must be created index coverage.

Understanding method two:

Indexes are one way to find rows efficiently, but in general databases can also use indexes to find data for a column, so it doesn't have to read the entire row.

After all, the index leaf nodes store the data they index: when the desired data can be obtained through the cache index, there is no need to cache the row.

An index that contains (or covers) data that satisfies the query result is called a covering index.

Note:
If you want to use the covering index, you must pay attention to only the required columns in the select list, and you cannot select *,

Because if all fields are indexed together, the index file will be too large and the query performance will decrease.

9.4 Using where

Indicates where filtering is used

9.5 Using join buffer

It indicates that the connection cache is used. For example, when querying, the number of multi-table joins is very large, so increase the join buffer of the buffer in the configuration file.

9.6 impossible where

The value of the where clause 总是false, cannot be used to obtain any tuples

SELECT * FROM t_user WHERE id = '1' and id = '2'

9.7 select tables optimized away

In the absence of the GROUP BY clause, the MIN/MAX operation is optimized based on the index or the COUNT(*) operation is optimized for the MyISAM storage engine. It is not necessary to wait until the execution stage for calculation, and the optimization is completed at the stage of query execution plan generation.

9.8 distinct

Optimize the distinct operation, stop finding the same value after finding the first matching tuple

Case Analysis

Execution sequence 1:

id is 4, select_type is UNION,

Explain that the fourth select is the second select in UNION, and execute [select name, id from t2] first

Execution sequence 2:

The id is 3, which is part of the third select in the entire query.

Because the query is included in from, it is DERIVED【select id, name from t1 where other_column=''】

Execution sequence 3:

The id is 2, the subquery select_type in the select list is subquery,

For the second select in the entire query [select id from t3]

Execution sequence 4:

The id is 1, which means it is the first select in the UNION, the primary in the select_type column means that the query is an outer query, and the table column is marked as, which means that the query result <derived3>comes from a derived table, and the 3 in derived3 means that the query is derived from The third select query is the select with id 3. 【select d1.name...】

Execution sequence 5:

The id is null, which represents the stage of reading rows from the UNION temporary table, and the <union1,4> of the table column indicates that the results of the first and fourth select are used to perform the UNION operation. [Two result union operation]

Actual combat: MySQL index optimization in-depth actual combat

Preface: This essay analyzes index-related interview questions through some cases.

0. Prepare

1. Create a test table (test table).

drop table if exists test; 
 
create table test( 
	id int primary key auto_increment, 
	c1 varchar(10), 
	c2 varchar(10), 
	c3 varchar(10), 
	c4 varchar(10), 
	c5 varchar(10) 
) ENGINE=INNODB default CHARSET=utf8;
insert into test(c1,c2,c3,c4,c5) values('a1','a2','a3','a4','a5');
insert into test(c1,c2,c3,c4,c5) values('b1','b2','b3','b4','b5');
insert into test(c1,c2,c3,c4,c5) values('c1','c2','c3','c4','c5');
insert into test(c1,c2,c3,c4,c5) values('d1','d2','d3','d4','d5');
insert into test(c1,c2,c3,c4,c5) values('e1','e2','e3','e4','e5');

2. Create an index.

3. General inquiries

1. Analyze the usage of the index according to the following Case

Case 1:

analyze:

① The order of creating composite indexes is c1, c2, c3, c4.

②The execution results of the above four sets of explain are the same: type=ref, key_len=132, ref=const, const, const, const.

in conclusion:

When performing constant equivalent query, changing the order of index columns will not change the execution result of explain.

Because mysql's underlying optimizer will optimize, but it is recommended to write SQL statements in the order of indexes.

Case 2:

analyze:

When there is a range, type=range, key_len=99, which is more than the range without key_len=66, indicating that the index is used,

However, comparing the execution results in Case1, it shows that the index on c4 is invalid.

Conclusion: The index column on the right side of the range is invalid, but the index at the current position (c3) of the range is valid, which can be proved from key_len=99.

Case 2.1:

analyze:

Compared with the execution result of explain above, key_len=132 indicates that 4 indexes are used.

Because the underlying optimizer of mysql will optimize this sql statement:

The index column on the right side of the range is invalid (there is no index column on the right side of c4), pay attention to the order of the index (c1, c2, c3, c4), so there will be no invalid index column on the right side of c4, so all 4 indexes are used.

in conclusion:

The index column on the right side of the range is invalid, in order: c1, c2, c3, c4, if c3 has a range, then c4 is invalid; if c4 has a range, there is no invalid index column, so all indexes will be used.

Case 2.2:

analyze:

If range is used at c1, then type=ALL, key=Null, index invalidation, full table scan,

This violates the best left prefix rule, and the leading brother is dead, because c1 is mainly used for ranges, not queries.

The solution is to use a covering index.

Conclusion: In the best left prefix rule, if the index of the leftmost front column (leading brother) fails, all subsequent indexes will fail.

Case 3:

analyze:

Use the best left prefix rule:

The middle brother cannot be broken, so the c1 and c2 indexes (search) are used. From key_len=66, ref=const, const, the c3 index column is used in the sorting process.

Case 3.1:

analyze:

Judging from the execution result of explain: key_len=66, ref=const, const, so that only the c1 and c2 indexes are used for searching, and the c3 index is used for sorting.

Case 3.2:

analyze:

Judging from the execution result of explain: key_len=66, ref=const, const, the query uses the c1 and c2 indexes, because c4 is used for sorting, c3 is skipped, and using filesort appears.

Case 4:

analyze:

The search only uses the indexes c1, c2 and c3 for sorting, without Using filesort.

Case 4.1:

analyze:

The execution result of explain in Case 4 is the same, but Using filesort appears, because the index creation order is c1, c2, c3, c4, but the positions of c2 and c3 are reversed during sorting.

Case 4.2:

analyze:

c5 is added during the query, but the execution result of explain is the same, because c5 does not create an index.

Case 4.3:

analyze:

Compared with Case 4.1, Using filesort does not appear in Extra, because c2 is a constant and is optimized in sorting, so the index is not reversed, and Using filesort does not appear.

Case 5:

analyze:

Only the index on c1 is used, because c4 is interrupted in the middle, according to the best left prefix rule, so key_len=33, ref=const, which means only one index is used.

Case 5.1:

analyze:

Compared with Case 5, the positions of c2 and c3 were exchanged during group by, resulting in Using temporary and Using filesort, which is extremely bad. Reason: c3 and c2 are in the reverse order of index creation.

Case 6

analyze:

① Create indexes on c1, c2, c3, and c4, and use the range directly on c1, which leads to the failure of the index (in fact, the bottom layer of MySQL is also optimized here, if the field after where is the first field of the index and uses the range Query, if the range is very large, almost all data is scanned,

MySQL will use the full table scan, if the range is not very large, then the bottom layer of MySQL will still use the index to query),

Full table scan: type=ALL, ref=Null. Because c1 is mainly used for sorting at this time, not for querying.

② Use c1 for sorting, but the index fails, and Using filesort appears.

③Solution: Use a covering index.

It is to cover the index field with the query field to achieve index coverage, and MySQL will not scan the entire table and use the index.

Case 7:

analyze:

Although the sorted field columns are in the same order as the index, and order by defaults to ascending order, here c2 desc becomes descending order, resulting in a different sorting method from the index,

Because all the fields of the index are sorted in the same direction, if the sorting direction is different, the index that has been arranged will naturally become invalid, resulting in Using filesort, and the type is still index (index is the scan all Table index, so the key_len of this one is 132, indicating that all four index fields have been scanned, ALL is to scan the entire table, and index is slightly faster than ALL).

Case 8:

EXPLAIN extended select c1 from test where c1 in ('a1','b1') ORDER BY c2,c3;

analyze:

For sorting, multiple equality conditions are also range queries, so the index fails, c2 and c3 cannot use the index, and Using filesort appears.

And here type is index, scan the whole table index.

Summarize

  • MySQL supports two ways of sorting filesort and index. Using index means that MySQL scans the index itself to complete the sorting. Index is efficient, but filesort is inefficient.
  • If order by satisfies two conditions, the Using index will be used.
    • The order by statement uses the leftmost front column of the index.
    • Use the combination of where clause and order by clause to satisfy the leftmost front column of the index.
  • Try to complete the sorting on the index column, and follow the leftmost prefix rule when the index is created (the order in which the index is created).
  • If the condition of order by is not on the index column, Using filesort will be generated.
  • Group by is very similar to order by. Its essence is to sort first and then group, and follow the best left prefix rule of index creation order. Note that where is higher than having, and the qualifications that can be written in where should not be restricted by having.

Through the analysis of the above cases, the following conclusions are made:

① Best left prefix rule.

1. In equivalent query, changing the order of index columns will not affect the execution result of explain, because the bottom layer of mysql will be optimized.

2. When using order by, pay attention to the index order, constants, and situations that may cause Using filesort.

②group by is easy to generate Using temporary.

③Popular understanding of formulas:

The full value matches my favorite, and the leftmost prefix must be followed;

The leading brother cannot die, and the middle brother cannot be broken;

Less calculations on the index column, all invalid after the range;

LIKE is written on the far right, and the covering index does not write stars;

There is also or for unequal null values, and the index invalidation should be used less.

references:

https://blog.csdn.net/qq_39708228/article/details/118692397

https://zhuanlan.zhihu.com/p/401198674

https://cloud.tencent.com/developer/article/1774781

https://blog.csdn.net/sufu1065/article/details/123343482

https://www.cnblogs.com/xiatc/p/16363312.html

https://blog.csdn.net/a303549861/article/details/96117063

https://segmentfault.com/a/1190000021086051

https://blog.csdn.net/CSDN_WYL2016/article/details/120500830

https://blog.csdn.net/xiao__jia__jia/article/details/117408114

https://blog.csdn.net/why15732625998/article/details/80388236

https://blog.csdn.net/weixin_39928017/article/details/113217272

The Path to Technical Freedom PDF is available at:

Realize your architectural freedom:

" Have a thorough understanding of the 8-figure-1 template, everyone can do the architecture "

" 10Wqps review platform, how to structure it? This is what station B does! ! ! "

" Alibaba Two Sides: How to optimize the performance of tens of millions and billions of data?" Textbook-level answers are coming "

" Peak 21WQps, 100 million DAU, how is the small game "Sheep a Sheep" structured? "

" How to Scheduling 10 Billion-Level Orders, Come to a Big Factory's Superb Solution "

" Two Big Factory 10 Billion-Level Red Envelope Architecture Scheme "

… more architecture articles, being added

Realize your responsive freedom:

" Responsive Bible: 10W Words, Realize Spring Responsive Programming Freedom "

This is the old version of " Flux, Mono, Reactor Combat (the most complete in history) "

Realize your spring cloud freedom:

" Spring Cloud Alibaba Study Bible " PDF

" Sharding-JDBC underlying principle and core practice (the most complete in history) "

" Get it done in one article: the chaotic relationship between SpringBoot, SLF4j, Log4j, Logback, and Netty (the most complete in history) "

Realize your linux freedom:

" Linux Commands Encyclopedia: 2W More Words, One Time to Realize Linux Freedom "

Realize your online freedom:

" Detailed explanation of TCP protocol (the most complete in history) "

" Three Network Tables: ARP Table, MAC Table, Routing Table, Realize Your Network Freedom!" ! "

Realize your distributed lock freedom:

" Redis Distributed Lock (Illustration - Second Understanding - The Most Complete in History) "

" Zookeeper Distributed Lock - Diagram - Second Understanding "

Realize your king component freedom:

" King of the Queue: Disruptor Principles, Architecture, and Source Code Penetration "

" The King of Cache: Caffeine Source Code, Architecture, and Principles (the most complete in history, 10W super long text) "

" The King of Cache: The Use of Caffeine (The Most Complete in History) "

" Java Agent probe, bytecode enhanced ByteBuddy (the most complete in history) "

Realize your interview questions freely:

4000 pages of "Nin's Java Interview Collection" 40 topics

Please go to the official account of "Technical Freedom Circle" to get the PDF file updates of the above Nien architecture notes and interview questions↓↓↓

Get 11 technical bible PDFs for free:

Guess you like

Origin blog.csdn.net/crazymakercircle/article/details/130492417