What is the MySQL execution plan (Explain keyword)?

The author of this article is Wang Liangchen, the architect of JD Zhongtai, who is good at distributed system and high-availability, high-concurrency system architecture and design. He has developed a number of general-purpose scaffolds for enterprises, advocating the use of technical means to improve development efficiency and restrict development behavior.


What is Explain

Explain is called an execution plan. Add the explain keyword before the statement. MySQL will set a flag on the query to simulate the MySQL optimizer to execute the SQL statement. When the query is executed, the execution plan information will be returned and the SQL will not be executed. . (Note that if from contains a subquery, the subquery will still be executed and the result will be placed in a temporary table).

Explain can be used to analyze the performance bottlenecks of SQL statements and table structures. Through the results of explain, you can learn information such as the query order of the data table, the operation type of the data query operation, which indexes can be hit, which indexes will actually hit, and how many rows of records in each data table are queried.


Explain command extension


explain extended

Provide some additional query information on the basis of explain. After the explian extended is executed, the optimized query statement can be obtained through the show warnings command. You can see what the optimizer has done, and you can also estimate the table connection through some data Rows.


explain partitions

Used to analyze the table that uses partitions, it will show the partitions that may be used.


Two important tips


1. Explain results are based on the existing data in the data table.


2. The Explain result has a great relationship with the MySQL version, and the optimization strategies of different versions of the optimizer are different.

 

Database tables used in the examples in this article


Explain command (keyword)

explain simple example

mysql>explain select * from t_user;

Each "table" in the query will output one row. The meaning of "table" here is very broad, not just a database table, but also a subquery, a union result, etc.


explain result column description


[Id column]

The id column is a sequential number, which is the sequence number of the query. There are several selects to display several rows. The order of id increases in the order in which select appears. The larger the value of the id column, the higher the execution priority, the first to execute, the same value in the id column is executed from top to bottom, and the value of the id column is NULL and executed last.


【select_type列】

The value of the select_type column indicates the type of query:

1) simple: indicates that the select corresponding to the current row is a simple query, excluding subqueries and union

2) primary: indicates that the select corresponding to the current row is the outermost select in the complex query

3) subquery: indicates that the select corresponding to the current row is a subquery contained in the select (not in the from clause)

4) derived: indicates that the select corresponding to the current row is a subquery contained in the from clause.

MySQL will create a temporary table to store the query results of the subquery. Use the following statement to illustrate:

explain select (select 1 fromt_user where user_id=1) from (select * from t_group where group_id=1) tmp;

*Note that during the data collection process, different versions of MySQL were found to be inconsistent. After repeated comparisons, the output of 5.7 and later versions is as follows:

Obviously, MySQL is optimized in this regard.

*Note that the performance of Explain differs greatly between different versions of MySQL. In some scenarios, from the statement level, indexes are used, but after analysis by the optimizer, combined with the existing data in the table, if MySQL thinks that the full table scan performance is better, then Will use a full table scan.

5) Union: indicates that the select corresponding to the current row is the second and subsequent select in the union

6) Union result: indicates that the select corresponding to the current row is the select that retrieves the result from the union temporary table

explain select 1 union all select 2 fromdual;

       MySQL5.7 and later are also optimized

[Table column]

The result of the table column indicates which table the select corresponding to the current row is accessing. When there is a subquery in the <from> clause of the query, the table column is in the format of <derivedN>, which means that the current select depends on the query corresponding to the result row with id=N, and the query with id serial number=N must be executed first. When there is a union, the value of the table column of UNION RESULT is <unionN1,N2>, and N1 and N2 represent the id number of the select row participating in the union.


[Type column]

The result of the type column indicates the associated type or access type of the select corresponding to the current row, that is, the optimizer decides how to find the rows in the data table and the approximate range of the data row records. The pros and cons of the optimization degree of the value of this column, from the best to the worst, are: null>system> const> eq_ref> ref> range> index> ALL. Generally speaking, to ensure that the query reaches the range level, it is best to reach ref.

1) Null, MySQL optimizer decomposes the query statement in the optimization stage, and the result can be obtained in the optimization process, so there is no need to access the table or index in the execution stage.

explain select min(user_id) from t_user;

At this time, the function min selects the minimum value in the index column user_id, which can be completed by directly searching the index, without accessing the data table when it is executed.

2) const and system: const appears when comparing all columns of primary key or unique key with constants, the optimizer optimizes the query and converts part of the query into a constant. There is at most one matching line, and it is read once, which is very fast. System is a special case of const, and when there is only one match in the table, it is system. At this point, you can use explain extended+show warnings to view the execution results.

explain extended select * from (select * from t_user where user_id = 1) tmp;

show warnings;

After MySQL5.7 and later versions are optimized:

3) eq_ref: primary key (primary key) or unique key (unique key) All constituent parts of the index are used by join, and only one eligible data row will be returned. This is the connection type second only to const.

explain select * from t_group_user gu left join t_group g ong.group_id = gu.group_id;

4) ref: Compared with eq_ref, the ref type does not use a unique index such as a primary key or unique key, but uses a common index or a partial prefix of a joint unique index. The index is compared with a certain value , Multiple data rows that meet the conditions may be found.

1. In the following example, the group_name used is a normal index

explain select * from t_group where group_name= 'group1';

2. Association table query

explain select g.group_id from t_group gleft join t_group_user gu on gu.group_id = g.group_id;

5) Range: appears in operators such as in(), between,> ,<, >=. Use an index to query a given range of rows.

6) index: scan the entire table index (index is read from the index, all fields have indexes, and all is read from the hard disk), faster than ALL.

explain select * from t_group;

7) all: Full table scan, you need to find the required rows from beginning to end. In this case, it is necessary to increase the index for query optimization.

explain select * from t_user;


【possible_keys列】

The results in this column indicate which indexes may be used by the query. But sometimes there will be results in the possible_keys column, and the following key column shows null. This is because there is not much data in the table at this time, and the optimizer thinks that the query index is not helpful to the query, so it does not use the index query. It is a full table scan. 

If the result of the possible_keys column is null, it indicates that there is no related index. At this time, you can improve the query performance by optimizing the where clause and adding appropriate indexes.


[Key column]

This list indicates which index the optimizer actually uses to optimize access to the table. If no index is used, the column is null.


【Key_len

This list shows the number of bytes used in the index, and this value can be used to roughly estimate the specific use of the first few columns in the joint index. 

The key_len calculation rules will not be repeated here. The number of bytes occupied by different data types is inconsistent.


[Ref column]

This list clarifies the columns or constants used in the table lookup value in the index of the key column record. Common ones are: const (constant), field name, such as user.user_id


[Rows column]

This list indicates the number of rows that the optimizer probably reads and checks. It is inconsistent with the actual number of data rows in most cases.


[Extra column]

As the name suggests, this list shows additional information, and the value of this column is very useful for optimizing SQL. Common important values ​​are as follows: 

1) Using index: All the fields being queried are indexed columns (called covering indexes), and the where condition is the leading column of the index. Such a result is a high performance.

explainselect group_id,group_name from t_group;

2) using where: The column being queried is not covered by the index, and the where condition is not the leading column of the index, which means that the MySQL executor receives the query data from the storage engine and then performs "Post-filter". The so-called "post-filtering" is to read the entire row of data first, and then check whether the row meets the conditions of the where clause, leave it if it meets it, and discard it if it doesn't.

explain select * from t_user whereuser_name='user1';

3) using where Using index: The column being queried is covered by the index, and the where condition is one of the index columns but not the leading column of the index, that is, there is no way to query the eligible data directly through the index

explain select * from t_group where group_name = 'group1';

4) null: The column being queried is not covered by the index, but the where condition is the leading column of the index. The index is used at this time, but some of the columns are not covered by the index. This must be achieved through "back to table query", not purely When it comes to the index, it is not completely useless

explain select * from t_user where user_id='1';

5) Using index condition: Similar to using where, the query column is not completely covered by the index, and the where condition is the range of a leading column; this situation is not shown by the example, and may be related to the MySQL version.

6) using temporary: This indicates that the query needs to be processed by creating a temporary table. This situation is generally to be optimized, using the index to optimize. Create a temporary table: distinct, group by, orderby, subquery, etc.

explain select distinct user_name from t_user;

explain select distinct group_name fromt_group; --group_name is the index column

7) usingfilesort: In the case of using order by, mysql will use an external index to sort the results instead of reading rows from the table in the index order. At this time, mysql will browse all eligible records according to the connection type, and save the sort key and row pointer, then sort the key and retrieve the row information in order. In this case, consider using the index to optimize.

explain select * from t_user orderby user_name;

explain select * from t_group order bygroup_name; --group_name is the index column

Query optimization suggestions

Combining the previous description, first look at the result of the type column. If the type is all, it means that a full table scan is expected. Usually the cost of a full table scan is relatively large. It is recommended to create an appropriate index and avoid a full table scan through index retrieval.

Let's take a look at the results in the Extra column. If there is a Using temporary or Using filesort, pay more attention:

Using temporary means that you need to create a temporary table to meet your needs, usually because the GROUP BY column does not have an index, or the GROUP BY and ORDER BY columns are different, you also need to create a temporary table. It is recommended to add an appropriate index.

Using filesort means that the index cannot be used to complete the sort. It may also be because the sort field is not a field in the driving table when multiple tables are connected. Therefore, there is no way to use the index to complete the sort. It is recommended to add an appropriate index.

Using where is usually because when a full table scan or a full index scan (the type column is displayed as ALL or index), and the WHERE condition is added, it is recommended to add an appropriate index.



Index usage analysis


Database Table

Primary key index: demo_id

Joint index: c1, c2, c3

Example description


Example 1:

explain select * from t_demo where c1='d1'and c2='d2' and c3='d3';

explain select * from t_demo where c2='d2'and c1='d1' and c3='d3';

explain select * from t_demo where c3='d3'and c1='d1' and c2='d3';

Several SQL perform the same

type=ref,ref=const,const,const

When performing a constant equivalent query, changing the order of the index columns will not change the execution result of the explain. The optimizer will optimize it. It is recommended to write SQL statements in the index order.

Real column two:

explain select * from t_demo where c1='d1'and c2>'d2' and c3='d3';

explain select * from t_demo where c1='d1'and c3>'d3' and c2='d2';

In the first example, the index on the right side of the range is invalid, and two indexes are used.

In the second example, all three indexes are used due to optimization by the optimizer.


Example 3:

explain select * from t_demo wherec1>'c' and c2='d2' and c3='d3';

explain select * from t_demo wherec1>'e' and c2='d2' and c3='d3';

From the above two examples, you can find that the leftmost index column range query is also used. In some cases, the index is not used and a full table scan is performed (the first example); in some cases, the index is used (the second example).

After repeated verification, it is found that the following rules (not necessarily reliable) may also be related to the first row or minimum value of the data.

1. Related to stored data

2. Under the greater than condition, if the condition data is less than the column data, the index is invalid; if the condition data is greater than the column data, the index is valid;

When designing query conditions, please pay attention to avoidance.

For the first example, you can use a covering index to optimize.


Example four:

explain select * from t_demo where c1='d1'and c2='d2' order by c3;

explain select * from t_demo where c1='d1'order by c3;

explain select * from t_demo where c1='d1'and c3='d3' order by c2;

Order by sort uses the index and does not use the index


Example five:

explain select * from t_demo where c1='d1'and c4='d4' order by c1,c2;

The conditional column contains columns that are not indexed, and the Using filesort appears


Example 6:

explain select * from t_demo where c1='d1'and c4='d4' group by c1,c2;

Scenes with very poor performance, both Using temporary and Using filesort appear



to sum up

1. There are two ways of sorting filesort and index. Usingindex means that MySQL scans the index itself to complete the sorting. Index efficiency is high, and filesort efficiency is low.

2. Using index will be used when order by meets two conditions.

1) The order by statement uses the leftmost column of the index.

2) Use the where clause and order by clause conditional column combination to satisfy the leftmost front column of the index.

3. Try to complete the sorting on the index column and follow the best left prefix rule when indexing (the order of index creation).

4. Group by is very similar to order by, both are sorted first and then grouped, following the best left prefix rule of the index creation order.

—————END—————

Friends who like this article, welcome to follow the official account  programmer Xiaohui , and watch more exciting content

点个[在看],是对小灰最大的支持!

Guess you like

Origin blog.csdn.net/bjweimengshu/article/details/109088693