[7,000 words] Teach you how to use MySQL to analyze query statements Explain

Analysis query statement: EXPLAIN


1 Overview

After locating the slow query SQL, you can use the EXPLAIN or DESCRIBE tool to do targeted analysis and query. Both are used in the same way, and the analysis results are also the same.

​ There is an optimizer module in MySQL that is specially responsible for SQL statement optimization. Its main function is to calculate and analyze the statistical information collected in the system, and provide its optimal execution plan for the Query requested by the client (the optimal data retrieval plan it considers is after all It is automatically analyzed, so it is not necessarily the optimal solution that DBA or developers think )

​ This execution plan shows how to perform specific queries next, such as the order of multi-table connections , how to perform specific queries on each table, etc. The EXPLAIN statement provided by MySQL can be used to query the specific query statement The execution plan, according to the output items of the EXPLAIN statement, can improve the query SQL performance in a targeted manner.

What can I find?

  • table read order
  • The operation type of the data read operation
  • which indexes can be used
  • which indexes are actually used
  • Reference relationship between tables
  • How many rows per table are queried by the optimizer

version difference

  • Before MySQL5.6.3, only EXPLAIN SELECT can be used, after which EXPLAIN SELECT, UPDATE, DELETE can be used
  • In versions before 5.7, you need to use EXPLAIN partitions and filtered to view partitions (partitions) and filtered, but after 5.7, it is directly displayed by default

data preparation

create table

CREATE TABLE s1 (
	id INT AUTO_INCREMENT,
	key1 VARCHAR(100),
	key2 INT,
	key3 VARCHAR(100),
	key_part1 VARCHAR(100),
	key_part2 VARCHAR(100),
	key_part3 VARCHAR(100),
	common_field VARCHAR(100),
	PRIMARY KEY (id),
	INDEX idx_key1 (key1),
	UNIQUE INDEX idx_key2(key2),
	INDEX idx_key3(key3),
	INDEX idx_key_part(key_part1, key_part2, key_part3)
)ENGINE=INNODB CHARSET=utf8


CREATE TABLE s2 (
	id INT AUTO_INCREMENT,
	key1 VARCHAR(100),
	key2 INT,
	key3 VARCHAR(100),
	key_part1 VARCHAR(100),
	key_part2 VARCHAR(100),
	key_part3 VARCHAR(100),
	common_field VARCHAR(100),
	PRIMARY KEY (id),
	INDEX idx_key1 (key1),
	UNIQUE INDEX idx_key2(key2),
	INDEX idx_key3(key3),
	INDEX idx_key_part(key_part1, key_part2, key_part3)
)ENGINE=INNODB CHARSET=utf8

create stored function

-- 函数返回随机字符串
DELIMITER //

CREATE FUNCTION `rand_string`(n INT) RETURNS varchar(255) CHARSET utf8mb4
BEGIN 
	DECLARE chars_str VARCHAR(100) DEFAULT 'abcdefghijklmnopqrstuvwxyzABCDEFJHIJKLMNOPQRSTUVWXYZ';
	DECLARE return_str VARCHAR(255) DEFAULT '';
	DECLARE i INT DEFAULT 0;
	WHILE i < n DO 
       SET return_str =CONCAT(return_str,SUBSTRING(chars_str,FLOOR(1+RAND()*52),1));
       SET i = i + 1;
    END WHILE;
    RETURN return_str;
END //
DELIMITER ;

First make sure that the variable
log_bin_trust_function_creators of the trust function is 1

SELECT @@log_bin_trust_function_creators variable;

SET GLOBAL log_bin_trust_function_creators = 1;

stored procedure

​ Stored procedure for adding data to s1 and s2 tables

DELIMITER //
CREATE PROCEDURE insert_s1 (IN min_num INT (10), IN max_num INT(10))
BEGIN
	DECLARE i INT DEFAULT 0;
	SET autocommit = 0;
	REPEAT
	SET i = i + 1;
	INSERT INTO s1 VALUES(
		(min_num + i),
		rand_string(6),
		(min_num + 30* i + 5),
		rand_string(6),
		rand_string(10),
		rand_string(5),
		rand_string(10),
		rand_string(10)
	);
	UNTIL i = max_num
	END REPEAT;
	COMMIT;
END //
DELIMITER;



DELIMITER //
CREATE PROCEDURE insert_s2 (IN min_num INT (10), IN max_num INT(10))
BEGIN
	DECLARE i INT DEFAULT 0;
	SET autocommit = 0;
	REPEAT
	SET i = i + 1;
	INSERT INTO s1 VALUES(
		(min_num + i),
		rand_string(6),
		(min_num + 30* i + 5),
		rand_string(6),
		rand_string(10),
		rand_string(5),
		rand_string(10),
		rand_string(10)
	);
	UNTIL i = max_num
	END REPEAT;
	COMMIT;
END //
DELIMITER;

Execute stored procedure to add data

CALL insert_s1(10001, 10000);
CALL insert_s2(10001, 10000);

Explain output column

column name describe
id Each SELECT keyword in a large query corresponds to a unique id
select_type The SELECT keyword corresponds to the type of query
table Table Name
partitions Matching Partition Information
type Access method for single table
possible_keys Indexes that may be used
key the actual index used
key_len The actual index length used
ref When using the index column equivalent query, the object information for equivalent matching with the index column
rows Estimated number of records to be read
filtered Percentage of the number of remaining records after a table is filtered by search conditions
Extra some additional information

1 id

​ id, in a large query statement, each SELECT keyword corresponds to a unique id, so there are several select keywords and there will be several ids:

EXPLAIN SELECT * FROM s1

EXPLAIN SELECT * FROM s1 INNER JOIN s2

The above two SQLs have only one select so there is only one id

EXPLAIN SELECT * FROM s1 WHERE key1 IN (SELECT key1 FROM s2) OR key3 = 'a'

The subquery has two selects, so it corresponds to two id1 and 2

The query optimizer may rewrite query statements involving subqueries :

EXPLAIN SELECT * FROM s1 WHERE key1 IN (SELECT key2 FROM s2 WHERE common_field = 'a')

After seeing the subquery, the optimizer judges that it can become a multi-table join to reduce the complexity (O(n^2) -> O(n)):

​ SELECT * FROM s1, s2 ON s1.key1 = s2.key2 WHERE s2.common_field = ‘a’

The rewritten sql becomes a select, so the query result is still an id

​ But if s2 checks key1, it will become as follows:

EXPLAIN SELECT * FROM s1 WHERE key1 IN (SELECT key1 FROM s2 WHERE common_field = 'a')

UNION deduplication

EXPLAIN SELECT * FROM s1 UNION SELECT * FROM s2;

Union will use the intermediate table due to the deduplication operation, so there will be a table<union, 1, 2>

But my temporary table here also has id = 3, which is not available for watching Master Kong’s video. Is it a version problem? That is to say, select is also performed on the intermediate table

​ If you use UNION ALL without deduplication, it is:

EXPLAIN SELECT * FROM s1 UNION ALL SELECT * FROM s2;

Summary :

  • If the ids are the same, they will be considered as the same group of queries and will be executed in order from top to bottom
  • If they are different, the one with the larger id will have higher priority and be executed first
  • The number of id indicates an independent query, and the fewer queries in a sql, the better

2 select_type

​A  large query can contain multiple select keywords, each select keyword represents a small query statement, and each small query contains several tables for connection operations, and each table corresponds to For a record in the EXPLAIN query plan, for tables with the same select keyword, their ids are the same .

​ select_type: The SELECT keyword corresponds to the type of query, that is, as long as we know the select_type attribute of a small query, we can know the role and function of this small query in the large query

Common select_types:

  • SIMPLE : Queries that do not contain UNION or subqueries are considered SIMPLE types

  • UNION , PRIMARY , UNION RESULT : For statements containing UNION and UNION ALL, it is composed of several small queries, except that the select_type of the leftmost query is PRIMARY, the rest are UNION, and the select for temporary tables is is UNION RESULT

  • SUBQUERY : If the query statement containing the subquery cannot be converted into a semi-join method (that is, the optimizer optimizes the subquery into a table connection), and the subquery is not a correlated subquery (that is, a subquery that uses the outer table), then the The select_type of the query represented by the first select keyword of the subquery is SUBQUERY

  • explain select * from s1 where key1 in (select key1 from s2) or key3 = ‘a’

  • First of all, this subquery is not a correlated subquery, so can this SQL be optimized into SQL for table joins?

  • select * from s1 INNER JOIN s2 on s1.key1 = s2.key1

  • The answer is no , the two SQLs are different: for example, there is a key1 value in the s1 table, and there are two duplicate key1 values ​​in the s2 table, the first statement will only match once because it is in, and the second statement SQL is an equal sign, so in this case it will match twice, so the results obtained by the two SQLs are completely different , so this SQL will use two selects, and there will be two ids, one select is Primary, The select of the subquery is subquery.

  • DEPENDENT SUBQUERY : If the query statement containing the subquery cannot be transformed into a semi-join method, but the subquery involves the appearance, that is, it is a correlated subquery, then the query represented by the first select keyword of the subquery select_type is DEPENDENT SUBQUERY

  • EXPLAIN SELECT * FROM s1 WHERE key1 IN (SELECT key1 from s2 WHERE s1.key2 = s2.key2) OR key3 = ‘a’

  • A query with a select_type of DEPENDENT SUBQUERY may be executed multiple times

  • DEPENDENT UNION : In a large query including UNION and UNION ALL, if each small query depends on the outer query, except for the leftmost small query, the select_type of the rest of the query is DEPENDENT UNION

  • EXPLAIN SELECT * FROM s1 WHERE key1 IN (SELECT key1 FROM s2 WHERE key1 = ‘a’ UNION SELECT key1 FROM s1 WHERE key1 = ‘b’)

  • The second subquery UNION added DEPENDENT is easy to understand, because the appearance is used

  • But why is the first subquery not using the outer table, but also a DEPENDENT SUBQUERY?

  • This is due to the optimizer's changes to in:

  • ​ where  exists  (s1.key1 = s2.key1 ...), this becomes a correlated subquery, and I don't know why it is done. .

  • DERIVED : The select_type of the subquery corresponding to the derived table is DERIVED

  • EXPLAIN SELECT * FROM (SELECT key1, count(*) AS c FROM s1 GROUP BY key1) AS derived_s1 WHERE c > 1

  • That is, the derived table with id 2

  • MATERIALIZED (materialization): When the query optimizer executes a subquery statement and chooses to connect the subquery with the outer query , the select_type corresponding to the subquery is MATERIALIZED

  • EXPLAIN SELECT * FROM s1 WHERE key1 IN (SELECT key1 FROM s2)

  • The results of SELECT key1 FROM s2 are records one by one and then connected with the outer table, then these records can be called materialized tables, and the query method is MATERIALIZED

  • The outer select directly treats the materialized table formed by the subquery as an ordinary table, and the query method is SIMPLE

  • This is a bit similar to the non-correlated subquery above. After adding an or key3 = 'a', the non-correlated subquery becomes a materialized table? ? ?

  • EXPLAIN SELECT * FROM s1 WHERE key1 IN (SELECT key1 FROM s2) or key3 = ‘a’

3 table

​ table, the name of the table

  • Each row of records queried corresponds to a single table

  • EXPLAIN SELECT * FROM s1

  • EXPLAIN SELECT * FROM s1, s2

  • You can see that the ids of the two records are the same, because they belong to the same large query statement (only one select)

  • And s2 is ranked in front of s1, so s2 is the driving table, and s1 is the driven table (it cannot be judged according to the sql statement, because the order of sql may be optimized and modified by the optimizer)

4 partitions

  • Represents the hit situation in the partitioned table. For non-partitioned tables, the value is NULL. Generally, the values ​​of the partitions column of our query statement execution plan are also NULL.

5 type

​ A record of the execution plan represents the access method of MySQL when executing a query on a table, also known as the access type, which is the type here. For example, if the type is ref, the table name mysql will use the ref method to query the table of the changed records .

​ The complete access method is as follows: system > const > eq_ref > ref > fulltext > ref_or_null > index_merge > unique_subquery > index_subquery > range > index > all, the higher the front, the higher the efficiency

​ The goal of SQL performance optimization: at least reach the range level, the requirement is the ref level, preferably the const level.

  • system : When there is only one record in the table , and the statistics of the storage engine used by the table are accurate , such as MyISAM and Memory, then the access method of the table is system

  • CREATE TABLE t(i INT) ENGINE=MYISAM; INSERT INTO t VALUES(1); EXPLAIN SELECT * FROM t

  • The statistics of the storage engine are accurate, which means, for example, the number of records recorded by the MyISAM storage storage engine

  • system is the highest performing case

  • And if you add another record, it will become all, and InnoDB even one piece of data is all

  • At the same time, INNODB access count() data is also all

  • CREATE TABLE tt(i INT) ENGINE=INNODB; INSERT INTO tt VALUES(1); EXPLAIN SELECT count(*) FROM tt

  • const : When the primary key or the only secondary index is matched with the constant, the access to the single table is const, indicating the level of the constant

  • EXPLAIN SELECT * FROM s1 WHERE id = 10005; EXPLAIN SELECT * FROM s1 WHERE key2 = 10066;

  • if key3, then all

  • EXPLAIN SELECT * FROM s1 WHERE key3 = 1006;

  • This actually involves the problem of index invalidation caused by implicit conversion: since key3 is of varchar type, but here is a number and then a function conversion is performed, and then the index invalidation can only be queried by all

  • eq_ref : When connecting queries , if the driven table is accessed through the primary key or the only secondary index equivalent matching method (if the primary key or the only secondary index is a joint index, each column of the index is required to be combined match ), then the access method for the driven table is eq_ref

  • EXPLAIN SELECT * from s1 INNER JOIN s2 WHERE s1.key2 = s2.key2

  • key2 is a secondary index with a unique constraint, so the access method of the driven table s2 is eq_ref

  • Among them, ref indicates that the value of the query has been specified: that is, specified by the s1 table queried by all

  • ref : When a table is queried through equivalent matching between ordinary secondary indexes and constants, the access method of the table may be ref

  • EXPLAIN SELECT * FROM s1 WHERE key3 = ‘CUTLVwqweqweq’;

  • Here key3 is an ordinary index without a unique constraint. You can see that the index key3 is used, so the type is ref

  • ref_or_null : When a table is queried through an equivalent match between an ordinary secondary index and a constant, when the value may also be a null value, then the access method of the table may be ref_not_null

  • EXPLAIN SELECT * FROM s1 WHERE key3 = ‘CUTLVwqweqweq’ OR key3 IS NULL;

  • index_merge : In some cases, single table access can use Intersection, Union, Sort-Union to execute queries

  • EXPLAIN SELECT * FROM s1 WHERE key1 = ‘a’ OR key2 = 123131

  • Both key1 and key2 are index columns, and a Select keyword can only use one index, so the method of merging indexes into a virtual index is used here, which is equivalent to scanning two index trees to take out the primary key, taking the union and returning to the table

  • However, in the case of AND, only one index will be used (here is the only secondary index. So it is const)

  • EXPLAIN SELECT * FROM s1 WHERE key1 = ‘rCLXEg’ AND key2 = 10036

  • unique_subquery: For some query statements containing IN subquery, if the query optimizer decides to change the In subquery statement into an EXISTS subquery, and the subquery can use the equivalent matching of the primary key , then the type of the subquery is unique_subquery

  • EXPLAIN SELECT * FROM s1 WHERE key2 IN (SELECT id FROM s2 WHERE s1.key1 = s2.key1) OR key3 = ‘a’

  • range : If you use an index to get records in certain range intervals , you may use the range method

  • EXPLAIN SELECT * FROM s1 WHERE key1 IN (‘a’, ‘b’, ‘c’)

  • All for non-indexed columns

  • index : When index coverage can be used and all index records need to be scanned , the access method of the table is index

  • EXPLAIN SELECT key_part2 FROM s1 WHERE key_part3 = ‘a’

  • It can be seen that the joint index is still used in the key. Although according to the leftmost prefix principle, the index can only be used if the retrieval condition is key_part1. This is because the retrieval condition and select return columns are related to the joint index, so use The combined index scans all the index records , because there is no need to go back to the table to find other columns (the columns checked are all on the index)

  • You can find the required data without going back to the table , which is called index coverage

  • Now add another column:

  • EXPLAIN SELECT key1, key_part2 FROM s1 WHERE key_part3 = ‘a’

  • The result is ALL, because there is no key1 information on the joint index column, you need to go back to the table to check key1

  • all : full table scan

6 possible_key 和 key

​ In the execution plan output by the EXPLAIN statement, possible_key  indicates the index that may be used in the single-table query . If there is an index on the field involved in the general query, the index will be listed, but not necessarily used by the query.

​ key means that after the query optimizer calculates the query cost of using different indexes, the index to be used is finally determined.

EXPLAIN SELECT * FROM s1 WHERE key1 > 'z' AND key3 = 'a'

Both key1 and key3 are ordinary secondary indexes, but key3 is an equivalent match, so the cost is relatively low, so the final choice is to use the index key3

EXPLAIN SELECT * FROM s1 WHERE key1 > 'z' OR key3 = 'a'

And if it is changed to OR here, it will evolve into the index_merge merge index mentioned earlier, that is, the primary keys of the two index trees are extracted and combined, and then unified into the clustered index to perform a table return operation

EXPLAIN SELECT key1, key3 FROM s1 WHERE key1 > 'z' OR key3 = 'a'

To expand, even if the query column can use the covering index (that is, the value of the query column can be found in the index tree), it still needs to be returned to the table , so the execution plan of the two queries is the same:

7 index_len (joint index analysis)

​The actual  . The larger the value of index_len, the better

The bigger the better here is the comparison with yourself , because it is mainly for the joint index, because the greater the length of the joint index, the fewer data pages that need to be read in the query, and the higher the efficiency

EXPLAIN SELECT * FROM s1 WHERE id = 10005

Why 4 : Because the id column is int type, the real data occupies 4 bytes. At the same time, the primary key in the row format is not empty, so there is no need for a NULL value list, and the fixed length does not need a variable-length field length list , so it is 4

EXPLAIN SELECT * FROM s1 WHERE key2 = 10126;

key2 is of type int, occupying 4 bytes, and has a unique constraint but may be empty, so the list of null values ​​in the row format occupies 1 byte, a total of 5 bytes

EXPLAIN SELECT * FROM s1 WHERE key1 = 'a';

First of all, key1 is varchar(100), and the table is in utf8mb3 format, so the real data storage occupies (100 * 3) = 300 bytes, and the length itself is fixed, so the variable-length field length list in row format occupies 2 bytes, NULL The value list occupies 1 byte, a total of 303 bytes

Similarly, one of the following queries is 303, and the other is 606. At this time, the role of key_len is reflected: the second sql uses the joint index more fully than the first sql

EXPLAIN SELECT * FROM s1 WHERE key_part1 = 'a';
EXPLAIN SELECT * FROM s1 WHERE key_part1 = 'a' AND key_part2 = 'b';

8 ref

ref indicates the information of the object that is equivalently matched with the index column when we use the index column equivalent query .

EXPLAIN SELECT * FROM s1 WHERE key1 = 'a';

key1 is an ordinary secondary index, so the type is ref (the only secondary index is const), and the equivalent matching type is a constant, so the value of the ref column is const

EXPLAIN SELECT * FROM s1 INNER JOIN s2 ON s1.id = s2.id;

Since it is a table connection, there is only one select id, and since it is a connection made by the primary key, the access method type for the second table is eq_ref (ordinary index is ref), and the equivalent comparison is the column of s1, so ref is atguigu1.s2.id

EXPLAIN SELECT * FROM s1 INNER JOIN s2 ON s2.key1 = UPPER(s1.key1);

key1 is an ordinary secondary index, so the type is ref, and the type of equality comparison is a function return value, so the value of the ref column is func

9 rows

​ rows: Estimated number of records to be read, the smaller the value, the better

The smaller the value, the more likely it is in the same data page, and the fewer IO times

10 filtered (combined with rows analysis)

​ filtered: Indicates the percentage of the number of remaining records after a table is filtered by conditions, the larger the value, the better

EXPLAIN SELECT * FROM s1 WHERE key1 > 'z';

The above indicates that after conditional filtering, 100% meet the requirements

The larger the value, the better the reason : Assume that there are 40 records after conditional filtering. If the filtered value is 100%, there are 40 records. If the filtered value is 10%, there are 400 records. Compared with the 40 records that need to be read fewer data pages

​And  if the single-table scan of the index is performed, in addition to estimating the search conditions that meet the corresponding index, you should also calculate the number of records that meet other conditions at the same time

EXPLAIN SELECT * FROM s1 WHERE key1 > 'z' AND common_field = 'b';

As in the above sql, rows303 indicates the estimated number of records that need to be read to meet the index column key1, and filtered indicates the estimated reading percentage of the total after adding the common_field field

​For  single-table query, the filtered column is actually not very useful, but it determines the execution times of the driven table in the filtered value of the execution plan record corresponding to the driving table in the multi-table connection .

EXPLAIN SELECT * FROM s1 INNER JOIN s2 ON s1.key1 = s2.key1 WHERE s1.common_field = 'a';

Firstly, the multi-table join query is the same select id, and secondly, the join condition is an ordinary secondary index , so the access type type of the driving table is all , the access type type of the driven table is ref , and finally the s1 table is estimated to be read The number of records rows is 10152 , and after conditional filtering, 10152 * 10% is matched with s2, so 1015 is the number of executions of s2 table

11 Extra

​ Extra is used to illustrate some extra information that is not suitable for display in other columns but is very important . With this additional information, you can more accurately know how mysql executes a given query statement .

  • no tables used: no from clause, that is, no table is used

  • EXPLAIN select 1

  • impossible where: where statement is always false

  • EXPLAIN select * FROM s1 WHERE 1 != 1

  • In this way, the table is not used, and the conditions are not right anyway.

  • where: use the full table scan to execute the query for a table, if there is a search condition for the table in the sentence, it will be displayed in Extra

  • EXPLAIN select * FROM s1 WHERE common_field = ‘a’

  • common_field is a common field without an index, so the type is all, and Extra shows that the statement is executed through where

  • no matching min/max row When there is a min or max aggregation function in the query list, but there is no record that meets the where condition, the additional information will be prompted

  • EXPLAIN SELECT MIN(key1) FROM s1 WHERE key1 = ‘adqwdqweqwe’

  • And when the where condition is met (or when there is no where condition at all), it will display Select tables optimized away, indicating that the optimized table is selected

  • EXPLAIN SELECT MIN(key1) FROM s1

  • using index: When index coverage occurs, that is, the columns of the query and retrieval conditions are all in the used index, that is, there is no need to return to the table

  • EXPLAIN SELECT key1 FROM s1 WHERE key1 = ‘a’

  • When there is a primary key, it is also a covering index

  • using index condition: the index condition is pushed down, consider the following sql query:

  • EXPLAIN SELECT * FROM s1 WHERE key1 > ‘z’ AND key1 like ‘%a%’

  • ​ The normal sequence of execution of this sql should be: first use the index tree of idx_key1 to query all the primary key values ​​of key1 > z, and find the primary keys of 385 records here, and then return these primary keys to the table, in the clustered index Find the data that contains other columns, and then judge the remaining filter conditions to return.

  • ​ The push-down of index conditions is optimized for special cases: that is, if the remaining filter conditions are for index columns, there is no need to make judgments after returning to the table, which can reduce the operation of returning to the table , but the rows are still 385

  • using join buffer: block-based nested loop algorithm: when the driven table cannot effectively use the index to speed up the access speed, mysql will allocate a join buffer memory block in the memory to speed up the access speed

  • EXPLAIN SELECT * FROM s1 INNER JOIN s2 ON s1.common_field = s2.common_field

  • common_field is a column without an index

  • not exists: When the table is connected, when a column of the driven table in the where condition is equal to null, and this column has a non-null constraint, Extra will display not exists

  • EXPLAIN SELECT * FROM s1 LEFT JOIN s2 on s1.key1 = s2.key1 WHERE s2.id IS NULL

  • Note that it must be the column of the driven table. If this happens in the main driving table, it will be directly displayed as impossible where, and the driven table will not be seen again.

  • using union(index_merge): or use two indexes, that is, the index_merge mentioned in the previous type, at this time, the ids detected by the two index trees will be combined and then returned to the table for where condition filtering

  • EXPLAIN SELECT * FROM s1 WHERE key1 = ‘a’ OR key3 = ‘a’

  • zero limit: when the limit is 0

  • file sort File sorting:

    • There are some situations where sorting can use indexes:

    • EXPLAIN SELECT * FROM s1 ORDER BY key1 LIMIT 10;

    • This query uses the idx_key1 index to directly fetch 10 records of the key1 column (sorted by the index column), and then returns the table with the primary key value of the record to obtain the values ​​of all columns. However, in more cases, the sorting operation cannot use the index, and can only be sorted in memory (in the case of fewer records) or on the disk. MySQL collectively refers to this sorting method in memory or disk as file sorting.

    • But there is a place here that I don't understand. Why does it become file sorting when the limit is removed or the limit is larger?

    • EXPLAIN SELECT * FROM s1 ORDER BY key1 LIMIT 97;

    • Personal guess : there is one thing to pay attention to, that is, as the limit increases, the rows also increase, especially when the limit is around 95, it suddenly increases a lot . Is this because: when the limit is small, through the index The primary key values ​​obtained in sequence are also relatively concentrated. At this time, the table return operation is also at the level of sequential query, but when the limit is too large or even absent, the primary key values ​​will be particularly scattered (because they are sorted according to the key1 index column, so key1 is concentrated And the primary key value is scattered), so at this time, the read operation back to the table is equivalent to the level of random search, so after the query optimizer judges the cost, it is better to sort the files directly in memory or disk.

    • For queries without indexes, naturally only files can be sorted:

    • EXPLAIN SELECT * FROM s1 ORDER BY common_field LIMIT 10;

  • using temporary: When mysql performs some functions such as deduplication and sorting, if the index cannot be effectively used, it may be necessary to establish an internal temporary table to complete it.

  • EXPLAIN SELECT DISTINCT common_field FROM s1;

  • The presence of temporary tables in the execution plan is not a good sign, because the establishment and maintenance of temporary tables requires a lot of cost, you should try to replace temporary tables by using indexes


summary

  • Explain does not consider Cache (does not consider the loading method of records, but only considers SQL statements)
  • Explain cannot display the optimization work done by mysql when executing queries
  • Explain does not show the impact of triggers, stored procedures, or user-defined functions on queries
  • Some information is estimated and not exact values

Further use of Explain

Four output formats of Explain

​ Four output formats of Explain: traditional format, Json format, Tree format, and visual format

1 Traditional format

​ That is, the EXPLAIN statement that has been used above, which summarizes the query plan

2 JSON format

​The output of the traditional  . The JSON format is the most detailed format among the four formats, including the execution cost information. Next, compare EXPLAIN in traditional and JSON formats:

EXPLAIN SELECT * FROM s1 INNER JOIN s2 on s1.key1 = s2.key2 WHERE s1.common_field = 'a'

EXPLAIN FORMAT=JSON SELECT * FROM s1 INNER JOIN s2 on s1.key1 = s2.key2 WHERE s1.common_field = 'a'
{
  "query_block": {
    "select_id": 1, // 原来的id
    "cost_info": {
      "query_cost": "1394.77" // 查询成本
    },
    "nested_loop": [
      {
        "table": {
          "table_name": "s1", // table
          "access_type": "ALL", // type
          "possible_keys": [
            "idx_key1"
          ],
          "rows_examined_per_scan": 10152, // rows
          "rows_produced_per_join": 1015, // rows * filtered
          "filtered": "10.00",
          "cost_info": {
            "read_cost": "937.93",
            "eval_cost": "101.52",
            "prefix_cost": "1039.45", // read + eval
            "data_read_per_join": "1M" // 读取的数据量
          },
          "used_columns": [ // 查询字段
            "id",
            "key1",
            "key2",
            "key3",
            "key_part1",
            "key_part2",
            "key_part3",
            "common_field"
          ],
          "attached_condition": "((`atguigudb1`.`s1`.`common_field` = 'a') and (`atguigudb1`.`s1`.`key1` is not null))" // 查询条件
        }
      },
      {
        "table": {
          "table_name": "s2",
          "access_type": "eq_ref",
          "possible_keys": [
            "idx_key2"
          ],
          "key": "idx_key2",
          "used_key_parts": [
            "key2"
          ],
          "key_length": "5",
          "ref": [
            "atguigudb1.s1.key1"
          ],
          "rows_examined_per_scan": 1,
          "rows_produced_per_join": 1015,
          "filtered": "100.00",
          "index_condition": "(cast(`atguigudb1`.`s1`.`key1` as double) = cast(`atguigudb1`.`s2`.`key2` as double))",
          "cost_info": {
            "read_cost": "253.80",
            "eval_cost": "101.52",
            "prefix_cost": "1394.77",
            "data_read_per_join": "1M"
          },
          "used_columns": [
            "id",
            "key1",
            "key2",
            "key3",
            "key_part1",
            "key_part2",
            "key_part3",
            "common_field"
          ]
        }
      }
    ]
  }
}
  • read_cost: consists of two parts: CPU cost of IO cost rows * (1 - filtered) records
  • eval_cost: rows * filtered

3 Tree format

​ The Tree format is a new format introduced after version 8.0.16. It mainly describes how to query based on the relationship between various parts and the execution order of each part.

EXPLAIN FORMAT=TREE SELECT * FROM s1 INNER JOIN s2 on s1.key1 = s2.key2 WHERE s1.common_field = 'a'
-> Nested loop inner join  (cost=1394.77 rows=1015)
    -> Filter: ((s1.common_field = 'a') and (s1.key1 is not null))  (cost=1039.45 rows=1015)
        -> Table scan on s1  (cost=1039.45 rows=10152)
    -> Single-row index lookup on s2 using idx_key2 (key2=s1.key1), with index condition: (cast(s1.key1 as double) = cast(s2.key2 as double))  (cost=0.25 rows=1)

4 Visual output

​ Need to install MySQL workbench

Use of Show Warnings

​ After we use the Explain statement to view the execution plan of a query statement, we can also use Show warnings to view some extended information related to the query plan , such as:

EXPLAIN SELECT s1.key1, s2.key1 FROM s1 LEFT JOIN s2 on s1.key1 = s2.key1 WHERE s2.common_field IS NOT NULL;

​ Normally, we use s2 left join s1, then s2 should be the driving table, and s1 is the driven table, but it can be seen that the execution plan is actually reversed, because the optimizer judges the two tables as When driving the execution cost of the table, the SQL is optimized (the where statement is for s2). You can see this optimization by using show warnings:

mysql> show warnings \G
*************************** 1. row ***************************
  Level: Note
   Code: 1003
Message: /* select#1 */ select `atguigudb1`.`s1`.`key1` AS `key1`,`atguigudb1`.`s2`.`key1` AS `key1` from `atguigudb1`.`s1` join `atguigudb1`.`s2` where ((`atguigudb1`.`s1`.`key1` = `atguigudb1`.`s2`.`key1`) and (`atguigudb1`.`s2`.`common_field` is not null))
1 row in set (0.00 sec)
SQL 复制 全屏

​ It looks awkward, that is, the following:

select s1.key1, s2.key1
from s1 join s2
where s1.key1 = s2.key1 and s2.common_field is not null;

Guess you like

Origin blog.csdn.net/dyuan134/article/details/130242418