[MySQL study notes (12)] query optimizer rule-based optimization and subquery optimization

This article is published by the official account [Developing Pigeon]! Welcome to follow! ! !


Old Rules-Sister Town House:

One. Query optimizer rule-based optimization

(I. Overview

       The query optimizer optimizes the query statement according to the corresponding rules and converts it into a form that can be executed efficiently. This process is called query rewriting.

(2) Conditional Simplification

1. Remove unnecessary parentheses

       Remove the extra parentheses in the expression.

2. Constant Passing

       If a column and a constant have been matched with equal value, when the AND operator is used to connect this expression with other expressions involving the column, the column can be converted into a constant.

Such as:

a = 5 AND b > a

a = 5 AND b > 5

3. Remove known conditions

       For expressions that are obviously TRUE or FALSE, the optimizer removes them.

4. Expression calculation

       Before the query is executed, if the expression contains only constants, its value will be calculated first.

5. Constant table detection

       For tables with no more than 1 records or tables with matching primary keys and unique secondary index columns, because these two types of tables take very little time to query, they are called constant tables. When the query optimizer analyzes a query statement, it first performs constant table detection to see if there is a constant table, and then replaces all the conditions involving the table in the query with constants, and then analyzes the query cost of other tables.


(3) Elimination of external connections

       Why eliminate outer joins? Because the inner join can evaluate the cost of the different join order of the table through the optimizer, and select the lowest cost join order to execute the query.

       After knowing the reason, let's see how to eliminate it. All records that do not meet the conditions of the WHERE clause will not participate in the connection, as long as we specify in the search conditions of the WHERE clause that the column of the driven table is not NULL (this condition is called null value rejection), then those external connections are The drive table records that do not match the ON clause conditions in the drive table will not appear in the final result set. At this time, the outer join and the inner join are the same, and the purpose of eliminating the outer join is achieved.


two. Subquery optimization

(1) Subquery

1 Overview

       A subquery is to nest a query in a query statement, whether it is in the FROM clause, WHERE clause, ON clause, etc., subqueries can be nested, and the outer query is called the outer query. The subquery placed in the FROM clause is called a derived table, because it is equivalent to querying a table.

2. Classification according to the returned result set

(1) Scalar quantum query

       Only return a single value subquery.

(2) Row subquery

       A subquery that returns a record that needs to contain multiple columns.

(3) Liezi query

       To query the data of a column, it needs to contain multiple records.

(4) Table subquery

       The result of the subquery contains multiple records and multiple columns.


3. Classified according to the relationship with the outer query

(1) Irrelevant subqueries

       The subquery runs independently to produce the results, and does not depend on the value of the outer query.

(2) Related subqueries

       The execution of the subquery depends on the value of the outer query.

4. The use of subqueries in Boolean expressions

(1) Use with comparison operators

(2) IN/NOT IN
操作数 [NOT] IN (子查询)

       Whether (not) exists in the set consisting of the query result set.

(3) ANY/SOME
操作数 比较操作符 ANY/SOME(子查询)

       As long as there is a value in the result set of the subquery, when a specified operand is compared with the value through a comparison operator, the result is TRUE, then the entire expression is TRUE.

(4) ALL
操作数 比较操作符 ALL(子查询)

       Only when a specified operand is compared with all the values ​​in the result set of the subquery through the comparison operator, the result is TRUE, the entire expression will be TRUE.

(5) EXISTS
[NOT] EXISTS (子查询)

       You only need to judge whether there are records in the result set of the subquery, and don't care what the records are.

5. Precautions for Subqueries

       (1) The subquery must use parentheses

       (2) The subquery in the SELECT clause must be a scalar subquery

       (3) It is not allowed to add, delete or modify the records of a table in a statement, and sub-query the table at the same time


(Two) how the subquery is executed in MySQL

1. Standard sub-query, row sub-query

(1) Irrelevant subqueries

       Execute the subquery separately, and then use the results obtained by the subquery as the parameters of the outer query, and then execute the outer query, that is, multiple single-table queries.

(2) Related subqueries

       First obtain a record from the outer query, and then find the value involved in the subquery from this record to execute the subquery, and finally check whether the condition in the outer query is established according to the query result of the subquery, and if it is established, the outer query The record of the layer query is added to the result set, otherwise it is discarded.

2. IN subquery optimization

(1) Materialization table

       For irrelevant IN sub-queries, if the result set of the sub-query is too large, it may cause performance problems. If the internal storage is not enough, the index cannot be used effectively. Therefore, the result set is not directly used as the parameter of the outer query, but the result set is written into a temporary table, and the written records are deduplicated to save space. The temporary table uses the MEMORY storage engine based on memory, and a hash index is also established for the table. If the result set is particularly large, the temporary table will be converted to a disk-based storage engine to save the result set, and the index will also become B+ Tree index. This process of creating temporary tables is called materialization, and temporary tables are called materialized tables.

       Now that a new table is generated, you can perform inner join operations on the materialized table with other tables, and you can find the lowest cost execution method by calculating the cost.

(2) Semi-connected

       Materialized tables have the cost of creating temporary tables. MySQL proposes semi-join, which directly converts sub-queries into joins. The semi-joining of the s1 table and the s2 table means that for the records in the s1 table, it only cares about whether there are matching records in the s2 table, and does not care how many records match, the final result Only the records of the s1 table are kept in the collection.

       There are 5 semi-join execution strategies, and semi-join is only an optimization strategy inside MySQL, not an interface open to users.


3. [NOT] EXISTS subquery

       If it is an irrelevant subquery, execute the subquery first, and if the result is TRUE or FALSE, rewrite the original query statement. If it is a correlated subquery, it can only be queried in the same manner as the correlated subquery in the previous standard subquery, and the EXISTS subquery can be captured to speed up the index.

4. Optimization of derived tables

       The subquery placed in the FROM clause is the derived table, and MySQL provides two execution strategies. One is to materialize the derived table and write it to a temporary table, and treat the materialized table as a normal table to participate in the query. There is also a delayed materialization strategy in MySQL, which will materialize only when the derived table is actually used in the query. Derived table. Another is to merge the derived table with the outer query, write it in a form without a derived table, extract the derived table into the FROM clause, and merge the search conditions into the WHERE clause.

       MySQL first tries to merge the derived table with the outer query. If it does not work, it materializes the derived table and executes the query.

Guess you like

Origin blog.csdn.net/Mrwxxxx/article/details/113839762