Mysql query optimizer optimization of queries on child

The following sql contain sub-queries:

mysql> select * from t1 where a in (select a from t2);
mysql> select * from (select * from t1) as t;

The results returned by the query molecules pool

1, scalar inquiry

Those only return a single value of the scalar subquery called query. such as:

select * from t1 where a in (select max(a) from t2);

2, row subqueries

Sub-query returns a record, but this record needs to include multiple columns. such as:

select * from t1 where (a, b) = (select a, b from t2 limit 1);

3, column subqueries

Sub-query returns a column of data that contains multiple records. such as:

select * from t1 where a in (select a from t2);

4, sub-table query

The results of both sub-query contains a lot of records, but also contains a number of columns. such as:

select * from t1 where (a, b) in (select a,b from t2);

Press the outer query relationships district molecular queries

1, the relevant sub-queries

If you perform sub-queries need to rely on the value of the outer query, we can put this subquery is called a correlated subquery. such as:

select * from t1 where a in (select a from t2 where t1.a = t2.a);

2, uncorrelated subquery

If the sub-query results can be run separately, without depending on the value of the outer query, we can not call this subquery correlated subquery. Those sub-queries presented in front of all can be seen as unrelated child investigation.

Subqueries in MySQL is how the implementation of

1, is not relevant for scalar query or subquery row

比如:select * from t1 where a = (select a from t2 limit 1);

Its implementation steps are:

1) perform select a from t2 limit 1 of this sub-queries.

2) then the results obtained in the previous step subqueries as arguments to the outer query then perform outer query select * from t1 where a = ...;

2, the relevant row scalar query or subquery

比如:select * from t1 where b = (select b from t2 where t1.a = t2.a limit 1);

Its implementation steps are:

1) acquires a recording start in the outer query, the present embodiment is a recording start acquiring table t1.

2) that record is then obtained from the previous step to identify the values ​​related to the subquery, that record in the present embodiment is obtained from the table to find the value of t1 t1.a column, and then performing sub Inquire.

3) Finally, according to the results of a query subquery to detect the outer query WHERE clause conditions are satisfied, if set up, put the outer query that record is added to the result set, otherwise discarded.

4) The first step is executed again, get the second record in the outer query, and so on. . .

3, IN sub-query optimization

mysql for IN subquery is optimized.

比如:select * from t1 where a in (select a from t2);

For IN subqueries irrelevant, if the number of records of the results of the subquery little concentration, the handle of the outer query, respectively as two separate single-table query efficiency is still quite high, but if the individual after performing sub-query result set too much, it would cause these problems:

• The results set too much, probably does not fit in memory

• For the outer query, if the result of the subquery set too much, it means that a particularly large IN clause parameters, which can lead to:

• can not effectively use the index, only the outer layer query a full table scan.

• When the outer layer query performs a full table scan, because too many IN clause parameters, which can lead to detect whether a record time and in line with the IN clause parameters match takes too long

In mysql, it is not directly the result set is not relevant sub-query parameters as the outer query, but the result set is written to a temporary table. The process of writing a temporary table like this:

1) column of the temporary table is sub-query result set columns.

2) recording the temporary table is written to the weight. IN statement is a judgment on the record when an operand is not in a collection, the value of the collection of the results of the entire weight not repeat IN statement does not affect, so we will write a temporary table result set to re-let temporary table becomes smaller. Temporary table is also a table, simply create a primary key or unique index columns for all records in the table can be de-emphasis.

3) general erupted in the query result set is not particularly large, it will be based on the establishment of a temporary table memory using Memory storage engine, and will build hash indexes for the table for it. IN statement is to determine the nature of an operand is not in a collection, if the data collection to establish a hash index, then this matching process is very fast.

4) If the sub-query result set is very large, exceeding the system variables tmp_table_size or max_heap_table_size, temporary tables will turn to the use of disk-based storage engine to save result set record, index types are also converted to the corresponding B + tree index.

Save this sub-query result set of records to temporary table called materialized (Materialize). Temporary tables that store sub-query result set called materialized table. Because materialized records in the table have been established index (memory-based table has materialized hash index, there is a disk-based B + tree 
index), determine a number of operations executed by the index IN statement in the sub-query result set is not very fast, so as to enhance the performance of the sub-query statement.

Or for the sql above:

mysql> select * from t1 where a in (select a from t2);

When we handle queries materialized, assuming the name of the sub materialized query tables for materialized_table, as the subquery materialized table stored in the result set m_val, then this query can actually be viewed from two angles below:

• to look at an angle from the table t1, the entire query is actually meant: for each record in the table t1, if the value of a column in the record corresponding subquery materialized table, then the record is added the final result set.

• angle materialized query tables from a child to look at the whole meaning of the query is actually: For each value subquery materialized table, if the value of a column corresponding to the value equal to the record can be found in the table t1, then put these records added to the final result set.

That is in fact equivalent to the top of the query table t1 and subquery materialized table materialized_table carried out within the connection:

select * from t1 inner join materialized_table on t1.a = m_val;

After transformation into the connector, the query optimizer can evaluate the cost of different connection sequence number is required, perform a query to select the lowest cost that query.

Although after the sub-query execution cost of re-materialized query will create a temporary table, but can be converted into sub-queries JOIN or a little more efficient. That can not fail to materialize operate direct connection handle queries into it.

We compared the following two sql:

select * from t1 where a in (select a from t2);
select t1.* from t1 inner join t2 on t1.a = t2.a;

The results of these two sql query is actually like, but said the outcome of the second set of sql did not go heavy, so IN subqueries and connection between the two tables is not exactly equivalent, but the sub-query into the connection and true can give full play to the role of the optimizer, so MySQL proposes a new concept of half-connected (the Join-the sEMI) , the tables t1 and t2 table semijoin means: for a record t1 table, we only care about whether there is a matching record in table t2 exists without regard to the specific number of records matching the final result set reserved recording only the table t1. way to execute child just inside the semi-join queries using MySQL, MySQL does not provide a user-oriented semi-join syntax.

So how to achieve semi-join it?

(1) Table pullout (pull subquery table)

When the query list subquery only the primary key or unique index column, can pull the outer FROM clause of a query directly handle tables in the query, the query and search condition handle incorporated into the outer layer of a search query conditions.
For example: select * from t1 where a in (select a from t2 where t2.b = 1); - a primary key

We can directly pull on the outer table t2 FROM clause of a query, and handle the query search criteria incorporated into the outer query's search criteria, the query after the pull is this:

select * from t1 inner join t2 on t1.a = t2.a where t2.b = 1; -– a是主键

(2) DuplicateWeedout execution strategy (elimination of duplicate values)

For this query is:

select * from t1 where a in (select e from t2 where t2.b = 1); - e is just an ordinary field

After converting semi-join queries, t1 table a record may be multiple matching of records in the table t2, so many times this record may be added to the final result set in order to eliminate duplication, we can build a temporary table, say so long this temporary table:

CREATE TABLE tmp (
    id PRIMARY KEY
);

When this connection during the execution of the query, the records whenever a bar t1 of the table to join the result set, first put the primary key record is added to the temporary table, if successfully added, indicating this before t1 record in the table is not added to the final result set, now to add the record to the final result set; if the addition failed, indicating that the article prior to the recording of this t1 table had already joined the final result set, where it directly discarded like, this embodiment eliminating the use of a temporary table semi-join result set of duplicate values ​​called DuplicateWeedout.

(3) FirstMatch execution strategy (first match)

FirstMatch is one of the most primitive semijoin implementation, is the beginning of our ideas, first take a record in the outer query, and then to the child table query to find records that meet the matching criteria, if you can find one, it will the outer query record into the final result set and stops looking for more matching records, if no record put discard the outer query; then start recording removed in an outer query, repeated on top of the process.

(4) LooseScan (loose index scan)

Subquery a non-unique index scan, because the non-unique indexes, may have the same value, the index can be used to weight.

For some statements related to the use of IN subquery, for example, this query:

select * from t1 where a in (select b from t2 where t1.b = t2.b);

It can be converted to semijoin:

select * from t1 semi join t2 on t1.a = t2.a and t1.b = t2.b;

As about several situations can not be converted into semi-join:

  • WHERE condition in the outer query there are other search criteria with Boolean expressions IN subquery is connected using the OR

  • Use case instead of IN, NOT IN

  • case sub query contains GROUP BY, HAVING, or aggregate functions

  • the case of sub-query contains a UNION

So for sub-queries can not be converted to semi-join queries, there are other ways to optimize:

  • For uncorrelated subqueries, it can try them again materialized after participating in the search
, such as using the NOT IN following sql:

select * from t1 where a not in (select a from t2 where t2.a = 1);

Note that this connection can not be converted to a subquery materialized and after the outer query table is not in use because only scans the table t1 and t1 of a record table, it is determined that a record value table is not materialized.

  • Whether subquery is relevant or not relevant, you can try to spot the IN subquery EXISTS sub-query
in fact, for any one of the IN subquery, it can be converted EXISTS subqueries, common examples are as follows:

outer_expr IN (SELECT inner_expr FROM … WHERE subquery_where)

Can be converted to:

EXISTS (SELECT inner_expr FROM … WHERE subquery_where AND outer_expr=inner_expr)

The benefits of this conversion is that the index had not used before conversion, but after conversion may be able to use the index, such as:

select * from t1 where a in (select a from t2 where t2.e = t1.e);

Less than the index during this sql query inside the sub, after the conversion becomes:

select * from t1 where exists (select 1 from t2 where t2.e = t1.e and t1.a = t2.a)

After the conversion table t2 can be used in the index of a field.

So, if the IN subquery converted into semi-join does not meet the conditions, they can not be converted to a table or materialized materialized table is converted to cost too much, it will be converted to EXISTS query.

For optimizing derived table

select * from (select a, b from t1) as t;

The above sql, subquery is placed from, the results of the subquery that corresponds to a later derived table, the table name is t, there is a, b two fields.

For the derived table, there are two ways to perform:

(A) the derived table materialized

We can derive the result set table is written to a temporary table inside, and then put this materialized table as an ordinary table, like participating in the query. Of course, when the materialization of a derived table, use a strategy called delayed materialized, that is true only in a query using a derived table to go back to try materialized derived table, not yet begun to execute the query took a derived table materialized out. such as:

select * from (select * from t1 where a = 1) as derived1 inner join t2 on derived1.a = t2.a where t2.a =10;

If materialized derived table way to perform this query, the priority will be to find a table when performing t1 = 10 meet t1.a record, if not, explain t1 connection table records involved is empty, the entire query result set is empty, so there is no need to materialized query the derived table.

(Ii) the consolidation table and the outer table derived, that is, rewrite the query does not form a derived table

For example, the following sql:

select * from (select * from t1 where a = 1) as t;

And the following are equivalent sql:

select * from t1 where a = 1;

Some look a little complicated sql:

select * from (select * from t1 where a = 1) as t inner join t2 on t.a = t2.a where t2.b = 1;

We can derive the outer query table and table merge, then the derived table search criteria into the search condition in the outer query, as follows:

select * from t1 inner join t2 on t1.a = t2.a where t1.a = 1 and t2.b = 1;

So by the outer query and derived tables merged successful way to eliminate the derived table, which means we do not need to re-pay the cost of creating and accessing the temporary table. But not all queries with derived table can be successful merger and the outer query, when the derived table with these statements can not be consolidated and the outer query:

Aggregate function, such as MAX (), MIN (), SUM () What is 
the DISTINCT 
the GROUP BY 
the HAVING 
the LIMIT 
the UNION ALL or the UNION 
the SELECT clause derived table corresponding to another containing a subquery subquery

So MySQL in the implementation of the derived table with a priority to try to merge the outer query and derived tables away, if not, then the derived table materialized out to execute the query.

Guess you like

Origin www.cnblogs.com/tongxuping/p/12330122.html