mysql subquery slow problem

When you use the explain tool to view the execution plan of the sql statement, if " DEPENDENT SUBQUERY " appears in the select_type field, you should pay attention, you have fallen into the "pit" of slow mysql subquery. . . Let's look at a specific example

There is such a query statement:

SELECT gid,COUNT(id) as count FROM shop_goods g1 WHERE status =0 and gid IN (SELECT gid FROM shop_goods g2 WHERE sid IN  (1519066,1466114,1466110,1466102,1466071,1453929))GROUP BY gid;

I looked at it with explain, and the keyword "DEPENDENT SUBQUERY" appeared, which means that the first select of the subquery depends on the external query;

SUBQUERY: The first SELECT in a subquery; DEPENDENT SUBQUERY: The first SELECT in a subquery, depending on the outer query  .

In other words, the way  the subquery queries g2 depends on the query of the outer layer g1 . It means two steps:

In the first step, MySQL obtains a large result set t1 according to  select gid,count(id) from shop_goods where  status=0 group by gid ,  and its data volume is rows=850672;

In the second step, each record in the large result set t1 above will form a new query statement with the sub-query SQL: select gid from shop_goods where sid in (15...blabla..29) and gid=%t1 .gid%. It means that the subquery needs to be executed 850,000 times ... Even if the index is used in both steps of the query, it is not surprising that it is not slow;

As a result, the execution efficiency of the subquery is actually limited by the number of records in the outer query, so it is better to split it into two independent queries and execute them sequentially .

The general optimization strategy for such statements is to split into two query statements. If you do not want to split into two independent queries, you can also join the query with the temporary table:

If you don't want to split into two independent queries, you can also join the table query with the temporary table , as shown in the following optimized sql:

SELECT g1.gid,count(1) FROM shop_goods g1,(select gid from shop_goods WHERE sid in (1519066,1466114,1466110,1466102,1466071,1453929)) g2 where g1.status=0 and g1.gid=g2.gid GROUP BY g1.gid;

I took a look with explain, and this time there is a new keyword "DERIVED", which means it is used when there is a subquery in the from clause. MySQL will recursively execute these subqueries, put the results in the temporary table, and then do the join operation;

The official meaning of DERIVED is: it is used when there is a subquery in the from clause. MySQL executes these subqueries recursively, placing the results in temporary tables.

Section 4.4, "Limitations of the MySQL Query Optimizer" in "High Performance MySQL" has a similar discussion in Section 4.4.1, "Correlated Subqueries": mysql is processing subqueries , the subquery is rewritten. Usually, we want to complete the results of the sub-query from the inside to the outside, and then use the sub-query to drive the table of the outer query to complete the query.

For example: select * from test where tid in (select fk_tid from sub_test where gid=10); usually we will perceptually think that the execution order of the sql is: in the sub_test table, obtain fk_tid(2,3,4,5,6 according to gid) ) record, and then go to test, bring in tid=2,3,4,5,6 to get the query data.

But the actual mysql processing method is:

select * from test where exists (select * from sub_test where gid=10 and sub_test.fk_tid=test.tid);

MySQL will scan all the data in test, and each data will be sent to sub-query to be associated with sub_test. The sub-query will not be executed first, so if the test table is large, there will be performance problems.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325159770&siteId=291194637