Performance Analysis | MySQL slow check-analysis

MySQL remember a slow check analysis

Recently I met a MySQL slow look into the problem, then under investigation, where the related processes to be summed up.

Positioning reason

I first looked at MySQL's slow query log and found that there is such a query takes very long (probably more than one second), and a large number of scanning lines (more than 10 million data, almost the whole table):

SELECT * FROM tgdemand_demand t1
WHERE
(
t1.id IN
(
SELECT t2.demand_id
FROM tgdemand_job t2
WHERE (t2.state = 'working' AND t2.wangwang = 'abc')
)
AND
NOT (t1.state = 'needConfirm')
)
ORDER BY t1.create_date DESC

This query is not very complicated, first perform a subquery, take to task the state (state) is the 'working' and associates tasks (wangwang) are all requirements 'abc' of the id (the designers of the corresponding task demand id), and then to the main table tgdemand_demandin just a collection into the id, check out the demand state (state) all demand is not 'needConfirm', and finally a sort.

Logically subquery after screening id to the main table is to use the filter directly to the primary key, it should be soon ah. And I checked the index table tgdemand_job sub-query, where the query conditions have been used in the index increased. How did that happen?

So, I performed a explain (output sql statement execution plan) on this query, look at the MySQL implementation plan is like. Output is as follows:

MySQL remember a slow check analysis

We see that the first line is t1 table, type is ALL (full table scan), rows (number of rows affected) is 157089, did not use any index; the second line is t2 table, use the index. And the execution order before I understood completely different!

Why MySQL not to execute sub-queries, but to t1 table full table scan it? We look carefully select_type second row, find its value is DEPENDENT_SUBQUERY, meaning that query depends on the outer query this sub-queries. What does it mean?

In fact, MySQL rewritten for this sub-queries, SQL above will be rewritten into the following form:

SELECT * FROM tgdemand_demand t1 WHERE EXISTS (
SELECT * FROM tgdemand_job t2 WHERE t1.id = t2.demand_id AND (t2.state = 'working' AND t2.wangwang = 'abc')
) AND NOT (t1.state = 'needConfirm')
ORDER BY t1.create_date DESC;

This means, SQL would be to scan all the data tgdemand_demand tables, each of the data and then pass the query to be associated with the sub-table tgdemand_job, the implementation of sub-queries, first sub-queries will not execute, and sub-query execution 157089 times (outside table number of recording layers). Fortunately, our sub-query plus the necessary index, otherwise the result will be even more miserable.

This result is really pit father, and very counterintuitive. For slow queries, do not take it for granted, or a lot of explain, how to look at the database is actually performed.

Bug fixes

Since the sub-query is rewritten, and that the simplest solution is to do a subquery, the inner id needs to obtain separate out the SQL execution, to get the results after the implementation of a SQL to get the actual data. Probably something like this (The following statement is illegal, but illustrative):

ids = SELECT t2.demand_id
FROM tgdemand_job t2
WHERE (t2.state = 'working' AND t2.wangwang = 'abc');

SELECT * FROM tgdemand_demand t1
WHERE
(
t1.id IN ids
AND
NOT (t1.state = 'needConfirm')
)
ORDER BY t1.create_date DESC;

Let's say dry dry, I found the following code (python is written in):

demand_ids = Job.objects.filter(wangwang=user['wangwang'], state='working').values_list("demand_id", flat=True)

demands = Demand.objects.filter(id__in=demand_ids).exclude(state__in=['needConfirm']).order_by('-create_date')

what! This is not and I would like to have the same thing? First find out the needs of id (first line of code), and then use the id set of the actual query execution to go (second line of code). Why after treatment ORM framework is not the same SQL output it?

With this question I searched a lot. The original Django ORM framework comes generated QuerySet that we can pass this QuerySet performed everywhere lazy (lazy evaluated), until it is needed the actual implementation of SQL.

For example, our code inside of Job.objects.filter(wangwang=user['wangwang'], state='working').values_list("demand_id", flat=True)this QuerySet does not actually executed is passed as a parameter to the id__in, when Demand.objects.filter(id__in=demand_ids).exclude(state__in=['needConfirm']).order_by('-create_date')the time to perform this QuerySet, not just QuerySet execution began to perform as SQL, so the SQL statement generated beginning.

That being the case, we aim to make QuerySet execution in advance, to get the result set. According to the document, circulated to QuerySet, Slice, take len, list when the conversion is performed. So I will change the code to look like this:

demand_ids = list(Job.objects.filter(wangwang=user['wangwang'], state='working').values_list("demand_id", flat=True))

demands = Demand.objects.filter(id__in=demand_ids).exclude(state__in=['needConfirm']).order_by('-create_date')

Finally, the page opening speed back to normal.

In fact, we can rewrite of SQL to solve the problem:

select * from tgdemand_demand t1, (select t.demand_id from tgdemand_job t where t.state = 'working' and t.wangwang = 'abc') t2
where t1.id=t2.demand_id and not (t1.state = 'needConfirm')
order by t1.create_date DESC

The idea is to remove the subqueries, by changing the way the two join tables to obtain the data. Here not carried out.

Impressions

Framework can improve the productivity of the premise is enough to understand the rationale behind, otherwise the application will most likely exposed some hidden problem to death at a certain time (these issues in small-scale stage might simply have not found ......). Ensure robust applications is really a major problem, there are many things worth exploring.

Guess you like

Origin www.cnblogs.com/wyf0518/p/11456864.html