One, the problem
The story originated from a report of query error rate: there are two query results, which are the items that have been added to the report and the items that should be added to the report. Please report the error rate.
What is no omission? That is, all the items that should be added have been added
The report without omission rate is the ratio of the number of complete reports to the total number of reports
Here are two examples of reports (respectively the reports that have been added and those that have been omitted)
First, find out the first result-items that should be added to the report
SELECT
r.id AS 报告ID,m.project_id 应添加项目
FROM
report r
INNER JOIN application a ON r.app_id=a.id
INNER JOIN application_sample s ON a.id=s.app_id
RIGHT JOIN application_sample_item si ON s.id=si.sample_id
RIGHT JOIN set_project_mapping m ON si.set_id=m.set_id
WHERE r.id IN ('44930','44927')
ORDER BY r.id,m.project_id;
Then, find out the second result-report items that have been added
SELECT r.id AS 报告ID,i.project_id AS 已添加项目
FROM report r
RIGHT JOIN report_item i ON r.id=i.report_id
WHERE r.id IN ('44930','44927');
The above is the result set we are going to compare. It is not difficult to see that the report 44927 is exhaustive. Although 44930 has the same number of items, it actually adds item 758 and lacks item 112, which is a missing report.
Two, the solution
From the perspective of the problem, it is obviously a problem of judging whether it is a subset. The added items and the items that should be added can be traversed separately. If the items that should be added can be matched in the added items, it means that the items that should be added are a subset of the added items, that is, there is nothing missing.
It is indeed possible to solve this problem by traversing and comparing, but the cross join of Cartesian product in SQL often means huge overhead and slow query speed. Is there any way to avoid this problem?
Option One:
With the help of functions FIND_IN_SET and GROUP_CONCAT, first understand the next two functions
· FIND_IN_SET(str,strlist)
str: the string to be queried
strlist: The parameters are separated by English "," , such as (1,2,6,8,10,22)
The FIND_IN_SET function returns the position of the string to be queried in the target string
· GROUP_CONCAT ([distinct] the field to be connected [order by asc/desc] [separator'separator'])
The GROUP_CONCAT() function can concatenate the values of the same field of multiple records into one record and return. The default is divided by English',' .
However, the default length of GROUP_CONCAT() is 1024
Therefore, if the splicing length exceeds 1024, the interception will be incomplete, and the length needs to be modified
SET GLOBAL group_concat_max_len=102400;
SET SESSION group_concat_max_len=102400;
From the introduction of the above two functions, we found that FIND_IN_SET and GROUP_CONCAT are separated by English',' (marked in bold)
Therefore, we can use GROUP_CONCAT to concatenate the items of the added items into a string, and then use FIND_IN_SET to query one by one whether the items to be added exist in the string
1. Modify the SQL in the description of the question, and use GROUP_CONCAT to connect the added items into a string
SELECT r.id,GROUP_CONCAT(i.project_id ORDER BY i.project_id,'') AS 已添加项目列表
FROM report r
LEFT JOIN report_item i ON r.id=i.report_id
WHERE r.id IN ('44930','44927')
GROUP BY r.id;
2. Use FIND_IN_SET to query whether all the items to be added exist in the string one by one
SELECT Q.id,FIND_IN_SET(W. The project list should be added, Q. The project list has been added) AS is missing
FROM
(
- report the added project
SELECT r.id, GROUP_CONCAT(i.project_id ORDER BY i.project_id,' ') AS added item list
FROM report r
LEFT JOIN report_item i ON r.id=i.report_id
WHERE r.id IN ('44930','44927')
GROUP BY r.id
)Q,
(
- report should be added The project
SELECT
r.id, s.app_id, m.project_id should be added to the project list
FROM
report r
INNER JOIN application a ON r.app_id=a.id
INNER JOIN application_sample s ON a.id=s.app_id
INNER JOIN application_sample_item si ON s.id=si.sample_id
INNER JOIN set_project_mapping m ON si.set_id=m.set_id
WHERE r.id IN ('44930','44927')
ORDER BY r.id,m.project_id
)W
WHERE Q.id=W.id;
3. Filter out missing reports
SELECT Q.id, CASE WHEN FIND_IN_SET (W. The list of items should be added, Q. The list of added items)>0 THEN 1 ELSE 0 END AS Whether the
FROM
(
- report the added items
SELECT r.id, GROUP_CONCAT(i .project_id ORDER BY i.project_id,'') AS added project list
FROM report r
LEFT JOIN report_item i ON r.id=i.report_id
WHERE r.id IN ('44930','44927')
GROUP BY r.id
)Q,
(
- Report the project that should be added
SELECT
r.id, s.app_id, m.project_id should add the project list
FROM
report r
INNER JOIN application a ON r.app_id=a.id
INNER JOIN application_sample s ON a.id=s.app_id
INNER JOIN application_sample_item si ON s.id=si.sample_id
INNER JOIN set_project_mapping m ON si.set_id=m.set_id
WHERE r.id IN ('44930','44927')
ORDER BY r.id,m.project_id
)W
WHERE Q.id=W.id
GROUP BY Q.id
HAVING COUNT(`是否遗漏`)=SUM(`是否遗漏`);
4. Our ultimate goal is to seek the zero omission rate
SELECT COUNT(X.id) No missing report count, Y.total total report total, CONCAT(FORMAT(COUNT(X.id)/Y.total*100,2),'%') AS item no missing rate FROM
(
SELECT Q.id, CASE WHEN FIND_IN_SET (W. The list of items should be added, Q. The list of items that have been added)>0 THEN 1 ELSE 0 END AS Whether the
FROM
(
- report the added items
SELECT r.id, GROUP_CONCAT(i. project_id ORDER BY i.project_id,'') AS added project list
FROM report r
LEFT JOIN report_item i ON r.id=i.report_id
WHERE r.id IN ('44930','44927')
GROUP BY r.id
) Q,
(
- report items that should be added
SELECT
r.id,s.app_id,m.project_id 应添加项目列表
FROM
report r
INNER JOIN application a ON r.app_id=a.id
INNER JOIN application_sample s ON a.id=s.app_id
INNER JOIN application_sample_item si ON s.id=si.sample_id
INNER JOIN set_project_mapping m ON si.set_id=m.set_id
WHERE r.id IN ('44930','44927')
ORDER BY r.id,m.project_id
)W
WHERE Q.id=W.id
GROUP BY Q.id
HAVING COUNT(`Whether missing` )=SUM(`Whether missing`)
)X,
(
- Total number of reports
SELECT COUNT(E.nums) AS total FROM
(
SELECT COUNT(r.id) AS nums FROM report r
WHERE r .id IN ('44930','44927')
GROUP BY r.id
)E
)Y
;
Option II:
Although the above scheme 1 avoids the line-by-line traversal comparison, it is essentially a comparison of items one by one. Is there any way to avoid comparison?
The answer is of course yes. We can judge whether it is completely included based on the statistical quantity.
1. Use union all to link the added items with the items that should be added, without removing duplicates
(
-- 应该添加的项目
SELECT
r.id,m.project_id
FROM
report r
INNER JOIN application a ON r.app_id=a.id
INNER JOIN application_sample s ON a.id=s.app_id
INNER JOIN application_sample_item si ON s.id=si.sample_id
INNER JOIN set_project_mapping m ON si.set_id=m.set_id
WHERE r.id IN ('44930','44927')
ORDER BY r.id,m.project_id
)
UNION ALL
(
-- 已经添加的项目
select r.id,i.project_id from report r,report_item i
where r.id = i.report_id and r.id IN ('44930','44927')
group by r.app_id,i.project_id
)
It can be seen from the results that there are duplicate items under the same report, which represent items that should be added and items that have been added.
2. According to the results of the joint table, the number of overlapping items in the statistical report
# Should add the number of overlaps with already added projects
select tt.id,count(*) count from
(
select t.id,t.project_id,count(*) from
(
(
- Projects that should be added
SELECT
r.id,m .project_id
FROM
report r
INNER JOIN application a ON r.app_id=a.id
INNER JOIN application_sample s ON a.id=s.app_id
INNER JOIN application_sample_item si ON s.id=si.sample_id
INNER JOIN set_project_mapping m ON si.set_id=m.set_id
WHERE r.id IN ('44930','44927')
ORDER BY r.id,m.project_id
)
UNION ALL
(
-- 已经添加的项目
select r.id,i.project_id from report r,report_item i
where r.id = i.report_id and r.id IN ('44930','44927')
group by r.app_id,i.project_id
)
) t
GROUP BY t.id,t.project_id
HAVING count(*) >1
) tt group by tt.id
3. Compare the quantity in the second step with the quantity that should be added. If they are equal, it means nothing is missing
select bb.id,aa.count has been added, bb.count needs to be added,
CASE WHEN aa.count/bb.count=1 THEN 1
ELSE 0
END AS'Is it missing?'
from
(
# Should add the amount of overlap with the added item
select tt.id,count(*) count from
(
select t.id,t.project_id,count(*) from
(
(
- the project that should be added
SELECT
r.id,m.project_id
FROM
report r
INNER JOIN application a ON r.app_id=a.id
INNER JOIN application_sample s ON a.id=s.app_id
INNER JOIN application_sample_item si ON s.id=si.sample_id
INNER JOIN set_project_mapping m ON si.set_id=m.set_id
WHERE r.id IN ('44930','44927')
ORDER BY r.id,m.project_id
)
UNION ALL
(
-- 已经添加的项目
select r.id,i.project_id from report r,report_item i
where r.id = i.report_id and r.id IN ('44930','44927')
group by r.app_id,i.project_id
)
) t
GROUP BY t.id,t.project_id
HAVING count(*) >1
) tt group by tt.id
) aa RIGHT JOIN
(
-- 应该添加的项目数量
SELECT
r.id,s.app_id,COUNT(m.project_id) count
FROM
report r
INNER JOIN application a ON r.app_id=a.id
INNER JOIN application_sample s ON a.id=s.app_id
INNER JOIN application_sample_item si ON s.id=si.sample_id
INNER JOIN set_project_mapping m ON si.set_id=m.set_id
WHERE r.id IN ('44930','44927')
GROUP BY r.id
ORDER BY r.id,m.project_id
) bb ON aa.id = bb.id
ORDER BY aa.id
4. Find the no-missing rate
select
SUM(asr.`Is it missing`) AS No missing number, COUNT(asr.id) AS total number, CONCAT(FORMAT(SUM(asr.`Is missing`)/COUNT(asr.id)*100,5), '%') AS report no omission rate
from
(
select bb.id,aa.count has been added, bb.count needs to be added,
CASE WHEN aa.count/bb.count=1 THEN 1
ELSE 0
END AS'whether omission'
from
(
# Should add the number of overlaps with already added projects
select tt.id,count(*) count from
(
select t.id,t.project_id,count(*) from
(
(
- Projects that should be added
SELECT
r.id,m.project_id
FROM
report r
INNER JOIN application a ON r.app_id=a.id
INNER JOIN application_sample s ON a.id=s.app_id
INNER JOIN application_sample_item si ON s.id=si.sample_id
INNER JOIN set_project_mapping m ON si.set_id=m.set_id
WHERE r.id IN ('44930','44927')
ORDER BY r.id,m.project_id
)
UNION ALL
(
-- 已经添加的项目
select r.id,i.project_id from report r,report_item i
where r.id = i.report_id and r.id IN ('44930','44927')
group by r.app_id,i.project_id
)
) t
GROUP BY t.id,t.project_id
HAVING count(*) >1
) tt group by tt.id
) aa RIGHT JOIN
(
-- 应该添加的项目数量
SELECT
r.id,s.app_id,COUNT(m.project_id) count
FROM
report r
INNER JOIN application a ON r.app_id=a.id
INNER JOIN application_sample s ON a.id=s.app_id
INNER JOIN application_sample_item si ON s.id=si.sample_id
INNER JOIN set_project_mapping m ON si.set_id=m.set_id
WHERE r.id IN ('44930','44927')
GROUP BY r.id
ORDER BY r.id,m.project_id
) bb ON aa.id = bb.id
ORDER BY aa.id
) asr;