mysql determines whether it is a subset

One, the problem

The story originated from a report of query error rate: there are two query results, which are the items that have been added to the report and the items that should be added to the report. Please report the error rate.

What is no omission? That is, all the items that should be added have been added

The report without omission rate is the ratio of the number of complete reports to the total number of reports

Here are two examples of reports (respectively the reports that have been added and those that have been omitted)
 

First, find out the first result-items that should be added to the report

SELECT 
                    r.id AS 报告ID,m.project_id 应添加项目
FROM 
        report r 
        INNER JOIN application a ON r.app_id=a.id
        INNER JOIN application_sample s ON a.id=s.app_id
        RIGHT JOIN application_sample_item si ON s.id=si.sample_id                            
        RIGHT JOIN set_project_mapping m ON si.set_id=m.set_id
WHERE r.id IN ('44930','44927')
ORDER BY r.id,m.project_id;

Then, find out the second result-report items that have been added

SELECT r.id AS 报告ID,i.project_id AS 已添加项目 
FROM report r 
RIGHT JOIN report_item i ON r.id=i.report_id
WHERE r.id IN ('44930','44927');

The above is the result set we are going to compare. It is not difficult to see that the report 44927 is exhaustive. Although 44930 has the same number of items, it actually adds item 758 and lacks item 112, which is a missing report.

 

Two, the solution

From the perspective of the problem, it is obviously a problem of judging whether it is a subset. The added items and the items that should be added can be traversed separately. If the items that should be added can be matched in the added items, it means that the items that should be added are a subset of the added items, that is, there is nothing missing.

It is indeed possible to solve this problem by traversing and comparing, but the cross join of Cartesian product in SQL often means huge overhead and slow query speed. Is there any way to avoid this problem?

Option One:

With the help of functions FIND_IN_SET and GROUP_CONCAT, first understand the next two functions

· FIND_IN_SET(str,strlist)

str: the string to be queried

strlist: The parameters are separated by English "," , such as (1,2,6,8,10,22)

The FIND_IN_SET function returns the position of the string to be queried in the target string

· GROUP_CONCAT ([distinct] the field to be connected [order by asc/desc] [separator'separator'])

The GROUP_CONCAT() function can concatenate the values ​​of the same field of multiple records into one record and return. The default is divided by English',' .

However, the default length of GROUP_CONCAT() is 1024

Therefore, if the splicing length exceeds 1024, the interception will be incomplete, and the length needs to be modified

SET GLOBAL group_concat_max_len=102400;
SET SESSION group_concat_max_len=102400;

 From the introduction of the above two functions, we found that FIND_IN_SET and GROUP_CONCAT are separated by English',' (marked in bold)

Therefore, we can use GROUP_CONCAT to concatenate the items of the added items into a string, and then use FIND_IN_SET to query one by one whether the items to be added exist in the string

1. Modify the SQL in the description of the question, and use GROUP_CONCAT to connect the added items into a string

SELECT r.id,GROUP_CONCAT(i.project_id ORDER BY i.project_id,'') AS 已添加项目列表 
FROM report r 
LEFT JOIN report_item i ON r.id=i.report_id
WHERE r.id IN ('44930','44927')
GROUP BY r.id;

2. Use FIND_IN_SET to query whether all the items to be added exist in the string one by one

SELECT Q.id,FIND_IN_SET(W. The project list should be added, Q. The project list has been added) AS is missing
            FROM 
            (
            - report the added project 
                        SELECT r.id, GROUP_CONCAT(i.project_id ORDER BY i.project_id,' ') AS added item list 
                        FROM report r 
                        LEFT JOIN report_item i ON r.id=i.report_id
                        WHERE r.id IN ('44930','44927')
                        GROUP BY r.id
            )Q,
            (
            - report should be added The project 
                        SELECT 
                                    r.id, s.app_id, m.project_id should be added to the project list
                        FROM 
                                    report r 
                                    INNER JOIN application a ON r.app_id=a.id
                                    INNER JOIN application_sample s ON a.id=s.app_id
                                    INNER JOIN application_sample_item si ON s.id=si.sample_id                            
                                    INNER JOIN set_project_mapping m ON si.set_id=m.set_id
                       WHERE r.id IN ('44930','44927')
                       ORDER BY r.id,m.project_id
            )W
            WHERE Q.id=W.id;

3. Filter out missing reports

    SELECT Q.id, CASE WHEN FIND_IN_SET (W. The list of items should be added, Q. The list of added items)>0 THEN 1 ELSE 0 END AS Whether the
            FROM 
            (
            - report the added items 
                        SELECT r.id, GROUP_CONCAT(i .project_id ORDER BY i.project_id,'') AS added project list 
                        FROM report r 
                        LEFT JOIN report_item i ON r.id=i.report_id
                        WHERE r.id IN ('44930','44927')
                        GROUP BY r.id
            )Q,
            (
            - Report the project that should be added 
                        SELECT 
                                    r.id, s.app_id, m.project_id should add the project list
                        FROM 
                                    report r 
                                    INNER JOIN application a ON r.app_id=a.id
                                    INNER JOIN application_sample s ON a.id=s.app_id
                                    INNER JOIN application_sample_item si ON s.id=si.sample_id                            
                                    INNER JOIN set_project_mapping m ON si.set_id=m.set_id
                        WHERE  r.id IN ('44930','44927')
                        ORDER BY r.id,m.project_id
            )W
            WHERE Q.id=W.id
            GROUP BY Q.id
            HAVING COUNT(`是否遗漏`)=SUM(`是否遗漏`);

4. Our ultimate goal is to seek the zero omission rate

 SELECT COUNT(X.id) No missing report count, Y.total total report total, CONCAT(FORMAT(COUNT(X.id)/Y.total*100,2),'%') AS item no missing rate FROM 
(
        SELECT Q.id, CASE WHEN FIND_IN_SET (W. The list of items should be added, Q. The list of items that have been added)>0 THEN 1 ELSE 0 END AS Whether the
            FROM 
            (
            - report the added items 
                        SELECT r.id, GROUP_CONCAT(i. project_id ORDER BY i.project_id,'') AS added project list 
                        FROM report r 
                        LEFT JOIN report_item i ON r.id=i.report_id
                        WHERE r.id IN ('44930','44927')
                        GROUP BY r.id
            ) Q,
            (
            - report items that should be added 
                        SELECT 
                                    r.id,s.app_id,m.project_id 应添加项目列表
                            FROM 
                                    report r 
                                    INNER JOIN application a ON r.app_id=a.id
                                    INNER JOIN application_sample s ON a.id=s.app_id
                                    INNER JOIN application_sample_item si ON s.id=si.sample_id                            
                                    INNER JOIN set_project_mapping m ON si.set_id=m.set_id
                            WHERE r.id IN ('44930','44927')
                ORDER BY r.id,m.project_id
            )W
            WHERE Q.id=W.id
            GROUP BY Q.id
            HAVING COUNT(`Whether              missing` )=SUM(`Whether missing`)
 )X,
 (
             - Total number of reports
SELECT COUNT(E.nums) AS total FROM
             (
                     SELECT COUNT(r.id) AS nums FROM report r 
                     WHERE r .id IN ('44930','44927')
                     GROUP BY r.id
             )E                
 )Y    
 ;

 

Option II:

Although the above scheme 1 avoids the line-by-line traversal comparison, it is essentially a comparison of items one by one. Is there any way to avoid comparison?

The answer is of course yes. We can judge whether it is completely included based on the statistical quantity.

1. Use union all to link the added items with the items that should be added, without removing duplicates

 (
 -- 应该添加的项目
SELECT 
        r.id,m.project_id
FROM 
         report r 
INNER JOIN application a ON r.app_id=a.id
INNER JOIN application_sample s ON a.id=s.app_id
INNER JOIN application_sample_item si ON s.id=si.sample_id                            
INNER JOIN set_project_mapping m ON si.set_id=m.set_id
WHERE r.id IN ('44930','44927')
ORDER BY r.id,m.project_id
)
UNION ALL
(
 -- 已经添加的项目
select r.id,i.project_id from report r,report_item i 
where r.id = i.report_id and r.id IN ('44930','44927')
group by r.app_id,i.project_id
 )

It can be seen from the results that there are duplicate items under the same report, which represent items that should be added and items that have been added.

2. According to the results of the joint table, the number of overlapping items in the statistical report

# Should add the number of overlaps with already added projects
select tt.id,count(*) count from 
(
            select t.id,t.project_id,count(*) from 
            (
                        (
                                - Projects that should be added
                                SELECT 
                                        r.id,m .project_id
                                FROM 
                                        report r 
                                        INNER JOIN application a ON r.app_id=a.id
                                        INNER JOIN application_sample s ON a.id=s.app_id
                                        INNER JOIN application_sample_item si ON s.id=si.sample_id                            
                                        INNER JOIN set_project_mapping m ON si.set_id=m.set_id
                                WHERE r.id IN ('44930','44927')
                                ORDER BY r.id,m.project_id
                        )
                        UNION ALL
                        (
                                -- 已经添加的项目
                                select r.id,i.project_id from report r,report_item i 
                                where r.id = i.report_id and r.id IN ('44930','44927')
                                group by r.app_id,i.project_id
                        )
                        
            ) t
            GROUP BY t.id,t.project_id
            HAVING count(*) >1 
) tt group by tt.id 

3. Compare the quantity in the second step with the quantity that should be added. If they are equal, it means nothing is missing

select bb.id,aa.count has been added, bb.count needs to be added,
                CASE WHEN aa.count/bb.count=1 THEN 1
                ELSE 0
                END AS'Is it missing?' 
from  
(
# Should add the amount of overlap with the added item
select tt.id,count(*) count from 
(
            select t.id,t.project_id,count(*) from 
            (
                        (
                                - the project that should be added
                                SELECT 
                                        r.id,m.project_id
                                FROM 
                                        report r 
                                        INNER JOIN application a ON r.app_id=a.id
                                        INNER JOIN application_sample s ON a.id=s.app_id
                                        INNER JOIN application_sample_item si ON s.id=si.sample_id                            
                                        INNER JOIN set_project_mapping m ON si.set_id=m.set_id
                                WHERE r.id IN ('44930','44927')
                                ORDER BY r.id,m.project_id
                        )
                        UNION ALL
                        (
                                -- 已经添加的项目
                                select r.id,i.project_id from report r,report_item i 
                                where r.id = i.report_id and r.id IN ('44930','44927')
                                group by r.app_id,i.project_id
                        )
                        
            ) t
            GROUP BY t.id,t.project_id
            HAVING count(*) >1 
) tt group by tt.id 
) aa RIGHT JOIN
(
        -- 应该添加的项目数量
        SELECT 
                r.id,s.app_id,COUNT(m.project_id) count
        FROM 
                report r 
                INNER JOIN application a ON r.app_id=a.id
                INNER JOIN application_sample s ON a.id=s.app_id
                INNER JOIN application_sample_item si ON s.id=si.sample_id                            
                INNER JOIN set_project_mapping m ON si.set_id=m.set_id
        WHERE r.id IN ('44930','44927')
        GROUP BY r.id
        ORDER BY r.id,m.project_id
) bb ON aa.id = bb.id 
ORDER BY aa.id

4. Find the no-missing rate

select 
                SUM(asr.`Is it missing`) AS No missing number, COUNT(asr.id) AS total number, CONCAT(FORMAT(SUM(asr.`Is missing`)/COUNT(asr.id)*100,5), '%') AS report no omission rate
from 
(
        select bb.id,aa.count has been added, bb.count needs to be added,
                        CASE WHEN aa.count/bb.count=1 THEN 1
                        ELSE 0
                        END AS'whether omission' 
        from  
        (
        # Should add the number of overlaps with already added projects
        select tt.id,count(*) count from 
        (
                    select t.id,t.project_id,count(*) from 
                    (
                                (
                                        - Projects that should be added
                                        SELECT 
                                                r.id,m.project_id
                                        FROM 
                                                report r 
                                                INNER JOIN application a ON r.app_id=a.id
                                                INNER JOIN application_sample s ON a.id=s.app_id
                                                INNER JOIN application_sample_item si ON s.id=si.sample_id                            
                                                INNER JOIN set_project_mapping m ON si.set_id=m.set_id
                                        WHERE r.id IN ('44930','44927')
                                        ORDER BY r.id,m.project_id
                                )
                                UNION ALL
                                (
                                        -- 已经添加的项目
                                        select r.id,i.project_id from report r,report_item i 
                                        where r.id = i.report_id and r.id IN ('44930','44927')
                                        group by r.app_id,i.project_id
                                )
                                
                    ) t
                    GROUP BY t.id,t.project_id
                    HAVING count(*) >1 
        ) tt group by tt.id 
        ) aa RIGHT JOIN
        (
                -- 应该添加的项目数量
                SELECT 
                        r.id,s.app_id,COUNT(m.project_id) count
                FROM 
                        report r 
                        INNER JOIN application a ON r.app_id=a.id
                        INNER JOIN application_sample s ON a.id=s.app_id
                        INNER JOIN application_sample_item si ON s.id=si.sample_id                            
                        INNER JOIN set_project_mapping m ON si.set_id=m.set_id
                WHERE r.id IN ('44930','44927')
                GROUP BY r.id
                ORDER BY r.id,m.project_id
        ) bb ON aa.id = bb.id 
        ORDER BY aa.id
) asr;

Guess you like

Origin blog.csdn.net/kk_gods/article/details/112894187