MYSQL query one-to-many data table association, how to deal with duplicate data

In MySQL, when performing a one-to-many data table association query, sometimes it will cause duplicate data in the result. This is due to the Cartesian product of multiple associated subtable records with the main table records, resulting in duplicate results. To handle this situation, one of the following methods can be used:

  • Use the DISTINCT keyword: You can use the DISTINCT keyword in the query statement to remove duplicate records. For example:
SELECT DISTINCT t1.column1, t1.column2, t2.column3
FROM table1 t1
JOIN table2 t2 ON t1.id = t2.table1_id;
  • Use subquery or nested query: By using subquery or nested query, you can merge the data of the associated subtable into the main query to avoid duplicate results. For example:
SELECT t1.column1, t1.column2, (
    SELECT GROUP_CONCAT(t2.column3)
    FROM table2 t2
    WHERE t2.table1_id = t1.id
) AS child_data
FROM table1 t1;

The above query will use the subquery to obtain the data of the subtable, and combine multiple subtable records into a string through the GROUP_CONCAT function.

  • Use the GROUP BY clause: If you need to group by the main table records, you can use the GROUP BY clause. This allows duplicating subtable data to be combined into one record. For example:
SELECT t1.column1, t1.column2, GROUP_CONCAT(t2.column3)
FROM table1 t1
JOIN table2 t2 ON t1.id = t2.table1_id
GROUP BY t1.id;

The above query will use the GROUP BY clause to group by the id of the main table, and use the GROUP_CONCAT function to combine multiple sub-table records into one string.
In addition to using the DISTINCT keyword, subqueries or nested queries, and the GROUP BY clause, there are other options for dealing with duplicate data:

  • Use different types of JOIN clauses: There are different types of JOIN operations in MySQL, such as INNER JOIN, LEFT JOIN, RIGHT
    JOIN, etc. According to the specific data table relationship and query requirements, select the appropriate JOIN type to ensure that no duplicate data appears in the result set.
  • Use subqueries to deduplicate: Duplicate data can be removed by using subqueries and aggregate functions (such as MAX, MIN). For example:
SELECT t1.column1, t1.column2, t2.column3
FROM table1 t1
JOIN (
    SELECT DISTINCT table1_id, column3
    FROM table2
) t2 ON t1.id = t2.table1_id;

In the above query, the subquery removes duplicate data in the subtable by using the DISTINCT keyword, and then associates it with the main table.

  • Use temporary tables or table variables: You can use temporary tables or table variables to store intermediate results, and then process the intermediate results to remove duplicate data. This approach may require the use of multiple query statements, and will add some extra operations.
  • Handle duplicate data in the application: If the problem of duplicate data cannot be resolved by database queries, it can be handled in the application. Deduplicate query results by using data structures (such as collections) and algorithms in your application.
  • Use window function (Window Function): MySQL 8.0 and above versions support window function, and you can use window function to process repeated data. Group, sort, and filter the result set to obtain the desired unique records by using the ROW_NUMBER() function or other window functions.
  • Use the DISTINCT ON syntax (applicable only to specific databases): Some databases (such as PostgreSQL) support the DISTINCT
    ON syntax, which can deduplicate the result set according to the specified column. However, be aware that the availability and syntax differences of this syntax may vary by database.

Guess you like

Origin blog.csdn.net/yuanchengfu0910/article/details/131209626