[mysql] delete duplicate data

Recently, it was found that the tables in the database had dirty data and needed maintenance. This dirty data is duplicate data and needs to be removed.

Maybe because you did not think well when building the table, you need to create a (joint) unique index for several fields in the table, but it has not been established, and because more than one written program is inserting data into the table, causing data duplication ~~

Now I need to delete these duplicate data. I looked at the examples written by the predecessors on the Internet. Most of them can't be used, and the rowid has also come out, but there is no rowid in MySQL.

 

Now suppose there is a t_test table, the primary key field is id, and there are three fields of date, time, cnt1, cnt, and cnt3. Assuming that date and time are combined, there can only be one record (that is, a joint unique index needs to be established for date and time). The data in the table is as follows:


 

It can be seen that the data in the table obviously has duplicate data that does not meet the conditions.

 

Let's first query what duplicate data is there (by the two fields of date and time):

SELECT * FROM t_test WHERE (DATE, TIME) IN(SELECT DATE,TIME FROM t_test GROUP BY DATE, TIME HAVING COUNT(1)>1);

 The result is as follows:



 Try to use the online method to delete:

DELETE FROM t_test a WHERE (a.date, a.time) IN(SELECT DATE,TIME FROM t_test GROUP BY DATE, TIME HAVING COUNT(1)>1)
AND rowid NOT IN(SELECT MIN(rowid) FROM t_test GROUP BY DATE, TIME HAVING COUNT(1)>1)

 It doesn't work at all, because rowid does not exist in MySQL, which is different from Oracle. .

Another point to note is that mysql does not support the use of table aliases in delete statements, so self-connection cannot be performed to delete records in the table!

 

Solution: use an intermediate temporary table transition~~

First, create a temporary table as follows:

CREATE TEMPORARY TABLE tmp AS SELECT MIN(id) FROM t_test GROUP BY DATE,TIME

 View the contents of the temporary table tmp:

SELECT * FROM tmp

 get:



 This temporary table records the primary key with the smallest id in the duplicate records, and the primary key information without duplicate records.

Next, delete the records that are not in it:

DELETE FROM t_test WHERE id NOT IN(SELECT * FROM tmp)

 Check the current record:

SELECT * FROM t_test

 Discover:



 Found that the record is finally "clean". . Duplicate records were successfully deleted!

Of course, it is safer to add a joint index to the date and time fields during the table building period. Or delete the duplicate records and then add a joint index to the alter table.

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=327003708&siteId=291194637