如何删除表中重复记录

最近项目中遇到了一个需求，即“如何删除表中重复记录，使得所有重复的记录都只保留一行？”。在Google了半个小时之后，发现居然没有一个是正常可用的，于是乎只好自己动手写了一个。因为即便是Grails提供了很好的GORM，但是使用SQL仍然不可能完全避免，因此把它共享出来，也符合咱们网站的“共享开发经验”的宗旨。

具体的分析就不多说了，这里面主要是SQL的功夫，在此就直接列出解法，实验是在MySQL上做的，实验用的表结构是(id, first_name, last_name, parent)。

基础语句，列出重复记录：

select * 
from db1.person 
group by first_name,last_name,parent
having count(*)>1

第一种做法，使用not in：

delete from db1.person where id not in (
    select id 
    from(
        (select max(id) id 
         from db1.person 
         group by first_name, last_name, parent
         having count(id)>1)
        union
        (select id 
         from db1.person 
         group by first_name, last_name, parent
         having count(id)=1)
    ) temp
)

上面的语句的意思应该非常直白了，简直就是大白话。可问题是not in的效率并不是特别高，因此这就有了第二种解法，使用外联结+is null + exists：

delete from db1.person where exists(
    select * from
    (
        select a.id aid, b.id bid
        from db1.person a left outer join
        ((select max(id) id 
          from db1.person 
          group by first_name, last_name, parent
          having count(id)>1)
         union
         (select id 
          from db1.person 
          group by first_name, last_name, parent
          having count(id)=1)) b on a.id= b.id
    ) tmp
    where aid= id and bid is null
)

对于有主键或有唯一性约束的表，以上解法应该足够了。但有时候如果没有主键怎么办，虽然这一点似乎有点不可思议，但实际总会遇到，尤其在表不是你设计的时候。这时，只好采用一个稍微有点“恶趣味”的做法：把distinct的结果放到另一张表中，drop掉原来的表，再把那张表重命名：

create table newperson select distinct * from person;
drop table db1.person;
ALTER TABLE `db1`.`newperson` RENAME TO `db1`.`person`;

以上就是基本的解决方案，但在实际中可能情况比较复杂，尤其是在有外键关联的时候，可能就有些问题了。但话说回来，既然决定要“删除”记录，那么在作出这个决策的时候，应该就已经考虑过外键的情形了吧;)

转载于 http://www.groovyq.net/node/162

如何删除表中重复记录

猜你喜欢