pg数据库查询重复数据并可识别空数据列重复

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/u011099093/article/details/78596034

根据多个字段查询重复数据:SELECT A,B,C FROM TABLE WHERE CONDITION GROUP BY A,B,C HAVING COUNT(*)>1 即可,但是现在的需求是:

最终查询的字段多于分组字段,且同一字段的空值也视为重复。在网上查询了很多资料,也询问了同事最后尝试出如下sql:

SELECT A,B,C,D,E FROM TABLE A WHERE EXISTS(SELECT A,B,C FROM TABLE B WHERE CONDITION AND COALESCE(A.A,'0')=COALESCE(B.A,'0') AND COALESCE(A.B,'0')=COALESCE(B.B,'0') AND COALESCE(A.C,'0')=COALESCE(B.C,'0') GROUP BY A,B,C HAVING COUNT(*)>1);

注意:上述sql中coalesce()函数中的后一个值是自己设置的,但设置的值的类型要与前一个值的类型相同。


如果要处理相同条件下查询出的数据,可使用如下sql:

DELETE FROM TABLE WHERE ID NOT IN(SELECT ID FROM

(SELECT MIN(ID) ID,A,B,C FROM TABLE WHERE CONDITION GROUP BY A,B,C HAVING COUNT(*)>1) C) 

AND ID IN(SELECT ID FROM TABLE A WHERE EXISTS 

(SELECT A,B,C FROM TABLE B WHERE CONDITION AND COALESCE(A.A,'0')=COALESCE(B.A,'0') AND COALESCE(A.B,'0')=COALESCE(B.B,'0') AND COALESCE(A.C,'0')=COALESCE(B.C,'0') GROUP BY A,B,C HAVING COUNT(*)>1))

这里涉及到IN 与EXISTS,NOT IN与NOT EXISTS的区别,有兴趣的同学可以查一查。

虽然能实现查重及去重功能,但是在大数据量时模型会运行特别慢,和数据库也有一定关系。

猜你喜欢

转载自blog.csdn.net/u011099093/article/details/78596034