mysql 查找重复及去除重复

1、查找重复

测试用表
+—-+——–+——+———+———+————+
| id | firmid | name | balance | holdsum | exchangeid |
+—-+——–+——+———+———+————+
| 1 | 10001 | aa | 100.10 | 100 | 2 |
| 2 | 10001 | aa | 899.20 | 100 | 2 |
| 3 | 10002 | bb | 98.20 | 100 | 2 |
| 4 | 10002 | bb | 11.20 | 100 | 1 |
| 5 | 10001 | aa | 899.20 | 100 | 4 |
| 6 | 10003 | cc | 1101.20 | 100 | 1 |
+—-+——–+——+———+———+————+
表中有多个字段重复,先以firmid为例,查找重复的数据。

SELECT
    firmid,
    count(*)
FROM
    `01test`
GROUP BY
    firmid
HAVING
    count(firmid) > 1;

结果:
+——–+———-+
| firmid | count(*) |
+——–+———-+
| 10001 | 3 |
| 10002 | 2 |
+——–+———-+
多列同时重复,group by 多个字段,如:

SELECT
firmid,exchangeid,count(firmid)
FROM
    `01test`
GROUP BY
    firmid,exchangeid
having count(firmid)>1

结果
+——–+————+—————+
| firmid | exchangeid | count(firmid) |
+——–+————+—————+
| 10001 | 2 | 2 |
+——–+————+—————+

由于group 特性 (group用法)查询出来的数据是所有重复数据的合并,如果需要删除,则需要指定条件,以保留一行数据。

2、去除重复

例1:删除测试表中firmid重复的行,保留ID最小的行

  • 查询出需要删除重复的数据
 select * from 02test where firmid in
(SELECT
firmid
FROM
    `02test`
GROUP BY
    firmid
having count(firmid)>1)
and id not in (SELECT
min(id)
FROM
    `02test`
GROUP BY
    firmid
having count(firmid)>1)

查出来的数据就是我们需要删除的数据,结果如下:
+—-+——–+——+———+———+————+
| id | firmid | name | balance | holdsum | exchangeid |
+—-+——–+——+———+———+————+
| 2 | 10001 | aa | 899.20 | 100 | 2 |
| 4 | 10002 | bb | 11.20 | 100 | 1 |
| 5 | 10001 | aa | 899.20 | 100 | 4 |
+—-+——–+——+———+———+————+

  • 去除需要删除的数据
    我们根据上面查询出来主键ID对表进行删除操作
delete from 02test where id in (
select id from 02test where firmid in
(SELECT
firmid
FROM
    `02test`
GROUP BY
    firmid
having count(firmid)>1)
and id not in (SELECT
min(id)
FROM
    `02test`
GROUP BY
    firmid
having count(firmid)>1)
)

然而出现报错

ERROR 1093 (HY000): You can't specify target table '02test' for update in FROM clause

这条提示表示在MySql里你不能先查询了然后再删除,这个就只能等官方升级支持这个新特性了,但我们用临时表来解决,修改成一下语句:

delete from 02test where id in (select id from (
select id from 02test where firmid in
(SELECT
firmid
FROM
    `02test`
GROUP BY
    firmid
having count(firmid)>1)
and id not in (SELECT
min(id)
FROM
    `02test`
GROUP BY
    firmid
having count(firmid)>1)
) as temp )

执行成功,再查查表。
+—-+——–+——+———+———+————+
| id | firmid | name | balance | holdsum | exchangeid |
+—-+——–+——+———+———+————+
| 1 | 10001 | aa | 100.10 | 100 | 2 |
| 3 | 10002 | bb | 98.20 | 100 | 2 |
| 6 | 10003 | cc | 1101.20 | 100 | 1 |
+—-+——–+——+———+———+————+

例2:删除测试表中firmid和exchangid 都相同的行,保留balance 最大的一条(多个重复条件)。
先查询出重复的数据

select * from 03test where (firmid,exchangeid) in (
SELECT
    firmid,exchangeid
FROM
    03test
GROUP BY
    firmid,
    exchangeid
HAVING
    count(firmid) > 1)

+—-+——–+——+———+———+————+
| id | firmid | name | balance | holdsum | exchangeid |
+—-+——–+——+———+———+————+
| 1 | 10001 | aa | 100.10 | 100 | 2 |
| 2 | 10001 | aa | 899.20 | 100 | 2 |
+—-+——–+——+———+———+————+

查询需要删除的数据:

select * from 03test where (firmid,exchangeid) in (
SELECT
    firmid,exchangeid
FROM
    03test
GROUP BY
    firmid,
    exchangeid
HAVING
    count(firmid) > 1)
and id not in 

(
select id from 03test where (firmid,exchangeid,balance)in(
SELECT
    firmid,exchangeid,max(balance)
FROM
    03test
GROUP BY
    firmid,
    exchangeid
HAVING
    count(firmid) > 1)
)

+—-+——–+——+———+———+————+
| id | firmid | name | balance | holdsum | exchangeid |
+—-+——–+——+———+———+————+
| 1 | 10001 | aa | 100.10 | 100 | 2 |
+—-+——–+——+———+———+————+

删除数据:

delete from 03test where id in (select id from (
(select id from 03test where (firmid,exchangeid) in (
SELECT
    firmid,exchangeid
FROM
    03test
GROUP BY
    firmid,
    exchangeid
HAVING
    count(firmid) > 1)
and id not in 
(
select id from 03test where (firmid,exchangeid,balance)in(
SELECT
    firmid,exchangeid,max(balance)
FROM
    03test
GROUP BY
    firmid,
    exchangeid
HAVING
    count(firmid) > 1)
))) as temp)

猜你喜欢

转载自blog.csdn.net/weixin_40283570/article/details/80402029