1、查找重复
测试用表
+—-+——–+——+———+———+————+
| id | firmid | name | balance | holdsum | exchangeid |
+—-+——–+——+———+———+————+
| 1 | 10001 | aa | 100.10 | 100 | 2 |
| 2 | 10001 | aa | 899.20 | 100 | 2 |
| 3 | 10002 | bb | 98.20 | 100 | 2 |
| 4 | 10002 | bb | 11.20 | 100 | 1 |
| 5 | 10001 | aa | 899.20 | 100 | 4 |
| 6 | 10003 | cc | 1101.20 | 100 | 1 |
+—-+——–+——+———+———+————+
表中有多个字段重复,先以firmid为例,查找重复的数据。
SELECT
firmid,
count(*)
FROM
`01test`
GROUP BY
firmid
HAVING
count(firmid) > 1;
结果:
+——–+———-+
| firmid | count(*) |
+——–+———-+
| 10001 | 3 |
| 10002 | 2 |
+——–+———-+
多列同时重复,group by 多个字段,如:
SELECT
firmid,exchangeid,count(firmid)
FROM
`01test`
GROUP BY
firmid,exchangeid
having count(firmid)>1
结果
+——–+————+—————+
| firmid | exchangeid | count(firmid) |
+——–+————+—————+
| 10001 | 2 | 2 |
+——–+————+—————+
由于group 特性 (group用法)查询出来的数据是所有重复数据的合并,如果需要删除,则需要指定条件,以保留一行数据。
2、去除重复
例1:删除测试表中firmid重复的行,保留ID最小的行
- 查询出需要删除重复的数据
select * from 02test where firmid in
(SELECT
firmid
FROM
`02test`
GROUP BY
firmid
having count(firmid)>1)
and id not in (SELECT
min(id)
FROM
`02test`
GROUP BY
firmid
having count(firmid)>1)
查出来的数据就是我们需要删除的数据,结果如下:
+—-+——–+——+———+———+————+
| id | firmid | name | balance | holdsum | exchangeid |
+—-+——–+——+———+———+————+
| 2 | 10001 | aa | 899.20 | 100 | 2 |
| 4 | 10002 | bb | 11.20 | 100 | 1 |
| 5 | 10001 | aa | 899.20 | 100 | 4 |
+—-+——–+——+———+———+————+
- 去除需要删除的数据
我们根据上面查询出来主键ID对表进行删除操作
delete from 02test where id in (
select id from 02test where firmid in
(SELECT
firmid
FROM
`02test`
GROUP BY
firmid
having count(firmid)>1)
and id not in (SELECT
min(id)
FROM
`02test`
GROUP BY
firmid
having count(firmid)>1)
)
然而出现报错
ERROR 1093 (HY000): You can't specify target table '02test' for update in FROM clause
这条提示表示在MySql里你不能先查询了然后再删除,这个就只能等官方升级支持这个新特性了,但我们用临时表来解决,修改成一下语句:
delete from 02test where id in (select id from (
select id from 02test where firmid in
(SELECT
firmid
FROM
`02test`
GROUP BY
firmid
having count(firmid)>1)
and id not in (SELECT
min(id)
FROM
`02test`
GROUP BY
firmid
having count(firmid)>1)
) as temp )
执行成功,再查查表。
+—-+——–+——+———+———+————+
| id | firmid | name | balance | holdsum | exchangeid |
+—-+——–+——+———+———+————+
| 1 | 10001 | aa | 100.10 | 100 | 2 |
| 3 | 10002 | bb | 98.20 | 100 | 2 |
| 6 | 10003 | cc | 1101.20 | 100 | 1 |
+—-+——–+——+———+———+————+
例2:删除测试表中firmid和exchangid 都相同的行,保留balance 最大的一条(多个重复条件)。
先查询出重复的数据
select * from 03test where (firmid,exchangeid) in (
SELECT
firmid,exchangeid
FROM
03test
GROUP BY
firmid,
exchangeid
HAVING
count(firmid) > 1)
+—-+——–+——+———+———+————+
| id | firmid | name | balance | holdsum | exchangeid |
+—-+——–+——+———+———+————+
| 1 | 10001 | aa | 100.10 | 100 | 2 |
| 2 | 10001 | aa | 899.20 | 100 | 2 |
+—-+——–+——+———+———+————+
查询需要删除的数据:
select * from 03test where (firmid,exchangeid) in (
SELECT
firmid,exchangeid
FROM
03test
GROUP BY
firmid,
exchangeid
HAVING
count(firmid) > 1)
and id not in
(
select id from 03test where (firmid,exchangeid,balance)in(
SELECT
firmid,exchangeid,max(balance)
FROM
03test
GROUP BY
firmid,
exchangeid
HAVING
count(firmid) > 1)
)
+—-+——–+——+———+———+————+
| id | firmid | name | balance | holdsum | exchangeid |
+—-+——–+——+———+———+————+
| 1 | 10001 | aa | 100.10 | 100 | 2 |
+—-+——–+——+———+———+————+
删除数据:
delete from 03test where id in (select id from (
(select id from 03test where (firmid,exchangeid) in (
SELECT
firmid,exchangeid
FROM
03test
GROUP BY
firmid,
exchangeid
HAVING
count(firmid) > 1)
and id not in
(
select id from 03test where (firmid,exchangeid,balance)in(
SELECT
firmid,exchangeid,max(balance)
FROM
03test
GROUP BY
firmid,
exchangeid
HAVING
count(firmid) > 1)
))) as temp)