mysql去重那些事儿

需求

数据库中存在一批数据,需要业务人员处理,然而,由于数据库中的数据存在大量的重复,因此,需要去重,仅保留一条数据即可,以减轻业务人员的工作量。

首先,我去网上搜索了一些方法,如下:

利用SQL,删除掉重复多余的数据,并且只保留一条数据。
1、查找表中多余的重复记录,重复记录是根据单个字段(teamId)来判断

select * from team where teamId in (select teamId from team group by teamId having count(teamId) > 1)

2、删除表中多余的重复记录,重复记录是根据单个字段(teamId)来判断,只留有rowid最小的记录
delete from team where

teamName in(select teamName from team group by teamName having count(teamName) > 1)

and teamId not in (select min(teamId) from team group by teamName having count(teamName)>1)

1、查找表中多余的重复记录(多个字段)
select * from team t
where (t.teamId,t.teamOrg) in (select teamId,teamOrg from team group by teamId,teamOrg having count(*) > 1)

2、删除表中多余的重复记录(多个字段),只留有rowid最小的记录
delete from team t
where (t.teamId,t.teamOrg) in (select teamId,teamOrg from team group by teamId,teamOrg having count(*) > 1)
and rowid not in (select min(rowid) from team group by teamId,teamOrg having count(*)>1)

3、查找表中多余的重复记录(多个字段),不包含rowid最小的记录
select * from team t
where (t.teamId,t.teamOrg) in (select teamId,teamOrg from team group by teamId ,t.teamOrg having count(*) > 1)

and rowid not in (select min(rowid) from team group by teamId,teamOrg having count(*)>1)

1.消除一个字段的左边的第一位:

update tableName set [Title]=Right([Title],(len([Title])-1)) where Title like ‘村%’

2.消除一个字段的右边的第一位:

update tableName set [Title]=left([Title],(len([Title])-1)) where Title like ‘%村’

1.假删除表中多余的重复记录(多个字段),不包含rowid最小的记录
update team set ispass=-1
where teamId in (select teamId from team group by teamId

1.oracle中利用rowId去删除多余的数据:

(1).在oracle中,每一条记录都有一个rowid,rowid在整个数据库中是唯一的,rowid确定了每条记录是在Oracle中的哪一个数据文件、块、行上。

(2).在重复的记录中,可能所有列的内容都相同,但rowid不会相同,所以只要确定出重复记录中那些具有最大rowid的就可以了,其余全部删除。

delete from team where teamName in

(select* from team t where rowid=(select max(rowid) from team where teamName=t.teamName))

然而,在进行上述操作过程中,发现一个很大的问题,上述sql中用到了in,而且,delete from team t where (t.teamId,t.teamOrg) in (select teamId,teamOrg from team group by teamId,teamOrg having count(*) > 1) and rowid not in (select min(rowid) from team group by teamId,teamOrg having count(*)>1) 这句,可运行性太低,特别是数据量较大,且你的‘teamId’和‘teamOrg’并非主键亦非索引时

第二种去重方法,借助Java代码实现:思路==》摘出重复数据的id,最终用delete from team where id in (各种重复数据的id)

测试代码如下:

mapper层

    /**
     * 功能描述: 查找重复数据id
     *
     * @param taskNo 任务号
     * @param link   链接
     * @return list
     * @date 2018/8/7 16:02
     */
    @Select("SELECT id FROM team " +
            "WHERE task_no=@{taskNo} AND link = @{link} ")
    List<Integer> selectDuplicateIds(@Param("taskNo") String taskNo, @Param("link") String link);
    /**
     * 功能描述: 查询重复数据
     * 这里加limit是考虑到数据库性能和用户体验
     * @return list
     * @date 2018/8/7 16:42
     */
    @Select("SELECT task_no,`link` FROM `team` " +
            "GROUP BY `link`,task_no HAVING count(*) > 1 "
            + "limit 1000"
    )
    List<Team> selectDuplicate();

service层此处省去

 /**
     * 功能描述: 数据库去重
     * @return 删除重复数据的数量
     * @date 2018/8/7 16:47
     */
    @Override
    public Integer removeDuplicate() {
        int result = 0;
        List<Team> list = teamMapper.selectDuplicate();
        for (Team info : list) {
            List<Integer> ids = teamMapper.selectDuplicateIds(info.getTaskNo(), info.getLink());
            for (int i=0;i<ids.size()-1;i++){
                teamMapper.deleteByPrimaryKey(ids.get(i));
                result++;
            }
        }
        return result;
    }

controller层省略

前端,去重按钮,发现数据有重复现象,点击去重,返回去重成功的数量,直到返回为0,说明没有重复数据

上述是接口功能实现,下面是拾取重复id的test


@RunWith(SpringRunner.class)
@SpringBootTest
public class AppTests {

    @Resource
    private TeamMapper teamMapper;

    @Test
    public void contextLoads() {
        System.out.println("hello world");
    }


    /**
     * 拾取重复数据的id或者删除重复数据
     */
    @Test
    public void remDuplicate() {
        List<Team> list = teamMapper.selectDuplicate();
        List<Integer> all = new ArrayList<>();
        for (Team info : list) {
            List<Integer> ids = teamMapper.selectDuplicateIds(info.getTaskNo(), info.getLink());
            for (int i=0;i<ids.size()-1;i++){
//                teamMapper.deleteByPrimaryKey(ids.get(i));
                all.add(ids.get(i));
            }
        }
        System.out.println(all.toString());
    }

}

猜你喜欢

转载自blog.csdn.net/hacker_Lees/article/details/81502046