Duplicate data in SQL query table

SQL query for duplicate data

Example table: emp

   emp_no         name    age     
    001           Tom      17     
    002           Sun      14     
    003           Tom      15     
    004           Tom      16

Require:

List all records of people with duplicate names

(1) The most intuitive idea: To know that all names have duplicate information, you must first know which name is duplicated:

select name from emp group by name having count(*) > 1;

The records of all names repeating are:

select * from emp where name in
(select name from emp group by name having count(*) > 1)

(2) If you are a little bit smarter, you will think that if you compare each name with the original table, if more than 2 people's names are the same as this record, they are qualified.

select * from emp where (select count(*) from emp e where e.name = emp.name) > 1;

-- pay attention to this >1, think if it is =1, if it is =2 if it is >2 If e is another table and it is =0 then the result is more fun :)

This process is to first obtain the name of 001 (emp.name) when judging the person whose job number is 001, and then compare e.name with the name of the original table

Note that e is an alias for emp.

After thinking a little more, you will think that if there is another artificial number with the same name that is not the same as her, then this record meets the requirements:

select * from emp where exists
(select * from emp e where e.name = emp.name and e.emp_no <> emp.emp_no);

The join method of this idea is:

select emp.* from emp,emp e where emp.name = e.name and emp.emp_no <> e.emp_no;
/* The more standard way to write this statement is to join     
select emp.* from emp inner join emp e on emp.name = e.name and emp.emp_no <> e.emp_no;
But I prefer the former way of writing, the key is to be clearer*/   


  
b. Example table: emp     
name     age     
Tom       16     
Sun       14     
Tom       16     
Tom       16

-------------------------------------------------- --Clear Duplicates ------------------------------------------------------- ------
Filter out all redundant duplicate records
(1) We know that distinct and group by can filter duplicates, so there is the most intuitive

select distinct * from emp 或 select name,age from emp group by name,age;

To get the required data, if you can use a temporary table, there is a solution:

select distinct * into #tmp from emp;
    delete from emp;
    insert into emp select * from #tmp;

(2) But what if you can't use temporary tables?
We observed that we can't distinguish the data (the physical location is different, there is no difference for SQL Server), the idea is to find a way to distinguish the data, since all the current columns can't distinguish the data, the only way Just add another column to distinguish it. What column should be added? The best option is the identity column:

alter table emp add chk int identity(1,1);

Table example:

    name   age   chk     
    Tom     16     1     
    Sun     14     2     
    Tom     16     3     
    Tom     16     4

Duplicate records can be represented as:

select * from emp where (select count(*) from emp e where e.name = emp.name)>1;

What to remove is:

delete from emp
where (select count(*) from emp e where e.name = emp.name and e.chk >= emp.chk)>1;

Then delete the added column, and the result appears.

alter table emp drop column chk;


(3) Another idea:
view

select min(chk) from emp group by name having count(*) > 1;

Get the minimum value of chk for duplicate records, so you can

delete from emp where chk not in (select min(chk) from emp group by name);

It can also be written in the form of join:

(1) There is an example table: emp

    emp_no         name    age     
    001            Tom      17     
    002            Sun      14     
    003            Tom      15     
    004            Tom      16

◆Request to generate serial number
(1) The simplest method, according to the solution to the b problem:

alter table emp add chk int identity(1,1);   或   
select *, identity(int,1,1) chk into #tmp from emp;

◆What if you need to control the sequence?

select top 100000 *, identity(int,1,1) chk into #tmp from emp order by age;

(2) What if the table structure cannot be changed?
If each record cannot be uniquely distinguished, there is no way. When each record can be uniquely distinguished, the idea of count in a can be used to solve this problem

select emp.*, (select count(*) from emp e where e.emp_no <= emp.emp_no)   
    from emp
    order by  (select count(*) from emp e where e.emp_no <= emp.emp_no);

Reprinted from http://www.cnblogs.com/yellowapplemylove/archive/2011/04/19/2021519.html
Duplicate data in SQL query table

Guess you like