SQL query for duplicate data Example table: emp emp_no name age 001 Tom 17 002 Sun 14 003 Tom 15 004 Tom 16 Require: List all records of people with duplicate names (1) The most intuitive idea: To know that all names have duplicate information, you must first know which name is duplicated: select name from emp group by name having count(*) > 1; The records of all names repeating are: select * from emp where name in (select name from emp group by name having count(*) > 1) (2) If you are a little bit smarter, you will think that if you compare each name with the original table, if more than 2 people's names are the same as this record, they are qualified. select * from emp where (select count(*) from emp e where e.name = emp.name) > 1; -- pay attention to this >1, think if it is =1, if it is =2 if it is >2 If e is another table and it is =0 then the result is more fun :) This process is to first obtain the name of 001 (emp.name) when judging the person whose job number is 001, and then compare e.name with the name of the original table Note that e is an alias for emp. After thinking a little more, you will think that if there is another artificial number with the same name that is not the same as her, then this record meets the requirements: select * from emp where exists (select * from emp e where e.name = emp.name and e.emp_no <> emp.emp_no); The join method of this idea is: select emp.* from emp,emp e where emp.name = e.name and emp.emp_no <> e.emp_no; /* The more standard way to write this statement is to join select emp.* from emp inner join emp e on emp.name = e.name and emp.emp_no <> e.emp_no; But I prefer the former way of writing, the key is to be clearer*/ b. Example table: emp name age Tom 16 Sun 14 Tom 16 Tom 16 -------------------------------------------------- --Clear Duplicates ------------------------------------------------------- ------ Filter out all redundant duplicate records (1) We know that distinct and group by can filter duplicates, so there is the most intuitive select distinct * from emp 或 select name,age from emp group by name,age; To get the required data, if you can use a temporary table, there is a solution: select distinct * into #tmp from emp; delete from emp; insert into emp select * from #tmp; (2) But what if you can't use temporary tables? We observed that we can't distinguish the data (the physical location is different, there is no difference for SQL Server), the idea is to find a way to distinguish the data, since all the current columns can't distinguish the data, the only way Just add another column to distinguish it. What column should be added? The best option is the identity column: alter table emp add chk int identity(1,1); Table example: name age chk Tom 16 1 Sun 14 2 Tom 16 3 Tom 16 4 Duplicate records can be represented as: select * from emp where (select count(*) from emp e where e.name = emp.name)>1; What to remove is: delete from emp where (select count(*) from emp e where e.name = emp.name and e.chk >= emp.chk)>1; Then delete the added column, and the result appears. alter table emp drop column chk; (3) Another idea: view select min(chk) from emp group by name having count(*) > 1; Get the minimum value of chk for duplicate records, so you can delete from emp where chk not in (select min(chk) from emp group by name); It can also be written in the form of join: (1) There is an example table: emp emp_no name age 001 Tom 17 002 Sun 14 003 Tom 15 004 Tom 16 ◆Request to generate serial number (1) The simplest method, according to the solution to the b problem: alter table emp add chk int identity(1,1); 或 select *, identity(int,1,1) chk into #tmp from emp; ◆What if you need to control the sequence? select top 100000 *, identity(int,1,1) chk into #tmp from emp order by age; (2) What if the table structure cannot be changed? If each record cannot be uniquely distinguished, there is no way. When each record can be uniquely distinguished, the idea of count in a can be used to solve this problem select emp.*, (select count(*) from emp e where e.emp_no <= emp.emp_no) from emp order by (select count(*) from emp e where e.emp_no <= emp.emp_no); Reprinted from http://www.cnblogs.com/yellowapplemylove/archive/2011/04/19/2021519.html
Duplicate data in SQL query table
Guess you like
Origin http://10.200.1.11:23101/article/api/json?id=326642652&siteId=291194637
Recommended
Ranking