ORACLE delete duplicate data

 
 
 
 
Query SQL statements and delete duplicate records
 
1, redundant lookup table duplicate records, duplicate records is based on a single field (Id) to determine
 
select Id from table group byId having count (Id)> 1 - (look-up table in that field duplicates)
 
select * from Table where Id in (select Id from table group byId having count (Id)> 1) - (check out according to data duplicate fields, queries all the records related)
 
2, to delete redundant duplicate records in the table, based on a single field is recorded repeatedly (Id) to determine, leaving only the smallest recording rowid
 
DELETE from 表 WHERE (id) IN ( SELECT id FROM 表 GROUP BY id HAVING COUNT(id) > 1) AND ROWID NOT IN (SELECT MIN(ROWID) FROM 表 GROUP BY id HAVING COUNT(*) > 1);
 
The field determination duplicate data, redundant data is removed, leaving only the ROWID (number of rows) of recording a minimum
 
 
 
3, extra lookup table is repeatedly recorded (more than one field)
 
select * from 表 a where (a.Id,a.seq) in(select Id,seq from 表 group by Id,seq having count(*) > 1)
 
4, the table is deleted redundant duplicates (multiple fields), leaving only the smallest recording rowid
 
delete from 表 a where (a.Id,a.seq) in (select Id,seq from 表 group by Id,seq having count(*) > 1) and rowid not in (select min(rowid) from 表 group by Id,seq having count(*)>1)
 
5, the lookup table recorded repeatedly extra (multiple fields), does not contain the minimum recording rowid
 
select * from 表 a where (a.Id,a.seq) in (select Id,seq from 表 group by Id,seq having count(*) > 1) and rowid not in (select min(rowid) from 表 group by Id,seq having count(*)>1)
 
 

A: repeating the data field is determined according to the single

1, first of all, look-up table redundant data by key fields (name) to query.

select * from OA_ADDRESS_BOOK where name in (select name from OA_ADDRESS_BOOK group by name having count(name)>1)

 

2, remove duplicate data in the table, a single field duplicate data (Name) is determined in accordance with, leaving only the smallest recording rowid

delete from OA_ADDRESS_BOOK where (Name) in 

(select Name from OA_ADDRESS_BOOK group by Name having count(Name) >1) 

and rowid not in (select min(rowid) from OA_ADDRESS_BOOK group by Name having count(Name)>1)

 

II: The determination is repeated a plurality of data fields

1, first, the duplicate data look-up table, to query the key field (Name, UNIT_ID).

select * from OA_ADDRESS_BOOK book1 where (book1.name,book1.unit_id) in 
(select book2.name,book2.unit_id from OA_ADDRESS_BOOK book2 group by  book2.name,book2.unit_id  having count(*)>1)

 

2, remove duplicate data table, the data is repeated a plurality of fields (Name, UNIT_ID) to determine, leaving only the smallest recording rowid

 

 

delete from OA_ADDRESS_BOOK a where (a.Name,a.UNIT_ID) in 
(select Name,UNIT_ID from OA_ADDRESS_BOOK group by Name,UNIT_ID having count(*) > 1) 
and rowid not in (select min(rowid) from OA_ADDRESS_BOOK group by Name,UNIT_ID having count(*)>1)

3, duplicate data lookup table, the data is repeated a plurality of fields (Name, UNIT_ID) is determined, the recording does not include the smallest rowid
 
select name,unit_id from OA_ADDRESS_BOOK a where (a.Name,a.UNIT_ID) in 
(select Name,UNIT_ID from OA_ADDRESS_BOOK group by Name,UNIT_ID having count(*) > 1) 
and rowid not in (select min(rowid) from OA_ADDRESS_BOOK group by Name,UNIT_ID having count(*)>1)
 
 

1. Problem Description

BBSCOMMENT table BBSDETAIL from the table, the evaluation record business information. Because the data and shift to shift the go, there are a lot of duplicate data. Table structure is as follows:

COMMENT_ID NOT NULL NUMBER - primary key
DETAIL_ID NOT NULL NUMBER - foreign key references BBSDETAIL table
COMMENT_BODY NOT NULL VARCHAR2 (500) - Content Evaluation

- Ignore the other fields

In which the primary key is not repeated, repetition is DETAIL_ID + COMMENT_BODY + ...... and other information, evaluation information is duplicated some businesses.

2. The resolution steps

2.1 lookup table unnecessary duplicate records

Copy the code
Copy the code
- Query all duplicate data 
SELECT DETAIL_ID, Comment_Body, COUNT (*) 
from BBSCOMMENT 
Group by DETAIL_ID, Comment_Body 
HAVING COUNT (*)>. 1 
Order by DETAIL_ID, Comment_Body; --1 955 bar
Copy the code
Copy the code

2.2 shows all non-redundant data

- This command shows all non-redundant data 
SELECT min (COMMENT_ID) AS COMMENT_ID, DETAIL_ID, Comment_Body 
from BBSCOMMENT 
Group by DETAIL_ID, Comment_Body; --21,453 bar, why this value is not equal to the total number of records in Table -1955 because the 1955 record, some repeated more than once.

2.3 If a small number of records (thousand level), the above statement may be made sub-query then delete

Copy the code
Copy the code
- If the table is not large amount of data (less than 1 one thousand), the above statement may be made sub-query and then delete 
the Delete from BBSCOMMENT the WHERE COMMENT_ID not in ( 
    the SELECT min (COMMENT_ID) 
    from BBSCOMMENT 
    Group by DETAIL_ID, Comment_Body 
); --782 seconds, in my place, 20,000 records, duplicate records more than 2,000 (too slow !!)
Copy the code
Copy the code

Another 2.4 Delete method

Copy the code
Copy the code
- This statement can achieve the above functions, but not tested, I data has been deleted 
- to delete a condition: there are duplicate records data; second condition: keep a record of the smallest rowid. 
A BBSCOMMENT from Delete 
WHERE 
    (a.DETAIL_ID, a.COMMENT_BODY) in (SELECT DETAIL_ID, Comment_Body from Group BBSCOMMENT by DETAIL_ID, Comment_Body HAVING COUNT (*)>. 1) 
    and in ROWID Not (SELECT min (ROWID) from Group BBSCOMMENT by DETAIL_ID , COMMENT_BODY having count (*)> 1);
Copy the code
Copy the code

2.5 large amount of data or use PL / SQL convenient

Copy the code
Copy the code
DECLARE 
- definition storage structure 
type bbscomment_type IS Record 
( 
    the comment_id type BBSCOMMENT.COMMENT_ID%, 
    detail_id type BBSCOMMENT.DETAIL_ID%, 
    Comment_Body type BBSCOMMENT.COMMENT_BODY% 
); 
bbscomment_record bbscomment_type; 

- for variables comparable 
v_comment_id BBSCOMMENT.COMMENT_ID% type; 
% BBSCOMMENT.DETAIL_ID type v_detail_id; 
v_comment_body BBSCOMMENT.COMMENT_BODY% type; 

- other variables 
v_batch_size Integer: = 5000; 
v_counter Integer: = 0; 

Cursor cur_dupl iS 
    - remove all duplicate records 
    SELECT COMMENT_ID, DETAIL_ID, Comment_Body 
    from BBSCOMMENT 
    WHERE (DETAIL_ID, Comment_Body) in ( 
        - duplicate records
        DETAIL_ID SELECT, Comment_Body 
        from BBSCOMMENT 
        Group by DETAIL_ID, Comment_Body 
        HAVING COUNT (*)>. 1) 
    Order by DETAIL_ID, Comment_Body; 
the begin 
    for bbscomment_record in cur_dupl Loop 
        ! v_detail_id IF or IS null (or v_detail_id bbscomment_record.detail_id = NVL (bbscomment_record.comment_body, ''!) = NVL (v_comment_body, '')) the then 
            - for the first time to enter, for the record, are reassigned 
            v_detail_id: = bbscomment_record.detail_id; 
            v_comment_body: = bbscomment_record.comment_body; 
        the else 
            - another record delete 
            IF MOD (v_counter, v_batch_size) = 0 the then 
            delete from BBSCOMMENT where COMMENT_ID = bbscomment_record.comment_id;
            v_counter: + = v_counter. 1; 

                - submitted once every how many 
                the commit; 
            End IF; 
        End IF; 
    End Loop; 

    IF v_counter> 0 the then 
        - the last commit 
        the commit; 
    End IF; 

    DBMS_OUTPUT.PUT_LINE (TO_CHAR (v_counter) || 'records are deleted!'); 
Exception 
    When the then Others 
        DBMS_OUTPUT.PUT_LINE ( 'SQLERRM ->' || SQLERRM); 
        ROLLBACK; 
End;

Guess you like

Origin www.cnblogs.com/JIKes/p/11583687.html