One, server information
1. Memory
[oracle@xmldb ~]$ free -g
total used free shared buffers cached
Mem: 125 92 33 0 0 59
-/+ buffers/cache: 32 93
Swap: 80 0 80
2.CPU
[oracle@xmldb ~]$ cat /proc/cpuinfo| grep "cpu cores"| uniq
cpu cores : 8
[oracle@yundingora ~]$ cat /proc/cpuinfo| grep "processor"| wc -l
32
3.IO
Server IO
[oracle@xmldb ~]$ dd if=/home/oracle/linuxx64_12201_database.zip of=/home/oracle/linuxx64_12201_database.zip.dd
6745501+1 records in
6745501+1 records out
3453696911 bytes (3.5 GB) copied, 25.2508 s, 137 MB/s
Database IO
数据存在另外一个磁阵上,后续再补
Two, database information
1. Database memory information
SQL> show parameter ga;
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
allow_group_access_to_sga boolean FALSE
lock_sga boolean FALSE
pga_aggregate_target big integer 0
sga_max_size big integer 1088M
sga_target big integer 0
unified_audit_sga_queue_size integer 1048576
Three, table information
1. Data sheet information
select a.table_name,a.partitioned,a.degree,b.num_cols,a.num_rows,round(a.blocks*8/1024,2) as size_m,a.logging,a.last_analyzed
from all_tables a,(select table_name, count(*) as num_cols from user_tab_columns group by table_name) b
where a.table_name='TB_DELETE_TEST'
and a.table_name=b.table_name;
TABLE_NAME PAR DEGREE NUM_COLS NUM_ROWS SIZE_M LOG LAST_ANAL
-------------- ----- ---------- ---------- ----------- -------- ----- -----
TB_DELETE_TEST NO 1 11 107946703 11011.56 YES 21-APR-20
Elapsed: 00:00:00.79
--该表无索引。
2.1 When a small number of repeated records
FI_QRY@orcl>select count(*) as distinct_2_cols_cnts from (select /*+parallel(30)*/ distinct acc,med_no,med_op_date from TB_DELETE_TEST);
DISTINCT_2_COLS_CNTS
--------------------
107946694
Elapsed: 00:00:29.56
select 107946703-107946694 as repeat_cnts from dual;
REPEAT_CNTS
-----------
9
Elapsed: 00:00:00.00
2.2 When a large number of duplicate records
FI_QRY@orcl>select count(*) as distinct_2_cols_cnts from (select /*+parallel(30)*/ distinct ACC,PAPER_NO from TB_DELETE_TEST);
DISTINCT_2_COLS_CNTS
--------------------
94681760
Elapsed: 00:00:30.88
--重复记录条数
FI_QRY@orcl>select 107946703-94681760 as repeat_cnts from dual;
REPEAT_CNTS
-----------
13264943
Elapsed: 00:00:00.00
Four, efficient deduplication
3.1 When there is a small amount of repeated records, it can be completed through DML statements
(Pay attention to hints everywhere)
FI_QRY@orcl>delete /*+RULE parallel(8)*/ from TB_DELETE_TEST a
where exists (select /*+parallel(8)*/
from ( select /*+parallel(30)*/ rowid rid,row_number() over (partition by acc,med_no,med_op_date order by rowid) rn from TB_DELETE_TEST) b
where b.rn <> 1 and a.rowid=b.rid);
9 rows deleted.
Elapsed: 00:03:28.89
3.2 When a large number of repeated records, it is recommended to complete the DDL statement
(Pay attention to hints everywhere)
FI_QRY@orcl>create /*+parallel(30)*/ table TB_DELETE_TEST_NEW as select /*+parallel(30)*/ DISTINCT * from TB_DELETE_TEST;
Table created.
Elapsed: 00:01:26.46
FI_QRY@orcl>rename TB_DELETE_TEST to TB_DELETE_TEST_OLD;
Table renamed.
Elapsed: 00:00:00.99
FI_QRY@orcl>rename TB_DELETE_TEST_NEW to TB_DELETE_TEST;
Table renamed.
Elapsed: 00:00:00.02
FI_QRY@orcl>drop table TB_DELETE_TEST_OLD purge;
Table dropped.
Elapsed: 00:00:00.56
FI_QRY@orcl>
If you use fast delete, it takes a long time
FI_QRY@orcl>delete /*+RULE parallel(8)*/ from TB_DELETE_TEST1 a
where exists (select /*+parallel(8)*/
from ( select /*+parallel(30)*/ rowid rid,row_number() over (partition by acc,paper_no order by rowid) rn from TB_DELETE_TEST1) b
where b.rn <> 1 and a.rowid=b.rid);
13264943 rows deleted.
Elapsed: 03:01:20.54
Others: For example, set partitions through the table, and delete them sequentially by partition through the program.
In summary, if the amount of duplicate data is small, you can use the above delete method to quickly delete and keep one. If the amount of duplicate data is large, it is recommended to use DDL .
In addition, if the amount of data in the table is below tens of millions and the repeated data is in the millions, the deduplication can still be completed within a few minutes through the above delete . For details, see:
https://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:15258974323143