If you want to remove large amounts of data tables, this generally refers to a large number of deleted records more than 10%, then how to delete, efficiency will be relatively high? The deletion will not affect how the system is relatively small it?
Below do a test, then the results of this experiment are analyzed, and then draw conclusions.
1. Create a database
-
use
master
-
go
-
-
if
exists(
select *
from sys.databases
where
name =
'test')
-
drop
database
test
-
go
-
-
create
database
test
-
go
2, create a table
-
use
test
-
go
-
-
-
if
exists(
select *
from sys.tables
where
name =
't')
-
drop
table t
-
go
-
-
create
table t(i
int,v
varchar(
100)
default
replicate(
'a',
100)
-
,vv
varchar(
100)
default
replicate(
'a',
100),
-
vvv
varchar(
100)
default
replicate(
'a',
100));
3, insert data
Add 100,000 records with the following code, 9 seconds consumption:
-
declare @i
int;
-
-
set @i =
1
-
-
begin tran
-
-
while @i <=
100000
-
begin
-
insert
into t(i)
values(@i)
-
-
set @i = @i +
1
-
end
-
-
commit tran
If using the following code, add the records 100,000, consumed 43 seconds:
-
declare @i
int;
-
-
set @i =
1
-
-
while @i <=
100000
-
begin
-
begin tran
-
insert
into t(i)
values(@i)
--没执行一次就提交一次,效率较差
-
commit tran
-
-
set @i = @i +
1
-
end
Double insertion data consumes 1 minutes 38 seconds
-
insert
into t
-
select *
-
from t
-
go
6
Finally, a total of 6.4 million inserted data.
4, indexing
create index idx_t_idx1 on t(i)
5, the following settings, in order to prevent SQL Server uses too much memory, which led to crash
-
sp_configure '
show
advanced
option
',1
-
go
-
reconfigure
-
go
-
-
sp_configure '
max
server
memory (MB)
',3584
-
go
-
reconfigure
-
go
6, the data created above the table t, t1 and t2 copy into two tables, index the table t1
-
if exists(
select *
from sys.tables
where
name =
't1')
-
drop
table t1
-
go
-
-
select *
into t1
-
from t
-
-
create
index idx_t1_idx1
on t1(i)
-
go
-
-
-
if
exists(
select *
from sys.tables
where
name =
't2')
-
drop
table t2
-
go
-
-
select *
into t2
-
from t
7, t1 table delete operation to delete the number 1000, each with a number 64, so that each deletion 64,000. 1000 total deletion, so delete 640,000 records, total time 82 seconds
-
dbcc dropcleanbuffers
-
go
-
-
declare @i
int =
20000;
-
declare @start_time datetime;
-- = getdate();
-
-
while @i <30000
-
begin
-
-
set @start_time =
GETDATE();
-
-
delete
from t1
where I>=@i
and i<=@i +
999
-
-
set @i +=
1000
-
select
DATEDIFF(
second,@start_time,
getdate())
-
-
end
8, deleting the data table t2, it takes 44 seconds
-
delete
from t2
-
where I>=
20000
and i<
30000
Through the above tests found that:
1, when a large number of insert operations, the operation after the completion of submission, submitted immediately after each insertion ratio, higher efficiency.
2, when you delete large amounts of data, even if the use of the index, even while the use of the index and batch operation, efficiency is not no such index, delete directly to the high table scan.
But the problem is that table scans will lock the entire table, block other transactions, leading to paralysis of large-scale systems business.
So, while fast through a direct method will delete speed, but if the index and by batch, you will need to remove the lock of a batch of data, and other data will not be locked, blocking problems caused by much smaller .
3, so the combination of the above two points, when the high-volume operation, if the final submission, the higher the efficiency of the entire operation, but can cause problems blocking, because it is not submitted in time will lead to other transactions are blocked.
Similarly, it may be deleted directly through higher efficiency, but will lock table, can cause serious blocking problems, and through the index and batch process, although the efficiency is not too high, but you can batch process, the equivalent of the batch submitted, and each lot by the index, only lock the records need to be addressed, while other records are not locked, then it is unlikely to cause problems blocking.
So, a huge delete operation, if by a full table scan, carried out in the evening for system maintenance more free time; but if you have to perform during the day, consider batching through the index and to reduce the problem of congestion, but the system still will have some impact, particularly in terms of memory.