How to delete duplicate data in batches through SQL


Preface

Let me briefly record some work issues.
Today I saw a lot of duplicate data in the database. This article will introduce how to delete duplicate data in batches through SQL statements.


1. What are GROUP BY and HAVING?

Let’s first understand GROUP BY, HAVING.
GROUP BY is an aggregate function in the SQL language. The literal meaning in English is "grouping (Group) according to (by) certain rules"; its function is to divide a data set into several small areas through certain rules, and then target several small areas. data processing. In SQL, GROUP BY is an optional clause of the SELECT statement that is used to group query results based on one or more fields. It reduces the number of rows in the result set.
HAVING is usually used in SQL statements and used together with the GROUP BY clause to filter grouped results. If the WHERE keyword is used to filter groups before aggregation, then the HAVING keyword is used to filter groups after aggregation.

In short, ROUP BY is used to group query results according to one or more fields, and HAVING is used to filter the grouped data again.

2. Write SQL

1. Query duplicate data

The code is as follows (example):

SELECT MIN( a.task_id ) id 
	FROM
		PLMS_T_D_TaskExecuteLog a 
	GROUP BY
		a.car_desc,
		a.direction,
		a.distance,
		a.log_begin_time,
		a.log_desc,
		a.log_status,
		a.mai_subName,
		a.person_str,
		a.weather 
HAVING
	COUNT ( * ) > 1

2. Delete SQL

The code is as follows (example):

DELETE 
FROM
	TaskExecuteLog 
WHERE
	task_id IN (
	SELECT MIN( a.task_id ) id 
	FROM
		PLMS_T_D_TaskExecuteLog a 
	GROUP BY
		a.car_desc,
		a.direction,
		a.distance,
		a.log_begin_time,
		a.log_desc,
		a.log_status,
		a.mai_subName,
		a.person_str,
		a.weather 
HAVING
	COUNT ( * ) > 1)

Summarize

The above is my statement for batch deleting data through SQL. I welcome colleagues to share other methods.

Guess you like

Origin blog.csdn.net/hsuehgw/article/details/133089912
Recommended