Article directory
Preface
Let me briefly record some work issues.
Today I saw a lot of duplicate data in the database. This article will introduce how to delete duplicate data in batches through SQL statements.
1. What are GROUP BY and HAVING?
Let’s first understand GROUP BY, HAVING.
GROUP BY is an aggregate function in the SQL language. The literal meaning in English is "grouping (Group) according to (by) certain rules"; its function is to divide a data set into several small areas through certain rules, and then target several small areas. data processing. In SQL, GROUP BY is an optional clause of the SELECT statement that is used to group query results based on one or more fields. It reduces the number of rows in the result set.
HAVING is usually used in SQL statements and used together with the GROUP BY clause to filter grouped results. If the WHERE keyword is used to filter groups before aggregation, then the HAVING keyword is used to filter groups after aggregation.
In short, ROUP BY is used to group query results according to one or more fields, and HAVING is used to filter the grouped data again.
2. Write SQL
1. Query duplicate data
The code is as follows (example):
SELECT MIN( a.task_id ) id
FROM
PLMS_T_D_TaskExecuteLog a
GROUP BY
a.car_desc,
a.direction,
a.distance,
a.log_begin_time,
a.log_desc,
a.log_status,
a.mai_subName,
a.person_str,
a.weather
HAVING
COUNT ( * ) > 1
2. Delete SQL
The code is as follows (example):
DELETE
FROM
TaskExecuteLog
WHERE
task_id IN (
SELECT MIN( a.task_id ) id
FROM
PLMS_T_D_TaskExecuteLog a
GROUP BY
a.car_desc,
a.direction,
a.distance,
a.log_begin_time,
a.log_desc,
a.log_status,
a.mai_subName,
a.person_str,
a.weather
HAVING
COUNT ( * ) > 1)
Summarize
The above is my statement for batch deleting data through SQL. I welcome colleagues to share other methods.