MySQL handles duplicate data

1. MySQL handles duplicate data

 There may be duplicate records in some MySQL data tables. In some cases, we allow the existence of duplicate data, but sometimes we also need to delete these duplicate data.
In this chapter, we will introduce how to prevent duplicate data in the data table and how to delete the duplicate data in the data table.

 

2. Prevent duplicate data from appearing in the table

You can set the specified field in the MySQL data table as a PRIMARY KEY (primary key) or UNIQUE (unique) index to ensure the uniqueness of the data.
Let's try an example: There are no indexes and primary keys in the table below, so the table allows multiple duplicate records.

CREATE TABLE person_tbl
(
    first_name CHAR(20),
    last_name CHAR(20),
    sex CHAR(10)
);

 If you want to set the fields first_name and last_name in the table, the data cannot be repeated, you can set the double primary key mode to set the uniqueness of the data. If you set the double primary key, the default value of that key cannot be NULL, but can be set to NOT NULL. As follows:

CREATE TABLE person_tbl
(
   first_name CHAR(20) NOT NULL,
   last_name CHAR(20) NOT NULL,
   sex CHAR(10),
   PRIMARY KEY (last_name, first_name)
);

 If we set a unique index, then when inserting duplicate data, the SQL statement will fail to execute successfully and throw an error.

The difference between INSERT IGNORE INTO and INSERT INTO is that INSERT IGNORE ignores the data that already exists in the database. If there is no data in the database, it will insert new data, and if there is data, it will skip this data. In this way, the existing data in the database can be preserved, and the purpose of inserting data in the gap can be achieved.

The following example uses INSERT IGNORE INTO, which executes without error and does not insert duplicate data into the data table:

mysql> INSERT IGNORE INTO person_tbl (last_name, first_name)
    -> VALUES( 'Jay', 'Thomas');
Query OK, 1 row affected (0.00 sec)
mysql> INSERT IGNORE INTO person_tbl (last_name, first_name)
    -> VALUES( 'Jay', 'Thomas');
Query OK, 0 rows affected (0.00 sec)

 INSERT IGNORE INTO When inserting data, after the uniqueness of the record is set, if duplicate data is inserted, no error will be returned, but only a warning will be returned. And REPLACE INTO into if there is a primary or unique record, delete it first. Insert a new record.
Another way to set the uniqueness of your data is to add a UNIQUE index like this:

CREATE TABLE person_tbl
(
   first_name CHAR(20) NOT NULL,
   last_name CHAR(20) NOT NULL,
   sex CHAR(10)
   UNIQUE (last_name, first_name)
);

 

3. Statistical duplicate data

Below we will count the number of duplicate records of first_name and last_name in the table:

mysql> SELECT COUNT(*) as repetitions, last_name, first_name
    -> FROM person_tbl
    -> GROUP BY last_name, first_name
    -> HAVING repetitions > 1;

 The above query statement will return the number of duplicate records in the person_tbl table. In general, to query for duplicate values, do the following:
Determine which column contains potentially duplicate values.
Use COUNT(*) in the column select list to list those columns.
Columns listed in the GROUP BY clause.
The HAVING clause sets the number of repetitions greater than 1.

Fourth, filter duplicate data

If you need to read unique data, you can use the DISTINCT keyword in the SELECT statement to filter duplicate data.

mysql> SELECT DISTINCT last_name, first_name
    -> FROM person_tbl
    -> ORDER BY last_name;

 You can also use GROUP BY to read unique data from a table:

mysql> SELECT last_name, first_name
    -> FROM person_tbl
    -> GROUP BY (last_name, first_name);

 

5. Delete duplicate data

If you want to delete duplicate data in the data table, you can use the following SQL statement:

mysql> CREATE TABLE tmp SELECT last_name, first_name, sex
    ->                  FROM person_tbl;
    ->                  GROUP BY (last_name, first_name);
mysql> DROP TABLE person_tbl;
mysql> ALTER TABLE tmp RENAME TO person_tbl;

 Of course, you can also add INDEX (index) and PRIMAY KEY (primary key) in the data table to delete duplicate records in the table. Methods as below:

mysql> ALTER IGNORE TABLE person_tbl
    -> ADD PRIMARY KEY (last_name, first_name);

 

 

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326989244&siteId=291194637