Relational Database (10): Handling Duplicate Data in MySQL

There may be duplicate records in MySQL data tables. In some cases, the existence of duplicate data is allowed, but sometimes it is necessary to delete these duplicate data. Next, we will introduce how to prevent duplicate data in the data table and how to delete the duplicate data in the data table.


data uniqueness

You can set the specified field in the MySQL data table as a  PRIMARY KEY (primary key)  or a  UNIQUE (unique)  index to ensure the uniqueness of the data.

Example:

# 下表中无索引及主键,所以该表允许出现多条重复记录。
CREATE TABLE person_tbl
(
    first_name CHAR(20),
    last_name CHAR(20),
    sex CHAR(10)
);

If a double primary key is set, the default value of that key cannot be NULL, it can be set to NOT NULL 

# 如果想设置表中字段 first_name,last_name 数据不能重复

# 一种是通过PRIMARY KEY设置主键模式来设置数据的唯一性,如下所示:
CREATE TABLE person_tbl
(
   first_name CHAR(20) NOT NULL,
   last_name CHAR(20) NOT NULL,
   sex CHAR(10),
   PRIMARY KEY (last_name, first_name)
);


# 另一种设置数据的唯一性方法是添加一个 UNIQUE 索引,如下所示:
CREATE TABLE person_tbl
(
   first_name CHAR(20) NOT NULL,
   last_name CHAR(20) NOT NULL,
   sex CHAR(10),
   UNIQUE (last_name, first_name)
);

If we set a unique index, then when INSERT INTO inserts duplicate data, the SQL statement will fail to execute successfully and throw an error. INSERT IGNORE INTO will ignore the data that already exists in the database. If there is no data in the database, new data will be inserted, and if there is data, this data will be skipped. INSERT IGNORE INTO When inserting data, after the uniqueness of the record is set, if duplicate data is inserted, no error will be returned, but only a warning will be returned. In REPLACE INTO , if there are records with the same primary or unique, delete them first. Insert a new record.

# 使用 INSERT IGNORE INTO,执行后不会出错,也不会向数据表中插入重复数据:
mysql> INSERT IGNORE INTO person_tbl (last_name, first_name)
    -> VALUES( 'Jay', 'Thomas');
Query OK, 1 row affected (0.00 sec)

mysql> INSERT IGNORE INTO person_tbl (last_name, first_name)
    -> VALUES( 'Jay', 'Thomas');
Query OK, 0 rows affected (0.00 sec)

mysql> REPLACE INTO person_tbl (last_name, first_name)
    -> VALUES( 'Jay', 'Thomas');
Query OK, 1 rows affected (0.00 sec)

Statistics duplicate data

Example:

# 统计表中 first_name 和 last_name的重复记录数:

mysql> SELECT COUNT(*) as repetitions, last_name, first_name
    -> FROM person_tbl
    -> GROUP BY last_name, first_name
    -> HAVING repetitions > 1;

 In general, to query for duplicate values, do the following:

  • Determine which column contains possible duplicate values.
  • Use COUNT(*) in the column select list to list those columns.
  • Columns listed in the GROUP BY clause.
  • The HAVING clause sets the number of repetitions greater than 1.

filter duplicate data

If you need to read unique data, you can use the DISTINCT keyword in the SELECT statement to filter duplicate data.

mysql> SELECT DISTINCT last_name, first_name FROM person_tbl;

You can also use GROUP BY to read unique data in the data table:

mysql> SELECT last_name, first_name FROM person_tbl
    -> GROUP BY (last_name, first_name);

deduplicate data

If you want to delete duplicate data in the data table, you can use the following SQL statement:

# 先创建一个没有重复数据的临时表
mysql> CREATE TABLE tmp SELECT last_name, first_name, sex FROM person_tbl  GROUP BY (last_name, first_name, sex);
# 删除原来的表
mysql> DROP TABLE person_tbl;
# 将临时表重命名为原来的表名
mysql> ALTER TABLE tmp RENAME TO person_tbl;

You can also add INDEX (index) and PRIMAY KEY (primary key) in the data table to delete duplicate records in the table. Methods as below:

mysql> ALTER IGNORE TABLE person_tbl
    -> ADD PRIMARY KEY (last_name, first_name);

Guess you like

Origin blog.csdn.net/weixin_43145427/article/details/124189810