SQL deduplication (blogger interview questions)

SQL deduplication (blogger interview questions)

In SQL, you can use the DISTINCT keyword for deduplication. The DISTINCT keyword can be used in the SELECT statement to remove duplicate rows from the query results.

For example, suppose there is a table called students, which contains students' names and age information, we need to query the names of all students, and return the results after removing duplicates. You can use the following SQL statements to query:

SELECT DISTINCT name FROM students;

After executing the above SQL statement, the deduplicated results of all student names will be returned.

It should be noted that the DISTINCT keyword is deduplicated for the entire row. If there are multiple columns of data in the query result, only when the entire row of data is identical will it be deduplicated. If you need to deduplicate some columns, you can use the GROUP BY clause and aggregate functions to operate. For example, if you need to query the number of students in each age group and return the result after deduplication, you can use the following SQL statement to query:

SELECT age, COUNT(DISTINCT name) FROM students GROUP BY age;

After executing the above SQL statement, it will return the result after deduplication of the number of students in each age group. But when the two names are the same, the data will be wrong.

If there are two people with the same name, they COUNT(DISTINCT name)will be regarded as the same person when using to perform deduplication statistics, and the counter will only be counted as one person.

If you need to distinguish between these two people, you can use other columns for auxiliary deduplication, such as using the ID column or other unique identifier columns for deduplication, for example:

SELECT COUNT(DISTINCT CONCAT(name, '-', id)) FROM students;

Here use CONCAT(name, '-', id)to concatenate the name and id into a string, and then perform deduplication statistics on the string, so that two people with the same name but different ids can be distinguished and counted.

Of course, if you don't need to distinguish between these two people, you only need to know how many different names there are, and it is COUNT(DISTINCT name)enough to use .

Supongo que te gusta

Origin blog.csdn.net/qq_46138492/article/details/129508492
Recomendado
Clasificación