MongoDB- find duplicate records in the table

background

The mongodb database is used in the project. When the test data is stored in the database, an auto-increment id will be generated into the database based on the source data, and the id for the same data in the online and test environments is inconsistent. Some data can only be associated with more data when the id matches the online one. Therefore, I will write a script to change the id of the test environment to be consistent with the online one for the same data. But maybe because the script is not perfect enough, some records with duplicate ids may be written in the database, and then the ids are not added with a unique index. Duplicate data will lead to errors in the normal execution of etl tasks. Therefore, it is necessary to query for duplicate records in a certain field in mongodb.

Let's review the usage in mysql first

Let's first take a look at how to query duplicate records if you use mysql?

For example, taking the database of the Metersphere platform as an example, if you want to find out the case where more than 2 valid use cases are written under a certain interface, how should you find it:

SELECT  api_definition_id,  COUNT(*) FROM  api_test_case 
WHERE  `status` <> "Trash" 
GROUP BY  api_definition_id 
HAVING  COUNT(*) > 1 
ORDER BY  COUNT(*) DESC

The results found are as follows:

a0ae790b99c0d05ee2c20ce3396b7de0.png

usage in mongo

Next, let's take a look at the usage of group statistics and filtering in mongo. The specific usage is not introduced here, and the query statement is directly displayed:

For example, query the records in the user table that satisfy the age greater than 15 and have duplicate names:

db.user.aggregate(
   [
      { $match: { age:{$gt:15}} },
      { $group: { _id: "$name", count: { $sum: 1 } } },
       { $match: { count: { $gt: 1 } } },
      { $sort: { count: -1 } },
      { $limit: 5000 }
   ],
   { allowDiskUse: true }
)

The result of the operation is as follows:

10620640ef53ac67bde2e7302ba9a7e7.png

Notice:

By default, MongoDB will try to complete the aggregation operation in memory, but if the amount of data is large, the memory may be insufficient, causing the aggregation operation to fail. The allowDiskUse option allows MongoDB to write intermediate results to disk instead of memory , which helps to solve out-of-memory problems and can support processing larger datasets. Be aware that using disk may result in slower aggregation operations, as disk is usually much slower than memory. Therefore, you should use the allowDiskUse option only when necessary to avoid unnecessary disk access.

Can chatgpt help us write this statement?

Before chatpgt appeared, for this kind of complex statement, I had to go to Baidu to learn its usage, which would be a little complicated. You can also use the paid version of studio3t, which supports direct writing of sql in mysql syntax format for query, and you can also use it A statement that helps you convert to mongo's js query syntax.

After the emergence of chatgp, these are not difficult things now:

3fd2b6538fea77af6b8014bbc1c7d595.png

ed92582d83838fd210d9813c56b9f00e.png

d7d4ccffc326106665156eca2d34be72.png

This is really smart, and some comments are written in the statement.

end

If you want to know other basic usage of mongo, you can check other articles written before:

MongoDB

MongoDB - Introduction to MongoDB

MongoDB- build a mongodb database for practice through docker

MongoDB-Install a mongodb database locally on a windows computer

MongoDB - use the mongo/mongosh command line to connect to the database

MongoDB-Quick start with some simple operations on the MongoDB command line

Introduction to the meaning of the MongoDB-_id field

MongoDB-insert data insert, insertOne, insertMany, save usage introduction

Introduction to the basic usage of MongoDB-table data query

Introduction to the usage of >, >=, <, <=, =, !=, in, and not in in MongoDB-query statements

Introduction to the usage of logical operators not, and, or, and nor in MongoDB-query statements

Introduction to the use of $exists and the combination of $ne, $nin, $nor, and $not in MongoDB-query statements

MongoDB-Use $type to query whether the type of a field is xxx

Introduction to the usage of $all in MongoDB-query

53a73771e97128a4f36ec7701467ce2c.gif

You are also welcome to join the fan exchange group of the official account. Learning resources will be provided from time to time in the group, and some industry information will also be shared from time to time. We look forward to growing together with you. In order to ensure the quality of group members, please add me as a friend first, and briefly introduce yourself (open circle of friends, which city you are in, and what position you are in). After confirming the identity of the test peers, I will invite you to join the group. Avoid some advertisers from mixing into the group, which will bring you a bad experience~~

Guess you like

Origin blog.csdn.net/liboshi123/article/details/129095955