background
The mongodb database is used in the project. When the test data is stored in the database, an auto-increment id will be generated into the database based on the source data, and the id for the same data in the online and test environments is inconsistent. Some data can only be associated with more data when the id matches the online one. Therefore, I will write a script to change the id of the test environment to be consistent with the online one for the same data. But maybe because the script is not perfect enough, some records with duplicate ids may be written in the database, and then the ids are not added with a unique index. Duplicate data will lead to errors in the normal execution of etl tasks. Therefore, it is necessary to query for duplicate records in a certain field in mongodb.
Let's review the usage in mysql first
Let's first take a look at how to query duplicate records if you use mysql?
For example, taking the database of the Metersphere platform as an example, if you want to find out the case where more than 2 valid use cases are written under a certain interface, how should you find it:
SELECT api_definition_id, COUNT(*) FROM api_test_case
WHERE `status` <> "Trash"
GROUP BY api_definition_id
HAVING COUNT(*) > 1
ORDER BY COUNT(*) DESC
The results found are as follows:
usage in mongo
Next, let's take a look at the usage of group statistics and filtering in mongo. The specific usage is not introduced here, and the query statement is directly displayed:
For example, query the records in the user table that satisfy the age greater than 15 and have duplicate names:
db.user.aggregate(
[
{ $match: { age:{$gt:15}} },
{ $group: { _id: "$name", count: { $sum: 1 } } },
{ $match: { count: { $gt: 1 } } },
{ $sort: { count: -1 } },
{ $limit: 5000 }
],
{ allowDiskUse: true }
)
The result of the operation is as follows:
Notice:
By default, MongoDB will try to complete the aggregation operation in memory, but if the amount of data is large, the memory may be insufficient, causing the aggregation operation to fail. The allowDiskUse option allows MongoDB to write intermediate results to disk instead of memory , which helps to solve out-of-memory problems and can support processing larger datasets. Be aware that using disk may result in slower aggregation operations, as disk is usually much slower than memory. Therefore, you should use the allowDiskUse option only when necessary to avoid unnecessary disk access.
Can chatgpt help us write this statement?
Before chatpgt appeared, for this kind of complex statement, I had to go to Baidu to learn its usage, which would be a little complicated. You can also use the paid version of studio3t, which supports direct writing of sql in mysql syntax format for query, and you can also use it A statement that helps you convert to mongo's js query syntax.
After the emergence of chatgp, these are not difficult things now:
This is really smart, and some comments are written in the statement.
end
If you want to know other basic usage of mongo, you can check other articles written before:
MongoDB
MongoDB - Introduction to MongoDB
MongoDB- build a mongodb database for practice through docker
MongoDB-Install a mongodb database locally on a windows computer
MongoDB - use the mongo/mongosh command line to connect to the database
MongoDB-Quick start with some simple operations on the MongoDB command line
Introduction to the meaning of the MongoDB-_id field
MongoDB-insert data insert, insertOne, insertMany, save usage introduction
Introduction to the basic usage of MongoDB-table data query
Introduction to the usage of >, >=, <, <=, =, !=, in, and not in in MongoDB-query statements
Introduction to the usage of logical operators not, and, or, and nor in MongoDB-query statements
MongoDB-Use $type to query whether the type of a field is xxx
Introduction to the usage of $all in MongoDB-query
You are also welcome to join the fan exchange group of the official account. Learning resources will be provided from time to time in the group, and some industry information will also be shared from time to time. We look forward to growing together with you. In order to ensure the quality of group members, please add me as a friend first, and briefly introduce yourself (open circle of friends, which city you are in, and what position you are in). After confirming the identity of the test peers, I will invite you to join the group. Avoid some advertisers from mixing into the group, which will bring you a bad experience~~