mongodb remove duplicate data

db.t_user_task.aggregate([
  {
    $group: {
      _id: {
        uid: '$uid',
        taskId: '$taskId'
      },
      count: {
        $sum: 1
      },
      dups: {
        $addToSet: '$_id'
      }
    }
  },
  {
    $match: {
      count: {
        $gt: 1
      }
    }
  }
]).forEach(function(doc){
  doc.dups.shift();db.t_user_task.remove({
    _id: {
      $in: doc.dups
    }
  });
})

1. Group according to uid and taskId and count the number, $group will only return the fields that participate in the grouping, use $addToSet to add the _id field to the returned result array

2. Use $match to match data with a number greater than 1

3.doc.dups.shift(); means to delete from the first value of the array; the function is to remove one of the _id of the duplicate data, so that the subsequent delete statement will not delete all the data

4. Use a forEach loop to delete data based on _id

 

The $addToSet operator adds a value to an array only if the value does not already exist in the array. If the value already exists in the array, $addToSet returns without modifying the array.

 

Note: The camel case of forEach and $addToSet cannot be written in lowercase, because mongodb is strictly case-sensitive , mongodb is strictly case-sensitive , mongodb is strictly case-sensitive , and the important thing is said three times!

 

db.t_user_task.aggregate([ {$match: { startTime: { $gt: 20180205 }} }, { $group: { _id: {uid: '$uid',taskId: '$taskId'},count: {$sum: 1}, dups: {$addToSet: '$_id'}}},   {$match: {count: {$gt: 1}}}   ])

 

 db.t_user_task.aggregate([  { $group: { _id: {uid: '$uid',taskId: '$taskId'},count: {$sum: 1}, dups: {$addToSet: '$_id'}}},   {$match: {count: {$gt: 1}}}   ]).forEach(function(doc){doc.dups.shift();db.t_user_task.remove({_id: {$in: doc.dups}});})

 

 

If the amount of data is okay, just wait patiently

 

 

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325119490&siteId=291194637