Finding/Counting Duplicate Values in Array in MongoDB

Ajinkya Karode :

I am new to the mongo database. Using Robo3t software
I have to find out duplicate values inside an array based on channel_id.
I did a research and found that aggregation needs to be used to do grouping and find respective count.
I have developed the following query but results are not as expected.

Sample Documents:

{
    "_id" : ObjectId("59b674d141b47e5401897d31"),
    "subscribed_channels" : [ 
        {
            "channel_id" : "1001",
            "channel_name" : "StarPlus",
            "channelPrice":"100"
        }, 
        {
            "channel_id" : "1002",
            "channel_name" : "StarGold",
            "channelPrice":"75"
        }, 
        {
            "channel_id" : "1001",
            "channel_name" : "StarPlus",
            "channelPrice":"100"
        },
        {
            "channel_id" : "1003",
            "channel_name" : "SetMax",
            "channelPrice":"80"
        }
    ],
    "viewer_account_id" : "59b6745b41b47e5401143b3d",
    "public_id_type" : "PHONE_NUMBER",
    "viewer_id" : "+919322264403",
    "role" : "CONSUMER",
    "active" : true,
    "date_time_created" : NumberLong(1505129681330),
    "date_time_modified" : NumberLong(1569320824387)
}

{
        "_id" : ObjectId("59b674d141b47e5401897d31"),
        "subscribed_channels" : [ 
            {
                "channel_id" : "1001",
                "channel_name" : "StarPlus",
                "channelPrice":"100"
            }, 
            {
                "channel_id" : "1002",
                "channel_name" : "StarGold",
                "channelPrice":"75"
            }, 
            {
                "channel_id" : "1001",
                "channel_name" : "StarPlus",
                "channelPrice":"100"
            },
             {
                "channel_id" : "1001",
                "channel_name" : "StarPlus",
                "channelPrice":"100"
            }
        ],
        "viewer_account_id" : "59b6745b41b47e5401143c56",
        "public_id_type" : "PHONE_NUMBER",
        "viewer_id" : "+919322264404",
        "role" : "CONSUMER",
        "active" : true,
        "date_time_created" : NumberLong(1505129681330),
        "date_time_modified" : NumberLong(1569320824387)
    }

Above are just 2 records of document viewers

Query :

db.getCollection('viewers').aggregate([ 
        {
                    "$group" : 
                    {_id:{
                        //viewer_id:"$consumer_id",
                        enterprise_id:"$subscribed_channels.channel_id",
                         }, 
                         "viewer_id": {
                             $first: "$viewer_id"
                        },
                        count:{$sum:1}
                        }},

                        {
                          "$match": {"count": { "$gt": 1 }}
                        }
                 ]) 

Actual Output :

{
    "_id" : {
        "enterprise_id" : [ 
            "1001", 
            "1001", 
            "1002",
            "1003"
        ]
    },
    "consumer_id" : "+919322264403",
    "count" : 2.0
}
{
    "_id" : {
        "enterprise_id" : [ 
            "1001", 
            "1002", 
            "1001",
            "1001
        ]
    },
    "consumer_id" : "+919322264404",
    "count" : 2.0
}

Expected Output :

I want to group based on subscribed_channels.channel_id and get a count respectively

{
    "_id" : {
        "enterprise_id" : [ 
            "1001", 
            "1001", 
            "1002",
            "1003"
        ]
    },
    "consumer_id" : "+919322264403",
    "count" : 2.0
}
{
    "_id" : {
        "enterprise_id" : [ 
            "1001", 
            "1001", 
            "1001",
            "1002
        ]
    },
    "consumer_id" : "+919322264404",
    "count" : 3.0
}

Grouping is not happening based on channel_id, also the count is incorrect.
The count is not even giving me no of channel-id subscribed, also not giving duplicate channel_ids.

Please guide me in building a query that gives the correct result.

whoami :

Try below query :

Query :

db.collection.aggregate([
  /** project only needed fields & transform fields as you like */
  {
    $project: {
      customer_id: "$viewer_id",
      enterprise_id: "$subscribed_channels.channel_id",
      count: {
        /** Subtract size of original array & newly formed array which has unique values to get count of duplicates */
        $subtract: [
          {
            $size: "$subscribed_channels.channel_id" // get size of original array
          },
          {
            $size: {
              $setUnion: ["$subscribed_channels.channel_id", []] // This will give you an array with unique elements & get size of it
            }
          }
        ]
      }
    }
  }
]);

Test : MongoDB-Playground

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=301680&siteId=1
Recommended