MongoDB, PyMongo how to filter results by count of uniques field?

Максим Дихтярь :

MongoDb contains next set of data

[{"user": "a", "domain": "some.com"},
{"user": "b", "domain": "some.com"},
{"user": "b1", "domain": "some.com"},
{"user": "c", "domain": "test.com"},
{"user": "d", "domain": "work.com"},
{"user": "aaa", "domain": "work.com"},
{"user": "some user", "domain": "work.com"} ] 

I need select first items filtered by domain, no more that 2 same domains in result. After mongo query result should looks like

[{"user": "a", "domain": "some.com"},
{"user": "b", "domain": "some.com"},
{"user": "c", "domain": "test.com"},
{"user": "d", "domain": "work.com"},
{"user": "aaa", "domain": "work.com"}]

Just 2 results with same domain, other with same domains must be skipped. Is this possible do do with $aggregation, $filter or something else?

Is the a way to group by domain and get just first N(2 in example) users data? Example:

[{"domain": "some.com", "users": [a, b]}]

so

{"user": "b1", "domain": "some.com"} will be skip
Valijon :

You may get desired result performing MongoDB aggregation.

It consists in four stages:
1. We group by domain field and accumulate into data documents with the same domain name
2. Than, we splice array to set max 2 items per domain
3. We flatten data field with $unwind operator
4. We return original document structure with $replaceRoot operator

db.collection.aggregate([
  {
    "$group": {
      "_id": "$domain",
      "data": { "$push": "$$ROOT" }
    }
  },
  {
    "$addFields": {
     "data": {
        "$slice": [ "$data", 0, 2 ]
      }
    }
  },
  {
    "$unwind": "$data"
  },
  {
    $replaceRoot: { "newRoot": "$data" }
  }
])

MongoPlayground | Pymongo Aggregation

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=397840&siteId=1