When using the mongo database, a simple query basically meet most of the business scenario, but imagine if you want to count how many times a particular recommendation appears in the data specified in the query how it? Stupid approach is to use findthe query data out, and then use the count()methods of statistics, this scene better, but if one of these required fields and then? Is not it have to traverse the corresponding data before proceeding summation of it?
In mysql we often use count, group byand other inquiries, in mongodb we can also use the aggregate queries.

Imagine a set of data prices
price

Which records the price of each fruit, and now I want to look at statistics, the number of various fruits appear in this table, if not aggregate query, then the idea should be like this, put all the data in the table is taken out, then initiate a dictionary, then traversing each row of data, it is acquired fName, then the dictionary update count, the time complexity of this method is O (N), and if the large amount of data is not very good, then, following look polymerization is how the query.

Aggregate query using a aggregatefunction whose parameter is a pipelinepipe, pipeline concept is a parameter for the output of the current command as the next command in the pipeline is in order, such as by a pipeline operation after the first data does not meet the so after the pipeline operation would not have entered, it must have to pay attention to the order pipeline operations. Due to these problems, we want the statistics, so there is no need $matchof

1
2
3
4
5
6
7
8
9
10
from pymongo import MongoClient

client = MongoClient(host=['%s:%s'%(mongoDBhost,mongoDBport)])
G_mongo = client[mongoDBname]['FruitPrice']

pipeline = [
{'$group': {'_id': "$fName", 'count': {'$sum': 1}}},
]
for i in G_mongo['test'].aggregate(pipeline):
print i

 

We can construct their own data, here mainly to see aggregateusage.
The result is

1
2
3
{u'count': 8, u'_id': u'banana'}
{u'count': 9, u'_id': u'pear'}
{u'count': 14, u'_id': u'apple'}

 

We can see, you can get one step corresponding statistics.

If you want to get the price of more than 50 various statistics in it?
Then there are pipeline should be re- $groupadd before $matchoperation

1
2
3
4
pipeline = [
{'$match':{'price':{'$gte':50}}},
{'$group': {'_id': "$fName", 'count': {'$sum': 1}}},
]

 

We must pay attention to the order

$matchIn fact, the conditions of use and findfunction in the same.

The following key for said $groupoperation, group means the packet refers to data packets according to which fields, used above {'$group': {'_id': "$fName", 'count': {'$sum': 1}, _idis set to be divided, there is fNamea field points, the latter 'count': {'$sum': 1}, where $sumis the sum of the meaning behind the value is 1, which means that each appear once plus 1, so that we can achieve the purpose of counting, and if you want to calculate prices priceand, this should be written like this

1
{'$group': {'_id': "$fName", 'count': {'$sum': '$price'}}}

 

Note that the field have here $, and I would like to ask if the average price of it? That is, the total number of first ask the price, and then divided by the number of goods, but there is a $avgoperation

1
2
3
4
pipeline = [
{'$match':{'price':{'$gte':50}}},
{'$group': {'_id': "$fName", 'avg': {'$avg': '$price'}}},
]

 

The results obtained

1
2
3
{u'_id': u'banana', u'avg': 66.200000000000003}
{u'_id': u'pear', u'avg': 77.0}
{u'_id': u'apple', u'avg': 74.0}

 

Similar to the $aveoperation of many, is more commonly used $min(minimization), $max(selecting the maximum value)

1
2
3
4
5
6
7
8
9
10
11
12
13
pipeline = [
{'$match':{'price':{'$gte':50}}},
{'$group': {'_id': "$fName",
'count':{'$sum':1},
'priceAll':{'$sum':'$price'},
'avg': {'$avg': '$price'},
'min': {'$min':'$price'},
'max': {'$max':'$price'}
}
},
]
for i in G_mongo['test'].aggregate(pipeline):
print i

All supported operating can refer to the official document: Group support operations

It must be used when grouping to which field _id.

Then look at the multi-key group.
In the above use groupwhen grouping query used _idis a single field, for example, I have the following data in the database with user data
With user data

With a userfield, and according to that if I want to userand fNamebe how the grouping does it work?
Here you can pass a dictionary into

1
2
3
4
5
6
7
8
9
10
11
12
13
pipeline = [
{'$match':{'price':{'$gte':50}}},
{'$group': {'_id': {'fName':'$fName','user':'$user'},
'count':{'$sum':1},
'priceAll':{'$sum':'$price'},
'avg': {'$avg': '$price'},
'min': {'$min':'$price'},
'max': {'$max':'$price'}
}
},
]
for i in G_mongo['test2'].aggregate(pipeline):
print i

 

The results obtained were as follows:

1
2
3
4
5
6
7
8
9
{u'count': 1, u'avg': 93.0, u'min': 93, u'max': 93, u'_id': {u'user': u'fanjieying', u'fName': u'pear'}, u'priceAll': 93}
{u'count': 2, u'avg': 88.0, u'min': 87, u'max': 89, u'_id': {u'user': u'yangyanxing', u'fName': u'banana'}, u'priceAll': 176}
{u'count': 2, u'avg': 70.0, u'min': 69, u'max': 71, u'_id': {u'user': u'yangyanxing', u'fName': u'pear'}, u'priceAll': 140}
{u'count': 2, u'avg': 65.5, u'min': 58, u'max': 73, u'_id': {u'user': u'fanjieying', u'fName': u'banana'}, u'priceAll': 131}
{u'count': 3, u'avg': 92.333333333333329, u'min': 86, u'max': 97, u'_id': {u'user': u'fantuan', u'fName': u'banana'}, u'priceAll': 277}
{u'count': 2, u'avg': 78.5, u'min': 73, u'max': 84, u'_id': {u'user': u'yangyanxing', u'fName': u'apple'}, u'priceAll': 157}
{u'count': 3, u'avg': 56.666666666666664, u'min': 51, u'max': 60, u'_id': {u'user': u'fantuan', u'fName': u'pear'}, u'priceAll': 170}
{u'count': 2, u'avg': 81.5, u'min': 73, u'max': 90, u'_id': {u'user': u'fanjieying', u'fName': u'apple'}, u'priceAll': 163}
{u'count': 2, u'avg': 69.5, u'min': 53, u'max': 86, u'_id': {u'user': u'fantuan', u'fName': u'apple'}, u'priceAll': 139}

 

The results presented here show each user which bought the goods, spent a total of how much money, and so can the average of the maximum and minimum one-off show, If we use a for loop to traverse their own words this time complexity is very high.

Here simply to say next $groupand $matchusage, aggregate queries to support a variety of operations (called stages), can be viewed through the official documentation
pymongo in the pipeline in stages

Reference article
pymongo the group by methods