When using the mongo database, a simple query basically meet most of the business scenario, but imagine if you want to count how many times a particular recommendation appears in the data specified in the query how it? Stupid approach is to use find
the query data out, and then use the count()
methods of statistics, this scene better, but if one of these required fields and then? Is not it have to traverse the corresponding data before proceeding summation of it?
In mysql we often use count
, group by
and other inquiries, in mongodb we can also use the aggregate queries.
Which records the price of each fruit, and now I want to look at statistics, the number of various fruits appear in this table, if not aggregate query, then the idea should be like this, put all the data in the table is taken out, then initiate a dictionary, then traversing each row of data, it is acquired fName
, then the dictionary update count, the time complexity of this method is O (N), and if the large amount of data is not very good, then, following look polymerization is how the query.
Aggregate query using a aggregate
function whose parameter is a pipeline
pipe, pipeline concept is a parameter for the output of the current command as the next command in the pipeline is in order, such as by a pipeline operation after the first data does not meet the so after the pipeline operation would not have entered, it must have to pay attention to the order pipeline operations. Due to these problems, we want the statistics, so there is no need $match
of
1 |
from pymongo import MongoClient |
We can construct their own data, here mainly to see aggregate
usage.
The result is
1 |
{u'count': 8, u'_id': u'banana'} |
We can see, you can get one step corresponding statistics.
If you want to get the price of more than 50 various statistics in it?
Then there are pipeline should be re- $group
add before $match
operation
1 |
pipeline = [ |
We must pay attention to the order
$match
In fact, the conditions of use and find
function in the same.
The following key for said $group
operation, group means the packet refers to data packets according to which fields, used above {'$group': {'_id': "$fName", 'count': {'$sum': 1}
, _id
is set to be divided, there is fName
a field points, the latter 'count': {'$sum': 1}
, where $sum
is the sum of the meaning behind the value is 1, which means that each appear once plus 1, so that we can achieve the purpose of counting, and if you want to calculate prices price
and, this should be written like this
1 |
{'$group': {'_id': "$fName", 'count': {'$sum': '$price'}}} |
Note that the field have here $
, and I would like to ask if the average price of it? That is, the total number of first ask the price, and then divided by the number of goods, but there is a $avg
operation
1 |
pipeline = [ |
The results obtained
1 |
{u'_id': u'banana', u'avg': 66.200000000000003} |
Similar to the $ave
operation of many, is more commonly used $min
(minimization), $max
(selecting the maximum value)
1 |
pipeline = [ |
All supported operating can refer to the official document: Group support operations
It must be used when grouping to which field _id
.
Then look at the multi-key group.
In the above use group
when grouping query used _id
is a single field, for example, I have the following data in the database
With a user
field, and according to that if I want to user
and fName
be how the grouping does it work?
Here you can pass a dictionary into
1 |
pipeline = [ |
The results obtained were as follows:
1 |
{u'count': 1, u'avg': 93.0, u'min': 93, u'max': 93, u'_id': {u'user': u'fanjieying', u'fName': u'pear'}, u'priceAll': 93} |
The results presented here show each user which bought the goods, spent a total of how much money, and so can the average of the maximum and minimum one-off show, If we use a for loop to traverse their own words this time complexity is very high.
Here simply to say next $group
and $match
usage, aggregate queries to support a variety of operations (called stages
), can be viewed through the official documentation
pymongo in the pipeline in stages
Reference article
pymongo the group by methods