In the practical application of the database, we often need to use aggregate operations to help us deal with data, statistics and data consolidation
In this article we will learn how to use aggregate operations in MongoDB
1, aggregate functions polymerization conduit
Aggregate functions using basic syntax polymerization conduit as follows:
db.COLLECTION_NAME.aggregate(AGGREGATE_OPERATION)
Common aggregate functions as follows:
Data processing function is mainly used for the polymerization, for example summing, averaging, etc., and returns the last calculation result
Operators | description |
---|---|
$sum | Summing |
$avg | Averaging |
$ | For the minimum |
$max | Seeking maximum |
$first | Get the first document |
$last | Gets the last document |
$push | Inserting a value into the array |
Common Polymer pipes are as follows:
Polymer pipes on a result of the processing pipeline may be passed to the next processing pipeline continues
Operators | description |
---|---|
$group | For groups documents |
$project | Used to modify the document structure, you can rename, add or remove fields |
$match | Filter does not meet the criteria document |
$sort | After ordering a document output |
$limit | Specifies the number of records to read a certain number of |
$skip | Specifies the number of records to skip a certain number of |
Well, let's come and experiment a bit, ready for first test data
> use university
> db.teacher.insert([
{
'tid': '19001',
'name': 'Alice',
'age': 32,
'department': 'Computer',
'salary': 10000
},
{
'tid': '19002',
'name': 'Bob',
'age': 48,
'department': 'Computer',
'salary': 15000
},
{
'tid': '19003',
'name': 'Alice',
'age': 42,
'department': 'Software',
'salary': 12000
},
{
'tid': '19004',
'name': 'Christy',
'age': 38,
'department': 'Software',
'salary': 14000
},
{
'tid': '19005',
'name': 'Daniel',
'age': 28,
'department': 'Architecture',
'salary': 8000
}
])
The total wage statistics for all teachers
db.teacher.aggregate([
{
$group: {
_id: null, // 不进行分组
total_salary: { $sum: '$salary' } // 对 salary 字段的值进行累加
}
},
{
$project: {
_id: 0, // 不输出 _id 字段
total_salary: 1 // 输出 total_salary 字段
}
}
])
// 查询结果
// { "total_salary" : 59000 }
The total number of more than 10,000 teachers wage statistics
db.teacher.aggregate([
{
$match: {
salary: { $gt: 10000 } // 返回 salary 字段的值大于 10000 的文档
}
},
{
$group: {
_id: null, // 不进行分组
total_teacher: { $sum: 1 } // 对数值 1 进行累加
}
},
{
$project: {
_id: 0, // 不输出 _id 字段
total_teacher: 1 // 输出 total_teacher 字段
}
}
])
// 查询结果
// { "total_teacher" : 3 }
The average wage statistics for each faculty, and from small to large output in the order according to the average wage
db.teacher.aggregate([
{
$group: {
_id: '$department', // 以 department 字段的值进行分组
avg_salary: { $avg: '$salary' } // 对 salary 字段的值求平均数
}
},
{
$project: {
_id: 0, // 不输出 _id 字段
dept_name: '$_id', // 增加 dept_name 字段,并将其值取为 _id 字段的值
avg_salary: 1 // 输出 avg_salary 字段
}
},
{
$sort: {
avg_salary: 1 // 按照 avg_salary 字段的值进行升序排列
}
}
])
// 查询结果
// { "avg_salary" : 8000, "dept_name" : "Architecture" }
// { "avg_salary" : 12500, "dept_name" : "Computer" }
// { "avg_salary" : 13000, "dept_name" : "Software" }
Teacher salaries of the top three output numbers
db.teacher.aggregate([
{
$sort: {
salary: -1 // 按照 salary 字段的值进行降序排列
}
},
{
$limit: 3 // 限制只能读取 3 条文档
},
{
$project: {
_id: 0, // 不输出 _id 字段
tid: 1 // 输出 tid 字段
}
}
])
// 查询结果
// { "tid" : "19002" }
// { "tid" : "19004" }
// { "tid" : "19003" }
2、Map Reduce
In addition to the polymeric aggregate functions outside the pipe, MongoDB also there is another more flexible polymeric operation - Map Reduce
Map Reduce is a computing model, it can be a large work breakdown (map) to perform, and then merge the results (reduce) as the final result
Its basic syntax is as follows:
db.COLLECTION_NAME.aggregate(
function() { emit(key, value) }, // map 函数,生成键值对序列,作为 reduce 函数的参数
function(key, values) { return reduceFunction }, // reduce 函数,处理 values
{
query: <query>, // 指定筛选条件,只有满足条件的文档才会调用 map 函数
sort: <function>, // 在调用 map 函数前给文档排序
limit: <number>, // 限制发给 map 函数的文档数量
finalize: <function>, // 在存入结果集合前修改数据
out: <collection>, // 指定结果存放的位置,若不指定则使用临时集合
}
)
Let's give an example
Statistical each college teachers over the age of 30, the average wage of more than 10,000 college, but does not output information about wages
db.teacher.mapReduce(
// 2、执行 map 函数,map 函数的核心是调用 emit 函数,提供 reduce 函数的参数
// emit 函数的第一个参数指定需要分组的字段,第二个参数指定需要进行统计的字段
// 这里依据 department 字段的值分组,作为 key;组合 salary 字段的值成为数组,作为 values
// 将每个分组得到的 (key, values) 作为 reduce 函数的参数传递过去
function() { emit(this.department, this.salary) },
// 3、执行 reduce 函数,reduce 函数的核心是将 (key, values) 变成 (key, value)
// 该函数的参数 (key ,values) 从 map 函数而来,并返回一个处理后的值作为 value
// value 与 key 组合成 (key, value) 再向后传递
// 这里返回一个使用 avg 函数对 values 求得的平均值
function(key, values) { return Array.avg(values) },
{
// 1、首先执行 query,筛选掉不符合条件的文档,然后将符合条件的文档发送到 map 函数
query: { age: { $gt: 30 } },
// 4、执行 finalize 函数,在将结果储存到 out 集合之前进行处理
// 该函数的参数 (key, value) 从 reduce 函数而来,并返回一个处理后的值作为 value
// 这里将平均工资信息隐藏,即将 value 字段的值设为 null
finalize: function(key, value) {
return null
},
// 5、将最终处理后的结果存到 total_teacher 集合
out: 'total_teacher'
}
)
You can see the following output
{
"result" : "total_teacher", // 储存结果的集合名称
"timeMillis" : 276, // 花费的时间,单位为毫秒
"counts" : {
"input" : 4, // 经过筛选后发送到 map 函数的文档个数
"emit" : 4, // 在 map 函数中处理的文档个数
"reduce" : 2, // 在 reduce 函数中处理的文档个数
"output" : 2 // 结果集合的文档个数
},
"ok" : 1
}
Then view the results
> show collections
// teacher
// total_teacher
> db.total_teacher.find()
// { "_id" : "Computer", "value" : null }
// { "_id" : "Software", "value" : null }
[Read more MongoDB series of articles, look at MongoDB study notes ]