MongoDB study notes (c) polymerizing

In the practical application of the database, we often need to use aggregate operations to help us deal with data, statistics and data consolidation

In this article we will learn how to use aggregate operations in MongoDB

1, aggregate functions polymerization conduit

Aggregate functions using basic syntax polymerization conduit as follows:

db.COLLECTION_NAME.aggregate(AGGREGATE_OPERATION)

Common aggregate functions as follows:

Data processing function is mainly used for the polymerization, for example summing, averaging, etc., and returns the last calculation result

Operators description
$sum Summing
$avg Averaging
$ For the minimum
$max Seeking maximum
$first Get the first document
$last Gets the last document
$push Inserting a value into the array

Common Polymer pipes are as follows:

Polymer pipes on a result of the processing pipeline may be passed to the next processing pipeline continues

Operators description
$group For groups documents
$project Used to modify the document structure, you can rename, add or remove fields
$match Filter does not meet the criteria document
$sort After ordering a document output
$limit Specifies the number of records to read a certain number of
$skip Specifies the number of records to skip a certain number of

Well, let's come and experiment a bit, ready for first test data

> use university
> db.teacher.insert([
    {
        'tid': '19001',
        'name': 'Alice',
        'age': 32,
        'department': 'Computer',
        'salary': 10000
    },
    {
        'tid': '19002',
        'name': 'Bob',
        'age': 48,
        'department': 'Computer',
        'salary': 15000
    },
    {
        'tid': '19003',
        'name': 'Alice',
        'age': 42,
        'department': 'Software',
        'salary': 12000
    },
    {
        'tid': '19004',
        'name': 'Christy',
        'age': 38,
        'department': 'Software',
        'salary': 14000
    },
    {
        'tid': '19005',
        'name': 'Daniel',
        'age': 28,
        'department': 'Architecture',
        'salary': 8000
    }
])

The total wage statistics for all teachers

db.teacher.aggregate([
    {
        $group: {
            _id: null, // 不进行分组
            total_salary: { $sum: '$salary' } // 对 salary 字段的值进行累加
        }
    },
    {
        $project: {
            _id: 0, // 不输出 _id 字段
            total_salary: 1 // 输出 total_salary 字段
        }
    }
])

// 查询结果
// { "total_salary" : 59000 }

The total number of more than 10,000 teachers wage statistics

db.teacher.aggregate([
    {
        $match: {
            salary: { $gt: 10000 } // 返回 salary 字段的值大于 10000 的文档
        }
    },
    {
        $group: {
            _id: null, // 不进行分组
            total_teacher: { $sum: 1 } // 对数值 1 进行累加
        }
    },
    {
        $project: {
            _id: 0, // 不输出 _id 字段
            total_teacher: 1 // 输出 total_teacher 字段
        }
    }
])

// 查询结果
// { "total_teacher" : 3 }

The average wage statistics for each faculty, and from small to large output in the order according to the average wage

db.teacher.aggregate([
    {
        $group: {
            _id: '$department', // 以 department 字段的值进行分组
            avg_salary: { $avg: '$salary' } // 对 salary 字段的值求平均数
        }
    },
    {
        $project: {
            _id: 0, // 不输出 _id 字段
            dept_name: '$_id', // 增加 dept_name 字段,并将其值取为 _id 字段的值
            avg_salary: 1 // 输出 avg_salary 字段
        }
    },
    {
        $sort: {
            avg_salary: 1 // 按照 avg_salary 字段的值进行升序排列
        }
    }
])

// 查询结果
// { "avg_salary" : 8000, "dept_name" : "Architecture" }
// { "avg_salary" : 12500, "dept_name" : "Computer" }
// { "avg_salary" : 13000, "dept_name" : "Software" }

Teacher salaries of the top three output numbers

db.teacher.aggregate([
    {
        $sort: {
            salary: -1 // 按照 salary 字段的值进行降序排列
        }
    },
    {
        $limit: 3 // 限制只能读取 3 条文档
    },
    {
        $project: {
            _id: 0, // 不输出 _id 字段
            tid: 1 // 输出 tid 字段
        }
    }
])

// 查询结果
// { "tid" : "19002" }
// { "tid" : "19004" }
// { "tid" : "19003" }

2、Map Reduce

In addition to the polymeric aggregate functions outside the pipe, MongoDB also there is another more flexible polymeric operation - Map Reduce

Map Reduce is a computing model, it can be a large work breakdown (map) to perform, and then merge the results (reduce) as the final result

Its basic syntax is as follows:

db.COLLECTION_NAME.aggregate(
    function() { emit(key, value) }, // map 函数,生成键值对序列,作为 reduce 函数的参数
    function(key, values) { return reduceFunction }, // reduce 函数,处理 values
    {
        query: <query>, // 指定筛选条件,只有满足条件的文档才会调用 map 函数
        sort: <function>, // 在调用 map 函数前给文档排序
        limit: <number>, // 限制发给 map 函数的文档数量
        finalize: <function>, // 在存入结果集合前修改数据
        out: <collection>, // 指定结果存放的位置,若不指定则使用临时集合
    }
)

Let's give an example

Statistical each college teachers over the age of 30, the average wage of more than 10,000 college, but does not output information about wages

db.teacher.mapReduce(
    // 2、执行 map 函数,map 函数的核心是调用 emit 函数,提供 reduce 函数的参数
    // emit 函数的第一个参数指定需要分组的字段,第二个参数指定需要进行统计的字段
    // 这里依据 department 字段的值分组,作为 key;组合 salary 字段的值成为数组,作为 values
    // 将每个分组得到的 (key, values) 作为 reduce 函数的参数传递过去
    function() { emit(this.department, this.salary) },
    // 3、执行 reduce 函数,reduce 函数的核心是将 (key, values) 变成 (key, value)
    // 该函数的参数 (key ,values) 从 map 函数而来,并返回一个处理后的值作为 value
    // value 与 key 组合成 (key, value) 再向后传递
    // 这里返回一个使用 avg 函数对 values 求得的平均值
    function(key, values) { return Array.avg(values) },
    {
        // 1、首先执行 query,筛选掉不符合条件的文档,然后将符合条件的文档发送到 map 函数
        query: { age: { $gt: 30 } },
        // 4、执行 finalize 函数,在将结果储存到 out 集合之前进行处理
        // 该函数的参数 (key, value) 从 reduce 函数而来,并返回一个处理后的值作为 value
        // 这里将平均工资信息隐藏,即将 value 字段的值设为 null
        finalize: function(key, value) {
            return null
        },
        // 5、将最终处理后的结果存到 total_teacher 集合
        out: 'total_teacher'
    }
)

You can see the following output

{
    "result" : "total_teacher", // 储存结果的集合名称
    "timeMillis" : 276, // 花费的时间,单位为毫秒
    "counts" : {
        "input" : 4, // 经过筛选后发送到 map 函数的文档个数
        "emit" : 4, // 在 map 函数中处理的文档个数
        "reduce" : 2, // 在 reduce 函数中处理的文档个数
        "output" : 2 // 结果集合的文档个数
    },
    "ok" : 1
}

Then view the results

> show collections
// teacher
// total_teacher
> db.total_teacher.find()
// { "_id" : "Computer", "value" : null }
// { "_id" : "Software", "value" : null }

[Read more MongoDB series of articles, look at MongoDB study notes ]

Guess you like

Origin www.cnblogs.com/wsmrzx/p/11583074.html